Applying and Interpreting Deep Learning

# Applying and Interpreting Deep Learning
## Two Stories from Insurance<br /><a href='http://kevinykuo.com/talk/2018/09/tdsc/'>kevinykuo.com/talk/2018/09/tdsc</a>
### Kevin Kuo <br /> <span class="citation">@kevinykuo</span>
### September 2018

---

# Why should we care about deep learning?

---

# Why should we care about deep learning *at work*?

---

.Large[If your day-to-day is writing software for autonomous driving or playing video games, the answer is obvious.]

.Large[But what about for the rest of us??]

---
class: center, middle

![](img/doge.jpg)

---

# The rest of us

- .large[Most data scientists at most places deal with structured tabular data most of the time.]

- .large[Traditional ML techniques like random forest/xgboost will usually beat neural nets with 1/10 of the tuning/training effort.]

- .large[Neural nets are a tougher sell in settings where transparency is more important (think regulated industries like financial services.)]

- .large[There are use cases, even for traditional predictive modeling problems, where deep learning techniques can be useful.]

- .large[Also, interpretability for neural networks (and ML in general) is an active area that has seen a lot of development recently.]

---

# Deep learning in actuarial science

---

# **Deep learning** in *actuarial science*?!?! 🤔

---

# Some context

From Casualty Actuarial Society [press release](https://www.casact.org/press/index.cfm?fa=viewArticle&articleID=2831):

> Survey participants were also asked to identify top statistical concepts that predictive modelers should understand. Generalized linear modeling (GLM), a **cutting-edge mathematical tool** on which modern ratemaking depends, was the top answer.

(bold mine)

---

```r
# Number of years a concept remains cutting-edge for actuaries
2015 - 1972
```

```
## [1] 43
```

---

# Insurance claims

---

# Fender bender!

What happens in the life of a claim?

.pull-left[
<div id="htmlwidget-826e0c8b55929f21507f" style="width:504px;height:360px;" class="grViz html-widget"></div>
<script type="application/json" data-for="htmlwidget-826e0c8b55929f21507f">{"x":{"diagram":"digraph {\n  \ngraph[layout = dot]\n\na [label = \"Accident occurs\", shape = rectangle]\nb [label = \"Claim reported\", shape = rectangle]\nc [label = \"Payments made\", shape = rectangle]\nd [label = \"Claim closed\", shape = rectangle]\n\na -> b -> c -> d\n}","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>
]

<br />

- Call in claim to agent or file via app. Claims adjuster set case reserves (approx. how much she expects the claim to cost eventually).

- Damages paid for

- Move on with life
]

---

# Workers' comp

What happens in the life of a (long-tailed) claim?

.pull-left[
<div id="htmlwidget-7ec6384161cd455d9611" style="width:504px;height:360px;" class="grViz html-widget"></div>
<script type="application/json" data-for="htmlwidget-7ec6384161cd455d9611">{"x":{"diagram":"digraph {\n  \ngraph[layout = dot]\n\na [label = \"Accident occurs\", shape = rectangle]\nb [label = \"Claim reported\", shape = rectangle]\nc [label = \"Payments made\", shape = rectangle]\nd [label = \"Claim closed\", shape = rectangle]\n\na -> b -> c -> d\n}","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>
]

<br />

- Mesothelioma diagnosis (this could be decades later), claim filed.

<br />

- Claim settled out of court after months (or more) of litigation

<br />

- Move on with life
]

---

# The loss reserving exercise

<br /><br />
.full-width[.content-box-blue[.Large[Basically, figure out what we gotta pay in the future due to claims.]]]

---

# Run-off triangles

Here's an example of a cumulative paid loss triangle:

```
## # A tibble: 9 x 10
##   accident_year   `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`
##           <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1          1988   133   333   431   570   615   615   615   614   614
## 2          1989   934  1746  2365  2579  2763  2966  2940  2978    NA
## 3          1990  2030  4864  6880  8087  8595  8743  8763    NA    NA
## 4          1991  4537 11527 15123 16656 17321 18076    NA    NA    NA
## 5          1992  7564 16061 22465 25204 26517    NA    NA    NA    NA
## 6          1993  8343 19900 26732 30079    NA    NA    NA    NA    NA
## 7          1994 12565 26922 33867    NA    NA    NA    NA    NA    NA
## 8          1995 13437 26012    NA    NA    NA    NA    NA    NA    NA
## 9          1996 12604    NA    NA    NA    NA    NA    NA    NA    NA
```

- .large[Each row represents an accident year, i.e. all the claims that occured that year.]
- .large[Each column represents a development year, i.e. how many years has it been since the respective accident year.]

---

# Development factors

Tradtionally, we calculate age-to-age factors, or link ratios, to take the amounts from one development period to the next...

```
## # A tibble: 8 x 9
## # Groups:   accident_year [8]
##   accident_year `1-2` `2-3` `3-4` `4-5` `5-6`  `6-7`  `7-8` `8-9`
##           <int> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>
## 1          1988  2.50  1.29  1.32  1.08  1     1      0.998     1
## 2          1989  1.87  1.35  1.09  1.07  1.07  0.991  1.01     NA
## 3          1990  2.40  1.41  1.18  1.06  1.02  1.00  NA        NA
## 4          1991  2.54  1.31  1.10  1.04  1.04 NA     NA        NA
## 5          1992  2.12  1.40  1.12  1.05 NA    NA     NA        NA
## 6          1993  2.39  1.34  1.13 NA    NA    NA     NA        NA
## 7          1994  2.14  1.26 NA    NA    NA    NA     NA        NA
## 8          1995  1.94 NA    NA    NA    NA    NA     NA        NA
```

Then we'll judgmentally select (usually taking some sort of average) link ratios for each period, then assume that future years will develop similarly.

---
# State of the art

![](img/excel.png)

.Large[🤔]

---

# State of the art

`while (TRUE) {`

![](img/excel.png)

.Large[🤔]

`}`

---

<img src="img/chopper1.jpg" width="100%" />
]

---

# Now, let's turn this into a supervised learning problem

---

# Treating this as a predictive modeling problem

.large[
Each cell of the triangle corresponds to a row in our modeling dataset, and the response we're trying to predict is the *incremental* paid loss at that cell.

We just need to come up with some predictors
]

```
## # A tibble: 8 x 4
##   accident_year development_lag cumulative_paid_loss predictors 
##           <int>           <int>                <dbl> <chr>      
## 1          1988               1                  133 ?!?!?!?!?!?
## 2          1989               1                  934 ?!?!?!?!?!?
## 3          1990               1                 2030 ?!?!?!?!?!?
## 4          1991               1                 4537 ?!?!?!?!?!?
## 5          1992               1                 7564 ?!?!?!?!?!?
## 6          1993               1                 8343 ?!?!?!?!?!?
## 7          1994               1                12565 ?!?!?!?!?!?
## 8          1995               1                13437 ?!?!?!?!?!?
```

```r
fancy_AI_algorithm(incremental_paid_loss ~ predictors, data = data)
```

---

# Response & predictors

.large[
**Response**: 
- Incremental paid losses at each accident year
- Total claims outstanding/case reserves

**Predictors**:
- Time series of paid losses and case reserves along accident year
- Company (because we're using data from multiple companies simultaneously)
]

---

# Obligatory fancy neural net diagram

.Large[Note that we're predicting two quantities simultaneously, in one model. Also, we're combining different types of inputs: a couple time series and a categorical factor.]

---

# Embedding of categorical variables

.large[
The embedding layer maps each company code index to a fixed length vector. While the dimension stay the same, the actual values for each company are *learned* by the network during training, in order to optimize our objective.

For example, if the specified length is 5, company #2 might get mapped to `c(0.4, 1.2, -3.7, 3.3, 0.2)`.

We can think of this representation as a proxy for characteristics of the companies that are not captured by the time series data input, e.g. size of book, case reserving philosophy, etc.
]

---

# Some results

Sample results from the company with the most data in the dataset...

---

# Benchmarking

Performance vs. existing techniques.

---

# Future work?

- Claims level analytics, where we can take into account things like adjusters' notes and images.

- Policy level analytics, towards a holistic approach to pricing + reserving.
]

---

# Life insurance example: shock lapse modeling

---

## This one is simple...

## Buy life insurance

## You kick bucket, you get paid

---

## This one is simple...

## Buy life insurance

## You kick bucket, ~~you~~ your beneficiaries get paid

---

# Term life

- At the end of the 20 years, you can either lapse (stop paying premiums) or continue to get coverage while paying a higher rate.

- For risk and financial management purposes, insurers would like to be able to *predict* which policyholders are more likely to lapse.
]

---

# Modeling setup

Here are our predictors and responses

```r
predictors
```

```
## [1] "gender"                       "issue_age"                   
## [3] "face_amount"                  "post_level_premium_structure"
## [5] "premium_jump_ratio"           "risk_class"                  
## [7] "premium_mode"
```

```r
responses
```

```
## [1] "lapse_count_rate"  "lapse_amount_rate"
```

---

## Let's say we've built a neural network model, how do we *explain* it?

## Picture time!

---

# Model performance

Old school actual vs. predicted charts still work.

---

# Model performance

Comparing multiple models.

---

# Variable importances

Seeing which predictors are the most important.

---

# Prediction breakdown

So how'd ya come up with that prediction?

(Prediction breakdown for a single observation.)

---

# Discussion

.large[
- Neural networks are a technically variable alternative to traditional modeling methods.
- There will be challenges in gaining adoption in regulated industries, although initial reception has been (net 😉) positive, at least in insurance.
- What will you do with deep learning?

Link to this talk: [kevinykuo.com/talk/2018/09/tdsc](http://kevinykuo.com/talk/2018/09/tdsc/)

For more info on DeepTriangle, see:

- Paper: [arXiv:1804.09253](https://arxiv.org/abs/1804.09253)

- Code: [github.com/kevinykuo/deeptriangle](https://github.com/kevinykuo/deeptriangle)

For ML interpretability:

- R package [github.com/pbiecek/DALEX](https://github.com/pbiecek/DALEX) and associated paper [arXiv:1806.08915](https://arxiv.org/abs/1806.08915)
]