class: center, middle, inverse, title-slide # Applying and Interpreting Deep Learning ## Two Stories from Insurance
kevinykuo.com/talk/2018/09/tdsc
### Kevin Kuo
@kevinykuo
### September 2018 --- class: center, middle # Why should we care about deep learning? --- class: center, middle # Why should we care about deep learning *at work*? --- class: center, middle .Large[If your day-to-day is writing software for autonomous driving or playing video games, the answer is obvious.] -- .Large[But what about for the rest of us??] --- class: center, middle <!-- --> --- # The rest of us - .large[Most data scientists at most places deal with structured tabular data most of the time.] -- - .large[Traditional ML techniques like random forest/xgboost will usually beat neural nets with 1/10 of the tuning/training effort.] -- - .large[Neural nets are a tougher sell in settings where transparency is more important (think regulated industries like financial services.)] -- .large[But...] - .large[There are use cases, even for traditional predictive modeling problems, where deep learning techniques can be useful.] - .large[Also, interpretability for neural networks (and ML in general) is an active area that has seen a lot of development recently.] --- class: inverse, center, middle # Deep learning in actuarial science --- class: inverse, center, middle # **Deep learning** in *actuarial science*?!?! 🤔 --- # Some context From Casualty Actuarial Society [press release](https://www.casact.org/press/index.cfm?fa=viewArticle&articleID=2831): <img src="img/cas_title.png" width="75%" /> > Survey participants were also asked to identify top statistical concepts that predictive modelers should understand. Generalized linear modeling (GLM), a **cutting-edge mathematical tool** on which modern ratemaking depends, was the top answer. (bold mine) --- class: center, middle <img src="img/glm_abstract.png" width="55%" /> -- ```r # Number of years a concept remains cutting-edge for actuaries 2015 - 1972 ``` ``` ## [1] 43 ``` --- class: inverse, center, middle # Insurance claims --- # Fender bender! What happens in the life of a claim? .pull-left[
] .pull-right[ - Oops! <br /> - Call in claim to agent or file via app. Claims adjuster set case reserves (approx. how much she expects the claim to cost eventually). - Damages paid for <br /><br /> - Move on with life ] --- # Workers' comp What happens in the life of a (long-tailed) claim? .pull-left[
] .pull-right[ - Exposure to asbestos at shipyard <br /> - Mesothelioma diagnosis (this could be decades later), claim filed. <br /> - Claim settled out of court after months (or more) of litigation <br /> - Move on with life ] --- # The loss reserving exercise <br /><br /> .full-width[.content-box-blue[.Large[Basically, figure out what we gotta pay in the future due to claims.]]] --- # Run-off triangles Here's an example of a cumulative paid loss triangle: ``` ## # A tibble: 9 x 10 ## accident_year `1` `2` `3` `4` `5` `6` `7` `8` `9` ## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1988 133 333 431 570 615 615 615 614 614 ## 2 1989 934 1746 2365 2579 2763 2966 2940 2978 NA ## 3 1990 2030 4864 6880 8087 8595 8743 8763 NA NA ## 4 1991 4537 11527 15123 16656 17321 18076 NA NA NA ## 5 1992 7564 16061 22465 25204 26517 NA NA NA NA ## 6 1993 8343 19900 26732 30079 NA NA NA NA NA ## 7 1994 12565 26922 33867 NA NA NA NA NA NA ## 8 1995 13437 26012 NA NA NA NA NA NA NA ## 9 1996 12604 NA NA NA NA NA NA NA NA ``` - .large[Each row represents an accident year, i.e. all the claims that occured that year.] - .large[Each column represents a development year, i.e. how many years has it been since the respective accident year.] --- # Development factors Tradtionally, we calculate age-to-age factors, or link ratios, to take the amounts from one development period to the next... ``` ## # A tibble: 8 x 9 ## # Groups: accident_year [8] ## accident_year `1-2` `2-3` `3-4` `4-5` `5-6` `6-7` `7-8` `8-9` ## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1988 2.50 1.29 1.32 1.08 1 1 0.998 1 ## 2 1989 1.87 1.35 1.09 1.07 1.07 0.991 1.01 NA ## 3 1990 2.40 1.41 1.18 1.06 1.02 1.00 NA NA ## 4 1991 2.54 1.31 1.10 1.04 1.04 NA NA NA ## 5 1992 2.12 1.40 1.12 1.05 NA NA NA NA ## 6 1993 2.39 1.34 1.13 NA NA NA NA NA ## 7 1994 2.14 1.26 NA NA NA NA NA NA ## 8 1995 1.94 NA NA NA NA NA NA NA ``` Then we'll judgmentally select (usually taking some sort of average) link ratios for each period, then assume that future years will develop similarly. --- # State of the art <!-- --> -- .Large[🤔] --- # State of the art `while (TRUE) {` <!-- --> .Large[🤔] `}` --- .pull-left[ The cynical view <img src="img/chopper1.jpg" width="100%" /> ] -- .pull-right[ <img src="img/chopper2a.jpg" width="75%" /> ] --- class: center, middle # Now, let's turn this into a supervised learning problem --- # Treating this as a predictive modeling problem .large[ Each cell of the triangle corresponds to a row in our modeling dataset, and the response we're trying to predict is the *incremental* paid loss at that cell. We just need to come up with some predictors ] ``` ## # A tibble: 8 x 4 ## accident_year development_lag cumulative_paid_loss predictors ## <int> <int> <dbl> <chr> ## 1 1988 1 133 ?!?!?!?!?!? ## 2 1989 1 934 ?!?!?!?!?!? ## 3 1990 1 2030 ?!?!?!?!?!? ## 4 1991 1 4537 ?!?!?!?!?!? ## 5 1992 1 7564 ?!?!?!?!?!? ## 6 1993 1 8343 ?!?!?!?!?!? ## 7 1994 1 12565 ?!?!?!?!?!? ## 8 1995 1 13437 ?!?!?!?!?!? ``` .large[Then we can do something like] ```r fancy_AI_algorithm(incremental_paid_loss ~ predictors, data = data) ``` --- # Response & predictors .large[ **Response**: - Incremental paid losses at each accident year - Total claims outstanding/case reserves **Predictors**: - Time series of paid losses and case reserves along accident year - Company (because we're using data from multiple companies simultaneously) ] --- # Obligatory fancy neural net diagram <img src="img/nn1.png" width="20%" style="display: block; margin: auto;" /> .Large[Note that we're predicting two quantities simultaneously, in one model. Also, we're combining different types of inputs: a couple time series and a categorical factor.] --- # Embedding of categorical variables .large[ The embedding layer maps each company code index to a fixed length vector. While the dimension stay the same, the actual values for each company are *learned* by the network during training, in order to optimize our objective. For example, if the specified length is 5, company #2 might get mapped to `c(0.4, 1.2, -3.7, 3.3, 0.2)`. We can think of this representation as a proxy for characteristics of the companies that are not captured by the time series data input, e.g. size of book, case reserving philosophy, etc. ] --- # Some results Sample results from the company with the most data in the dataset... <img src="img/ppauto-results.png" width="55%" style="display: block; margin: auto;" /> --- # Benchmarking Performance vs. existing techniques. <img src="img/comparison_table.png" width="70%" style="display: block; margin: auto;" /> --- # Future work? .large[ - Predictions intervals for reserve variability. - Claims level analytics, where we can take into account things like adjusters' notes and images. - Policy level analytics, towards a holistic approach to pricing + reserving. ] --- class: inverse, center, middle # Life insurance example: shock lapse modeling --- class: center, middle ## This one is simple... -- ## Buy life insurance -- ## You kick bucket, you get paid --- class: center, middle ## This one is simple... ## Buy life insurance ## You kick bucket, ~~you~~ your beneficiaries get paid --- # Term life .large[ - You buy coverage for a set number of years, e.g. 20, and pay a fixed premium every month. - At the end of the 20 years, you can either lapse (stop paying premiums) or continue to get coverage while paying a higher rate. - For risk and financial management purposes, insurers would like to be able to *predict* which policyholders are more likely to lapse. ] --- # Modeling setup Here are our predictors and responses ```r predictors ``` ``` ## [1] "gender" "issue_age" ## [3] "face_amount" "post_level_premium_structure" ## [5] "premium_jump_ratio" "risk_class" ## [7] "premium_mode" ``` ```r responses ``` ``` ## [1] "lapse_count_rate" "lapse_amount_rate" ``` --- class: center, middle ## Let's say we've built a neural network model, how do we *explain* it? -- ## Picture time! --- # Model performance Old school actual vs. predicted charts still work. <img src="img/actual_vs_predicted.png" width="70%" /> --- # Model performance Comparing multiple models. <img src="img/distribution_residuals.png" width="65%" /> --- # Variable importances Seeing which predictors are the most important. <img src="img/variable_importance.png" width="65%" /> --- # Prediction breakdown So how'd ya come up with that prediction? <img src="img/prediction_breakdown.png" width="75%" /> (Prediction breakdown for a single observation.) --- # Discussion .large[ - Neural networks are a technically variable alternative to traditional modeling methods. - There will be challenges in gaining adoption in regulated industries, although initial reception has been (net 😉) positive, at least in insurance. - What will you do with deep learning? Link to this talk: [kevinykuo.com/talk/2018/09/tdsc](http://kevinykuo.com/talk/2018/09/tdsc/) For more info on DeepTriangle, see: - Paper: [arXiv:1804.09253](https://arxiv.org/abs/1804.09253) - Code: [github.com/kevinykuo/deeptriangle](https://github.com/kevinykuo/deeptriangle) For ML interpretability: - R package [github.com/pbiecek/DALEX](https://github.com/pbiecek/DALEX) and associated paper [arXiv:1806.08915](https://arxiv.org/abs/1806.08915) ]