Data, Big or Otherwise, in the Insurance Industry

Mick Cooney

25 February 2021

Who Am I?

Physics

Quant Finance

Statistics

Computer Programming

Nerd

The Last Six Years…


  • Lapse Modelling – Survival analysis
  • Geospatial Modelling – PostgreSQL / PostGIS / Clustering
  • Mortality Swaps – MonteCarlo simulation
  • Underwriting Fraud – Bayesian networks
  • Data Generation – Life/Motor/Captive/Employer/Corporate
  • Car Insurance Pricing – GLMs / MonteCarlo simulation
  • Email Analysis – NLP / LDA
  • Property Insurance Loss Costing – Stan / Boosting / Dataviz
  • Business Planning – Data engineering / MonteCarlo simulation
  • Portfolio Profitability Analysis – Visualisation / MonteCarlo simulation
  • Risk Evaluation of a Fixed-Income Portfolio – Data engineering

Big Data / Machine Learning / AI

Dirty Little Secret


You (Probably) Do Not Have a Big Data / AI Problem

Sizes of Data


Hadley Image

Hadley Wickham – Interview with a Data Scientist (Peadar Coyle)

Three Data Sizes


In-Memory


On-Disk


Multi-Disk

Three Problem Sizes


Sampleable Problems


Parallelisable Problems


Irreducible Problems

Sampleable Problems


\(\sim\) 90% of problems


Parallelisable Problems


\(\sim\) 9% of problems


Irreducible Problems


\(\sim\) 1% of problems

What Does Big Data Mean?

Various definitions


Generally bigger than one machine

Cynical Definition


“Big Data” is a marketing term used by salespeople to extract large sums of money from companies with a severe case of FOMO.

Circular Definition


A big data problem is one where big data solutions work best

Practical Definition


Software infrastructure to assist in the processing and use of large datasets

Data in Insurance

Most classic insurance problems are small data

Actuarial Problems


Pricing


Reserving


Capital Modelling

Loss-Costing DNF Business


5-10k Policies


SoV data can be large


Claim data poor quality

Estimating Claims Reserves


Estimate total claim amounts


Data sparse


Chain Ladder often ineffective

ICT Pricing


Exposure data aggregated


Partial view of claims (censored)

Cost of Guarantees


With-profit Funds


Quant finance


MonteCarlo simulation

Estimating Lapse Risk


Customer segmentation


Survival analysis

Other Problems

Less traditionally ‘actuarial’

Marketing Campaigns


Limited marketing budget


Optimise call list

Underwriting Fraud


Most data missing


Find fraudulent or misleading applications

Internal Operations


Communications data, emails


Identify bottlenecks


Improve customer turnaround

One Big Prediction

Things will not stay the same…

… and the industry will fight this tooth and nail!


(Two! Two big predictions)

Getting Started

Suggested Learning


  • SQL
  • Linear Algebra
  • Statistics
  • Computer Programming
  • Source Control
  • Linear Models
  • Tree-based Models

Some Books


R for Data Science - Wickham & Grolemund

https://r4ds.had.co.nz/


Introduction to Probability - Grinstead & Snell

https://math.dartmouth.edu/~prob/prob/prob.pdf

Suggested Activities


Where to Learn


Help everyone you can!

Thank You

Get in Touch


GitHub: http://www.github.com/kaybenleroll