Executive AI Dinner hosted by Kumo - Austin, April 8

Register here
2Regression · Risk Scoring

Underwriting Risk Assessment

What is the true risk for this applicant?

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

Catalina Logo

A real-world example

What is the true risk for this applicant?

Insurers lose $15-30B annually from adverse selection (underpricing risky applicants) and over-conservatism (rejecting profitable applicants), according to Deloitte. Traditional underwriting models use 15-30 rating factors from the application and third-party data, but miss relational signals: an applicant's neighborhood claims history, the correlation between their vehicle type and local theft rates, or the interaction between their occupation and commute pattern. A 5-point improvement in loss ratio on a $10B book translates to $500M in annual savings.

Quick answer

Graph-based AI improves underwriting risk assessment by connecting applicant profiles with property records, geographic risk factors, claims history, and neighborhood patterns. Traditional GLMs use 15-30 rating factors and miss relational signals like how a zip code's pipe-burst claims correlate with a specific building vintage. Relational models produce risk scores that are 30-40% more predictive, improving loss ratios by 5-10 points on a given book.

Approaches compared

4 ways to solve this problem

1. Generalized Linear Models (GLMs)

The actuarial standard for decades. GLMs use rating factors (age, credit, territory, vehicle type) with multiplicative factor structures. Well-understood by regulators and easy to file.

Best for

Regulatory compliance and rate filings where interpretability is required by state DOIs.

Watch out for

GLMs assume independence between rating factors. They miss interactions like 'older roof + wildfire zone + wood frame' that compound risk non-linearly. Adding interaction terms manually is slow and limited.

2. Gradient-Boosted Trees (XGBoost / LightGBM)

Tree-based models that capture non-linear interactions between rating factors. Used as a supplement to GLMs, often feeding into the GLM structure as an additional factor.

Best for

Capturing non-linear interactions between known rating factors. Good improvement over pure GLMs with modest engineering effort.

Watch out for

Still operates on flat feature vectors. Cannot see neighborhood-level patterns, property-record correlations, or claims-history network effects without heavy manual feature engineering.

3. Third-Party Enrichment Scores

Vendors like LexisNexis, Verisk, and TransUnion provide pre-built risk scores that incorporate credit, claims history, and property data. Drop-in scores that require minimal modeling work.

Best for

Quick lift on top of existing models when you need incremental accuracy without building new infrastructure.

Watch out for

Black-box scores that every competitor also buys. No differentiation, and you cannot customize them for your specific book mix or geographic footprint.

4. Relational Deep Learning (Kumo's Approach)

Connects applicant data with property records, geographic risk, neighborhood claims patterns, and historical outcomes in a single relational graph. Learns cross-table interactions automatically without manual feature engineering.

Best for

Finding hidden risk signals that span multiple data sources: zip-code weather trends + roof age + construction type + neighborhood claims density.

Watch out for

Regulatory acceptance varies by state. Some DOIs require rate-factor transparency that may need a GLM wrapper around the relational model's output.

Key metric: SAP benchmarked relational models at 91% accuracy vs. 75% for XGBoost on structured prediction tasks. A 5-point loss ratio improvement on a $10B book saves $500M annually.

Why relational data changes the answer

Traditional underwriting models flatten each applicant into a single row of 15-30 features. They can see that Sarah Mitchell is 42, has A-tier credit, and lives in 90210. But they cannot see that her specific zip code had a 3x spike in wildfire claims last year, her roof is 18 years old (past the typical replacement threshold for her construction type), and homes in her neighborhood with similar characteristics filed water-damage claims at 2x the regional average. These signals live in different tables: property records, geographic risk databases, and historical claims. A flat model needs a data scientist to manually engineer each cross-table feature, and they inevitably miss combinations they did not think to create.

Relational learning connects these tables directly. The model walks from applicant to property to zip code to historical claims, discovering risk patterns across the full data graph. It finds that the combination of aging roof + wildfire zone + wood-frame construction + rising neighborhood claims density creates a risk profile that is 2x what the individual factors suggest in isolation. This is not theoretical: SAP benchmarked relational models at 91% accuracy versus 75% for XGBoost on structured prediction tasks. The result is sharper risk segmentation that prices accurately for both high-risk applicants (rate up or decline) and low-risk applicants (write more at competitive rates).

Underwriting from a flat applicant table is like evaluating a house for purchase using only the listing sheet. You see bedrooms, square footage, and asking price. But you cannot see that the house next door just had a foundation crack, the neighborhood's water main is 60 years old, and the previous owner filed two undisclosed insurance claims. Relational underwriting is the equivalent of hiring a local inspector who knows the neighborhood's history, the builder's track record, and every claim on the block.

How KumoRFM solves this

Relational intelligence built for insurance data

Kumo connects applicant profiles, claims history, policy data, geographic risk factors, vehicle databases, and external data sources into a relational graph. The model discovers that Applicant APP-3301 lives in a zip code where pipe-burst claims spiked 3x last winter, has a roof older than 15 years (from property records), and owns a breed of dog associated with 2.5x liability claim frequency. These cross-table signals produce a risk score that is 30-40% more predictive than traditional generalized linear models (GLMs), catching both overpriced low-risk and underpriced high-risk applicants.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

1

Your data

The relational tables Kumo learns from

APPLICANTS

applicant_idnameageoccupationcredit_tierzip_code
APP-3301Sarah Mitchell42TeacherA90210
APP-3302David Kim35Software EngineerA+10001
APP-3303Robert Brown58ContractorB33101

PROPERTY_DATA

applicant_idproperty_typeyear_builtroof_agesqftreplacement_cost
APP-3301Single Family199818 years2,400$420,000
APP-3302Condo20196 years1,100$280,000
APP-3303Single Family198512 years3,200$510,000

ZIP_CODE_RISK

zip_codeweather_risktheft_indexclaims_per_1000trend
90210Wildfire: HighLow42Increasing
10001Flood: MediumHigh38Stable
33101Hurricane: HighMedium55Increasing

CLAIMS_HISTORY

applicant_idprior_claims_5yrtotal_paidlargest_claimclaim_type
APP-33011$8,200$8,200Water Damage
APP-33020$0$0N/A
APP-33033$45,000$22,000Wind Damage
2

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL
PREDICT SUM(CLAIMS.TOTAL_PAID, 0, 12, months)
FOR EACH APPLICANTS.APPLICANT_ID
3

Prediction output

Every entity gets a score, updated continuously

APPLICANT_IDNAMEKUMO_RISK_SCORETRADITIONAL_SCOREEXPECTED_LOSSRECOMMENDATION
APP-3301Sarah Mitchell0.720.35$14,200Rate Up 40%
APP-3302David Kim0.180.22$2,800Rate Down 15%
APP-3303Robert Brown0.850.68$28,500Decline or Restrict
4

Understand why

Every prediction includes feature attributions — no black boxes

Applicant APP-3301 (Sarah Mitchell)

Predicted: Risk score 0.72, expected loss $14,200

Top contributing features

Roof age exceeding replacement threshold

18 years

27% attribution

Zip code wildfire risk increasing

High, +15% YoY

24% attribution

Prior water damage claim

$8,200 in 5yr

20% attribution

Property age and construction type

1998 wood frame

17% attribution

Neighborhood claims density

42 per 1,000

12% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

Frequently asked questions

Common questions about underwriting risk assessment

How does AI improve insurance underwriting accuracy?

AI improves underwriting accuracy by incorporating signals from connected data sources that traditional rating factors miss. Instead of evaluating each applicant in isolation, graph-based models connect applicant profiles with property records, geographic risk trends, neighborhood claims patterns, and historical outcomes to produce risk scores that are 30-40% more predictive than GLMs alone.

What is the ROI of AI underwriting in insurance?

A 5-point improvement in loss ratio on a $10B book translates to $500M in annual savings. Additionally, more accurate risk segmentation allows insurers to write 8-12% more business at profitable rates by identifying low-risk applicants that traditional models over-price.

Can AI underwriting models satisfy state insurance regulators?

Yes, but implementation matters. Most state DOIs require rate-factor transparency. The practical approach is to use the relational model's output as an additional rating factor within a filed GLM structure. This gives you the predictive power of graph-based learning with the interpretability regulators require.

How does graph-based underwriting differ from traditional actuarial models?

Traditional actuarial models (GLMs) use multiplicative rating factors that assume independence between variables. Graph-based models learn non-linear interactions across connected data: property age interacting with neighborhood claims trends, construction type interacting with local contractor costs, and applicant behavior interacting with competitive market dynamics. These cross-table patterns are invisible to GLMs.

Bottom line: Improve loss ratios by 5-10 points while writing 8-12% more profitable business, translating to $500M+ in annual savings on a $10B book.

Topics covered

underwriting risk AIinsurance underwriting modelrisk assessment machine learningloss ratio improvementgraph neural network underwritingKumoRFMrelational deep learning insurancepredictive underwritinginsurance risk scoringactuarial AI model

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.