What data is needed for length of stay prediction?

Kumo connects directly to your existing relational tables: PATIENTS, ADMISSIONS, PROCEDURES, LAB_RESULTS. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

3Regression · Length of Stay

Length of Stay Prediction

“How many days will this patient stay?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

How many days will this patient stay?

Inaccurate length-of-stay estimates cascade into bed shortages, surgical cancellations, and staff misalignment. A 500-bed hospital where average LOS is off by 1.5 days loses $4.2M annually in underutilized capacity and overtime staffing. Traditional severity scores (APR-DRG) ignore the temporal progression of labs and the network effects of shared care teams.

Quick answer

AI predicts inpatient length of stay by connecting admission details, lab result trajectories, procedure sequences, medication adjustments, and care-team patterns into a dynamic relational graph. Traditional APR-DRG severity scores assign a static expected LOS at admission that ignores how the patient actually progresses. Graph-based models update predictions in real time as labs trend and treatments change, reducing LOS prediction error by 1.5 days and saving $4.2M annually for a 500-bed hospital.

Approaches compared

4 ways to solve this problem

1. APR-DRG Severity Scores

All Patient Refined Diagnosis Related Groups assign a severity of illness and risk of mortality level at admission, each with an expected LOS. The billing and capacity-planning standard across US hospitals.

Best for

Initial bed-planning and case-mix index calculations. Required for Medicare reimbursement regardless of whether you use it for prediction.

Watch out for

Static score set at admission. Does not update as the patient progresses. A patient assigned 6-day expected LOS who develops a post-op complication on day 2 still shows 6 days in the system until a coder manually adjusts it.

2. Regression on Admission Features

Linear or tree-based regression trained on admission-level features: age, DRG, comorbidity count, admission type (elective vs. emergency). Provides a more nuanced estimate than APR-DRG by incorporating local data.

Best for

Improving on APR-DRG baselines with hospital-specific training data. Good for aggregate bed-planning over the next 7-14 days.

Watch out for

Still a point-in-time prediction at admission. Cannot incorporate the temporal trajectory of lab values, medication changes, or procedure sequences that unfold during the stay.

3. Time-Series Models on Vitals/Labs

Recurrent neural networks or transformer models that track vital signs and lab results over time to predict remaining LOS. Captures temporal trends that admission-only models miss.

Best for

Patients in step-down or observation where the clinical trajectory is the primary driver of discharge timing.

Watch out for

Operates on a single patient's temporal data without considering care-team patterns, staffing dynamics, or concurrent patient load on the unit. A patient ready for discharge on a Friday may stay until Monday if weekend discharge processes are slow.

4. Graph Neural Networks (Kumo's Approach)

Models the temporal trajectory of labs, procedures, and medications as a dynamic graph connected to care teams, units, and concurrent patient load. Learns how staffing patterns, bed assignments, and care-team dynamics affect individual LOS.

Best for

Real-time LOS updates that account for both clinical trajectory and operational factors. Predicts that a clinically ready patient on a short-staffed weekend unit will discharge 1.5 days later than the same patient on a fully staffed weekday.

Watch out for

Requires connected data across clinical (EHR) and operational (staffing, bed management) systems. Many hospitals have these in separate platforms.

Key metric: APR-DRG severity scores miss individual LOS by 1.5-2.5 days on average. Graph-based models reduce this error by 30-40%, saving $4.2M annually for a 500-bed hospital.

Why relational data changes the answer

Flat LOS models see each admission as a row: DRG code, age, comorbidity count. They predict that a 78-year-old male in DRG 190 (COPD) will stay 8.5 days based on the median for that code. But they cannot see that this patient's WBC is trending upward (rising +2.1 in first 24 hours, suggesting a complication), the pulmonology department has 0.6x weekend staffing (slowing discharge processing), and the patient has had 3 prior admissions in 6 months (each progressively longer). These signals come from lab-result tables, staffing schedules, and admission history. A flat model would need manual features for each, and they would still miss the interaction between rising labs and weekend staffing.

Relational learning connects the clinical and operational data into a single graph. The model walks from the admission to the patient's lab trajectory (trending worse or better), to the procedures performed and pending, to the care team on duty (staffing ratio, discharge-planning capacity), to the unit's current occupancy and discharge rate. It discovers that the combination of rising WBC + weekend admission + high comorbidity count + department staffing ratio of 0.6x predicts 11.4 days, not the 8.5-day DRG median. This real-time, context-aware prediction enables capacity managers to plan beds, staff, and surgical schedules with much higher confidence.

Predicting length of stay from admission data alone is like estimating a construction project timeline based only on the blueprint. You miss that the electricians are behind schedule, a material shipment is delayed, and the inspector is booked two weeks out. The actual timeline depends on the interactions between trades, supply chains, and external dependencies. Relational LOS models see the full construction site, not just the blueprint.

How KumoRFM solves this

Graph-learned clinical intelligence across your entire patient network

Kumo models the temporal trajectory of lab results, procedure sequences, and medication adjustments as a dynamic graph. It learns that patients with specific lab trend patterns (declining creatinine + stable WBC) under certain care teams discharge predictably faster. The relational structure captures how staffing patterns, bed assignment history, and concurrent patient load affect individual LOS.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

PATIENTS

patient_id	age	gender	comorbidity_count
P3001	65	F	3
P3002	78	M	5
P3003	42	F	1

ADMISSIONS

admission_id	patient_id	admit_date	department	drg_code
ADM01	P3001	2025-02-28	Cardiology	291
ADM02	P3002	2025-03-01	Pulmonology	190
ADM03	P3003	2025-03-02	Orthopedics	470

PROCEDURES

procedure_id	admission_id	cpt_code	performed_date
PR01	ADM01	33533	2025-02-28
PR02	ADM02	31624	2025-03-02
PR03	ADM03	27447	2025-03-02

LAB_RESULTS

lab_id	patient_id	test_name	value	collected_date
L001	P3001	Troponin	0.42	2025-02-28
L002	P3002	WBC	14.2	2025-03-01
L003	P3003	Hemoglobin	11.8	2025-03-02

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT COUNT(LAB_RESULTS.*, 0, 30, days)
FOR EACH ADMISSIONS.ADMISSION_ID
-- Regression target: days from admit to discharge

Prediction output

Every entity gets a score, updated continuously

ADMISSION_ID	PATIENT_ID	ADMIT_DATE	PREDICTED_LOS_DAYS
ADM01	P3001	2025-02-28	6.2
ADM02	P3002	2025-03-01	11.4
ADM03	P3003	2025-03-02	3.1

Understand why

Every prediction includes feature attributions — no black boxes

Admission ADM02 -- 78yo Male, Pulmonology

Predicted: 11.4 days predicted length of stay

Top contributing features

Comorbidity count

5 conditions

28% attribution

WBC trend (first 24h)

Rising (+2.1)

23% attribution

DRG historical median LOS

8.5 days

19% attribution

Prior admissions (last 6 months)

3 stays

16% attribution

Department weekend staffing ratio

0.6x

14% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about length of stay prediction

How does AI predict hospital length of stay?

AI predicts LOS by connecting admission details with real-time lab trajectories, procedure sequences, medication changes, care-team staffing, and unit occupancy data. Graph-based models update predictions as the patient's condition evolves, accounting for both clinical trajectory and operational factors like weekend staffing that affect discharge timing.

Why is length of stay prediction important for hospitals?

Accurate LOS prediction drives bed management, surgical scheduling, and staffing decisions. When predictions are off by 1.5 days on average, hospitals experience bed shortages (leading to surgical cancellations worth $50K+ each), ED boarding (patients waiting hours for inpatient beds), and staff misalignment. A 500-bed hospital saves $4.2M annually by improving LOS prediction accuracy.

What factors affect hospital length of stay?

Clinical factors include diagnosis severity, comorbidities, lab trajectories, and procedure complexity. Operational factors include staffing levels, weekend vs. weekday admission, care-team discharge practices, and concurrent patient load on the unit. The interaction between clinical and operational factors is often more predictive than either alone.

How accurate are APR-DRG expected length of stay estimates?

APR-DRG expected LOS estimates are accurate in aggregate (the average across many patients matches reality) but poor for individual patients. Individual-level error averages 1.5-2.5 days for medical admissions and 2-4 days for surgical admissions. Graph-based models reduce individual error by 30-40% by incorporating real-time clinical and operational data.

Bottom line: A 500-bed hospital improving LOS prediction accuracy by 1.5 days saves $4.2M annually through better bed utilization, fewer surgical cancellations, and optimized staffing. Kumo captures lab trajectories and care-team dynamics that severity scores ignore.

Related use cases

Explore more healthcare use cases

Use Case #1Readmission PredictionLearn more

Use Case #5Claims DenialLearn more

Use Case #6Patient DeteriorationLearn more

Previous#2 No-Show Prediction

Next#4 Clinical Trial Enrollment

Topics covered

length of stay predictionhospital LOS modelbed management AIinpatient stay forecastingdischarge planning MLgraph neural network hospitalKumoRFM length of staycapacity planning healthcarepatient flow optimization

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free