Healthcare analytics · Databricks · Random Forest

Predicting Medicare patient churn before it happens

ACOs lose attributed revenue and quality scores when patients go inactive, shift care out-of-network, or stop seeing their assigned PCP. By the time it shows up in reporting, the window to act has closed. This project scores patients monthly so care coordinators can reach out while the relationship is still recoverable.

View notebook See findings
0.979
AUC-ROC
98.5%
Accuracy
87.5%
Churn recall
73
Churns preventable

What separates churned patients from active ones

Across 1,000 synthetic Medicare patients with 10,000+ encounters, three signals predict churn reliably. The gap in visit recency alone is striking.

Churned patients
172
avg days since last visit
Active patients
43
avg days since last visit
PCP engagement
44.6%
churned patients, vs 52.3% active
New member risk
<6 mo
the highest-risk window, due to a missed onboarding gap

Churn is defined as: 6+ months inactive, OR 60%+ of recent care out-of-network, OR zero PCP visits in the last 6 months. The overall churn rate is 8% (80 of 1,000 patients). The patterns are consistent enough across the population to catch reliably.

Random Forest on 15 behavioral features

Feature engineering in PySpark on Databricks, modeling in scikit-learn. Recent encounter volume is the top predictor, accounting for nearly half the model's explanatory power.

recent_encounters
44%
tenure_years
9%
pcp_engagement_rate
7%
primary_care_ratio
6%
out_of_network_rate
5%
care_fragmentation
4%
distance_to_pcp_miles
3%
ROC curve · churn prediction
AUC 0.979 · Random Forest · 200 test patients
Model achieves AUC of 0.979.
Risk score distribution
Separation between active and churned patients
Active patients cluster near 0, churned near 1.0.

Three tiers, three responses

Score all ACO members monthly. The 74 high-risk patients account for nearly all preventable churn, so that's where care coordinator capacity goes first.

High risk

Immediate outreach: assign care coordinator, PCP visit within 2 weeks, address access barriers

74
98.6% churn rate
Medium risk

Proactive monitoring: preventive care reminders, PCP touchpoint within 60 days

7
71.4% churn rate
Low risk

Standard wellness programs, quarterly monitoring for escalation

911
0.2% churn rate
Patient segmentation by risk tier
Total patients vs. actual churned per tier
Risk tier distribution: most churn concentrated in high-risk tier.

End-to-end pipeline

Built to run on Databricks at scale, using the same architecture common in production ACO analytics environments.

01
Data generation
1,000 patients · 10,000+ encounters · 3 years
02
Feature engineering
PySpark · in-network rate · PCP engagement · fragmentation
03
Modeling
Random Forest · 100 estimators · stratified split
04
Risk scoring
Probability scores → 3 tiers → action plan
Python PySpark Databricks scikit-learn pandas matplotlib seaborn Random Forest

From model to production

Now: Score all patients, flag top 74, assign care coordinators, schedule wellness visits within 2 weeks.

Next quarter: Build a PCP onboarding program targeting new members (under 6 months), reduce out-of-network leakage with proactive referral management.

6 to 12 months: Automate monthly scoring pipeline, build an operational dashboard for care managers, A/B test intervention strategies, track ROI on prevented churns.