October 3, 2025

AI Can Now Predict Disease and Death. The Real Challenge Is What We Do With It.

by Paras Karmacharya

Primary prevention has always felt just out of reach. We treat chronic disease well enough to extend life with it. We rarely delay its arrival.

Two new large-scale AI studies suggest that might change that.

Not tomorrow. Not by magic. But in a way that looks practical, testable, and scalable.

From Medicine 1.0 → 2.0 → 3.0

Medicine 1.0: pre-scientific, trial and error.
Medicine 2.0: microscopy, sanitation, antibiotics. We doubled lifespan by beating “fast death” like trauma and infection.
Medicine 3.0: extend healthspan by delaying or preventing “slow death” from chronic disease. Real prevention, not just earlier diagnosis.

Outside of vaccines and parts of cardiovascular medicine, we’ve barely cracked primary prevention. Cancer and neurodegeneration remain mostly unprevented. The cost of late treatment keeps rising. The missing piece has been who to reach, when, and with what—before disease becomes visible.

Recent advances in AI suggest we may finally be getting closer. In this post, I’ll walk through two studies that borrow the same underlying architecture as ChatGPT and other large language models.

Proof-of-Concept 1: Delphi-2M — Learning the “Language” of Health

Delphi-2M takes inspiration from the way ChatGPT works. Instead of predicting the next word in a sentence, Delphi reads a patient’s health history and predicts the next medical event—and when it will happen.

Each diagnosis, hospitalization, or prescription is treated like a “token” in a sentence. (A token is roughly 3/4 of a word in English). By training on the longitudinal records of 400,000 participants in the UK Biobank, and then validating in 1.9 million patients in Denmark, the model learned the “language” of health: the patterns that link one event to the next over time.

Results:

Predicted >1,000 ICD-10 conditions with AUC ~0.76 overall and ~0.70 across 10-year horizons.
Forecasted multiple diseases per individual along with the timing of onset.
Mortality prediction was especially accurate (AUC 0.97).
Accuracy improved when biomarkers or polygenic risk scores were layered in.
Performance held up across very different populations (UK vs Denmark).

Limitations are real—data collection artifacts, cohort differences, code biases.

But the core idea is interesting: ordinary clinical data may already carry enough signal to forecast disease trajectories years in advance. And as multi-omic data (genetic, proteomic, metabolic, microbiome, environmental) are layered in, predictions may only get stronger.

Proof-of-Concept 2: CoMET — Generating Possible Futures From Routine EHR

CoMET, developed from Epic’s Cosmos dataset of over 118 million patients, takes a slightly different approach. Rather than predicting one disease at a time, it simulates many possible futures for each patient and then calculates the probabilities of different outcomes.

Think of it as running your health story forward on dozens of alternate timelines, then averaging the results to forecast what is most likely to occur.

Unlike traditional models that must be custom-built for each task, CoMET is zero-shot: the same model can handle incident disease, exacerbations, differential diagnosis, or health system utilization without retraining.

Results:

At treatment decision points, predicted 1–3 year risks of outcomes such as MI, CKD progression, or neuropathy—matching or exceeding supervised models.
Anticipated acute-on-chronic exacerbations (asthma, COPD, CHF, sickle cell, alcohol use disorder) more accurately than existing tools.
Identified incident diseases in the general population, even when prevalence was <1.5%, showing strong precision where false positives usually dominate.
Improved differential diagnosis over time, especially in rheumatology and hepatobiliary disease clusters.
Forecasted utilization outcomes like readmissions and length-of-stay better than standard approaches.

Like all large EHR models, CoMET inherits biases from its training data (Epic’s Cosmos) and currently lacks richer inputs such as imaging, notes, or genomics, which limits generalizability. Its performance also varies by task, meaning disease-specific models can still outperform it in certain settings.

Even with these caveats, it’s an important proof of concept: a generalist forecaster that can simulate plausible futures from routine EHR data, surfacing risks for disease, exacerbations, and healthcare utilization in parallel. It appears versatile, scalable, and—because of its zero-shot ability—potentially deployable without waiting for disease-specific models.

The Temptation That Will Break Us

As the architectures that govern the current set of large language models get better, so will the ability to predict disease.

And if these models expand the set of people who might be at risk, the easy response is to cast a wider net.

More labels. More “pre-disease.” More downstream testing. More worry.

That is the old reflex: sensitivity first at all costs.

It made sense for one-shot, catastrophic diseases where missing a case is unacceptable and confirmatory testing is constrained. It does not scale for chronic conditions where confirmation is cheap, treatment takes years, and system capacity is finite.

We need a new reflex.

A Quick Refresher: SPIN and SNOUT

SNOUT: SeNsitive tests rule OUT disease. Great when the risk of missing a life-threatening event is high and you only get one shot.
SPIN: SPecific tests rule IN disease. Great when you need to trust that a positive truly deserves action.

Traditional screening worshipped sensitivity for good reasons. But population-level AI case-finding has different constraints:

→ Millions of people. Always-on signals. Limited clinic slots. Real harms from cascades of false positives.

High specificity becomes the safety valve for the health system and, paradoxically, a protector of patients too.

Fortunately, we don’t have to look far for an example- the answer is already in our hands (or to be more precise in our wrists).

Case Study On Your Wrist: Specificity-First Done On Purpose

Apple’s new Hypertension Notification is not diagnosis, not monitoring in the conventional sense, and not a stand-alone screening program. It is ambient, specificity-first case-finding.

Sensitivity is modest by design. Many hypertensives won’t be flagged on any given pass.
Specificity is high so that a notification is likely to reflect true hypertension.
That design makes sense because:
1. Confirmation is cheap and safe. A blood-pressure cuff, not a biopsy.
2. Scale is massive. Even a small false-positive rate would swamp clinics.
3. Signals are continuous. The device watches over months, not a one-off test.
4. Disease is slow-burn. Missing a few weeks is unlikely to change outcomes, while overtreating false positives at scale absolutely will.

This might be the right pattern for many chronic conditions that are common, slow to harm, and easily confirmed.

But “cheap confirmation” still has limits. A million extra cuff checks, visits, and ABPMs can strain primary care capacity; any rollout needs hard resource caps and throughput modeling. And because silence is often misread as safety, ambient pipelines must pair the algorithm with default patient nudges and periodic re-checks to reduce false reassurance.

Ambient Case-Finding: A Better Category Name

Call this ambient case-finding rather than screening.

Ambient because it runs passively in the background.
Case-finding because it surfaces likely positives for cheap confirmation rather than labeling the whole population at once.
Specificity-first because the main risk is system overload and downstream iatrogenesis, not a missed hour-zero of disease.

Hypertension is the cleanest starter. Atrial fibrillation, sleep apnea, chronic kidney disease, anemia signals, metabolic risk from continuous or intermittent sensors, and AI re-reads of legacy EKGs and images may follow—but portability isn’t automatic. Where confirmation is costly or capacity-limited (e.g., sleep studies, nephrology work-ups), thresholds and pathways must be redesigned.

Suggestions For AI-Enabled Primary Prevention

If you lead a clinic, service line, or population health program, I suggest incorporating these 7 steps into your playbook:

Lead with specificity. Tune models so a positive means something. Save the system from floods of false alarms. Then add sensitivity over time as pathways mature.
Demand a confirmatory step that is cheap, fast, and safe. If the next step is expensive, invasive, or capacity-limited, your threshold is too low (or the condition isn’t a good fit for ambient case-finding yet).
Model the cascade before you launch. For every 1,000 notifications, estimate follow-up visits, confirmatory tests, downstream positives, and treatment starts. If your clinic cannot absorb that load, raise the threshold.
Set patient-facing expectations. Silence ≠ safety. Ship with clear messaging, automated nudges, and periodic re-checks to prevent false reassurance. Also, a ping means “worth checking now,” not “you have a disease.”
Measure what matters. Do not celebrate AUC in a dashboard while referrals spike and no outcomes improve. Track time-to-confirmation, treatment initiation, adherence, blood-pressure control, A1c change, hospitalization rates, and patient-reported anxiety.
Stage the rollout. Start with cohorts where the confirmatory pathway is mature, capacity exists, and benefit is clear. Expand only when throughput and outcomes are stable.
Continuously recalibrate and audit equity. Re-tune for temporal drift; monitor performance and access by race/skin tone, language, insurance, and device ownership; add non-wearable EHR routes where wearables under-serve.

The Way Forward…

Delphi-style sequence models and CoMET-style generative futures show that who, what, and when are learnable from routine data.

That seems to put primary prevention within reach.

But reach is not the same as readiness.

We should treat these as promising signals that now require prospective, workflow-integrated trials with clinical and system endpoints—not just retrospective AUC. If we deploy with sensitivity-first instincts, we’ll convert millions of healthy people into worried patients and jam the system. If we adopt specificity-first, ambient case-finding with cheap confirmation, equity guardrails, capacity modeling, and measured follow-through, we can nudge prevention from promise to practice.

Catch the right cases. At the right time. With a pathway that can actually deliver care.

That is Medicine 3.0 done with discipline.

Where in your own work do you see the greatest opportunity for AI-driven prevention—and the greatest risk of over-medicalization?

PROMPT OF THE WEEK

Professional Headshot

This one’s from google themselves.

Head over to Google AI studio. (Doesn’t work well with ChatGPT. )
Click Try Nano Banana

Attach your selfie or any other casual picture

Copy and paste this prompt and hit enter.

A professional, high-resolution, profile photo, maintaining the exact facial structure, identity, and key features of the person in the input image. The subject is framed from the chest up, with ample headroom and negative space above their head, ensuring the top of their head is not cropped. The person looks directly at the camera, and the subject's body is also directly facing the camera. They are styled for a professional photo studio shoot, wearing a smart casual blazer. The background is a solid '#141414' neutral studio. Shot from a high angle with bright and airy soft, diffused studio lighting, gently illuminating the face and creating a subtle catchlight in the eyes, conveying a sense of clarity. Captured on an 85mm f/1.8 lens with a shallow depth of field, exquisite focus on the eyes, and beautiful, soft bokeh. Observe crisp detail on the fabric texture of the blazer, individual strands of hair, and natural, realistic skin texture. The atmosphere exudes confidence, professionalism, and approachability. Clean and bright cinematic color grading with subtle warmth and balanced tones, ensuring a polished and contemporary feel.

Source: Google