When "Detecting Everything" Misses the Point
In the world of medical diagnostics, the included meme's wisdom captures a crucial yet often overlooked reality. We've all heard the hype about revolutionary tests – a single blood test to detect 50 different cancers, for example – that promises to catch illnesses early and save lives. It's the stuff of headlines and investor dreams. But looking under the hood of these promising diagnostics reveals a more complex story about what makes a medical test truly valuable.
To understand why a test that sounds amazing might not live up to the hype, we need to talk about three key concepts in diagnostics: sensitivity, specificity, and the often-overlooked positive predictive value (PPV). These three statistics tell a story much like characters in a film - sensitivity and specificity have long been the celebrated heroes in news articles and marketing brochures. PPV, on the other hand, is the underappreciated sidekick that actually determines whether our hero's actions do more good than harm.
Sensitivity & Specificity – The Dynamic Duo (with a Catch)
In the realm of medical tests, sensitivity and specificity have been the dynamic duo that every clinician and medical student learns about early in their training. Think of sensitivity as a smoke alarm's ability to detect any hint of fire - a highly sensitive test catches almost everyone who is sick, rarely missing cases. When a test has 95% sensitivity, it means catching 95 out of 100 patients who actually have the disease. It's like having a guard who never sleeps, always on watch for any sign of trouble.
Specificity, on the other hand, is like having a guard dog that only barks when there's a real intruder - it's about correctly identifying those who don't have the disease. A test with 99% specificity means correctly giving a clean bill of health to 99 out of 100 healthy people, avoiding false alarms.
Historically, the medical world has obsessed over these two metrics. The reasoning seems sound: we want tests that cast a wide net to catch every possible case (high sensitivity) while avoiding unnecessary scares for healthy folks (high specificity). The dream is to have both at 100% - never missing a sick person, never misidentifying a healthy one. But in reality, there's often a trade-off. Make a test super sensitive, and you might start flagging some healthy folks as positive. Make it super specific, and you risk missing some real cases.
But this dynamic duo, as impressive as they sound, only tells part of the story. While sensitivity and specificity are the celebrated heroes in medical literature and marketing materials, there's an underappreciated sidekick that actually determines whether these heroes are truly helping or potentially causing harm - the Positive Predictive Value (PPV).
Positive Predictive Value – The Unsung Hero (or Party Pooper)
PPV answers a deceptively simple question: 'When the alarm goes off, how often is there really a fire?' In medical terms, it tells us how much we should trust a positive test result. While sensitivity and specificity are like factory specifications of a test that don't change, PPV changes based on how common the condition is in the population that is being tested.
To understand why this matters, imagine a superhero detector that is 99% sensitive and 99% specific at identifying aliens living among us. This might sound impressive, but if only 1 in a million people is actually an alien, testing the entire population would yield dozens of 'positives' - yet almost all of them would be false alarms because true aliens are so scarce that even a tiny error rate creates more false positives than true alien detections.
Translate this to medicine: a test with 99% sensitivity and specificity might sound perfect on paper, but if the condition is extremely rare, most positive results could still be wrong. But, why has PPV been historically overlooked? One reason is that it's trickier to measure in early studies. Companies can easily showcase sensitivity and specificity in controlled trials, but PPV depends heavily on who you're testing.
This reality has profound implications. If we don't ask 'How many of those positives are real?', we might be impressed by a test's ability to catch cases while ignoring that it also flags too many healthy people. In short, while sensitivity tells us how good a test is at not missing illness, and specificity tells us how good it is at not false-alarming healthy folks, PPV tells us whether we should believe a positive result when we see one. And ultimately, for a patient and their doctor, that's what matters most.
Catch-22: When Finding Everything Finds Too Much
To see how prioritizing sensitivity and specificity over PPV can backfire, let's look at a high-profile real-world example: GRAIL's Galleri test, often hyped as the 'holy grail' of cancer screening. Backed by major players like Illumina, this blood test promises to detect 51.5% of cancers overall with a reported specificity of 99%. On paper, that means only ~1% false positives, which sounds terrific.
But here's where PPV creates the plot twist: cancer is relatively rare in the general screening population. In GRAIL's large study called PATHFINDER, the test was used on ~6,600 people aged 50+ who were at higher risk for cancer. The reality check? The test found cancers in about 1% of participants – or put another way, 99% of people tested didn't have cancer. Even with the test's impressive 99% specificity, this low prevalence created a sobering outcome: if you got a positive result, there was only about a 38% chance it was actually cancer. Think about that – a 2 in 3 chance of a false alarm.
This isn't to say GRAIL's test is worthless – it did find some dangerous cancers at treatable stages that might have been missed otherwise. In fact, it roughly doubled the number of cancers found compared to standard screening. But it's a classic example of the sensitivity trap: to catch those extra cancers, the test cast such a wide net that it also ensnared many healthy people in false scares.
Interestingly, context matters significantly. When the same test was used in the UK on people who already had cancer symptoms, the PPV jumped to around 75%. Why? Because cancer was much more common in that group. But for screening the general population, where most people are healthy, PPV becomes the Achilles' heel.
GRAIL isn't alone in facing the PPV challenge. Exact Sciences, known for its Cologuard at-home colon cancer test, recently entered the same arena with their own multi-cancer blood screening test. In 2024, they announced results that seemed to mirror GRAIL's: about 50.9% sensitivity at 98.5% specificity. The numbers sounded impressive, especially their reported higher sensitivity (~64%) for aggressive cancers like pancreatic and lung cancer.
Let's run a thought experiment to see how PPV changes the story: Imagine using this test on 10,000 average-risk people. If about 1% have an asymptomatic cancer (consistent with most studies), that's 100 people with cancer. The test, with its ~51% sensitivity, might catch about 51 of those cancers. Meanwhile, with a 1.5% false positive rate, about 148 of the 9,900 cancer-free people would wrongly test positive. Do the math, and you'll find that three out of four positive results are false alarms. Even if we doubled the cancer rate in the population, we'd still have more false positives than true positives.
No one wants a test that sends thousands of people for unnecessary colonoscopies, scans, or anxiety-inducing oncologist visits when only a fraction truly have cancer. To their credit, both GRAIL and Exact Sciences recognize this challenge and are working on improving specificity and focusing on higher-risk groups where PPV would naturally be better.
But their experiences underscore a crucial point: in the race to detect as many cancers as possible, we can't outrun the reality check of predictive value. A 'positive' result needs to actually mean something reliable for a test to be useful at scale.
The Alarm That Cried Wolf - Why PPV Problems Feel So Familiar
To understand why we're in this predicament, we need to recognize a crucial shift in medical testing. Historically, tests were primarily used to confirm suspicions in sick people - think testing for strep throat when you have a fever and sore throat. In that context, the traditional benchmarks for sensitivity and specificity made sense because the disease was already likely present.
But today's healthcare market is racing toward broad screening tests - trying to catch diseases in seemingly healthy people. Companies are marketing diagnostic tests originally designed for confirmation as screening tools for the general population. It's like taking a security system designed for a high-crime area and installing it in every home in a peaceful suburb - the false alarm rate that was acceptable in one context becomes problematic in another.
Let's consider the real impact of these false alarms. When Galleri gives a false positive, it's not just about anxiety or inconvenience. Each positive result typically leads to a cascade of follow-up tests: CT scans (radiation exposure), invasive biopsies, or other procedures that carry their own risks. In a cruel irony, a test meant to protect health might actually increase health risks for many people. If we run our earlier numbers again: for every 10,000 people tested, about 148 healthy people might undergo unnecessary procedures, while only 51 true cancers are found. This means nearly three people face unnecessary risks and procedures for every one person who genuinely benefits.
This reality highlights a critical need: we must evolve our standards for what makes a 'good' screening test. The benchmarks that worked for diagnostic confirmation simply aren't stringent enough for population screening. For a screening test to be truly valuable, both sensitivity and specificity need to be dramatically higher than our traditional standards - perhaps 99.9% or better - to achieve a PPV that makes the test more helpful than harmful.
I wish this understanding was intuitive to everyone - patients, investors, healthcare providers, and companies alike. Imagine if every news article about a breakthrough test asked not just 'How many cases can it catch?' but 'How many false alarms will it create?' Imagine if investors demanded PPV projections in real-world populations before getting excited about sensitivity numbers. Imagine if patients knew to ask 'If this test comes back positive, what are the chances it's real?'
Because in the end, this isn't just about statistics - it's about people. Every false positive is a real person facing real anxiety, real procedures, and real risks. Every unnecessary follow-up test costs not just money but peace of mind. As we race toward the future of medical testing, we need to ensure our enthusiasm for detecting everything doesn't lead us to miss the point: tests should help more than they harm.
Back to the Pop Culture Wisdom
"One does not simply" trust sensitivity and specificity alone when judging a test. Like a movie that seems amazing in trailers but disappoints in theaters, a test that dazzles with high sensitivity might not deliver in the real world if its PPV isn't up to par. When the next breakthrough test makes headlines, pause and ask the crucial question: "When this test says something is wrong, how often is it right?" It's not about dampening innovation or progress - it's about understanding what makes a test truly valuable in practice.
The path forward is clear, though challenging. Companies developing new tests need to demonstrate not just their ability to detect disease, but their reliability in real-world populations where diseases are rare. More importantly, they need to show that their tests do more good than harm - that the benefits of early detection outweigh the risks and anxiety of false positives.
For patients and healthcare providers, this means looking beyond the impressive-sounding percentages in marketing materials. For investors and companies, it means thinking harder about the actual utility of tests in their intended populations. And for all of us, it means understanding that in medicine, as in life, if something sounds too good to be true, it probably is.
After all, the goal in medical testing isn't just to detect signal from noise - it's to improve people's lives. The most valuable tests aren't necessarily those that detect everything, but those that give us reliable information we can act on with confidence. In the end, that's the real holy grail we should be seeking.