← All posts
·11 min read

The Average Patient Doesn't Exist: Why the Same Intervention Works Differently for Everyone

A drug that reduces blood pressure by 10 mmHg on average might lower yours by 20 and your colleague's by zero. The reasons are specific, measurable, and more common than most health advice acknowledges.

The same diet, the same lab, the same two weeks — completely different glucose responses

In 2015, a research team at the Weizmann Institute in Israel recruited 800 people and fed them identical meals for two weeks while continuously monitoring their blood glucose. Participants wore continuous glucose monitors; the researchers collected detailed dietary logs, gut microbiome samples, blood tests, and activity data.

The same meal — the same bread, the same banana, the same glucose load — produced wildly different glucose responses in different people. Some participants' glucose spiked dramatically after eating a banana and barely responded to sushi. For others, the response was exactly reversed. One participant's blood glucose spiked sharply after eating a cookie — and after eating white rice. Another participant's blood glucose barely moved with cookies but spiked with rice. The correlation between individuals' responses to the same food was close to zero.

The researchers built a personalized prediction model using gut microbiome composition that predicted individual glucose responses far better than standard carbohydrate counts. They then validated it in a separate group and used it to design personalized diets. People eating diets "predicted good" for their microbiome had significantly better glycemic outcomes than people eating diets predicted good for someone else's microbiome.

The study was not testing whether personalized nutrition works better than generic nutrition. It was demonstrating that the concept of a "glycemic index" — the idea that a given food raises glucose by a predictable amount — is wrong at the individual level. Population averages are real. Individual prediction from population averages is often not.

The mechanisms behind individual variation

This is not a theoretical problem. The mechanisms are well-characterized.

Pharmacogenomics. Genetic variants in drug-metabolizing enzymes — particularly the CYP450 family — determine how quickly your body processes medications and many nutritional compounds. CYP2C19 variants affect how quickly you metabolize clopidogrel (a blood thinner), proton pump inhibitors, and several antidepressants. Poor metabolizers of clopidogrel, who carry a variant in roughly 30% of the population, don't convert it to its active form efficiently — meaning the standard dose doesn't work as intended. Poor metabolizers of certain antidepressants accumulate drug concentrations two to five times higher than extensive metabolizers at the same dose. Caffeine metabolism is largely determined by CYP1A2 variants: fast metabolizers may get a different cardiovascular and performance response to caffeine than slow metabolizers. These are not edge cases. They are common variants with large effects.

Gut microbiome composition. The Weizmann Institute study above is the clearest demonstration, but the principle extends broadly. Your gut microbiome — roughly 100 trillion bacteria comprising thousands of species, with composition as unique as a fingerprint — influences nutrient absorption, bile acid metabolism, short-chain fatty acid production, immune signaling, and neurotransmitter synthesis. Omega-3 conversion rates from ALA to EPA/DHA vary significantly with microbiome composition. The therapeutic response to metformin (the most common diabetes drug) is partly mediated by gut bacteria. Polyphenol bioavailability — the extent to which you actually absorb and use compounds in berries, green tea, and red wine — varies dramatically with microbiome composition. Two people eating the same diet may extract meaningfully different nutritional value from it.

Chronotype. Your circadian phase — the timing of your sleep-wake cycle, body temperature rhythm, and hormonal patterns — is substantially genetically determined, with ~50% heritability. The standard morning-to-evening circadian range across the population is roughly six hours: the earliest chronotypes have a circadian nadir (lowest point) around 2–4 AM; the latest have it around 8–10 AM. This is not a preference. It is a biological reality with broad downstream effects. Exercise at the same clock time may occur at a completely different circadian phase for two people with different chronotypes. Cognitive performance tests administered at 9 AM may catch one person at their peak and another in the equivalent of the middle of the night. The finding that "morning exercise improves cardiovascular outcomes" may mean different things for different chronotypes, because the circadian phase at the time of exercise — not the clock time — is what drives many of the biological effects.

Sleep architecture. Even within the same sleep duration, individuals differ substantially in how much time they spend in slow-wave sleep (SWS), REM sleep, and light sleep — and what drives these differences is partly genetic, partly environmental, and substantially individual. SWS is the stage most associated with physical recovery and growth hormone release. REM is most associated with memory consolidation and emotional regulation. Two people sleeping the same duration may have radically different restorative quality based on their sleep architecture, which is invisible without polysomnography or an accurate wearable. An intervention that increases total sleep time does not necessarily increase SWS in proportion. An individual who responds to magnesium glycinate with a substantial increase in SWS may get a much larger benefit than one who doesn't — and neither you nor the clinical trial can know which category you fall into without measurement.

Hormonal baseline and sensitivity. Testosterone, cortisol, insulin, leptin, ghrelin — baseline levels and receptor sensitivity vary substantially across individuals. Resistance training increases testosterone acutely in most people, but the magnitude varies by roughly 300% across individuals. Cortisol response to the same stressor varies by a factor of 10 between the lowest and highest responders. This is not noise. It means interventions targeting hormonal pathways — any supplement, diet, or training approach that works through these mechanisms — will produce systematically different outcomes for different baseline physiologies.

Micronutrient status. Baseline deficiency determines response. Magnesium supplementation improves sleep quality most in people with low magnesium status; the effect is negligible in people with adequate status. Vitamin D supplementation improves mood and energy most in people with serum levels below 20 ng/mL; the effect in people with sufficient levels is inconsistent. Iron supplementation improves endurance performance most in people who are iron-deficient; in people with normal iron status, the same dose has no performance effect and may be mildly harmful. The population average effect of supplementation conflates two groups — deficient and non-deficient — that are having completely different biological responses.

What this looks like in practice

The gap between the average and the individual is routinely visible in clinical practice but rarely discussed explicitly.

Statins and LDL. The standard statin trial result — statins reduce LDL by 30–50% on average — is real. What is less often communicated is that approximately 5–10% of statin users experience myopathy (muscle pain and weakness), often severe enough to discontinue the drug. The genetic variant most associated with statin myopathy (SLCO1B1) can be identified with a gene test, but it is rarely ordered routinely. The 90% who tolerate statins well and the 10% who don't are receiving the same prescription based on the same average.

Antidepressants and response. Meta-analyses suggest that antidepressants produce a statistically significant reduction in depression scores versus placebo on average. What the average conceals: approximately one-third of patients respond fully to the first prescribed antidepressant, one-third respond partially, and one-third don't respond and must try another. There is currently no validated way to predict which category you will fall into without trying the drug. The current standard of care is sequential trial and error across different drugs and classes — an empirical n=1 approach by necessity.

Exercise training response. The HERITAGE Family Study, which randomized 742 adults to the same standardized endurance training program for 20 weeks, produced an extraordinary range of VO2max responses. The average improvement was about 400 mL/min. But the distribution ranged from −185 mL/min (some individuals got worse) to +1,100 mL/min. Roughly 26% of participants were classified as "low responders" who showed minimal improvement despite full compliance with training. The responders and non-responders were genetically different in ways that correlated substantially with training response. For a specific individual, the average VO2max improvement from an exercise program is nearly useless as a prediction of their personal response.

Dietary interventions. The average weight loss response to a low-carbohydrate versus low-fat diet is roughly equivalent over 12 months in most head-to-head trials. But the individual variation within each diet arm is enormous — some people lose 20+ kg on low-carb while others gain weight, and vice versa for low-fat. Christopher Gardner's DIETFITS trial (n=609) found that neither genotype nor insulin secretion status — the two most plausible predictors — reliably predicted which diet a given person would respond better to. The best predictor was early response: how you did in the first month predicted how you did over the year.

The implication that most health advice ignores

Most health advice is written as if the average effect applies to you. It doesn't — not because the research is wrong, but because you are not the average.

This is not a reason to ignore population research. It is a reason to treat it as a prior — a starting probability that updates your belief about whether an intervention is likely to work for you — rather than a prescription.

The population research on sleep and cognitive performance is strong. The average effect of going from 6 to 8 hours of sleep on cognitive performance is substantial. But some people function well on 6 hours (genuinely, not just tolerating the impairment) and some people function poorly on 8 without more. Without measuring your own cognitive performance across different sleep durations, you don't know which you are.

The same logic applies to nearly every behavioral intervention in the wellness space. The average effect of a post-lunch walk on afternoon alertness is positive. Whether it's positive for you, and whether it's as positive as a 20-minute nap, or a cold shower, or 200mg of caffeine — you cannot know from population data. The mechanisms are real. Your individual response is an empirical question.

This is why the model of "read a study, implement the finding, hope for the best" is systematically less useful than the model of "read the study, estimate the probability it applies to you, test it, measure your response, update." The first model extracts a recommendation from research designed to answer a different question. The second model uses research for what it's actually good for: narrowing the space of things worth testing.

How to account for individual variation in self-experiments

Measure the right outcomes for you. Population research often uses outcomes optimized for detection across large samples: standardized cognitive tests, laboratory biomarkers, validated questionnaires. These are chosen because they have known properties in populations, not because they're the most sensitive to your specific physiology. If you're testing caffeine timing, measuring your reaction time matters more than your subjective alertness rating if you're the kind of person who doesn't feel the cognitive impairment clearly. If you're testing sleep duration, measuring HRV the next morning may be more sensitive than a sleep quality rating.

Run enough trials to average out noise. Individual variation isn't only between people — it's within you over time. Your glucose response to the same food varies by 20–30% depending on sleep, stress, prior meals, exercise, and gut microbiome state. Your cognitive performance varies hour to hour and day to day. A single measurement is nearly useless. Ten measurements on each condition gives you something you can trust.

Use active controls, not baselines. The placebo effect is real in self-experiments too. If you know you're eating "healthy" or taking a supplement you believe in, you may feel better independent of the pharmacology. A crossover design — alternating between intervention and control conditions with randomized order — controls for time trends, seasonal effects, and expectation effects better than a simple before-after measurement.

Track the confounders that matter for you. Sleep, stress, alcohol, and exercise explain a large proportion of day-to-day variation in most outcome metrics. Controlling for these in your analysis — measuring and adjusting for them statistically — lets you see the intervention effect more clearly. If you had a terrible night of sleep on every intervention day and a good night on every control day, you'll see an apparent intervention effect that is actually a sleep effect. Logging three or four confounders consistently turns this from a guessing game into something you can actually adjust for.

Take your own n=1 data seriously. If a well-designed self-experiment shows that an intervention does nothing for you after 20+ trials, the population average effect is not a reason to conclude you're measuring wrong. You may simply be a non-responder. This is not failure. It is a result — and it's the result that is most useful for making decisions about your own behavior.

The average patient doesn't exist. The question was never whether caffeine improves cognitive performance on average. The question was whether it improves yours, at what dose, at what time, given your genetics, your microbiome, your sleep history, and your current stress load. That question has a specific answer. It just isn't in any study.