Wash-in and washout effects: mitigating bias in short term dietary and other trials
BMJ 2025; 389 doi: https://doi.org/10.1136/bmj-2024-082963 (Published 22 April 2025) Cite this as: BMJ 2025;389:e082963- David S Ludwig
, professor and senior consultant1234,
- Walter C Willett, professor4,
- Mary E Putt, professor5
- 1New Balance Foundation Obesity Prevention Center, Boston Children’s Hospital, Boston, MA 02115, USA
- 2Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- 3Steno Diabetes Center Copenhagen, Herlev, Denmark
- 4Department of Nutrition, Harvard TH Chan School of Public Health, Boston, MA, USA
- 5Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Correspondence to: D S Ludwig david.ludwig{at}childrens.harvard.edu
- Accepted 18 March 2025
Summary points
The effects of clinical interventions may take time to develop (wash-in) and dissipate (washout), threatening the validity of short term trials of chronic disease that use surrogate outcomes
Physiological adaptation to a major change in nutrients typically requires several weeks or longer, placing dietary trials at exceptionally high risk of bias
This problem, especially pertaining to crossover trials, has received extensive attention in the pharmacology literature, but it seems commonly disregarded in nutrition and some other areas of non-pharmaceutical research
The duration of wash-in and washout effects should be considered in the design of clinical trials
Diet related diseases, including obesity, diabetes, atherosclerosis, neurodegenerative conditions, and numerous cancers, cause immense economic losses and harm to humans. Despite considerable progress in understanding the health effects of specific dietary factors (eg, trans fatty acids, fiber, added sugar, micronutrients), answers to many basic clinical questions, such as whether a low carbohydrate or low fat diet is better for the management of diabetes, remain elusive. This knowledge gap is attributable in part to poor quality research.12345
The main limitations with animal research (uncertain translatability), observational research (confounding), and behavioral counseling trials (poor adherence) have received considerable attention. To overcome these challenges, nutrition researchers often turn to feeding studies in which participants receive prepared diets, typically in an inpatient setting. Feeding studies provide the opportunity to maintain rigorous control of dietary intake (minimizing non-adherence) and other environmental conditions. Owing to cost and complexity, however, these trials are typically of short duration (diet arms ≤2 weeks). For instance, the Nutrition for Precision Health initiative of the US National Institutes of Health (NIH) plans to enroll 1500 to 2000 community dwelling participants and 500 to 1000 participants in a residential setting for feeding trials with diet arms of about 14 days’ duration (request for applications RFA-RM-21-005, ClinicalTrials.gov NCT05701657). The short term nature of these trials raises concern for wash-in and washout effects.
Treatment wash-in
The purpose of a randomized clinical trial is to compare the effects of an intervention with a control (or other interventions) on outcomes of interest. Often in trials of chronic health conditions, these effects do not manifest immediately but instead require a prolonged wash-in period. Antidepressants, antidiabetes drugs, and statins require one to three months to achieve full effect; trials of shorter duration would underestimate their clinical effectiveness. For drugs such as glucagon-like peptide-1 receptor agonists, adverse effects predominate early, which could result in an unfavorable safety profile in short term trials. Intensive physical fitness training may first decrease exercise tolerance until transient responses (muscular soreness, fatigue) that mask long term effects subside.
Bias from wash-in effects often occurs in short term trials with surrogate measures for chronic conditions. For instance, initially after high dose calcium supplementation, calcium balance becomes positive. Extrapolation of this surrogate measure would imply protection against osteoporosis. However, calcium balance stabilizes after a few months of supplementation (ie, the intervention effect does not continue to accrue) and long term trials confirm little or no benefit for bone fracture.6 When uncertainty exists about the duration of wash-in, repeated measurements can be used to assess the stability of the outcome for each treatment group.
Treatment washout
Short term trials commonly use a crossover design because of enhanced statistical power derived from within participant comparisons and relative feasibility. However, this design compounds the challenge of separating intervention effects from transient influences. In the typical 2×2 crossover trial where participants receive two treatments in two periods, time is required not only for effects in the first period to wash in, but also for those effects to wash out before the second period. Problems arise when the washout period is insufficient or omitted entirely, potentially leading to carryover effects (fig 1).
Types of carryover effects in crossover trials. Panel A: No carryover effect; each treatment has the same absolute and relative effect in periods 1 and 2. Panels B and C: Non-differential (equal) carryover effect that might occur, for example, with learning (increased effect) or boredom (diminished effect). In these examples, the absolute effects of the treatments differ between periods 1 and 2, but the relative difference between them remains the same. Panels D and E: Differential carryover effect that might occur with biological persistence of, or withdrawal from, a drug in period 2. Here, the relative effect of the treatments appears to differ according to period. This form of carryover effect is inseparable from the treatment effect, invalidating the crossover design
Consider a theoretical crossover trial of a treatment for attention deficit/hyperactivity disorder comparing a drug with placebo for mental focus and with response speed on a computer app as the outcome. Participants might improve in the second period, having developed facility with the app in the first period. Alternatively, they might do worse in the second period because of boredom with the testing procedure. Here, systematic differences by period might occur, but the carryover effect is equal—the same regardless of which came first, drug or placebo—and the treatment effect can be estimated without bias (ceiling or floor effects notwithstanding) (fig 1, panels B and C).
A more concerning type of carryover effect is differential (unequal among interventions). If the biological actions of a drug administered in the first period persist into the second period (and the placebo has no lasting action), the effect of that drug will be underestimated. Differential carryover effects may also cause overestimation of effects if, for example, withdrawal from a drug administered in the first period produces adverse effects in the second period (fig 1, panels D and E).
Because differential carryover effects are aliased with, and inseparable from, treatment-by-period effects, no unbiased within participant comparisons can be made—as extensively reviewed in the statistics and pharmacology literature.7891011121314151617181920 Therefore, in the presence of a clinically relevant carryover effect, the crossover design is invalid and the second treatment period must be discarded.713141819 Furthermore, the between participant statistical tests for the presence of carryover effects have low statistical power, meaning that serious bias may be present even with a nominally negative test result.16 For this reason, statisticians have warned that crossover trials should “avoid the problems caused by differential carryover effects at all costs”21 and that the 2×2 design should only be used when investigators can make an informed determination that the washout period will be adequate.711152223 Alternatively, if carryover effects cannot be excluded and a crossover trial is needed for adequate power, a higher order design (ie, with >2 treatment sequences or time periods) can help mitigate bias.1924
Bias among interventions in the general medical literature
Conceptually, bias from wash-in and washout could apply to a variety of study designs, including observational studies, if an outcome is measured soon after change in an independent variable. However, these effects will be attenuated in long term cohorts, especially with use of repeated measures. Furthermore, a range of well recognized biases potentially affect observational research, whereas randomized clinical trials may be perceived as unbiased, with less critical appraisal of related methodological pitfalls.
To explore the pervasiveness of washout effects in the general medical literature, we searched PubMed for randomized clinical trials indicating a crossover design and found >1500 articles each year in the past decade. The supplemental table summarizes the most recent 75 randomized clinical trials published in 2024 focused on health outcomes, with variants of the term crossover in the title. Only a small proportion of these trials (∼15%) tested conventional pharmaceuticals. The rest involved diverse interventions (eg, behavioral treatments, physical therapies, procedures, diets and dietary supplements, and devices) for numerous health conditions (eg, obesity, diabetes, cancer, cardiovascular disease, kidney disease, cognitive function, exercise performance, and rare diseases).
At presumably low risk of bias from carryover effects are short interventions (eg, single session) with ostensibly transient biological actions (eg, physiotherapy method), conducted in a blinded fashion, and with relatively long washout periods. Interventions with potentially persistent effects on body structure (eg, body composition), functional state (eg, sleep deprivation, insulin resistance), or knowledge, and relatively short washout periods would seem to be at the high end of the risk spectrum. For numerous studies, the likelihood of bias seems hard to determine owing to limited knowledge about persistence of the intervention effect. Also of note, 13 trials justify their conclusions based on negative test results for carryover effects, despite the well known poor sensitivity of these post hoc tests and expert opinion against this practice.71115162223 (As these trials are predominately short term, they will also be at risk of wash-in effects, but the heterogeneity of interventions and health conditions preclude a systematic assessment of this issue.)
Physiological basis for bias in nutrition research
Wash-in and washout effects are of special concern in dietary trials. Macronutrient composition—the relative amounts of protein, carbohydrate, and fat consumed—affects the secretion of many hormones (eg, insulin, glucagon, glucagon-like peptide-1, gastric inhibitory polypeptide, ghrelin, growth hormone, insulin-like growth factor-1, cortisol, fibroblast growth factor-21, leptin, and adiponectin). These hormonal responses, in turn, affect myriad biological pathways relating to energy metabolism, growth, reproduction, inflammation, and apoptosis. Dietary composition may also influence these systems through the gut microbiome.
Not surprisingly, considering the profound nature of these biological actions, adaptation to a major change in diet may require several weeks to months before transient effects subside and long term effects can be reliably observed. For instance, serum ketones reach steady state levels two to three weeks after a very low carbohydrate diet is started.25262728 Until then, nitrogen balance tends to be negative25 as the brain transitions from use of glucose (produced in part by gluconeogenesis from amino acids) to ketones as the primary metabolic fuel. Conversely, after switching from a ketogenic diet to a high carbohydrate diet, progressive changes in glucose tolerance can be observed for at least one month.29
Although macronutrient adaptation presently lacks a formal definition, this prolonged multiorgan process can be assessed with numerous biomarkers, as highlighted in table 1 and reviewed elsewhere.3746 The preponderance of evidence indicates that metabolic adaptation cannot be considered complete after just a few days, contradicting a commonly stated rationale for use of diet arms of short term duration and omission of washout periods in macronutrient focused dietary trials. These data inform suggestions for minimum duration of interventions and washout periods in dietary trials (box 1).
Time course of macronutrient adaptation
Suggestions to improve causal inference in short term trials
The crossover design:
Should generally not be used without an adequate washout period, informed by reliable knowledge on the persistence of intervention effects, consistent with research conduct and reporting guidelines23*
If adequacy of the washout period is uncertain, and a crossover trial must be utilized, higher order designs (>2 intervention sequences or treatment periods) can help mitigate bias1924
A washout period may be omitted if the duration of intervention arms is long relative to the wash-in period and only end-of-intervention measures are used (ie, no baseline in the second period)
A statistician with expertise in clinical trials should help guide the choice of study design and analysis
All trials using surrogate measures for chronic conditions:
The length of interventions should exceed the wash-in period, as demonstrated by:
o Reliable existing evidence on the time required for the intervention effects to occur or
o Attainment of a stable treatment effect through use of repeated measures (see glycemia example in Prins et al30)
For outcomes that accrue over time (eg, change in bodyweight), the duration of the intervention must allow sufficient time after the wash-in for hypothesized effects to manifest, especially when transient and longer term changes are opposite in direction
Dietary macronutrient trials:
Considering extensive available information on the time course of adaptation to a major change in nutrients (table 1), dietary interventions and washout periods should be at least one or two months, and ideally longer, depending upon the outcomes assessed
To facilitate more reliable trials, logistically feasible alternatives to the inpatient research ward will be needed, which might involve provision of prepared meals on an outpatient basis or collaboration with residential institutions (eg, colleges48 or military facilities), with capacity to prepare and monitor consumption of controlled diets
Recognizing the conceptual and pragmatic limitations of all clinical trials of chronic diseases that develop over many years, cohort studies with repeated assessments and high follow-up rates will also be needed
*Studies specifically focused on sequence effects in which carryover effect has clinical relevance (such as the best order in which to conduct two procedures for sleep apnea49), comprise an exception to this rule.
Pervasive bias in nutrition research literature
Failure to account for this process of adaptation in trial design can produce grossly misleading findings. In a crossover feeding study lacking a washout period with dietary interventions of two weeks’ duration, ad libitum energy intake was 508 kcal/d (1 kcal=4.18 kJ=0.00418 MJ) greater in the ultra-processed diet group versus the unprocessed diet group, leading the investigators to conclude that, “Limiting consumption of ultra-processed foods may be an effective strategy for obesity prevention and treatment.”50 However, energy intake in the ultra-processed diet group decreased progressively by 25 kcal/d each day (P<0.001) throughout the two week period; no such trend was observed for the unprocessed diet group. With extrapolation, the difference between the two diets would cease to exist after another two weeks, providing clear evidence that the wash-in period was insufficient. (In a replication trial, the diet effect weakened after just one week.51) Energy density, the energy content per mass of food, likely accounts for much of this finding. Bell et al52 reported that a 31% increase in energy density increased energy intake by 424 kcal/d over two days—more than enough on a proportionate basis to explain the entire effect of the experimental ultra-processed diet,50 considering that energy density of the solid food was 85% greater. However, the acute effect of energy density on energy intake does not persist, as shown by the lack of meaningful differences in bodyweight or adiposity among long term trials, as previously reviewed.53
In another crossover feeding study with dietary interventions of two weeks and lacking a washout period, the difference in ad libitum energy intake between a low fat versus low carbohydrate diet (689 kcal/d favoring low fat) also attenuated strongly with time.54 Moreover, the apparent superiority of the low fat diet derived entirely from data in the invalid second treatment period, related to the presence of a massive carryover effect. The time course of change in the apparent diet effect graphically shows this fundamental problem.5556 In week 1, energy intake was similar between the diet groups. In week 2 (after a one week wash-in period of the first diet), a difference of several hundred kcal/d emerged favoring the low carbohydrate diet. In week 3, after the crossover, energy intake decreased abruptly in the low fat group versus low carbohydrate group, by ∼1800 kcal/d. Then, in week 4 (after a one week wash-in period of the second diet), this huge difference diminished by several hundred kcal/d. If the averaged effect (ie, the reported primary outcome of the trial) was valid, a low fat diet should produce rapid weight loss without conscious energy restriction, especially compared with a low carbohydrate diet. The dramatic increase in prevalence of obesity coinciding with public health recommendations to lower dietary fat in the late 20th century,5758 and numerous subsequent meta-analyses,596061626364 suggest otherwise.
Thus, feeding studies are highly susceptible to wash-in and washout effects because of the short duration of the interventions and the strong contrasts in dietary composition characteristically used. Specifically, the process of metabolic adaptation—such as from a very low fat to a very low carbohydrate diet—may greatly exceed the time frame of the trial. This problem can, however, affect any short term dietary trial, regardless of food preparatory method. (Although beyond the focus of this review, feeding studies can be affected by other methodological problems, as summarized in box 2.)
Potential methodological issues with feeding studies, in addition to wash-in and washout effects
Inconsistent definitions of diets (eg, diets labeled as low carbohydrate with carbohydrate content as much as 45% of total energy)*
Confounding by uncontrolled dietary factors, such as energy density, fatty acid composition, fiber content, and palatability*
Confounding by effects of the artificial environmental of a metabolic ward on biopsychological state (sleep, stress, mood, and eating behavior)
Unexamined effect modification by baseline metabolic status of participants (healthy lean versus overweight, insulin resistant)*
*Common to other dietary trial designs.
Indeed, low risk of bias seems to be the exception rather than the rule, with recently published dietary crossover trials (table 2). Among 40 such trials of diets with different macronutrient compositions published in the past five years, 24 showed high risk of bias with a diet intervention or washout period of <2 weeks or evidence of a carryover effect. Nine trials entirely omitted a washout period. Only six trials (15% of the total) were at relatively low risk for bias, with diet intervention and washout periods of ≥4 weeks.
Risk of bias among recent dietary crossover trials
The adaptative process after a change in macronutrients may produce opposing results according to time frame. In a meta-analysis, low fat diets increased energy expenditure slightly for trials of ≤2 weeks, whereas low carbohydrate diets increased energy expenditure more substantially in longer trials.31 In behavioral obesity treatment trials, calorie restricted and low fat diets produce weight loss for about six months, with weight regain characteristic thereafter—highlighting the inherent limitations of short term studies for diseases that develop over decades.58 In contrast with macronutrient quantity, some dietary factors (eg, qualitative macronutrient comparisons such as plant based versus animal based protein source and variation in micronutrient amounts above minimum requirements) can be anticipated to have less pervasive effects on metabolism; for these, short term mechanistically oriented feeding trials could carry less risk of bias.
Conclusions
The risk of bias in crossover trials is well recognized but seems widely disregarded in nutrition research and potentially a wide range of interventions across specialties of study. Of main concern, carryover effects resulting from an inadequate washout period invalidate the within participant comparisons inherent to the crossover trial and can yield grossly misleading findings. The problem of carryover pertains not only to the interpretation of individual studies but also to meta-analytic knowledge extraction, as “reporting of such a limitation is unlikely to be found given that it would invalidate the trial results.”23
Bias from wash-in effects, including trials with a parallel design, has received less attention. This problem is likely to arise in short term trials using surrogate measures for chronic disease, when attainment of a stable intervention effect takes time. Although wash-in effects do not violate a fundamental statistical principle in the manner of carryover effects, they can nevertheless yield misleading conclusions.
Short term dietary trials, and especially crossover feeding studies (eg, those in NIH’s Nutrition for Precision Health initiative), are highly susceptible to these methodological pitfalls. Transient adaptive processes after a major change in diet may inflate, deflate, or even reverse the true treatment effect in short term studies of risk factors for, or pathogenesis of, chronic disease. Because of their ability to maintain rigorous control of food intake and other factors, feeding trials seem to confer a level of rigor not possible with other study designs. This rigor is pointless if the biases considered here are not addressed. Other interventions may also be susceptible, to the extent that their effects take time to develop or dissipate. Attention to several suggestions (box 1) can mitigate these commonplace threats to causal inference and prevent the adverse impacts on patient care and public health that may ensue.
Acknowledgments
We thank Jørgen Rungby for critical review of the manuscript.
Footnotes
Contributors: DSL wrote the first draft of the manuscript with input from WCW and MEP. WCW and MEP revised the manuscript for critical intellectual content. All authors approved the final version. DSL acts as guarantor and attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding: No specific funding.
Competing interests. All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; DSL reported receiving royalties from books on nutrition and obesity. The other authors declared no other relationships or activities that could appear to have influenced the submitted work.
Dissemination to participants and related patient and public communities: The authors plan to disseminate the study through conference presentations, conventional media, and social media—including to interest holders involved in the design, conduct, or sponsorship of relevant trials.
Provenance and peer review: Not commissioned; externally peer reviewed.