Nuancing the Interpretation of the Impact of Retracted Trials on the Evidence Ecosystem
Dear Editor,
We read with great interest the VITALITY Study I by Xu et al.[1] The authors provide a valuable large-scale empirical investigation quantifying the downstream cascade of retracted randomized controlled trials (RCTs) through systematic reviews (SRs) and into clinical practice guidelines. While acknowledging its significance, we wish to offer perspectives on the interpretation and broader implications of these findings.
First, evaluating evidence contamination can go beyond simply counting the number of affected SRs or guidelines, as not all outcomes within these documents are of equal clinical importance. The impact of retracted trials on critical outcomes, such as all-cause mortality or major morbidity, should arguably weigh more heavily in contamination assessments than effects on secondary or surrogate endpoints. We suggest that future evaluations of evidence contamination consider assigning importance-based weights to outcomes, allowing quantification of contamination not just by frequency, but by its potential to affect key clinical decisions.
Second, we advise caution in interpreting the observed propagation that one retracted trial reaches an average of 3 SRs and potentially 9 guidelines as a direct amplification of negative impact. This “contamination chain” is moderated by two key factors. At the SR level, although 1330 retracted trials were included in 847 SRs (4095 meta-analyses), the exclusion of these trials led to a change in effect direction or statistical significance in about 20.6% of cases. Thus, widespread inclusion of retracted trials does not equate to widespread distortion of synthesized evidence. Furthermore, at the guideline level, citation of a contaminated SR does not invariably mean the guideline's recommendations are distorted, given that guideline panels weigh evidence certainty, balance outcomes, as well as consider clinical context and patient values.[2] In many cases, a single flawed or low-certainty study may not meaningfully influence recommendations.
Third, although the study quantifies contamination within the evidence synthesis process, the downstream impact on clinical practice remains challenging to assess, owing to the well-documented evidence-to-practice gap.[3] Translating research findings through SRs and guidelines into tangible clinical behavior change (implementation or de-implementation of interventions) is complex and influenced by numerous factors beyond the evidence itself. While identifying contamination in guidelines is a critical first step, directly extrapolating this to widespread adverse clinical practice impact requires caution.
Finally, beyond the potential impact on guideline recommendations, the study highlights a potentially more insidious long-term impact on the research ecosystem itself (pathways 3 and 4 in Figure 3). Misleading findings can misdirect future research efforts and funding, perpetuating flawed lines of inquiry, potentially even creating a feedback loop if guidelines influenced by such research then shape future studies. This underscores the critical need for systemic solutions. Strengthening the integrity of primary research (e.g., trial registration, rigorous peer review, data sharing mandates), enhancing SR methodology (e.g., explicitly requiring and reporting checks for retracted publications, at least reducing the inclusion of already retracted papers during SR development), as well as developing rapid, effective mechanisms to flag retracted papers and alert authors of citing SRs/guidelines and the wider research community are paramount. This requires collaboration between journals, databases, researchers, guideline developers, and healthcare practitioners. Emerging technologies such as artificial intelligence[4] and the framework of living systematic reviews[5] may offer promising avenues to address the timeliness issue.
In summary, Xu et al.’s work makes an important contribution by quantifying the propagation of retracted trials and prompting further discussion. However, interpreting the true impact necessitates careful consideration of the complexities within evidence synthesis, guideline development, implementation science, and the research cycle itself. Continued systemic efforts are essential to enhance the integrity and resilience of our evidence ecosystem, and this study provides valuable data to guide such reforms.
Sincerely,
Na He
Department of Pharmacy, Peking University Third Hospital, Beijing, 100191, China
Drug Evaluation Center, Peking University Health Science Center, Beijing 100191, China
Suodi Zhai
Department of Pharmacy, Peking University Third Hospital, Beijing, 100191, China
Drug Evaluation Center, Peking University Health Science Center, Beijing 100191, China
Zhiling Zhang
Department of Pharmacy, Beijing Anzhen Hospital, Capital Medical University, Beijing, 100029, China.
References:
1. Xu C, Fan S, Tian Y, et al. Investigating the impact of trial retractions on the healthcare evidence ecosystem (VITALITY Study I): retrospective cohort study. BMJ. 2025;389:e082068. doi:10.1136/bmj-2024-082068
2. Guyatt G, Agoritsas T, Brignardello-Petersen R, et al. Core GRADE 1: overview of the Core GRADE approach. BMJ. 2025;389:e081903. doi:10.1136/bmj-2024-081903
3. Morris ZS, Wooding S, Grant J. The answer is 17 years, what is the question: understanding time lags in translational research. J R Soc Med. 2011;104(12):510-520. doi:10.1258/jrsm.2011.110180
4. Luo X, Chen F, Zhu D, et al. Potential Roles of Large Language Models in the Production of Systematic Reviews and Meta-Analyses. J Med Internet Res. 2024;26:e56780. doi:10.2196/56780
5. Elliott JH, Synnot A, Turner T, et al. Living systematic review: 1. Introduction-the why, what, when, and how. J Clin Epidemiol. 2017;91:23-30. doi:10.1016/j.jclinepi.2017.08.010
Competing interests:
No competing interests
27 April 2025
Na He
Clinical pharmacist
Suodi Zhai, Zhiling Zhang
Department of Pharmacy, Peking University Third Hospital
49 North Garden Rd.,Haidian District, Beijing, 100191, China
Rapid Response:
Nuancing the Interpretation of the Impact of Retracted Trials on the Evidence Ecosystem
Dear Editor,
We read with great interest the VITALITY Study I by Xu et al.[1] The authors provide a valuable large-scale empirical investigation quantifying the downstream cascade of retracted randomized controlled trials (RCTs) through systematic reviews (SRs) and into clinical practice guidelines. While acknowledging its significance, we wish to offer perspectives on the interpretation and broader implications of these findings.
First, evaluating evidence contamination can go beyond simply counting the number of affected SRs or guidelines, as not all outcomes within these documents are of equal clinical importance. The impact of retracted trials on critical outcomes, such as all-cause mortality or major morbidity, should arguably weigh more heavily in contamination assessments than effects on secondary or surrogate endpoints. We suggest that future evaluations of evidence contamination consider assigning importance-based weights to outcomes, allowing quantification of contamination not just by frequency, but by its potential to affect key clinical decisions.
Second, we advise caution in interpreting the observed propagation that one retracted trial reaches an average of 3 SRs and potentially 9 guidelines as a direct amplification of negative impact. This “contamination chain” is moderated by two key factors. At the SR level, although 1330 retracted trials were included in 847 SRs (4095 meta-analyses), the exclusion of these trials led to a change in effect direction or statistical significance in about 20.6% of cases. Thus, widespread inclusion of retracted trials does not equate to widespread distortion of synthesized evidence. Furthermore, at the guideline level, citation of a contaminated SR does not invariably mean the guideline's recommendations are distorted, given that guideline panels weigh evidence certainty, balance outcomes, as well as consider clinical context and patient values.[2] In many cases, a single flawed or low-certainty study may not meaningfully influence recommendations.
Third, although the study quantifies contamination within the evidence synthesis process, the downstream impact on clinical practice remains challenging to assess, owing to the well-documented evidence-to-practice gap.[3] Translating research findings through SRs and guidelines into tangible clinical behavior change (implementation or de-implementation of interventions) is complex and influenced by numerous factors beyond the evidence itself. While identifying contamination in guidelines is a critical first step, directly extrapolating this to widespread adverse clinical practice impact requires caution.
Finally, beyond the potential impact on guideline recommendations, the study highlights a potentially more insidious long-term impact on the research ecosystem itself (pathways 3 and 4 in Figure 3). Misleading findings can misdirect future research efforts and funding, perpetuating flawed lines of inquiry, potentially even creating a feedback loop if guidelines influenced by such research then shape future studies. This underscores the critical need for systemic solutions. Strengthening the integrity of primary research (e.g., trial registration, rigorous peer review, data sharing mandates), enhancing SR methodology (e.g., explicitly requiring and reporting checks for retracted publications, at least reducing the inclusion of already retracted papers during SR development), as well as developing rapid, effective mechanisms to flag retracted papers and alert authors of citing SRs/guidelines and the wider research community are paramount. This requires collaboration between journals, databases, researchers, guideline developers, and healthcare practitioners. Emerging technologies such as artificial intelligence[4] and the framework of living systematic reviews[5] may offer promising avenues to address the timeliness issue.
In summary, Xu et al.’s work makes an important contribution by quantifying the propagation of retracted trials and prompting further discussion. However, interpreting the true impact necessitates careful consideration of the complexities within evidence synthesis, guideline development, implementation science, and the research cycle itself. Continued systemic efforts are essential to enhance the integrity and resilience of our evidence ecosystem, and this study provides valuable data to guide such reforms.
Sincerely,
Na He
Department of Pharmacy, Peking University Third Hospital, Beijing, 100191, China
Drug Evaluation Center, Peking University Health Science Center, Beijing 100191, China
Suodi Zhai
Department of Pharmacy, Peking University Third Hospital, Beijing, 100191, China
Drug Evaluation Center, Peking University Health Science Center, Beijing 100191, China
Zhiling Zhang
Department of Pharmacy, Beijing Anzhen Hospital, Capital Medical University, Beijing, 100029, China.
References:
1. Xu C, Fan S, Tian Y, et al. Investigating the impact of trial retractions on the healthcare evidence ecosystem (VITALITY Study I): retrospective cohort study. BMJ. 2025;389:e082068. doi:10.1136/bmj-2024-082068
2. Guyatt G, Agoritsas T, Brignardello-Petersen R, et al. Core GRADE 1: overview of the Core GRADE approach. BMJ. 2025;389:e081903. doi:10.1136/bmj-2024-081903
3. Morris ZS, Wooding S, Grant J. The answer is 17 years, what is the question: understanding time lags in translational research. J R Soc Med. 2011;104(12):510-520. doi:10.1258/jrsm.2011.110180
4. Luo X, Chen F, Zhu D, et al. Potential Roles of Large Language Models in the Production of Systematic Reviews and Meta-Analyses. J Med Internet Res. 2024;26:e56780. doi:10.2196/56780
5. Elliott JH, Synnot A, Turner T, et al. Living systematic review: 1. Introduction-the why, what, when, and how. J Clin Epidemiol. 2017;91:23-30. doi:10.1016/j.jclinepi.2017.08.010
Competing interests: No competing interests