Data dredging, bias, or confounding
BMJ 2002; 325 doi: https://doi.org/10.1136/bmj.325.7378.1437 (Published 21 December 2002) Cite this as: BMJ 2002;325:1437
All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
Dear Editor,
Informing the public about controversies
The editorial by Smith GW and Ebrahim S, though written with seasonal
humour, calls for serious consideration of two issues addressed; validity
and public implications (1). First, data dredging, bias or confounding
grouped together is as old as epidemiology. This is engrained in what my
teacher James Lee at National University of Singapore called the grand
equation of truth. All observations are subject to errors. What we Observe
is equal to the Truth plus or minus errors; random, bias, and confounding,
O=T ± e. In addition, low risks are more difficult to resolve, (do
electric fields cause disease?); what makes epidemiological sense, does
not necessarily make good public policy, (high fertility reduces breast
cancer!), and public health action need not wait break through evidence,
(AIDS prevention preceded the discovery of the HIV) (2). Second,
controversies to the investigators are the engine of growth, leading to
the refining of methods to yield better studies that minimize but do not
eradicate errors. To the general public, they are causes of confusion and
disputes. The mechanisms to address the first issue are available but need
to be practiced, continuously updated and circulated as research methods,
guidelines, CONSORT, and in all learning (3,4). Some journals do well on
this aspect and others should be encouraged.
The second issue has not been adequately examined and overlaps
several areas that include research and public communication ethics (5).
Because, the latter issue can arise without the former, newsworthy
information to the public may be different and undesirable. Newscasters
avoid ambiguities, because the general public loves rationalization, yes
or no clear statements that are easy to understand and apply to self. So,
the mass media broadcasted that older adults tolerate more alcohol and
that breast self-examination is useless. Days are gone when the journals
and scientific advances were a preserve of the profession and so should
the response. What and who should communicate to the general public? The
author and the editor should give a take home message to the general
public. The present set up in BMJ for the caption: what is known and what
the study adds, is a good attempt and should be universally adopted by
authors and journals but should as well include a cautiously crafted
evidence-based message for the general public. Evidence is an appropriate
term both in science and general usage. In both, it has implied
probability.
No conflicting interest declared.
Anthony Lwegaba, Lecturer, UWI School of Clinical Medicine and
Research, QE Hosp., Barbados, W.I. Lwegaba@lycos.com.
1 Smith GW, Ebrahim S. Data dredging, bias or confounding. They can
all get into the BMJ and the Friday Papers. BMJ 2002; 325:1435-8
2.Savitz DP, Poole C, Miller WC. Reassessing the role of epidemiology in
public health. Am J Pub Health 1999; 89: 1158-61
3.Moher D, Schulz KF, Altman DG. The CONSORT statement: revised
recommendations for improving the quality of reports of parallel-group
randomised trials. Lancet 2001; 357; 1191-94
4.Moher D, Jones A, Lepage L. Use of the Consort statement and quality of
reports of randomized trials; a comparative before-and-after evaluation. J
Am Med Ass 2001; 285: 2006-7. PMID 11308436 [PubMed].
5.Nelkin D. Scientific journal and public disputes. Lancet 1998; 352 s2: 8
-12
Competing interests:
None declared
Competing interests: No competing interests
Epidemiology needs to be taken seriously
Dear Sir,
In their editorial "Data dredging, bias, or confounding" George Davey
Smith and Shah Ebrahim (1) describe a common problem in Epidemiology and
Public Health. Only too often even experienced epidemiologists do not
resist from using existing (large) data sets for further or new analysis’s
(often called “fishing expeditions”) which provide them with significant
associations.
Results of every epidemiological study can either show, what often is
assumed, a causal relation between exposure and outcome or it can only
show the effects of chance, bias or confounding. As Smith and Ebrahim
rightly state, the effect of chance (using p <_0.05 is="is" in="in" this="this" kind="kind" of="of" data="data" dredging="dredging" exercises="exercises" often="often" underestimated.="underestimated." following="following" their="their" suggestions="suggestions" for="for" the="the" use="use" more="more" stringent="stringent" significance="significance" levels="levels" problem="problem" might="might" be="be" reduced="reduced" or="or" controlled.="controlled." but="but" what="what" about="about" bias="bias" and="and" confounding="confounding" p="p"/> Studies used for these re-analysis studies were generally not
designed with the new study question in mind. This can be a problem
especially when controlling for possible confounding in the new analysis.
Experience shows, that in fact it is often very difficult to control for
all known confounders when a study is systematically planned and all
relevant variables known are included in the data collection. Using
existing data sets which were collected for a study with different aims
and objectives than for the new data dredging exercise does not allow to
include any previously not collected variable (e.g. a possible confounder)
in the study. This can open the door for effects of strong forms of
confounding which might even reverse the measured association (e.g.
Simpsons’ paradox) (2).
This can and will ultimately lead to the loss of trust in
epidemiology by the public. Therefore, epidemiological research needs to
be taken seriously by those who use the results for decision making, but
also by those who conduct and analyse the studies.
References:
1) Smith GD, Ebrahim S. Data dredging, bias, or confounding. BMJ 2002;
325(7378):1437-8
2) Reintjes R, de Boer A, van Pelt W, Mintjes-de Groot J. Simpson's
paradox: an example from hospital epidemiology. Epidemiology.
2000;11(1):81-3.
Competing interests:
None declared
Competing interests: No competing interests