Is a subgroup effect believable? Updating criteria to evaluate the credibility of subgroup analyses
BMJ 2010; 340 doi: https://doi.org/10.1136/bmj.c117 (Published 30 March 2010) Cite this as: BMJ 2010;340:c117
All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
I am troubled by the suggestion by Sun et al that a subgroup effect
consistent with a pre-specified direction will increase the credibility of
a subgroup analysis, and that getting the direction wrong weakens the case
for a real underlying subgroup effect. 1
Freedman has suggested that the concept of clinical equipoise in the
conduct of a study requires that there is uncertainty, not necessarily on
the part of the individual investigator, but within the clinical
community. 2
Greenland argues that the design and analysis of studies may be
biased towards results desired by an investigator. Investigator bias may
arise from many sources and can be a major source of uncertainty about
study effects. 3
While it may be appropriate for an investigator to reveal a bias in
one direction or another, it should not be necessary to accurately predict
what is going to be discovered in any sub-group analysis. If a bias is
revealed or a prediction made, it is important that neither influence the
conduct of the study, nor the validity or credibility of the results.
1. Sun X, Briel M, Walter SD, Guyatt GH. Is a subgroup effect
believable? Updating criteria to evaluate the credibility of subgroup
analyses BMJ 2010; 340: c117
2. Freedman, B. Equipoise and the ethics of clinical research. N Engl
J Med. 1987; 317:141-145.
3. Greenland, S. Accounting for uncertainty about investigator bias:
disclosure is informative: How could disclosure of interests work better
in medicine, epidemiology and public health? J Epidemiol Community Health
2009; 63:593-598
Competing interests:
None declared
Competing interests: No competing interests
A good example of an unbelievable overall result but a believable
subgroup result is given in the latest Royal College of General
Practitioners oral contraception mortality study.1,2
Oral contraceptives are mostly used by young women. The subgroup
analysis for deaths under age 30 found 3 times more deaths in Ever pill
takers compared with Controls.
The widely publicised overall result of a mortality reduction 40
years after the start of the study has to be wrong. The main fault in the
study, apart from not recruiting only new takers, large losses and
switching controls to be takers, is failing to record HRT use in the last
10 years of the study when 3 out of 4 deaths were recorded. Both combined
HRT and combined oral contraceptives contain progestogens and oestrogens
which increase the main causes of death from cancers, vascular diseases
and mental illness.
1 Sun X, Briel M, Walter SD, Guyatt GH.
Is a subgroup effect believable? Updating criteria to evaluate the
credibility of subgroup analyses
BMJ 2010; 340: c117
2 Hannaford PC, Iversen L, Macfarlane TV, et al. Mortality among
contraceptive pill users: cohort evidence from Royal College of General
Practitioners' Oral Contraception Study. BMJ 2010; 340: c927
Competing interests:
None declared
Competing interests: No competing interests
We read with great interest Sun et al’s paper about evaluating the
credibility
of subgroup effects. We agree with the third point “is the significant
subgroup effect independent” although we would raise one concern
regarding the authors’ final suggestion that in a small sample size it is
acceptable to pre-specify a limited number of important interactions to be
considered.
In analyses of non-randomized interventions it is important to adjust
for all
potential confounders since otherwise the estimated intervention effect
may
be biased. In an RCT with randomized allocation to interventions there
should be no confounders and it is not typically necessary to adjust for
other
factors (although doing so may increase precision in the estimated
intervention effect). However in an RCT, an analysis concerning
differences in
intervention effects in a subgroup should initially use a full model
including
all interactions between all possible confounders and that subgroup,
because
otherwise the interaction of interest may be confounded. This is well
noted
by the authors in their smoking and fracture-type example.
The authors suggestion that it is acceptable to pre-specify a limited
number
of important interactions to be considered is perhaps analogous to, and as
valid as, only including a pre-defined subset of possible confounders in
an
analysis of observational data. It would be better to create a full model
and
use a standard model selection process, such as change in estimate or
model
fit, to create a parsimonious model. If the sample size is too small to
permit
detailed consideration of all possible confounders to a subgroup effect,
it
may be more appropriate not to attempt to identify subgroup effects.
In our own work we have found a simple rule of thumb to be helpful in
judging whether subgroup effects might be confounded. If the coefficient
for
the subgroup interaction of interest is very similar to the difference
between
the coefficients from subgroup specific (i.e. stratified) analysis, then
the
model may be adequate and the subgroup difference inferred credible.
Conversely, if there is a discrepancy it suggests that the model is mis-
specified, additional interaction terms are needed and the subgroup
difference inferred may not be credible. Sun et al’s paper provides a very
helpful way forward for assessing the credibility of subgroup analyses,
which
need not be confined to randomized control trials and could be further
improved by being both more rigorous and more pragmatic.
Competing interests:
None declared
Competing interests: No competing interests
Sun et al point up the distinction between the mathematical process
by which statistically-significant interactions are identified and the
interpretational process by which these interactions are placed in the
context of our wider understanding. As Sun et al make clear, the seeming
unambiguity implied by the mathematical process is not tenable:
interpretation must take into account a number of additional sometimes ill
-defined issues. Below, I suggest some supplementary comments to include
with their admirable overview:
(a) Choice of significance level is a very real issue (section 3).
The size of F in ANOVA, along with other statistical values such as t, is
heavily dependent on the size of the sample: these values are effectively
computed from standard errors rather than standard deviations, so the
bigger the sample the lower will tend to be the probabilities of the null
hypotheses and the greater will tend to be the number of significant Fs,
everything else being equal. The choice of significance level should
therefore also address this issue.
(b) The number of interaction Fs is dependent on the number of
independent variables. Interactions can become increasingly difficult to
interpret as the number of independent variables is increased. The
interpretation of the single interaction emanating from two independent
variables is not normally problematic. However, four independent variables
yield eleven interactions, in addition to four main effects:
interpretatively, a statistically-significant interaction concerning all
four independent variables may be virtually impenetrable.
(c) The reference to untoward associations regarding independent
variables, also in section 3, suggests that analysis-of-covariance
(ANCOVA) may be a useful alternative technique in some circumstances, such
as that described in the second paragraph. ANCOVA is capable of teasing
out the contributions of two associated independent variables, along with
assessing significant difference. Preferably independent variables should
be orthogonal, but real-world data may not conform to this ideal.
Competing interests:
None declared
Competing interests: No competing interests
Interesting paper - thank you for attempting to clarify subgroup
analyses and effect modification in general. I do feel however that some
of your suggestions are not entirely convincing.
1) You note, correctly, that the presence of an effect modification
depends on the model that is used for combining risks, i.e., additive or
multiplicative. But then you affirm that the multiplicative model is the
correct one, on the basis of an example that merely shows that an additive
interaction may correspond to the absence of a multiplicative interaction.
How does that make the multiplicative model better? In real life, results
fall often inbetween models - e.g. the risk of disease may be 10% in
female nonsmokers, 30% in female smokers, 20% in male nonsmokers, and 50%
in male smokers. So the effect of smoking is STRONGER in men on the
additive scale (+20% in women but +30% in men), but WEAKER on the
multiplicative scale (x3 in women but x2.5 in men). How can that be (or
rather: what can that mean in biological terms)? In fact the "effect
modification" is an artefact caused by the choice of the statistical
model. I would suggest that interpreting statistical interactions in
biological terms is hazardous unless the baseline risks are nearly
identical (e.g. the risks of disease in male and female nonsmokers). More
discussion of this problem in Rothman and Grenland's Modern epidemiology.
2) You submit that consistency (of the presence of an interaction)
across closely related outcomes lends credibility to the effect. That is
circular logic. If two outcomes are closely correlated within a sample,
their analysis MUST yield similar risk models. There is no corroborating
evidence in that observation.
Thank you for your thoughts on this, and apologies for taking up so
much space...
Competing interests:
None declared
Competing interests: No competing interests
Congratulations to the authors on an excellent and very useful
article. I wanted to make one minor point that could avoid some
misinterpretation. The first criterion is written as: "Is the subgroup
variable a characteristic measured at baseline or after randomisation?"
This could be taken to mean that both of these are acceptable ways of
specifying subgroups, when of course only the first is. The other
criteria are not written in this way, and only mention the acceptable
choice (or it is clear which is better, as with "within rather than
between studies"). It would be clearer to for the criterion simply to say
"Is the subgroup variable a characteristic measured at baseline."
Competing interests:
None declared
Competing interests: No competing interests
Rethinking the premises of subgroup analyses
In updating criteria for evaluating sub-group analyses, Sun et al.[1]
emphasize that subgroup effects must be evaluated in terms of relative
effects on each subgroup. In doing so, they explain that the same
relative reduction of different base rates would yield different absolute
reductions.
I will not defend measuring subgroup effects in terms of absolute
reductions. But a fundamental flaw in subgroup analyses to date is the
premise that it is somehow normal that a factor will cause equal
proportionate changes in different base rates and that, when that does not
occur, one has identified some meaningful difference in the way the factor
affects the groups with the different base rates. Such premise overlooks
the pattern, inherent in features of normal risk distributions, whereby
the rarer an outcome, the greater tends to be the relative difference in
experiencing it and the smaller tends to be the relative difference in
avoiding it.[2-7] A corollary to such pattern is one whereby a factor
that reduces an outcome will tend to reduce it to a larger proportionate
degree in the group with the lower base rate while increasing the opposite
outcome to a larger proportionate degree in the other group.[3,4,6] It is
only with a recognition of these patterns that one may begin to identify a
meaningful subgroup effect (i.e., one that is not a function of the
differing based rates and one that would not yield an opposite
interpretation as to comparative effect size if one examined the opposite
outcome).
Even apart from the above patterns, it is illogical to regard it as
somehow normal that a factor will cause equal proportionate changes to
different base rates, for the simple reason that it is mathematically
impossible for a factor to cause equal proportionate changes in different
base rates of an outcome while at the same time causing equal
proportionate changes in the opposite outcome. That is, for example, if
one group has a base rate of 5% and another has a base rate of 10%, a
factor that reduces both rates by 20% (to 4% and 8%) would necessarily
increase the rates of experiencing the opposite outcome by different
proportionate amounts (95% increased to 96%, a 1.1% increase; 90%
increased to 92%, a 2.2% increase). And since there is no more reason to
regard it as normal for there to be equal proportionate decreases in one
outcome than to regard it as normal for there to be equal proportionate
increases in the opposite outcome, there is no reason to regard it as
normal for there to be equal proportionate changes in either outcome.
In any case, I suggest that the only effective way to identify a true
subgroup effect is to derive from the base and treated rates for each
group the difference between the means of the hypothetical underlying
distributions, as discussed in reference 6 and the Subgroups Effects page
of the Scanlan’s Rule page of jpscanlan.com.[8]
References:
1. Sun X, Briel M. Walter SD, and Guyatt GH. Is as subgroup effect
believable? Updating criteria to evaluated the credibility of subgroup
analyses. BMJ 2010;340:850-854.
2. Scanlan JP. Can we actually measure health disparities? Chance
2006:19(2):47-51:
http://www.jpscanlan.com/images/Can_We_Actually_Measure_Health_Dispariti...
(Accessed 6 June 2010)
3. Scanlan JP. Race and mortality. Society 2000;37(2):19-35:
http://www.jpscanlan.com/images/Race_and_Mortality.pdf (Accessed 6 June
2010)
4. Scanlan JP. Divining difference. Chance 1994;7(4):38-9,48:
http://jpscanlan.com/images/Divining_Difference.pdf (Accessed 6 June 2010)
5. Scanlan JP. The Misinterpretation of Health Inequalities in the
United Kingdom, presented at the British Society for Populations Studies
Conference 2006, Southampton, England, Sept. 18-20, 2006:
http://www.jpscanlan.com/images/BSPS_2006_Complete_Paper.pdf (Accessed 6
June 2010)
6. Scanlan JP. Interpreting Differential Effects in Light of
Fundamental Statistical Tendencies, presented at 2009 Joint Statistical
Meetings of the American Statistical Association, International Biometric
Society, Institute for Mathematical Statistics, and Canadian Statistical
Society, Washington, DC, 1-6 Aug. 2009:
PowerPoint Presentation :
http://www.jpscanlan.com/images/Scanlan_JSM_2009.ppt (Accessed 6 June
2010)
Oral Presentation: http://www.jpscanlan.com/images/JSM_2009_ORAL.pdf
(Accessed 6 June 2010)
7. Scanlan JP. Measuring Health Inequalities by an Approach
Unaffected by the Overall Prevalence of the Outcomes at Issue, presented
at the Royal Statistical Society Conference 2009, Edinburgh, Scotland, 7-
11 Sept. 2009.
PowerPoint Presentation:
http://www.jpscanlan.com/images/Scanlan_RSS_2009_Presentation.ppt
(Accessed 6 June 2010)
8. Subgroup Effects page of Scanlan’s Rule page of jpscanlan.com
(Accessed 6 June 2010)
Competing interests:
None declared
Competing interests: No competing interests