How should cost data in pragmatic randomised trials be analysed?

Simon G Thompson; Julie A Barber

doi:10.1136/bmj.320.7243.1197

Education And Debate

How should cost data in pragmatic randomised trials be analysed?

BMJ 2000; 320 doi: https://doi.org/10.1136/bmj.320.7243.1197 (Published 29 April 2000) Cite this as: BMJ 2000;320:1197

All rapid responses

Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.

From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.

The word limit for letters selected from posted responses remains 300 words.

t-Tests and Rank Sum Tests

To t-Test or Not

The article by Thompson and Barber (BMJ 2000;320:1197-1200) discusses
the relative merits of t-Tests and rank sum tests. They demonstrate that
the former strictly requires that the data are approximately normally
distributed but it is very robust. The latter requires that the
distributions of the data in the samples are similar (Conover 1980).
Otherwise the two methods are similar, the rank test being like a t-Test
on the ranks (Conover 1980). However, as the authors point out, when it
comes to using averages and their differences and the ranges of values and
differences that are supported by the data, the method that employs means
is clearly more useful in most situations.

It is good that these authors have brought to attention problems with
analysis of the sort of data people working in quality improvement in
hospitals have to deal with regularly. However, I think that the
discussion needs to go further if we are to learn what we can from our
data.

The first figure in their paper shows costs of two methods of
treatment. It is clear that the two distributions differ in shape and
therefore one might expect the rank test to be inappropriate. However, if
we look at the first group there seems to be a subgroup that does very
well, at least from a cost perspective, and another group that does not.
It may be more useful to try to determine the characteristics of these
subgroups than to perform significance tests. If there are in fact two
distributions in the first group, it may not make sense to compare them in
together, with another group.

A further aspect can be illustrated by two groups of length of stay
data from the same diagnosis related group; there were 151 patients in the
first group and 210 in the second. A simple chart (not shown) revealed
that the two distributions had similar shapes but slightly different
locations; as is usual with these data they had a marked positive skew. In
addition, like most length of stay data there were many tied values at
most lengths of stay.

The difference between the means of the two groups was 0.102 days
(95% CI -2.074 to 2.304) and the average Bca bootstrap 95% CI values were,
for 5 runs each of 1000,
-2.084 to 2.208. The t-Test was clearly not significant. However, the rank
sum test value was Z=2.164, P=0.03 (Bayes factor 1/10, see Goodman 1999),
suggesting that a difference may have existed. The difference between the
medians was one day.

Estimators for the rank sum test are available in CIA (Altman,
Machin, Bryant and Gardner 2000), and the 95% CI for both the binomial and
Wilcoxon methods was 0 to 2; this was also the bootstrap value. For these
data with many tied values, it may be better to think of them as being in
ordered categories. A suitable estimator for the rank sum test with
ordered categorical data is the RIDIT (Fleiss 1981). The mean RIDIT was
0.567 (95% CI 0.508 to 0.625) using group 2 as the reference group. This
means that a group 1 patient had odds of 0.567/(1-0.567) or 1.31 to 1 of
having a longer length of stay than a comparable group 2 patient (Fleiss
1981).

This is a quite modest difference and it may be of no practical
importance. However, if it had occurred that the group 2 data had come
from a hospital that went to extra lengths with its discharge planning and
the group 1 data from a hospital that did not, and that it was likely that
this modest difference also occurred with other high volume diagnosis
related groups, the view of its importance might change.

References

Thompson S and Barber J “How Should Cost Data in Pragmatic Randomised
Controls be Analysed” British Medical Journal 2000;320:1197-2000.

Conover W “Practical Nonparametric Statistics” Wiley New York 2nd edition
1980.

Goodman S “Toward Evidence-Based Medical Statistics 2: The Bayes Factor”
Annals of Internal Medicine 1999;130:1005-1013.

Altman D, Machin D, Bryant T, and Gardner M “Statistics with Confidence”
2nd edition British Medical Journal London 2000.

Fleiss J “Statistical Methods for Rates and Proportions” John Wiley and
Sons New York 1981.

Competing interests: No competing interests

23 August 2002

Anthony P Morton

Consultant, Infection Management Services

Princess Alexandra Hospital Brisbane 4102 Australia

Trimming cost data

Sir,
The article by Thompson and Barber addresses an important problem for the
costing of healthservice activity. It does however ignore a considerable
body of work undertaken around casemix methods, stimulated orignially by
the work of Fetter on Diagnosis Related Groups (DRGs) in the early 80's.
This casemix based work has used a trimming approach to exclude abnormally
long stay or costly patients and produce a less skewed population which
can be analysed with parametric methods and provide better comparisons.

Considerable debate and empirical work has been undertaken to find
suitable methods of trimming, these have included simple percentiles,
multiples of the standard deviation of the log distribution and multiples
of the interquartile range. The objective of this work has been to find a
technique which maximises the improvement of the remaining distribution,
whilst minimising the number of cases excluded from the analysis.

Whilst this creates a problem of how to handle the costs of the
outlier patients, it permits a better comparison of the average patients
and has therefore been widely used in the many applications of casemix for
funding and performance monitoring in the US and many other countries. The
National Schedule of Reference Costs published by the NHS Executive (
January 2000), provides costs for Healthcare Resource Groups ( the English
analogue of DRGs) , excluding the trimmed cases, and the excess ( outlier
cases) costs. Similarly, length of stay comparisons based on HRG analysis
of Hospital Episode Statistics ( as supplied by a number of benchmarking
services) also use trimmed length of stay data together with a comparison
of the proportion of episodes trimmed.

Competing interests: No competing interests

05 May 2000

Hugh Sanderson

Assistant medical Director (Clinical Governance)

Winchester and Eastleigh Healthcare Trust

Misleading arithmetics

Relying on the arithmetic mean to describe highly
skewed distributions seems counter-intuitive and
clearly misleading.

In effect, this assumes that the prevalence of factors
leading to high individual costs is fixed and/or that both
this prevalence and the estimate of the costs incurred
for these patients can be generalized. The variance of
the costs estimates will also be inflated by an unknown
factor, leading to increased Type II errors for
comparisons of alternative programs or interventions
such as reported by the authors.

If economic evaluation is intended to provide decision
makers with non biased information that can help them
assessing the costs and consequences of alternative
interventions, it might be more informative to report:

1) the actual distributions of the costs incurred for each
alternative

2) unbiased estimates of the central tendency and of
the dispersion of these distributions, i.e. medians and
quartiles or geometric means and confidence intervals
if the distribution are highly skewed, or yet arithmetic
means after excluding "outliers" . Non parametric or
bootstrap tests would also seem adequate for
assessing differences between these distributions.

3) estimates of the prevalence of outliers, and of the
costs incurred for these patients.

Sensitivity analysis methods could then (and should)
be used to assess the impact of variations in the
prevalence of "outliers" on the overall conclusions of
the analysis.

Competing interests: No competing interests

29 April 2000

Alain Fontaine

Medical Evaluation Unit

Hôpital Louis Mourier, Colombes, France

How should cost data in pragmatic randomised trials be analysed?

All rapid responses

t-Tests and Rank Sum Tests

Trimming cost data

Misleading arithmetics

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information

Search form

How should cost data in pragmatic randomised trials be analysed?

All rapid responses

t-Tests and Rank Sum Tests

Trimming cost data

Misleading arithmetics

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information