Statistics notes: Calculating correlation coefficients with repeated observations: Part 1—correlation within subjects
=======================================================================================================================

* J Martin Bland
* Douglas G Altman

In an earlier Statistics Note1 we commented on the analysis of paired data where there is more than one observation per subject, as shown in table I. We pointed out that it could be highly misleading to analyse such data by combining repeated observations from several subjects and then calculating the correlation coefficient as if the data were a simple sample. This note is a response to several letters about the appropriate analysis for such data.

View this table:
[TABLE I](https://www.bmj.com/content/310/6977/446/T1)

TABLE I 
Repeated measurements of intramural pH and PaCO2 for eight subjects2

The choice of analysis for the data in table I depends on the question we want to answer. If we want to know whether subjects with high values of intramural pH also tend to have high values of PaCO2 we are interested in whether the average pH for a subject is related to the subject's average PaCO2. We can use the correlation between the subject means, which we shall describe in a subsequent note. If we want to know whether an increase in pH within the individual was associated with an increase in PaCO2 we want to remove the differences between subjects and look only at changes within.

To look at variation within the subject we can use multiple regression. We make one of our variables, pH or PaCO2, the outcome variable and the other variable and the subject the predictor variables. Subject is treated as a categorical factor using dummy variables3 4 and so has seven degrees of freedom. We use the analysis of variance table3 4 for the regression (table II), which shows how the variability in pH can be partitioned into components due to different sources. This method is also known as analysis of covariance and is equivalent to fitting parallel lines through each subject's data (see figure). The residual sum of squares in table II represents the variation about these lines. We remove the variation due to subjects (and any other nuisance variables which might be present) and express the variation in pH due to PaCO2 as a proportion of what's left: (Sum of squares for PaCO2)/(Sum of squares for PaCO2 + residual sum of squares) The magnitude of the correlation coefficient within subjects is the square root of this proportion. For table II this is: (square root) 0.1153/0.1153+0.3337 = 0.51 The sign of the correlation coefficient is given by the sign of the regression coefficient for PaCO2. Here the regression slope is -0.108, so the correlation coefficient within subjects is -0.51. The P value is found either from the F test in the associated analysis of variance table, or from the t test for the regression slope. It doesn't matter which variable we regress on which; we get the same correlation coefficient and P value either way.

View this table:
[TABLE II](https://www.bmj.com/content/310/6977/446/T2)

TABLE II 
Analysis of variance for the data in table I

  
</br>![][1]</img>

pH against PaCO2 for eight subjects, with parallel lines fitted for each subject

If we incorrectly calculate the correlation coefficient ignoring the fact that we have 47 observations on only 8 subjects, we get -0.07, P=0.7. Hence the correct analysis within subjects reveals a relation which the incorrect analysis misses.

## References

1.  1.Bland JM, Altman DG.Correlation, regression, and repeated data.BMJ1994;308: 896.
    
    [FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjEyOiIzMDgvNjkzMy84OTYiO3M6NDoiYXRvbSI7czoyMjoiL2Jtai8zMTAvNjk3Ny80NDYuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

2.  2.Boyd O, Mackay CJ, Lamb G, Bland JM, Grounds RM, Bennett ED.Comparison of clinical information gained from routine blood-gas analysis and from gastric tonometry for intramural pH.Lancet1993;341:142–6.
    
    [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/0140-6736(93)90005-2&link_type=DOI) 
    
    [PubMed](https://www.bmj.com/lookup/external-ref?access_num=8093745&link_type=MED&atom=%2Fbmj%2F310%2F6977%2F446.atom) 
    
    [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=A1993KG62700005&link_type=ISI) 

3.  3.Altman DG.Practical statistics for medical research. London: Chapman and Hall,1991.
    
    

4.  4.Armitage P, Berry G.Statistical methods in medical research.3rd ed. Oxford: Blackwell,1994.

 [1]: /embed/graphic-1.gif