Bahareh Rasouli postdoctoral researcher, Jessica Chubak senior investigator and affiliate professor, James S Floyd associate professor, Bruce M Psaty professor, Matthew Nguyen data consultant, Rod L Walker collaborative biostatistician et al
Rasouli B, Chubak J, Floyd J S, Psaty B M, Nguyen M, Walker R L et al.
Combining high quality data with rigorous methods: emulation of a target trial using electronic health records and a nested case-control design
BMJ 2023; 383 :e072346
doi:10.1136/bmj-2022-072346
Re: Combining high quality data with rigorous methods: emulation of a target trial using electronic health records and a nested case-control design
Dear Editor
We enjoyed reading the paper by Rasouli et al. and agree with their suggestion that regression analysis using the treatment and covariates recorded just before the time an outcome occurred (i.e., the index date) can lead to bias when estimating the treatment effect in nested case–control (NCC) studies.[1] Rasouli et al. provided great insight into causal inference in NCC studies by using the target trial emulation framework.
However, from a statistical viewpoint, we wish to emphasize that caution must be taken when using their methodology. In this response, we focus on the situation in which the Cox model is fitted to cases and controls sampled by the risk-set sampling method (we term these “NCC samples”), which is commonly applied in NCC studies, to estimate the treatment effect.
First, to calculate the inverse probability (IP) of treatment or censoring weights, Rasouli et al. only fitted the regression models to the sampled control subjects. However, because the control subjects are not a random sample of a full cohort, we cannot consistently estimate the treatment or censoring probabilities using control subjects alone. Rather, the NCC samples should be treated as unequally sampled case–cohort data incorporating inverse probability of sampling (IPS) weights.[2] Subsequently, the IP weights can be consistently estimated from the pseudolikelihood estimations weighted by IPS. However, if the NCC samples do not include a sufficient number of subjects whose treatments change and whose follow-up is censored during the study period, it is difficult to solve the estimating equations to estimate IP weights using NCC samples only, due to separation. To overcome this problem, researchers can sample subjects whose treatment status changed and subjects whose follow-up was censored.[3] The IPS weight can be modified by incorporating the additional sampling probabilities. Using the NCC samples and additional samples, the IP weights can be consistently estimated from the modified IPS-weighted pseudolikelihood estimations. This method only requires assembly of the covariates for the NCC samples and the additional samples.
Second, Rasouli et al. suggested applying the robust variance estimator if each patient is allowed to enroll in multiple trials or when utilizing IP-weighted estimation. However, when using risk-set sampling, the robust estimator that is commonly implemented in statistical software (e.g., the phreg procedure in SAS or survival package in R) can substantially underestimate the covariance of Cox model estimates for NCC sampling data.[4] This is because the robust estimator does not consider the sharing of subjects between the sampled risk-set. Although Xiang and Langholz proposed a robust variance estimator that could solve this problem,[4] their estimator did not consider clustering by each subject. Accordingly, further research on the variance estimator of the parameter estimates is needed if researchers wish to apply a multiple-trial design for NCC studies.
Finally, Rasouli et al. explained that absolute risk measures (e.g., risk difference) could not be estimated in NCC studies. However, by considering the sampling process, the risk differences can be estimated using NCC sampling data, as for the full cohort data.[2 5] By adjusting the non-random treatment change and censoring using the IP-weighted Breslow estimator[ [5] and estimating hazard ratios using IP-weighted Cox models, researchers can estimate the counterfactual risks and, thereby, the marginal treatment effect in terms of risk differences.
The detailed procedure that underlies our suggestions is presented in our recently published article, in which we proposed methods to fit marginal structural Cox models to NCC sampling data to estimate the effect of time-varying treatments.[3] We believe that our suggestions provide further insights into emulating a target trial using the NCC design and will encourage the more appropriate use of this approach.
Sincerely
References
1. Rasouli B, Chubak J, Floyd JS, et al. Combining high quality data with rigorous methods: emulation of a target trial using electronic health records and a nested case-control design. BMJ 2023;383:e072346. doi: 10.1136/bmj-2022-072346
2. Samuelsen SO. A Pseudolikelihood Approach to Analysis of Nested Case-Control Studies. Biometrika 1997;84(2):379-94.
3. Takeuchi Y, Hagiwawa Y, Komukai S, et al. Estimation of the causal effects of time-varying treatments in nested case-control studies using marginal structural Cox models. Biometrics 2024;80(1) doi: 10.1093/biomtc/ujae005
4. Xiang AH, Langholz B. Robust Variance Estimation for Rate Ratio Parameter Estimates from Individually Matched Case-Control Data. Biometrika 2003;90(3):741-46.
5. Borgan O, Goldstein L, Langholz B. Methods for the Analysis of Sampled Cohort Data in the Cox Proportional Hazards Model. The Annals of Statistics 1995;23(5):1749-78.
Competing interests: No competing interests