Intended for healthcare professionals

Research Methods & Reporting

Combining high quality data with rigorous methods: emulation of a target trial using electronic health records and a nested case-control design

BMJ 2023; 383 doi: https://doi.org/10.1136/bmj-2022-072346 (Published 28 December 2023) Cite this as: BMJ 2023;383:e072346
  1. Bahareh Rasouli, postdoctoral researcher12,
  2. Jessica Chubak, senior investigator and affiliate professor34,
  3. James S Floyd, associate professor5,
  4. Bruce M Psaty, professor56,
  5. Matthew Nguyen, data consultant3,
  6. Rod L Walker, collaborative biostatistician3,
  7. Kerri L Wiggins, research scientist7,
  8. Roger W Logan, senior research scientist8,
  9. Goodarz Danaei, professor289
  1. 1Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
  2. 2Department of Global Health and Population, Harvard TH Chan School of Public Health, Boston, MA 02115, USA
  3. 3Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
  4. 4Department of Epidemiology, University of Washington, Seattle, WA, USA
  5. 5Cardiovascular Health Research Unit, Departments of Medicine and Epidemiology, University of Washington, Seattle, WA, USA
  6. 6Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
  7. 7Department of Medicine, University of Washington, Seattle, WA, USA
  8. 8Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
  9. 9CAUSALab, Harvard TH Chan School of Public Health, Boston, MA, USA
  1. Correspondence to: G Danaei gdanaei{at}hsph.harvard.edu
  • Accepted 27 October 2023

Emulating a target trial reduces the potential for bias in observational comparative effectiveness research. Owing to feasibility constraints, large cohort studies often use electronic health records without validating key variables or collecting additional data. A case-control design allows researchers to validate, supplement, or collect additional data on key measurements in a much smaller sample compared with the entire cohort. In this article, Rasouli and colleagues describe methods to emulate a target trial using a nested case-control design, and provide a detailed guideline, an analytical program, and results of a clinical example.

Summary points

  • Case-control studies are efficient designs for studies that require validation of key variables; data collection can occur on all cases and a sample of controls rather than on an entire cohort

  • Case-control studies are vulnerable to several biases, including prevalent user bias and inappropriate adjustment for covariates when treatment and confounders are measured at the date when cases are identified and controls are sampled (ie, the index date)

  • Emulating the design and analysis of a target randomized controlled trial can minimize some of these biases in comparative effectiveness case-control studies

  • The proposed approach combines the benefits of measure validation in nested case-control designs with the strengths of target trial emulation, reducing bias

Randomized controlled trials are considered the ideal study design for comparative effectiveness research. Given that such trials are usually costly, lengthy, and, in some instances, unethical or infeasible, interest is increasing in using observational studies such as those conducted using data from electronic health records and administrative datasets to inform clinical decision making.123 Analysis of observational data, however, requires careful consideration of possible biases, including confounding, selection bias,1456 and measurement error.237891011 Methods and tools for minimizing these biases are therefore essential.

Suppose researchers aim to use electronic health record data to study the effect of initiating a treatment (eg, statins) on risk of cardiovascular disease events. Such events identified in the electronic health record using international classification of disease (ICD) codes are likely to be misclassified compared with review of medical records12 and this may lead to substantial bias.2378910 Collecting additional data to validate measurements (in this example for outcomes) using the entire electronic health record cohort is often impractical because it requires substantial time and resources.913 This impracticality of gathering data on the complete cohort has led many researchers to conduct nested case-control studies to allocate their limited resources to gather high quality data on a subsample rather than on the entire cohort. However, common approaches to designing and analyzing case-control studies are prone to several biases.1456 Emulating a target trial can reduce the potential for these types of bias.3 In this approach, investigators first specify a clear causal question and develop a detailed protocol for a target randomized trial to answer that causal question. Then, they modify the protocol to accommodate the observational nature of the data.14

Most previous studies that emulated a target trial used a cohort design. We have already shown that emulating a target trial using a cohort design can provide estimates of treatment effects that are much more consistent with those observed in randomized controlled trials compared with estimates based on conventional methods without emulating a target trial.1516 Methods to emulate a target trial using a case-control design have only recently been developed,17 and a detailed description and analytical guideline on how to implement these methods has not been published. Although it may seem counterintuitive to conceptualize emulating a target trial using a nested case-control design, it is worth noting that nested case-control studies are just an efficient way of sampling from an underlying cohort, and cohort studies are meant to estimate the same underlying effect size that would have been observed in randomized controlled trials.

In this paper we discuss common biases in the design and analysis of case-control studies and how emulating a target trial may reduce these biases. Using a clinical example, we then describe the protocol of a target trial that we wish to emulate. For our clinical example we explain how a nested case-control design can be used to emulate the target trial, and we estimate the observational analog of the intention-to-treat and per protocol effects. Finally, we present the results of our clinical example. Supplemental file 1 provides a detailed guideline and an analytical code to implement the target trial emulation approach using a nested case-control design.

The methods we suggest can be applied in two ways: by reanalyzing a previously conducted case-control study, or by conducting a new case-control analysis to emulate a target trial. If, however, a previously conducted case-control study collected data on cases and controls only at the time of the event or sampling (known as the index date) or a few time points before that date, it may not be appropriate to use the proposed methods unless additional data across time can be obtained from the same electronic health record dataset. To implement the proposed methods, it is essential to have access to comprehensive data across time on eligibility, potential confounders, and treatment. In our clinical example, we reanalyzed a previous case-control study that had been linked to underlying electronic health record data.

Biased approaches in the design and analysis of case-control studies

Conventional case-control studies often evaluate the values of treatment and confounders at the event date for cases and the matched date for controls—that is, the index or reference date (see supplemental figure 1). This approach leads to prevalent user bias and bias due to inappropriate adjustment for covariates; both major types of bias.

Prevalent user, or differential survival, bias

In case-control studies, assessing treatment or exposure at the index date may lead to prevalent user bias. Current users have survived and continue taking treatment; if treatment affects the outcome or shares common causes with the outcome, current users are not comparable to non-users, resulting in bias.181920 This differential survival bias is more obvious when treatment has a short term effect, such as the harmful effects of postmenopausal hormone replacement therapy on myocardial infarction.21

Bias due to inappropriate adjustment for covariates

Adjusting for potential confounders measured at the index date may create bias if those variables are affected by past treatment.17 Such adjustment may either remove part of the effect of interest (if the covariate measured at the index date is a mediator) or lead to collider stratification bias (if the covariate shares a common cause with the outcome).

Case-control studies that to some extent measure exposure, covariates, and eligibility before the index date may have less bias.

Trial emulation using case-control design to reduce bias

In a randomized controlled trial, participants are assigned randomly to a treatment strategy at time zero—that is, when they meet eligibility criteria and follow-up starts. Successful emulation of a target trial requires a clear definition of time zero, here referred to as the enrollment date. Enrollment date is a point in (or short period of) time at which eligibility criteria are satisfied, treatment is assessed, and follow-up starts. Assigning enrollment dates allows for comparison of treatment initiators (incident users) with non-initiators at a point in time to prevent prevalent user bias, and for measuring confounders before the observed treatment to prevent bias due to adjustment for covariates affected by past treatment.

The analytical dataset can be created in two ways. In the simplest approach, the entire period represented in the data can be assigned as the enrollment period of a single trial. In this approach, each row of data corresponds to one person. Eligibility can be assessed for all individuals in the dataset, and as soon as a person becomes eligible, they can enter into the trial. Baseline is defined for an individual as the first time when all eligibility criteria are met (the first enrollment date). Values of baseline covariates should be assessed before this enrollment date, and the observed treatment should be recorded at the time of enrollment (fig 1). To assess if imposing eligibility criteria may introduce selection bias, we suggest that researchers compare baseline characteristics of the enrolled population with the patients who are not eligible, noting that restricting the study population to a subset of patients in the target population may introduce selection bias if being selected into the study is associated with both the treatment and the outcome owing to shared underlying factors (common causes). For each eligible case, controls who are eligible can be randomly sampled as of the case’s enrollment date using incidence density or risk set sampling. The estimated odds ratio from such a case-control analysis approximates the hazard ratio or (in the presence of constant hazards) incidence rate ratio from the target trial.2223

Fig 1
Fig 1

Schematic diagram of a case-control study with incidence density sampling to emulate a target trial using the entire timeframe as the enrollment period. Squares represent selected control index dates and triangles represent index dates for cases (with letters depicting individuals)

The approach discussed above allows each patient to enroll in only one trial. However, we suggest discretizing the data by time, such as days, weeks, or months, and using each time interval as an enrollment period, thus allowing each individual to enroll in as many periods for which they are eligible. In this approach, each row of data represents one copy of a person who enrolls in a trial, here referred to as a person trial. This approach maximizes the use of data and improves statistical efficiency. Figure 2 illustrates the process of sampling cases and controls using this approach. The detailed steps are explained in section 2.3 of the guideline (see supplemental file 1) and supplemental text (see supplemental file 2). In supplemental figures 2 and 3 we also illustrate the steps of sampling case person trials and control person trials in diagrams. Briefly, all eligible person trials that lead to an event will be included in the analytical dataset as “case person trials.” One or more “control person trials” are randomly sampled for each case person trial from all eligible person trials within the dataset.

Fig 2
Fig 2

Schematic diagram of a case-control study with incidence density sampling to emulate a target trial using each calendar month as one enrollment period. Each month (m) is considered as the enrollment period for a trial. Each letter identifies one individual. Triangles represent cases and squares represent controls. It is possible for a case to be sampled as control before experiencing the event (for example, in trial 3, individual E is sampled as a control for case A and later becomes a case). To identify cases within each monthly trial, individuals who are eligible at m and experience an event between month m+1 and the end of follow-up are identified (A and E), the event month (q) is recorded, and the case status is validated. These observations are referred to as case person trials (see process 1 in the guideline (supplemental file 1) for more detail). To sample controls for each case within each monthly trial using incidence density sampling, all individuals who are eligible at month m are identified, n (2 in the example in the figure) controls are randomly selected for each case, and a randomly selected month between m and their end of follow-up is selected for each control (referred to as q and shown as a square). This process is repeated for subsequent cases within each trial (m) and for subsequent monthly trials. These observations are referred to as control person trials. It should be ensured that these sampled controls did not experience an event before month q, using existing or additional data. To conduct an intention-to-treat analysis analog and in the absence of differential loss to follow-up, information on treatment at month m, confounders before that month, and the case or control status are sufficient for analysis (no information is required at month q). However, to adjust for non-adherence using a per protocol analysis or to adjust for differential loss to follow-up (in intention to treat or per protocol), information on time varying determinants of treatment and loss to follow-up between months m and q is required to estimate inverse probability weights (see process 2 in the guideline (supplemental file 1) for more detail). Alternatively, month q can be chosen for all controls sampled for each case to be the same as event month for the case, which is often referred to as risk set sampling

Trial emulation using case-control design: Analysis

Once the analytical dataset is created, the observational analog of the intention-to-treat effect can be estimated by comparing the treatment initiation status at enrollment of case person trials with control person trials using a pooled logistic regression model, adjusting for confounders measured at or before enrollment. If each patient is allowed to enroll in multiple trials, the variance of the estimated effect size should be adjusted for the within person correlation of person trials using an appropriate variance estimator, such as a robust variance estimator.24 More details are provided in section 2.2 of the guideline (see supplemental file 1).

To adjust for imperfect adherence and estimate a per protocol effect, we propose artificially censoring non-adherent person trials when they deviate from their assigned treatment strategy at the time of treatment discontinuation. The resulting dataset will only include person trials that are always adherent to assigned treatment. To adjust for potential selection bias due to censoring because of imperfect adherence, inverse probability weights should be estimated using time varying data on prognostic factors associated with the probability of treatment. In section 2.3 of the guideline (see supplemental file 1), we explain in detail how to include such time varying factors in an expanded dataset of case person trials and control person trials. Inverse probability weights should be estimated in the control population because controls represent the target population, whereas the association between treatment and confounders may be different among cases.25 A similar approach using inverse probability weights can also be used to adjust for differential loss to follow-up (eg, due to disenrollment or competing risks) by including factors associated with loss to follow-up in the time varying dataset.

When using data from an existing matched case-control study, the covariates used for matching should be included in the outcome model, preferably in the same functional form used for matching. This method is followed because matching in case-control studies creates an association between the matching variable and case-control status.

Clinical example

Protocol for target trial and its observational emulation

We developed a detailed protocol of a target trial to estimate the effect of starting statin treatment on the prevention of myocardial infarction (table 1) and emulated this trial under a nested case-control design using electronic health record data. The few differences between the target trial and the emulated trial are presented in the last column of table 1 and are mostly due to data availability and lack of randomization. As is common in observational comparative effectiveness research using electronic health records, we restricted the population to patients who had been enrolled in the health plan for more than one year. This requirement serves at least two purposes. Firstly, it allows investigators to gather information on eligibility, treatment, and confounders for a period preceding the patient’s eligibility, which is crucial for adjusting for baseline confounding factors, and, secondly, it ensures that eligible patients are long term users of the healthcare system from which the study data are derived, thus reducing loss to follow-up. This restriction may, however, induce selection bias or limit the generalizability of the findings. For comparison of the protocol of the target trial with a previously conducted randomized controlled trial, we also describe the protocol for a clinical randomized controlled trial: the Justification for the Use of Statins in Primary Prevention: An Intervention Trial Evaluating Rosuvastatin trial (JUPITER)26 in supplemental file 2.

Table 1

Specification of protocols for a target trial on the effect of statin initiation and risk of myocardial infarction and emulation of the target trial using EHR data

View this table:

We chose a case-control design rather than a cohort design because our aim was to validate outcome status (cases and non-cases) using manual review of medical records, which would not be practical to do on the entire cohort. Although bias correction methods have been proposed to adjust for outcome misclassification, they often rely on assumptions and modeling choices that may introduce additional uncertainty or bias.27

Data sources

As mentioned before, data from a previously conducted case-control study can be reanalyzed or a new case-control study conducted to emulate the target trial if electronic health record data are available over the entire span of the study. Here, we used data from a previously conducted case-control study that had validated diagnoses for myocardial infarction. We extracted additional data from the electronic health record database from which the cases and controls arose. The electronic health record data came from Kaiser Permanente Washington (KPWA), an integrated healthcare delivery system in the US providing medical care and coverage to about 700 000 members in Washington State. The main advantages of this setting were convenience and the availability of extensive electronic health record data as well as data from previously conducted nested case-control studies.2829 The previous case-control studies, however, had collected data on cases and controls only at the index date or at a few time points before it, but our access to the same electronic health record dataset allowed us to obtain comprehensive data on eligibility, potential confounders, and treatment across time. We were then able to compare results obtained from a cohort design with unvalidated outcomes to those obtained from a case-control design with validated outcomes in the same population. KPWA maintains computerized data on diagnoses, hospital admissions, procedures, outpatient visits, laboratory test results, vital signs, and prescriptions. Information on statin prescription fills was derived from a pharmacy database, which included all outpatient prescription fills at KPWA pharmacies and prescription claims submitted by outside pharmacies. Pharmacy data comprised a unique patient identifier, drug name, strength, route of administration, date dispensed, quantity dispensed, and days’ supply. We chose a large set of potential confounders based on a priori knowledge of determinants of statin initiation and incidence of myocardial infarction. However, similar to other studies based on electronic health records, we did not have data on diet or physical activity. Data on blood pressure and body mass index were available from 2005 onwards, and these variables were included only in a sensitivity analysis. We adjusted for total cholesterol and high density lipoprotein cholesterol in the main analysis and additionally for low density lipoprotein cholesterol in a sensitivity analysis. To ascertain comorbidities, we used ICD-9 (international classification of diseases, ninth revision) diagnosis codes. Analyses were adjusted for history of atrial fibrillation, chronic obstructive pulmonary disease, cancer, cataract, dementia, depression, diabetes, heart failure, hypertension, and chronic kidney disease.

The myocardial infarction diagnosis among cases and lack of a myocardial infarction event among controls was verified by medical record review as part of the Heart and Vascular Health study. This previously conducted case-control study among KPWA enrollees (1994-2010) is further described in supplemental file 2.282930

Analysis

We allowed each patient to enroll in multiple emulated monthly trials (see details in supplemental file 1). We used a pooled logistic regression model to estimate the observational analog of intention-to-treat effect of statin initiation on occurrence of myocardial infarction after adjusting for confounders at enrollment date as well as trial month and follow-up month. We also estimated the per protocol effect of statins after adjustment for time varying confounders through inverse probability weighting.

To evaluate the impact of validation for case or control status, we also used cohort data without outcome validation to emulate the same target trial, and we calculated the observational analog of intention-to-treat and per protocol effects of statin initiation on occurrence of myocardial infarction. Furthermore, we performed a set of sensitivity analyses to evaluate the impact of alternative designs and analytic approaches (see supplemental table 2).

Results

Among the 10 128 unique cases and controls in the Heart and Vascular Health study, 1221 and 4267, respectively, met the eligibility criteria for at least one person trial (fig 3). The main reason for exclusion was not having data on selected confounders in the six months before a potential enrollment period (see supplemental figure 4). Characteristics were overall similar between ineligible and eligible patients, although eligible patients had a slightly higher risk profile (eg, higher levels of low density lipoprotein cholesterol) and higher prevalence of documented comorbidities (eg, diabetes) than ineligible patients (see supplemental table 1). We created 198 monthly emulated trials (between January 1994 and June 2010). Across all trials, 15 263 were eligible case person trials, and we sampled five controls for each case using incidence density sampling, thereby creating 76 315 control person trials (fig 3). Some individuals selected as controls later became cases. The median follow-up time (time between enrollment and event date for cases and sampling date for controls) was 25 months (interquartile range 11-43 months) for initiators and 30 (12-55) months for non-initiators. Overall, statin treatment was initiated in 1.9% (287 of 15 263) of the eligible case person trials and 1.5% (1167 of 76 315) of the sampled control person trials. Statin initiators were generally less healthy than non-initiators (table 2). They were on average older and had a higher prevalence of diabetes and smoking, higher total cholesterol levels, and lower high density lipoprotein cholesterol levels.

Fig 3
Fig 3

Flowchart of person trials in case-control trial emulation of statin treatment initiation and risk of myocardial infarction using incidence density sampling (1994-2010). *Cases and controls overlap (ie, controls may later be selected as cases). †Some people were initiators in some trials and non-initiators in other trials

Table 2

Baseline characteristics of eligible person trials of statin treatment initiation on risk of myocardial infarction (1994-2010), case-control design with validated outcome status and EHR data. Values are number (percentage) unless stated otherwise

View this table:

In a minimally adjusted (only trial month and follow-up month) pooled logistic regression model for validated myocardial infarction the intention-to-treat odds ratio was 1.26 (95% confidence interval 1.10 to 1.44). After further adjustment for confounders at enrollment date, the odds ratio was 0.80 (0.69 to 0.92) (table 3). Adherence to assigned treatment was assessed among sampled controls; 41% (596 of 1454) of initiators discontinued treatment within one year and 64% (931 of 1454) discontinued treatment within five years. Conversely, 8% (7210 of 90 124) of non-initiators started treatment within one year and 38% (34 247 of 90 124) started treatment within five years (fig 4). The per protocol odds ratio after censoring non-adherent person trials and adjusting for determinants of adherence using inverse probability weights was 0.71 (95% confidence interval 0.58 to 0.87). Figure 5 summarizes the results obtained using different analytic approaches and study designs. The biased case-control analysis using validated outcome status and measures of covariate and treatment at the index date (as opposed to enrollment date) produced an odds ratio for statin use of 1.12 (0.96 to 1.31) (see supplemental table 2).

Table 3

Estimated odds ratios for effect of statin treatment initiation on risk of myocardial infarction (1994-2010) in case-control design using validated outcome status and EHR data

View this table:
Fig 4
Fig 4

Adherence to treatment by statin initiation status among controls (1994-2010) using case-control design with validated outcome status. Last deviation from protocol among initiators occurred at month 122 and among non-initiators at month 144

Fig 5
Fig 5

Estimated effects of statin treatment initiation on risk of myocardial infarction (1994-2010) using different study designs and analytic approaches. Effect size is pooled hazard ratio from meta-analysis and odds ratios elsewhere. *From Danaei et al 2012.20 No results for per protocol analysis were reported. On average across trials, 21% of statin initiators discontinued treatment and 25% of statin non-initiators started treatment. Whiskers represent 95% CIs. CI=confidence interval; ICD=international classification of diseases; MRR=medical record review; RCT=randomized controlled trial

In sensitivity analyses, a case-control analysis comparing unvalidated controls sampled from the KPWA dataset with validated cases from the Heart and Vascular Health study showed an odds ratio of 0.80 (0.70 to 0.90) (see supplemental table 2). Comparing unvalidated controls with unvalidated cases from electronic health record data, identified only by ICD codes, showed an odds ratio of 1.00 (0.90 to 1.10) (fig 5). Other sensitivity analyses are reported in supplemental table 7.

Concluding remarks

Using a nested case-control design, we emulated a target trial of the effect of statin initiation on incidence of myocardial infarction. We compared our findings to those reported in a previous meta-analysis of intention-to-treat results from randomized controlled trials (pooled hazard ratio of 0.69).20 The estimated intention-to-treat treatment effect from our emulated trial was consistent with benefit (odds ratio 0.80). In contrast, an odds ratio of 1.12 was obtained for a biased case-control analysis with confounders and treatment measured at the index date. The smaller protective effect estimated in the trial emulations compared with previous randomized controlled trials could be explained by unmeasured confounding, differences in eligibility criteria, longer follow-up time, and lower adherence in our study population compared with those enrolled in randomized controlled trials.

Strengths and limitations of this study

The proposed methods may not be usable in all settings. Firstly, the methods require time varying data on eligibility, treatment, and confounders. Therefore, such analyses cannot be implemented within existing case-control studies that have measured such factors only at the index date (or sporadically before it). In our clinical example, we resolved this by extracting additional data on these factors for our selected cases and controls from the healthcare system’s electronic health record. However, even in the context of a high quality electronic health record system, as in our study, information on major confounders may not be available for all potentially eligible individuals during the period of study. Therefore, investigators may limit the eligible population to those with recent measurements of major confounders, which reduces sample size and may introduce selection bias and limit generalizability. Secondly, the proposed inverse probability weighting to adjust for imperfect adherence and differential loss to follow-up is sensitive to violations of the positivity assumption. Such violations are more common with longer durations of follow-up and may lead to undue influence of a few observations. In sections 3.4 and 3.6 of the guideline (see supplemental file 1), we propose several ways to reduce the potential for such violations. Finally, the proposed methods, especially allowing each patient to enroll in multiple trials and estimating time varying inverse probability weights are conceptually and analytically complicated. The guideline (see supplemental file 1) aims to resolve this issue by providing detailed guidance on how to conceptualize the methods, prepare the analytic datasets, and conduct the analysis using the provided SAS macro and sample dataset.

Despite these limitations, the proposed methods have several major strengths compared with biased analytical methods used in case-control studies that evaluate eligibility, treatment, and confounders at the index date. Drafting a protocol for the target trial helps clarify the causal question of interest by clearly defining eligibility criteria, treatment strategies, and outcome definitions. Emulating the protocol using observational data helps reduce bias due to prevalent user bias (ie, differential survival bias) by defining intervention strategies as treatment initiation among those who were eligible at study enrollment. In addition, measuring confounders at or before treatment assignment prevents inappropriate adjustment for factors that may be affected by previous treatment (ie, collider stratification bias and adjustment for a mediator). Notably, not all case-control analyses evaluate exposure and confounders at the index date; those that evaluate these variables before the index date may be less prone to bias.

Compared with previous methods proposed to emulate the design and analysis of a target trial using a cohort design, the case-control design allows researchers to focus resources on efficiently collecting data on high quality and expensive measures of eligibility, confounders, treatments, and outcomes. In our clinical example, we obtained a sensitivity of 95-96% and a specificity of 98% for the diagnosis of myocardial infarction by ICD codes compared with medical reviews (as the ideal method for validation). But even this small error in measurement was enough to introduce substantial bias in the results. In other clinical examples, the measurement error may be much larger.237891012 Our clinical example illustrates the benefit of implementing these methods and shows the process of emulating a sequence of nested trials to increase statistical efficiency.

In conclusion, emulating a target trial using a nested case-control design allows high quality data and validated measures to be combined with analytic methods that are less prone to common biases in comparative effectiveness research. The accompanying guideline and analytic code should allow other investigators to implement these analyses.

Ethics statements

Ethical approval

This study was approved by the Harvard TH Chan School of Public Health Research ethics committee (protocol No: IRB18-1075, institutional review board (IRB) effective date: 24 Sept 2018), and Kaiser Permanente Washington Region IRB (study No: 1191203-1, IRB effective date: 13 Jun 2018). The lead author affirms that the manuscript is an honest, accurate, and transparent account of the study being reported and that no important aspects of the study have been omitted.

Data availability statement

No additional data available.

Acknowledgments

We thank Miguel A Hernán, who contributed to the design of the study and development of the methodology; Barbra A Dickerman who contributed to developing the methodology and interpreting the results and reviewed the manuscript and the guideline; all participants, investigators, and the staff of the Kaiser Permanente Washington Health Research Institute and Heart and Vascular Health study; and 22 researchers and data analysts for providing comments on an earlier draft of the guideline during our qualitative interviews. We are grateful to Kelly Meyers for project management support at Kaiser Permanente Washington Health Research Institute.

Footnotes

  • Contributors: GD the principal investigator, conceived the study, led the development of the methodology, and contributed to the interpretation of the results and writing of the manuscript and the guideline. BR contributed to developing the methodology, data analysis, interpretation of the results, and writing of the manuscript and the guideline. JC, MN, and RLW (cohort electronic health record data) and JSF, BMP, and KLW (case-control data from the Heart and Vascular Health study) developed and implemented the data creation protocol, contributed to the interpretation of the results, and reviewed and revised the manuscript and the guideline. RWL developed the SAS macros for the data analysis. All authors read and approved the final version of the manuscript and the guideline. GD is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and take responsibility for the integrity of the data and analysis.

  • Funding: This study was funded by the Patient-Centered Outcomes Research Institute (award No/project ID: ME-1609-36748). The postdoctoral fellowship to BR is supported by Novo Nordisk Foundation (No: NNF17OC0027580). Kaiser Permanente also supported this research. The funders had no role in considering the study design or in the collection, analysis, interpretation of data, writing of the report, or decision to submit the article for publication.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at http://www.icmje.org/disclosure-of-interest/ and declare: support from the Patient-Centered Outcomes Research Institute, Novo Nordisk Foundation, and Kaiser Permanente for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • The lead author (GD) affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

  • Provenance and peer review: Not commissioned; externally peer reviewed.

References