Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score
BMJ 2020; 370 doi: https://doi.org/10.1136/bmj.m3339 (Published 09 September 2020) Cite this as: BMJ 2020;370:m3339Linked Editorial
Prediction models for covid-19 outcomes
Read our latest coverage of the coronavirus outbreak

All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
Dear Editor
I would like to be brief and precise. Thank you for the helpful article on risk stratification. We have been using the score to risk stratify our patients and found it very useful. To make it more practical and save time while working on the COVID floor, we here in Pinderfields came up with a quick Excel calculator based on the 4C mortality score. We have been using it and finding it much more convenient than calculating the score manually and I thought I will share it with others too. I am sharing the link here as I am unable to upload any files directly. Please feel free to distribute. I am happy to take any comments/questions.
https://www.dropbox.com/s/o5tf29nbt9xplqn/4C%20Mortality%20Risk%20Calcul...
Thank you.
Kind Regards
Rahim
Competing interests: No competing interests
Dear Editor,
We read with great interest the excellent study by Knight et al.[1], “Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score” which has been recently published in your esteemed journal. It is a succinctly written article and we would like to commend the authors for their outstanding effort. It is a topic of much interest to us and we would like to add a few points which we feel would enrich the discussion.
Brahier T et al. [2] reported that lung involvement visualised with ultrasound correlated to disease severity and that summarising this into a simple ordinal scoring system has potential to discriminate patient requiring hospitalisation and thus better allocate scarce resources. Lung ultrasonography (LUS) is reliable, cheap and easy to use as a triage tool for the early risk stratification in COVID -19 patients. LUS has already shown excellent performance to detect non-COVID-19 pneumonia, compared to CT as a reference standard, and matches the discriminative power of CT in patients with acute respiratory distress syndrome (ARDS). [3]
Various studies have shown the relevance of evaluating the levels of IL-6 and platelet count along with the parameters in the 4c mortality score. Both these parameters have an independent role in stratifying the covid -19 patients based on their severity. Since these are readily obtainable in clinical laboratories, their inclusion will boost the mortality predicting accuracy. [4,5]
REFERENCES
(1)Knight S R, Ho A, Pius R, Buchan I, Carson G, Drake TM, et al. Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol:delopment and validation of the 4C MortalityScore. BMJ. 2020;370:m3339. doi: https://doi.org/10.1136/bmj.m3339.
(2) Brahier T, Meuwly PJ-Y, Pantet O, Vez M-J B, Donnet H G ,Hartley M-A et al. Lung ultrasonography for risk stratification in patients with COVID -19: a prospective observational cohort study. Clin Infect Dis. 2020;ciaa1408. https://doi.org/10.1093/cid/ciaa1408
(3) Mayo PH, Copetti R, feller-Kopman D, Mathis G, Maury E, Mongodi S, et al. Thoracic ultrasonography: a narrative review. Intensive Care Med. 2019;45(9):1200-11.
(4) Laguna–Goya R, Utero-Rico A, Talayero P, Lazaro-Lasa M, Ramirez-Fernandez A, Naranjo L, et al. IL-6 –based mortality risk model for hospitalized patients with COVID -19. J Allergy Clin Immunol. 2020. Doi: 10.1016/j.jaci.2020.07.009
(5) Lippi G, Plebani M, Henry MB. Thrombocytopenia is associated with severe coronavirus disease 2019 (COVID -19) infections: A meta–analysis. Clin Chim Acta. 2020 Jul;506:145-148. https://doi.org/10.1016/j.cca.2020.03.022
Competing interests: No competing interests
Dear Editor,
It was with great interest that we read the study of Knight and colleagues regarding risk stratification of patients with COVID-19 according to the 4C mortality score [1]. The authors considered eight predictors of in-hospital mortality: age, sex, number of comorbidities, respiratory rate, peripheral oxygen saturation, Glasgow coma scale, urea level, and C reactive protein, all part of routinely collected information from medical records and easily obtained in most settings.
In order to test its external validation in a different population, we applied the 4C score in the COVIDAge cohort of very old patients hospitalized in acute care with COVID-19, who were ineligible for intensive care [2]. This cohort was composed of 235 Caucasian patients with a mean age of 86 (65-102) years, with 102 (43%) male patients, all admitted to acute geriatric care at the Geneva University Hospitals. Seventy-six patients (32.3%) died during hospitalization [2]. The mortality rate in our cohort was similar to those from the derivation (32.2%) and validation (30.1%) dataset used in the study of Knight and colleagues, although our patients were older.
The eight variables of the 4C score were extracted and the score was computed respecting the weighting described by the authors, ranging from 0 to 21 points, the higher scores being associated with greater mortality. Because the Glasgow coma scale was not systematically available, we decided to modify this item by including all patients with a diagnosis of delirium as dichotomized variable, which was systematically assessed by the Confusion Assessment Method and the DSM-5 criteria [3,4]. Respiratory rate was defined as the highest value within the first 24 hours after hospital admission. We obtained excellent completeness of data, with only a few missing values: three for urea and four for C-reactive protein. The missing data corresponded to four patients, who were all categorized as “high risk” without these laboratory values.
The categorization of our population by risk groups resulted in the following distribution: 21.7% (51/235) were at “very high risk” (≥15 points), 74.9% (176/235) at “high risk” (9-14 points) and 3.4% (18/235) as “intermediate risk” (4-8 points) of mortality. No patient was categorized as “low risk” (0-3 points). This score distribution was significantly different from the initial validation cohort (p<0.001), mainly because our patients were older, which put them directly in the “intermediate risk”, by taking into consideration the age criteria alone.
Then, we computed mortality rates with their respective binomial exact 95% confidence interval for each risk category in our cohort. Although no patient in the “intermediate risk” category died (N=0/8; 0%, 0-36.9%), we observed an increased mortality from “higher risk” (N=49/176; 27.8%, 21.4-35.1%) to “very high risk” (N=27/51; 52.9%, 38.4-67.1%). The mortality rate according to the 4C risk categories was similar in our population as in the published derivation and validation cohorts (Table 5 from the paper), as their mortality rates were comprised in our confidence intervals. Additionally, there was no overlap of the confidence intervals between “high risk” and “very high risk” mortality rates, which highlights the good discriminative power between these two categories. On the other hand, while the authors obtained an Area Under the Receiver Operating Characteristics curve (AUROC) of 0.786 (95% Confidence Interval 0.781-0.790) in the derivation cohort and of 0.767 (95% Confidence Interval 0.760-0.773) in the validation cohort with their final eight-variable model, the AUROC was slightly lower in the COVIDAge cohort (0.743; 95% Confidence Interval 0.68-0.81).
We would like to call attention to the criterion of altered consciousness measured by a Glasgow coma scale <15, which is often of limited use and less reliable in older patients [5], for this reason we used the diagnosis of delirium as a simple alternative and suggest its use for this prognostic assessment [6]. This is based on our experience from the COVIDAge cohort, in which delirium was a strong predictor of in-hospital mortality in older patients with COVID-19 (HR 2.09; 1.18-3.70 95%CI; p=0.011) [2].
Despite our relatively small sample size, we were able to confirm the mortality rates published in both the validation and derivation cohorts, thus demonstrating that the 4C mortality score can also be applied to a population of very old inpatients.
Authors:
Professor Dr. François Herrmann
Division of Geriatrics, Department of Rehabilitation and Geriatrics
François.Herrmann@hcuge.ch
Professor Dr. Stephan Harbarth
Division of Infectious Diseases and Infection Control Program
Stephan.Harbarth@hcuge.ch
Dr. Christine Serratrice
Division of Internal Medicine of the Aged, Department of Rehabilitation and Geriatrics
Christine.Serratrice@hcuge.ch
Professor Dr. Christophe Graf
Division of Internal Medicine and Rehabilitation, Department of Rehabilitation and Geriatrics
Christophe.Graf@hcuge.ch
Professor Dr. Dina Zekry
Division of Internal Medicine of the Aged, Department of Rehabilitation and Geriatrics
Dina.Zekry@hcuge.ch
Professor Dr. Gabriel Gold
Division of Geriatrics, Department of Rehabilitation and Geriatrics
Gabriel.Gold@hcuge.ch
Dr. Aline Mendes
Division of Geriatrics, Department of Rehabilitation and Geriatrics
Aline.Mendes@hcuge.ch
Competing interests: No competing interests
Dear Editor,
In primary care, could blood ESR and urine Specific Gravity be substituted for blood CRP and Urea to enable primary care clinicians to provide a rapid appropriate further care plan discussion and treatment.
A patient who prefers to stay at home first, may benefit from a similar risk score to enable a timely and appropriate escalation to palliative care needs.
Could such an adapted tool be used, to stratify risk and further monitor ‘home first’ patients in general practice virtual wards, to improve quality intended outcomes of care?
Competing interests: No competing interests
Dear Editor
Many thanks to all correspondents for their interest in our work on the 4C Mortality Score. The feedback is greatly appreciated, and we will respond to all points in time. We write here specifically in response to the letter from Professor Riley and colleagues (1).
We are pleased to be able to use the large and detailed ISARIC 4C dataset to provide a pragmatic decision support tool for use by clinicians in these challenging times. We thank the correspondents for highlighting the quality of the work; we are particularly indebted to the large and dedicated team of research nurses and students who worked hard to collect these data. We are grateful for the opportunity to address the technical points raised.
Development of this tool began during lockdown, when nurses and doctors in emergency departments and intensive care units were working incredibly hard in difficult circumstances to provide the best care for patients unwell with covid-19. The request from clinical teams was for a pragmatic decision support tool that could be applied quickly using admission information without the need for a web or mobile phone-based app, given contamination risks (2). This is what has been achieved.
The first and second points are around score calibration. Calibration describes the relationship between predictions and observations and is important in ensuring that a prognostic tool is accurate across the range of risk. Figure 2 refers to validation data and an erratum has been requested to correct a previous legend substituted in error. All performance metrics and analyses presented in this study were performed using the prognostic index (score), not using the regression models which were used to generate this score. The calibration in the validation dataset is excellent, though not perfect. It is correct to highlight that these are averages within deciles of risk. The good performance is due in part to this being an internal validation, albeit using data from different patients admitted at later points in time. Particularly in a pandemic, it is likely that calibration will change over time, perhaps differently by region and by patient subgroup. The simplicity of the 4C Mortality Score makes it susceptible to this, and we would not expect calibration to be as good in planned external validation exercises. Calibration-in-the-large (CITL) and the slope of the calibration curve were generated in the standard manner. The score was fitted to the outcome in a logistic regression model using derivation data. Predictions on the log-odds scale were made in the validation dataset and fitted in a logistic regression model to determine the calibration slope (1.034). Linear predictions were fitted as an offset to determine CITL (0.030). LOESS curve fitting did not alter the appearance of the plot.
With regard to clinical utility, we included a comparison of decision curves for the best discriminating scores that could be applied to >50% of the complete case cohort. It is asked why decision curves do not have the form of a step function (a staircase relationship in lay terms; in mathematical terms, a linear combination of indicator functions). We admit to being a little confused by this question, as we know the correspondents have a great deal of experience in this area. Net benefit is defined as the fraction of true positives minus the fraction of false positives at a given threshold odds, multiplied by the threshold odds (3). Given that a decision curve is a function of threshold odds, the result can never be a step function. The discrete changes in net benefit given the discrete nature of the prognostic index are seen as expected.
There are difficulties in using decision curves for points score models, given that such a model does not incorporate an underlying probability function in the way a regression model does. We considered three approaches to this. The first is to use the original outcome probabilities of comparison scores across the full range of risk if available; these are then fitted to validation data. The second is to refit the comparison scores in the derivation data, and to use these to predict outcomes for the validation data. The third is to refit all scores in the validation data and provide a direct comparison. We included the third approach in the paper and found no difference in conclusions using the second approach. All these analyses were performed in the same nested data for each score.
In point 3, it is asked why a machine learning model was developed and reported together with the other approaches. As stated in the paper, we believe it is important when presenting a simple and pragmatic score to understand what discrimination might be achieved when using alternative classification tools. We challenged our simple derived score by asking “to what extent is discrimination being sacrificed for the sake of expediency”? It is an interesting philosophical question to consider the upper bound of predictive power given a set of information at a particular decision point. We are trying to get to that in using all included variables in a flexible modelling framework. The correspondents make no comment on the comparison, which is of interest. Differences in discrimination between modelling approaches were numerically small, possibly reflecting limitations of the area under the receiver operator curve metric. Much of the direct correspondence we have received for this study has been in praise of the inclusion of alternative approaches when defining a pragmatic score. With regard to continuous variables, perhaps capturing non-linearity will have benefits in other performance metrics or score calibration in subgroups, at the expense of the requirement for a nomogram or calculator app.
In point 4, it is emphasised that performance should be ensured across geographical regions and patient subgroups. We completely agree. Discrimination was considered in geographical subsets, by sex, and by ethnicity. It will be useful to externally validate this model in further geographical regions and subgroups, and we look forward to doing so.
In point 5, clarification is asked on inclusion criteria. As stated, patients were required to have at least 4 weeks follow-up at the time of data extraction. Events occurring at a time-point after 4 weeks were considered at that time-point. Included patients who had no outcome, either because it was missing or it had not happened yet (derivation dataset 3.6%), were considered to have had no event. The outcome measure is therefore in-hospital mortality. It should be noted that outcomes after day 28 may not be as reliably collected due to the pressure the data collection teams were under during the pandemic.
In point 6, it is stated that this "new tool should be viewed as predicted risks in the context of current care [and] a low risk does not mean that the patient should immediately be sent home without care”. Yes, we emphasise that the key aim of risk stratification is to support clinical management decisions, not to replace them.
In the final point, the comparison with existing scores is described as problematic given the inability to apply particular scores to the data commonly available at admission to hospital. It is imperative that any new prognostic tool is put in the context of what has come before. As described above, the decision curves analysis of the best scores was performed within a nested dataset. That some existing prognostic scores can only be applied in a small proportion of patients in this dataset is itself an important result. This was a prospective non-interventional study using routine data. Prediction scores that require information not commonly available at the time of decision making have limited applicability in practice. This is particularly important in situations of surge when healthcare demand is high and clinical resources are limited.
Novel biomarkers may be important. Our deep phenotyping work progresses, and we hope to identify biomarkers that can be incorporated into similar tools to help characterise and guide the treatment of patients with covid-19.
Many thanks again for the opportunity to clarify these important points, as clinician-scientists working in hospitals, we are acutely aware of the balances that must be struck around pragmatism when creating decision support tools. We welcome data sharing requests (https://isaric4c.net) and as with all our projects, the code is made public (https://github.com/SurgicalInformatics/4C_mortality_score). We hope this code provides useful solutions to others working in this area.
1. Riley RD, Collins GS, van Smeden M, Snell KIE, Van Calster B, Wynants L. Is the 4C Mortality Score fit for purpose? Some comments and concerns. 2020 Sep 15 [cited 2020 Sep 16]; Available from: https://www.bmj.com/content/370/bmj.m3339/rr-3
2. Phua J, Weng L, Ling L, Egi M, Lim C-M, Divatia JV, et al. Intensive care management of coronavirus disease 2019 (COVID-19): challenges and recommendations. The Lancet Respiratory Medicine. 2020 May 1;8(5):506–17.
3. Vickers AJ, Calster BV, Steyerberg EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ [Internet]. 2016 Jan 25 [cited 2020 Sep 16];352. Available from: https://www.bmj.com/content/352/bmj.i6
Competing interests: No competing interests
Dear Editor
Few models can perfectly predict coming events. Even having encountered the disastrous pandemic of SARS-COV-2, and the complex disease COVID-19 it initiated, we look like having no more ways against it. Antiviral drugs and vaccines available now have very limited effects, and the novel strategies on the way have uncertain results. In addition, COVID-19 has greatly changed our views, from formerly being a predominantly pulmonary disease to a systematic syndrome affecting many organs [1, 2]. Furthermore, many more factors are tightly correlated with COVID-19, such as age, comorbidities, psychic situation, public healthy polices, even the politic aspects. Management and prevention partially under the guidance of a pragmatic risk stratification tool to predict mortality in patients admitted to hospital with COVID-19 become more and more important.
Many prognostic stratification tools used for COVID-19 have been developed [3], but most of them lack the capacity to identify patients with COVID-19 who are at the highest mortal risk. Most of them have a high risk of bias, a small sample size resulting in uncertainty, poor reporting, and lack of formal validation, as described by Knight SR and colleagues, who designed an easy-to-use and valid prediction tool, named coronavirus clinical characterisation consortium mortality core (4C Mortality Score) [4]. They declared that this tool has the advantage of stratifying patients with COVID-19 at every risk level with high sensitivity and specificity. In fact, they enrolled eight key factors as the elements for data analysis. Though these elements could not completely depict the whole situation of patients with COVID-19, they are already consistent with existing predicting tools, and even 4C has a discriminatory performance higher than 15 pre-existing stratification scores.
Without discussing the other factors in these datasets, we just focus on age stratification. As described in Figure 3, both in sensitivity/specificity and standardized benefit/high risk threshold, it has the lowest reliability. This means the age index has limited reliability as a key predictor among these elements. Recent report has demonstrated that age has no reliable correlation with the severity and mortality in patients with COVID-19. [5] Yet other clinical studies claimed that age was tightly correlated with the adverse outcome in COVID-19 patients. [6] We found these opinions have no contradiction substantially. This divergence is possibly linked to the comorbidities in patients with COVID-19 admitted to hospital. Concerning the infective rate of SARS-CoV-2, in-hospital mortality is lower than that reported for older adults with COVID-19, but approximately double that of young adults with acute myocardial infarction. But young adults with more than 1 of the conditions, which include morbid obesity, hypertension, and diabetes, faced risks comparable with those observed in middle-aged adults without them. [7] This further demonstrates that age has little probability as a predictor; at least it should not be regarded as an independent factor in scores dataset. Hence the age index in 4C Mortality Score should be modified and calibrated with referencing to comorbidity.
On the other hand, COVID-19 now is a kind of systematic syndrome: nearly all tissues and organs will have a high risk of being infected by SARS-CoV-2. Eight factors enrolled in the 4C prediction tool seem to be biased. A factor which has good sensitivity and specificity crossing the life-span and diverse systems is urgently needed. Holistically, endothelium is the first barrier, one of the largest organs in the human body. In fact, endothelium is susceptible to SARS-CoV-2 [8], and the increased levels of circulating endothelial cells (CECs, derived from normal endothelium damage) appear to be associated with severe forms of COVID-19. Indeed, those who required admission to the intensive care unit (ICU) had significantly higher CECs than patients who did not require treatment in the ICU. In addition, the extent of endothelial injury was correlated with disease severity and inflammatory cytokines. [9] In COVID-19 patients, endothelium damage is usual, and the CECs present the advantages of quick-testing and high sensitivity to mortality scoring in contrast to the eight factors in the 4C predicting tool. It is strongly suggested using CECs as a predictor in the 4C mortality score.
Reference
1. Cheung KS, Hung IF, Chan PP, et al. Gastrointestinal manifestations of SARS-CoV-2 infection and virus load in fecal samples from the Hong Kong cohort and systematic review and meta-analysis. Gastroenterol. 2020; 159:81-95.
2. Puelles VG, Lütgehetmann M, Lindenmeyer MT, et al. Multiorgan and renal tropism of SARS-CoV-2. N Eng J Med. 2020; 383:590-592.
3. Wynants L, Van Calster B, Collins GS, et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ 2020;369:m1328.
4. Knight SR, Ho A, Pius R, et al. Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ 2020;370:m3339.
5. Cunningham JW, Vaduganathan M, Claggett BL, et al. Clinical outcomes in young US adults hospitalized with COVID-19. JAMA Intern Med. 2020. doi:10.1001/jamainternmed.2020.5313
6. Ludvigsson JF. Systematic review of COVID-19 in children shows milder cases and a better prognosis than adults. Acta Paediatr. 2020;109:1088-1095.
7. Yang J, Biery DW, Singh A, et al. Risk factors and outcomes of very young adults who experience myocardial infarction: the Partners YOUNG-MI Registry. Am J Med. 2020; 133(5):605-612.
8. Varga Z, Flammer AJ, Steiger P, et al. Endothelial cell infection and endotheliitis in COVID-19. Lancet. 2020;395(10234):1417-1418.
9. Guervilly C, Burtey S, Sabatier F, et al. Circulating endothelial cells may serve as a marker of COVID-19 severity. J Infect Dis. 2020. doi:10.1093/infdis/jiaa528.
Competing interests: No competing interests
Dear Editor
We read with interest Knight et al’s 4C mortality model (1) and congratulate them on its impressive discriminatory performance. ISARIC has managed to gather one of the largest and most robust datasets for inpatient COVID-19 admissions in the world.
Although it was derived from United Kingdom (UK) data, given its scale, there is a temptation to look toward this model for its potential to benefit those in other countries. Future external validation studies will of course be needed to assess its predictive performance. Nonetheless, the tool utilises variables (such as CRP and urea) which may not be universally available to healthcare settings in many low- and middle-income countries (LMICs). At the outset this makes it unusable by an already underserved portion of the world’s population (2).
Ideally for use in LMICs, a tool would need to be developed using local datasets from local researchers. As an example, to our knowledge such a large local LMIC dataset does not exist in sub-Saharan Africa. It is likely that smaller regional datasets exist and they can be accessed through organisations such as the African Federation of Emergency Medicine (https://afem.africa/).
Therefore, we would urge the authors to consider adapting their model into LMIC populations by drawing on the expertise of local researchers particularly for candidate variable selection. As pressing as this is, we equally urge caution in the approach: for such a tool to add value to clinical management in a lower resourced setting, context is everything. Simply removing variables deemed less available has the potential to create a model which yields outcomes unsuited to the population it purports to serve which in turn could have a distracting and even detrimental effect.
A similar adaptive approach has been suggested before and implemented in acute coronary syndrome risk scores. The scores were derived and validated based upon the availability of investigations in hospital (3,4) vs. prehospital settings (5). Whilst this kind of approach is justified with regard to the ISARIC data, much more careful and transparent assumptions will need to be made given that the original UK setting operates a very different health system for a population with a different baseline health status.
If the 4C mortality score was adapted (e.g. using model updating techniques (6)), such a tool should then be externally validated locally in a smaller LMIC dataset as has been suggested previously (7,8). This has the potential to add value to the current dearth of COVID-19 research which has direct relevance to LMICs. If targeted at the right stage in the patient journey, this could produce large gains through efficient resource allocation which may then translate to impact on mortality and morbidity.
References
1) Knight, S.R., Ho, A., Pius, R., Buchan, I., Carson, G., Drake, T.M., Dunning, J., Fairfield, C.J., Gamble, C., Green, C.A. and Gupta, R., 2020. Risk stratification of patients admitted to hospital in the United Kingdom with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of a multivariable prediction model for mortality. British Medical Journal.
2) Cattani, M., 2020. Global coalition to accelerate COVID-19 clinical research in resource-limited settings. Lancet, pp.30798-4.
3) Body R, Carley S, McDowell G, et al The Manchester Acute Coronary Syndromes (MACS) decision rule for suspected cardiac chest pain: derivation and external validation Heart 2014;100:1462-1468.
4) Body, R., Carlton, E., Sperrin, M., Lewis, P.S., Burrows, G., Carley, S., McDowell, G., Buchan, I., Greaves, K. and Mackway-Jones, K., 2017. Troponin-only Manchester Acute Coronary Syndromes (T-MACS) decision aid: single biomarker re-derivation and external validation in three cohorts. Emergency Medicine Journal, 34(6), pp.349-356.
5) Alghamdi, A., Howard, L., Reynard, C., Moss, P., Jarman, H., Mackway-Jones, K., Carley, S. and Body, R., 2019. Enhanced triage for patients with suspected cardiac chest pain: the history and Electrocardiogram-only Manchester acute coronary syndromes decision aid. European Journal of Emergency Medicine, 26(5), p.356.
6) Su, T.L., Jaki, T., Hickey, G.L., Buchan, I. and Sperrin, M., 2018. A review of statistical updating methods for clinical prediction models. Statistical methods in medical research, 27(1), pp.185-197.
7) Steyerberg, E.W., Moons, K.G., van der Windt, D.A., Hayden, J.A., Perel, P., Schroter, S., Riley, R.D., Hemingway, H., Altman, D.G. and PROGRESS Group, 2013. Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Med, 10(2), p.e1001381.
8) Janssen, K.J., Vergouwe, Y., Kalkman, C.J., Grobbee, D.E. and Moons, K.G., 2009. A simple method to adjust clinical prediction models to local circumstances. Canadian Journal of Anesthesia/Journal canadien d'anesthésie, 56(3), p.194.
Competing interests: No competing interests
Dear Editor
Knight and colleagues have developed and validated a pragmatic risk score that predicts mortality in patients with COVID-19. The study is arguably the largest of its kind; the statistical analysis is extensive and relatively robust; and the score is simple enough to be applied in clinical practice.
Clinicians must understand that this score won’t differentiate COVID-19 from other similar clinical presentations at the front door, where this score is intended to be used. Many times, the patient who presents with severe breathlessness to emergency departments or their GPs will have not yet had a confirmatory SARS-CoV-2 nasopharyngeal PCR result. Differentials for a breathless patient, even with bilateral shadowing on a concomitant chest x-ray are broad. This includes acute heart failure, atypical bacterial pneumonia and with the winter looming, acute influenza.
To complicate matters further, any patient with severe disease from the aforementioned conditions would score highly on the 4C Mortality Score; but their management, and therefore prognosis would be different to what the score predicts, depending on their condition and appropriate subsequent use of diuretics or antimicrobial therapy. For example, an elderly patient in acute heart failure (giving them drop in mental status, raised urea, oxygen requirements and raised respiratory rate) may be inadvertently given steroids due to a misdiagnosis of COVID-19 and a high scoring 4C Mortality Score, when all they actually required was adequate diuresis.
The second wave of COVID-19 is on the horizon in the UK. Thanks to huge efforts in research, we are now better equipped to deal with the disease. However, the clinical history and examination remain critical in ensuring the diagnosis of COVID-19 is correct; so that correct interventions are targeted at those who need them, and that patients presenting to healthcare with similar diseases other than COVID-19 do not come to harm.
Competing interests: No competing interests
To the Editor
We read with interest the paper in the BMJ by Knight et al.,[1] proposing a new risk prediction model for patients admitted to hospital with COVID-19, which the Guardian indicate is expected to be rolled out in the NHS this week (https://www.theguardian.com/world/2020/sep/09/risk-calculator-for-covid-...). On the whole, the paper appears of higher quality than most other articles we have reviewed in our living review [2]. For example, the dataset was large enough;3 there was a very clear target population; missing data was handled using multiple imputation; multiple metrics of predictive performance were considered (including calibration and net benefit, which are often ignored); and reporting followed the TRIPOD guideline [4 5]. However, we have identified some concerns and issues, that we want to flag to BMJ readers.
Firstly, a potential issue is that calibration (i.e. agreement between observed and predicted risks) appears perfect in both the development dataset and the validation dataset, with a CITL of 0 and a calibration slope of 1. Now, we’d expect these results in the model development dataset (at least when using unpenalised regression models; here the lasso was used, but the data is so large that penalisation is most likely negligible, and so a CITL of 0 and calibration slope of 1 may be possible). However, to observe such perfect results upon validation is highly unusual. What is concerning is whether there has been recalibration of the model (perhaps unknowingly) in the validation dataset, before then calibration measures have been estimated. Perhaps the authors can ask an independent statistician to check this? It would also help to know how they calculated CITL and calibration slope. Perhaps their statistical analysis code could be made available?
A related point is that the calibration plot is given for the derivation dataset (Fig 2 and Appendix 11), but not for the validation dataset. Perhaps Figure 2 is for the validation dataset? Regardless, it appears that straight lines have been used to join the groups on the plot (which should be avoided), rather than a smoothed calibration curve being added.[6 7] This may hide some potential deviation.
Secondly, the authors have created a simplified score from their final prediction model (shown in Table 2), and it is the simplified score they are recommending for practice. However, it is not clear if the score itself was validated, or the model equation based on the lasso. If the performance metrics relate to the original (lasso) model, then this is not a reflection of the performance of the simplified score. Unlike the regression based model, there are challenges in validating a simplified score developed in this manner, namely due to the lack of predicted risks which impedes an assessment of calibration. How can calibration slope be 1 (even in model development) when the score no longer maps 1-to-1 with the risk estimates of the original model? Clarity is needed. For example, how do the authors get predicted risks? Do they take the average risk for everyone with that score (Fig 2, center panel)? How do they then examine CITL and calibration slope?
A related issue is that the decision curve of a score should have the form of a step function, but the one shown here is smooth. Hence, something appears amiss, and it is important that the authors can clarify this.
Thirdly, as part of the three-stage model building strategy, the authors selected ‘optimal’ cut-off values to categorise (often dichotomise) continuous predictors. It is well-known that this approach is biologically implausible and loses information.[8-11] A simplified score can still be created after modelling the continuous predictors on their continuous scale and allowing for non-linear relationships.[12 13] Indeed, the lack of allowing for non-linearity in the regression based model is one potential reason for why it does (slightly) worse in terms of discrimination than the machine learning based model (XGBoost). Indeed, it not clear why results from this particular machine learning method were presented (by definition, the method precludes bedside use), as they serve no obvious purpose except to distract, and provide an unfair comparison to the regression based model.
Fourthly, when evaluating the performance of a prediction model in large datasets (as done here), it is important to evaluate performance in relevant subgroups and settings, not just on average.[14 15] A model’s performance may work well on average (e.g. in average across England, Wales and Scotland), but not work well in particular countries, regions, or hospitals. A particular concern is the potential for miscalibration at the region or individual hospital-level, where heterogeneity (e.g. in case-mix, clinical care) may lead to large differences from the average. Surely before the NHS decides to implement this approach, this should be checked closely? Otherwise decisions are being made on potentially miscalibrated risk predictions in each hospital. A related point: It is good to see that discrimination was checked in ethnic and sex subgroups; but why not also calibration?
Fifthly, logistic regression models were used in the model development. The authors ‘included patients without an outcome after four weeks and considered to have had no event’. But what about longer term mortality? It should be much clearer then that this model is for prediction of mortality risk by 4 weeks after admission, and not for prediction of longer time outcome.
Sixthly, we note that predictions from this new tool should be viewed as predicted risks in the context of current care. That is, a low risk does not mean that the patient should immediately be sent home without care; rather it means that in the context of current care and treatment pathways, the patient is unlikely to die within 4 weeks. They might still be severely ill well beyond 4 weeks. If the model is rolled out in the NHS, this needs to be absolutely clear to those that are implementing it.
Finally, the comparison with existing scores is problematic, as the sample size varies between 197 and 19361 (i.e. each model uses a different validation sample based on the availability of predictors), and the two models added to the decision curve are from 15 years ago for a related but different condition. Therefore, the claim that the developed score is better than what exists is poorly supported in this way.
We hope readers and the authors find our comments constructive.
With best wishes,
Richard D Riley, Gary S Collins, Maarten van Smeden, Kym Snell, Ben Van Calster, Laure Wynants
Reference List
1. Knight SR, Ho A, Pius R, et al. Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ 2020;370:m3339.
2. Wynants L, Van Calster B, Collins GS, et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ 2020;369:m1328.
3. Riley RD, Ensor J, Snell KIE, et al. Calculating the sample size required for developing a clinical prediction model. BMJ 2020;368:m441.
4. Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594.
5. Moons KG, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015;162(1):W1-73.
6. Austin PC, Steyerberg EW. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med 2014;33(3):517-35.
7. Van Calster B, Nieboer D, Vergouwe Y, et al. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol 2016; 74:167-76.
8. Collins GS, Ogundimu EO, Cook JA, et al. Quantifying the impact of different approaches for handling continuous predictors on the performance of a prognostic model. Stat Med 2016;35(23):4124-35.
9. Cohen J. The cost of dichotomization. Appl Psychol Meas 1983;7:249-53.
10. Altman DG, Royston P. Statistics notes: The cost of dichotomising continuous variables. BMJ 2006;332::1080.
11. Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med 2006;25(1):127-41.
12. Bonnett LJ, Snell KIE, Collins GS, et al. Guide to presenting clinical prediction models for use in clinical settings. BMJ 2019;365:l737.
13. Sullivan LM, Massaro JM, D'Agostino RB, Sr. Presentation of multivariate data for clinical use: The Framingham Study risk score functions. Stat Med 2004;23(10):1631-60.
14. Riley RD, Ensor J, Snell KI, et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ 2016;353:i3140.
15. Riley RD, van der Windt D, Croft P, et al., editors. Prognosis Research in Healthcare: Concepts, Methods and Impact. Oxford, UK: Oxford University Press, 2019.
Competing interests: No competing interests
Improving prognostication of patients admitted to hospital with COVID-19
Dear Editor,
We read with interest the elegant publication for the prognostic role of 4C Mortality Score in patients with SARS-CoV-2 infection and implemented this score in the everyday clinical practice of our department. There is indeed an amenable need both for prognostic and theragnostic indicators in patients with COVID-19, given disease heterogeneity and contradictory results of recent studies in the field (1, 2).
The 4C Mortality Score presents with several methodological advantages over other COVID-19 prognostic scores including clinical applicability at first assessment. Nonetheless, a major limitation of this score is that comorbidities were defined according to a modified Charlson comorbidity index(3). Charlson comorbidity index is not a COVID-19 specific score. Thus, comorbidities with a potential prognostic role in SARS-CoV-2 infection such as arterial hypertension and thyroid disorders were not included in the analysis(4, 5).
Several lines of evidence have shown that arterial hypertension is common in patients with COVID-19 and has a negative prognostic role with regards to disease progression and mortality(4, 6). The negative prognostic role of arterial hypertension seemed to be retained irrespective of antihypertensive regimen, as Renin–Angiotensin–Aldosterone System Inhibitors did not increase the risk for severe COVID-19(7, 8). Accordingly, previous evidence for the negative prognostic role of thyroid disorders in Idiopathic Pulmonary Fibrosis and the beneficial effects of thyroid hormone in experimental lung fibrosis fueled investigation of thyroid disorders in COVID-19(9, 10). A large population-based case-control and cohort study demonstrated that the use of levothyroxine was associated with an increased risk of death, hospitalization, intensive care unit admission, mechanical ventilation and dialysis before propensity score weighting (5). Given that findings attenuated after propensity score weighting, the role of thyroid disorders in the prognostication of SARS-CoV-2 infection warrants further investigation.
Based on the above, we believe that incorporation of these two comorbidities in a modified 4C Mortality Score may better stratify patients admitted to hospital with COVID-19 and improve the discriminatory performance of this valuable risk score. Future studies implementing the modified 4C Mortality Score are greatly anticipated.
Authors
Theodoros Karampitsakos, Elli Malakounidou, Argyris Tzouvelekis
Department of Respiratory Medicine, University Hospital of Patras, Greece
References
1. Gordon AC, Mouncey PR. Interleukin-6 Receptor Antagonists in Critically Ill Patients with Covid-19. NEJM
2021.
2. Veiga VC, Prats JAGG, Farias DLC, Rosa RG, Dourado LK, Zampieri FG, et al. Effect of tocilizumab on clinical outcomes at 15 days in patients with severe or critical coronavirus disease 2019: randomised controlled trial. BMJ. 2021;372:n84.
3. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of chronic diseases. 1987;40(5):373-83.
4. Guan W-j, Liang W-h, Zhao Y, Liang H-r, Chen Z-s, Li Y-m, et al. Comorbidity and its impact on 1590 patients with Covid-19 in China: A Nationwide Analysis. European Respiratory Journal. 2020:2000547.
5. Brix TH, Hegedüs L, Hallas J, Lund LC. Risk and course of SARS-CoV-2 infection in patients treated for hypothyroidism and hyperthyroidism. The lancet Diabetes & endocrinology. 2021.
6. Cunningham JW, Vaduganathan M, Claggett BL, Jering KS, Bhatt AS, Rosenthal N, et al. Clinical Outcomes in Young US Adults Hospitalized With COVID-19. JAMA internal medicine. 2021;181(3):379-81.
7. Reynolds HR, Adhikari S, Pulgarin C, Troxel AB, Iturrate E, Johnson SB, et al. Renin–Angiotensin–Aldosterone System Inhibitors and Risk of Covid-19. New England Journal of Medicine. 2020;382(25):2441-8.
8. Mancia G, Rea F, Ludergnani M, Apolone G, Corrao G. Renin–Angiotensin–Aldosterone System Blockers and the Risk of Covid-19. New England Journal of Medicine. 2020;382(25):2431-40.
9. Oldham JM, Kumar D, Lee C, Patel SB, Takahashi-Manns S, Demchuk C, et al. Thyroid Disease Is Prevalent and Predicts Survival in Patients With Idiopathic Pulmonary Fibrosis. Chest. 2015;148(3):692-700.
10. Yu G, Tzouvelekis A, Wang R, Herazo-Maya JD, Ibarra GH, Srivastava A, et al. Thyroid hormone inhibits lung fibrosis in mice by improving epithelial mitochondrial function. Nature medicine. 2018;24(1):39-49.
Competing interests: No competing interests