How to develop a more accurate risk prediction model when there are few events

Menelaos Pavlou; Gareth Ambler; Shaun R Seaman; Oliver Guttmann; Perry Elliott; Michael King; Rumana Z Omar

doi:10.1136/bmj.h3868

Research Methods & Reporting

How to develop a more accurate risk prediction model when there are few events

BMJ 2015; 351 doi: https://doi.org/10.1136/bmj.h3868 (Published 11 August 2015) Cite this as: BMJ 2015;351:h3868

This article has a correction. Please see:

Errata - June 08, 2016

Menelaos Pavlou, research associate1,
Gareth Ambler, senior lecturer1,
Shaun R Seaman, senior statistician2,
Oliver Guttmann, cardiology registrar3,
Perry Elliott, professor4,
Michael King, professor5,
Rumana Z Omar, professor1

¹Department of Statistical Science, University College London, WC1E 6BT London, UK
²Medical Research Council Biostatistics Unit, Cambridge
³School of Life and Medical Sciences, Institute of Cardiovascular Science, University College London
⁴Inherited Cardiac Disease Unit, the Heart Hospital, London
⁵Division of Psychiatry, University College London

Correspondence to: Menelaos Pavlou m.pavlou{at}ucl.ac.uk

Accepted 21 June 2015

When the number of events is low relative to the number of predictors, standard regression could produce overﬁtted risk models that make inaccurate predictions. Use of penalised regression may improve the accuracy of risk prediction

Summary points

Risk prediction models are used in clinical decision making and are used to help patients make an informed choice about their treatment
Model overfitting could arise when the number of events is small compared with the number of predictors in the risk model
In an overfitted model, the probability of an event tends to be underestimated in low risk patients and overestimated in high risk patients
In datasets with few events, penalised regression methods can provide better predictions than standard regression

Risk prediction models that typically use a number of predictors based on patient characteristics to predict health outcomes are a cornerstone of modern clinical medicine.1 Models developed using data with few events compared with the number of predictors often underperform when applied to new patient cohorts.2 A key statistical reason for this is “model overfitting.” Overfitted models tend to underestimate the probability of an event in low risk patients and overestimate it in high risk patients, which could affect clinical decision making. In this paper, we discuss the potential of penalised regression methods to alleviate this problem and thus develop more accurate prediction models.

Statistical models are often used to predict the probability that an individual with a given set of risk factors will experience a health outcome, usually termed an “event.” These risk prediction models can help in clinical decision making and help patients make an informed choice regarding their treatment.3 4 5 6 Risk models are developed using several risk factors typically based on patient characteristics that are thought to be associated with the health event of interest (box …

View Full Text

How to develop a more accurate risk prediction model when there are few events

This article has a correction. Please see:

Summary points

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information

Search form

How to develop a more accurate risk prediction model when there are few events

This article has a correction. Please see:

Summary points

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information