Variable selection under multiple imputation using the bootstrap in a prognostic study

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

BACKGROUND: Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection.

METHOD: In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels.

RESULTS: We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found.

CONCLUSION: We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values.

Original languageEnglish
Pages (from-to)33
JournalBMC Medical Research Methodology
Volume7
DOIs
Publication statusPublished - 13 Jul 2007

Cite this

@article{96274ca13aa444499969b8b50bf66d83,
title = "Variable selection under multiple imputation using the bootstrap in a prognostic study",
abstract = "BACKGROUND: Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection.METHOD: In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1{\%}. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels.RESULTS: We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0{\%} (full model) to 90{\%} of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found.CONCLUSION: We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values.",
keywords = "Biometry/methods, Chronic Disease, Cohort Studies, Data Interpretation, Statistical, Female, Humans, Low Back Pain/diagnosis, Male, Models, Statistical, Netherlands, Outcome Assessment (Health Care)/methods, Prognosis, Prospective Studies, Randomized Controlled Trials as Topic, Uncertainty",
author = "Heymans, {Martijn W} and {van Buuren}, Stef and Knol, {Dirk L} and {van Mechelen}, Willem and {de Vet}, {Henrica C W}",
year = "2007",
month = "7",
day = "13",
doi = "10.1186/1471-2288-7-33",
language = "English",
volume = "7",
pages = "33",
journal = "BMC Medical Research Methodology",
issn = "1471-2288",
publisher = "BioMed Central",

}

Variable selection under multiple imputation using the bootstrap in a prognostic study. / Heymans, Martijn W; van Buuren, Stef; Knol, Dirk L; van Mechelen, Willem; de Vet, Henrica C W.

In: BMC Medical Research Methodology, Vol. 7, 13.07.2007, p. 33.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Variable selection under multiple imputation using the bootstrap in a prognostic study

AU - Heymans, Martijn W

AU - van Buuren, Stef

AU - Knol, Dirk L

AU - van Mechelen, Willem

AU - de Vet, Henrica C W

PY - 2007/7/13

Y1 - 2007/7/13

N2 - BACKGROUND: Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection.METHOD: In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels.RESULTS: We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found.CONCLUSION: We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values.

AB - BACKGROUND: Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection.METHOD: In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels.RESULTS: We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found.CONCLUSION: We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values.

KW - Biometry/methods

KW - Chronic Disease

KW - Cohort Studies

KW - Data Interpretation, Statistical

KW - Female

KW - Humans

KW - Low Back Pain/diagnosis

KW - Male

KW - Models, Statistical

KW - Netherlands

KW - Outcome Assessment (Health Care)/methods

KW - Prognosis

KW - Prospective Studies

KW - Randomized Controlled Trials as Topic

KW - Uncertainty

U2 - 10.1186/1471-2288-7-33

DO - 10.1186/1471-2288-7-33

M3 - Article

VL - 7

SP - 33

JO - BMC Medical Research Methodology

JF - BMC Medical Research Methodology

SN - 1471-2288

ER -