Abstract
Original language | English |
---|---|
Pages (from-to) | 3113-3129 |
Number of pages | 17 |
Journal | Human Brain Mapping |
Volume | 43 |
Issue number | 10 |
Early online date | 2022 |
DOIs | |
Publication status | Published - Jul 2022 |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
Mind the gap : Performance metric evaluation in brain-age prediction. / de Lange, Ann-Marie G.; Anatürk, Melis; Rokicki, Jaroslav et al.
In: Human Brain Mapping, Vol. 43, No. 10, 07.2022, p. 3113-3129.Research output: Contribution to journal › Article › Academic › peer-review
TY - JOUR
T1 - Mind the gap
T2 - Performance metric evaluation in brain-age prediction
AU - de Lange, Ann-Marie G.
AU - Anatürk, Melis
AU - Rokicki, Jaroslav
AU - Han, Laura K. M.
AU - Franke, Katja
AU - Alnæs, Dag
AU - Ebmeier, Klaus P.
AU - Draganski, Bogdan
AU - Kaufmann, Tobias
AU - Westlye, Lars T.
AU - Hahn, Tim
AU - Cole, James H.
N1 - Funding Information: This research was conducted using the UK Biobank under Application 27412. While working on this study, the authors received funding from the Swiss National Science Foundation (Ann‐Marie G. de Lange; PZ00P3_193658; Bogdan Draganski; NCCR Synapsy, project grants Number 32003B_135679, 32003B_159780, 324730_192755 and CRSK‐3_190185), the Leenaards Foundation (Bogdan Draganski), the Collaboratory on Research Definitions for Reserve and Resilience in Cognitive Ageing and Dementia (Melis Anatürk; 5R24AG061421‐03), the UK Medical Research Council (James H. Cole and Melis Anatürk; MR/R024790/2, Klaus P. Ebmeier; G1001354), the HDH Wills 1965 Charitable Trust (Klaus P. Ebmeier; 1117747), the research Council of Norway (Lars T. Westlye; 273345, 249795, 223273; Tobias Kaufmann; 276082), the European Research Council under the European Union's Horizon 2020 research and innovation programme (Lars T. Westlye; 802998), the South‐East Norway Regional Health Authority (Lars T. Westlye; 2015073, 2019107), the German Research Foundation (Katja Franke; FR 3709/1‐2; Tim Hahn; HA7070/2‐2, HA7070/3, HA7070/4), the Interdisciplinary Center for Clinical Research (IZKF) of the medical faculty of Münster (Tim Hahn; MzH 3/020/20), the Interdisciplinary Center for Clinical Research (IZKF) of the Jena University hospital (Katja Franke; AMSP 07) and the ERA‐Net Cofund through the ERA PerMed project “IMPLEMENT” (Jaroslav Rokicki). We thank Dr Dónal Hill, Institute of Physics, École polytechnique fédérale de Lausanne (EPFL), for valuable statistical input. Open access funding enabled and organized by Projekt DEAL. Funding Information: This research was conducted using the UK Biobank under Application 27412. While working on this study, the authors received funding from the Swiss National Science Foundation (Ann-Marie G. de Lange; PZ00P3_193658; Bogdan Draganski; NCCR Synapsy, project grants Number 32003B_135679, 32003B_159780, 324730_192755 and CRSK-3_190185), the Leenaards Foundation (Bogdan Draganski), the Collaboratory on Research Definitions for Reserve and Resilience in Cognitive Ageing and Dementia (Melis Anatürk; 5R24AG061421-03), the UK Medical Research Council (James H. Cole and Melis Anatürk; MR/R024790/2, Klaus P. Ebmeier; G1001354), the HDH Wills 1965 Charitable Trust (Klaus P. Ebmeier; 1117747), the research Council of Norway (Lars T. Westlye; 273345, 249795, 223273; Tobias Kaufmann; 276082), the European Research Council under the European Union's Horizon 2020 research and innovation programme (Lars T. Westlye; 802998), the South-East Norway Regional Health Authority (Lars T. Westlye; 2015073, 2019107), the German Research Foundation (Katja Franke; FR 3709/1-2; Tim Hahn; HA7070/2-2, HA7070/3, HA7070/4), the Interdisciplinary Center for Clinical Research (IZKF) of the medical faculty of Münster (Tim Hahn; MzH 3/020/20), the Interdisciplinary Center for Clinical Research (IZKF) of the Jena University hospital (Katja Franke; AMSP 07) and the ERA-Net Cofund through the ERA PerMed project “IMPLEMENT” (Jaroslav Rokicki). We thank Dr Dónal Hill, Institute of Physics, École polytechnique fédérale de Lausanne (EPFL), for valuable statistical input. Open access funding enabled and organized by Projekt DEAL. Funding Information: Collaboratory on Research Definitions for Reserve and Resilience in Cognitive Aging and Dementia, Grant/Award Number: 5R24AG061421‐03; Deutsche Forschungsgemeinschaft, Grant/Award Numbers: FR 3709/1‐2, HA7070/2‐2, HA7070/3, HA7070/4; ERA‐net Cofound, Grant/Award Number: ERA PerMed project ”IMPLEMENT”; Fondation Leenaards; H2020 European Research Council, Grant/Award Number: 802998; HDH Wills 1965 Charitable Trust, Grant/Award Number: 1117747; Helse Sør‐Øst RHF, Grant/Award Numbers: 2015073, 2019107; Interdisciplinary Center for Clinical Research of the Jena University hospital, Grant/Award Number: AMSP 07; Interdisciplinary Center for Clinical Research of the Medical Faculty of Münster, Grant/Award Number: MzH 3/020/20; Medical Research Council, Grant/Award Numbers: G1001354, MR/R024790/2; Norges Forskningsråd, Grant/Award Numbers: 223273, 249795, 273345, 276082; Swiss National Science Foundation, Grant/Award Numbers: 32003B_135679, 32003B_159780, 324730_192755, CRSK‐3_190185, PZ00P3_193658 Funding information Publisher Copyright: © 2022 The Authors. Human Brain Mapping published by Wiley Periodicals LLC.
PY - 2022/7
Y1 - 2022/7
N2 - Estimating age based on neuroimaging-derived data has become a popular approach to developing markers for brain integrity and health. While a variety of machine-learning algorithms can provide accurate predictions of age based on brain characteristics, there is significant variation in model accuracy reported across studies. We predicted age in two population-based datasets, and assessed the effects of age range, sample size and age-bias correction on the model performance metrics Pearson's correlation coefficient (r), the coefficient of determination (R2), Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). The results showed that these metrics vary considerably depending on cohort age range; r and R2 values are lower when measured in samples with a narrower age range. RMSE and MAE are also lower in samples with a narrower age range due to smaller errors/brain age delta values when predictions are closer to the mean age of the group. Across subsets with different age ranges, performance metrics improve with increasing sample size. Performance metrics further vary depending on prediction variance as well as mean age difference between training and test sets, and age-bias corrected metrics indicate high accuracy—also for models showing poor initial performance. In conclusion, performance metrics used for evaluating age prediction models depend on cohort and study-specific data characteristics, and cannot be directly compared across different studies. Since age-bias corrected metrics generally indicate high accuracy, even for poorly performing models, inspection of uncorrected model results provides important information about underlying model attributes such as prediction variance.
AB - Estimating age based on neuroimaging-derived data has become a popular approach to developing markers for brain integrity and health. While a variety of machine-learning algorithms can provide accurate predictions of age based on brain characteristics, there is significant variation in model accuracy reported across studies. We predicted age in two population-based datasets, and assessed the effects of age range, sample size and age-bias correction on the model performance metrics Pearson's correlation coefficient (r), the coefficient of determination (R2), Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). The results showed that these metrics vary considerably depending on cohort age range; r and R2 values are lower when measured in samples with a narrower age range. RMSE and MAE are also lower in samples with a narrower age range due to smaller errors/brain age delta values when predictions are closer to the mean age of the group. Across subsets with different age ranges, performance metrics improve with increasing sample size. Performance metrics further vary depending on prediction variance as well as mean age difference between training and test sets, and age-bias corrected metrics indicate high accuracy—also for models showing poor initial performance. In conclusion, performance metrics used for evaluating age prediction models depend on cohort and study-specific data characteristics, and cannot be directly compared across different studies. Since age-bias corrected metrics generally indicate high accuracy, even for poorly performing models, inspection of uncorrected model results provides important information about underlying model attributes such as prediction variance.
KW - brain-age prediction
KW - machine learning
KW - neuroimaging
KW - statistics
UR - http://www.scopus.com/inward/record.url?scp=85126743377&partnerID=8YFLogxK
U2 - 10.1002/hbm.25837
DO - 10.1002/hbm.25837
M3 - Article
C2 - 35312210
SN - 1065-9471
VL - 43
SP - 3113
EP - 3129
JO - Human Brain Mapping
JF - Human Brain Mapping
IS - 10
ER -