Accuracy and reproducibility of automated white matter hyperintensities segmentation with lesion segmentation tool: A European multi-site 3T study

Federica Ribaldi*, Daniele Altomare, Jorge Jovicich, Clarissa Ferrari, Agnese Picco, Francesca Benedetta Pizzini, Andrea Soricelli, Anna Mega, Antonio Ferretti, Antonios Drevelegas, Beatriz Bosch, Bernhard W. Müller, Camillo Marra, Carlo Cavaliere, David Bartrés-Faz, Flavio Nobili, Franco Alessandrini, Frederik Barkhof, Helene Gros-Dagnac, Jean-Philippe RanjevaJens Wiltfang, Joost Kuijer, Julien Sein, Karl-Titus Hoffmann, Luca Roccatagliata, Lucilla Parnetti, Magda Tsolaki, Manos Constantinidis, Marco Aiello, Marco Salvatore, Martina Montalti, Massimo Caulo, Mira Didic, N. ria Bargallo, Olivier Blin, Paolo M. Rossini, Peter Schonknecht, Piero Floridi, Pierre Payoux, Pieter Jelle Visser, R. gis Bordet, Renaud Lopes, Roberto Tarducci, Stephanie Bombois, Tilman Hensch, Ute Fiedler, Jill C. Richardson, Giovanni B. Frisoni, Moira Marizzoni

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


Brain vascular damage accumulate in aging and often manifest as white matter hyperintensities (WMHs) on MRI. Despite increased interest in automated methods to segment WMHs, a gold standard has not been achieved and their longitudinal reproducibility has been poorly investigated. The aim of present work is to evaluate accuracy and reproducibility of two freely available segmentation algorithms. A harmonized MRI protocol was implemented in 3T-scanners across 13 European sites, each scanning five volunteers twice (test-retest) using 2D-FLAIR. Automated segmentation was performed using Lesion segmentation tool algorithms (LST): the Lesion growth algorithm (LGA) in SPM8 and 12 and the Lesion prediction algorithm (LPA). To assess reproducibility, we applied the LST longitudinal pipeline to the LGA and LPA outputs for both the test and retest scans. We evaluated volumetric and spatial accuracy comparing LGA and LPA with manual tracing, and for reproducibility the test versus retest. Median volume difference between automated WMH and manual segmentations (mL) was −0.22[IQR = 0.50] for LGA-SPM8, −0.12[0.57] for LGA-SPM12, −0.09[0.53] for LPA, while the spatial accuracy (Dice Coefficient) was 0.29[0.31], 0.33[0.26] and 0.41[0.23], respectively. The reproducibility analysis showed a median reproducibility error of 20%[IQR = 41] for LGA-SPM8, 14% [31] for LGA-SPM12 and 10% [27] with the LPA cross-sectional pipeline. Applying the LST longitudinal pipeline, the reproducibility errors were considerably reduced (LGA: 0%[IQR = 0], p < 0.001; LPA: 0% [3], p < 0.001) compared to those derived using the cross-sectional algorithms. The DC using the longitudinal pipeline was excellent (median = 1) for LGA [IQR = 0] and LPA [0.02]. LST algorithms showed moderate accuracy and good reproducibility. Therefore, it can be used as a reliable cross-sectional and longitudinal tool in multi-site studies.
Original languageEnglish
Pages (from-to)108-115
Number of pages8
JournalMagnetic Resonance Imaging
Publication statusPublished - 1 Feb 2021

Cite this