Visual rating of age-related white matter changes on magnetic resonance imaging: Scale comparison, interrater agreement, and correlations with quantitative measurements

P. Kapeller, R. Barber, R. J. Vermeulen, H. Adèr, P. Scheltens, W. Freidl, O. Almkvist, M. Moretti, T. Del Ser, P. Vaghfeldt, C. Enzinger, F. Barkhof, D. Inzitari, T. Erkinjunti, R. Schmidt, Franz Fazekas*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Background and Purpose - To provide further insight into the MRI assessment of age-related white matter changes (ARWMCs) with visual rating scales, 3 raters with different levels of experience tested the interrater agreement and comparability of 3 widely used rating scales in a cross-sectional and follow-up setting. Furthermore, the correlation between visual ratings and quantitative volumetric measurement was assessed. Methods - Three raters from different sites using 3 established rating scales (Manolio, Fazekas and Schmidt, Scheltens) evaluated 74 baseline and follow-up scans from 5 European centers. One investigator also rated baseline scans in a set of 255 participants of the Austrian Stroke Prevention Study (ASPS) and measured the volume of ARWMCs. Results - The interrater agreement for the baseline investigation was fair to good for all scales (K values, 0.59 to 0.78). On the follow-up scans, all 3 raters depicted significant ARWMC progression; however, the direct interrater agreement for this task was poor (κ, 0.19 to 0.39). Comparison of the interrater reliability between the 3 scales revealed a statistical significant difference between the scale of Manolio and that of Fazekas and Schmidt for the baseline investigation (z value, -2.9676; P=0.003), demonstrating better interrater agreement for the Fazekas and Schmidt scale. The rating results obtained with all 3 scales were highly correlated with each other (Spearman rank correlation, 0.712 to 0.806; P≤0.01), and there was significant agreement between all 3 visual rating scales and the quantitative volumetric measurement of ARWMC (Kendall W, 0.37, 0.48, and 0.57; P<0.001). Conclusions - Our data demonstrate that the 3 rating scales studied reflect the actual volume of ARWMCs well. The 2 scales that provide more detailed information on ARWMCs seemed preferential compared with the 1 that yields more global information. The visual assessment of ARWMC progression remains problematic and may require modifications or extensions of existing rating scales.

Original languageEnglish
Pages (from-to)441-445
Number of pages5
JournalStroke
Volume34
Issue number2
DOIs
Publication statusPublished - 1 Feb 2003

Cite this