Interobserver agreement in automated metabolic tumor volume measurements of Deauville score 4 and 5 lesions at interim 18F-FDG PET in DLBCL

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Metabolic tumor volume (MTV) on interim-PET (I-PET) is a potential prognostic biomarker for diffuse large B-cell lymphoma (DLBCL). Implementation of MTV on I-PET requires consensus which semi-automated segmentation method delineates lesions most successfully with least user interaction. Methods used for baseline PET are not necessarily optimal for I-PET due to lower lesional standardized uptake values (SUV) at I-PET. Therefore, we aimed to evaluate which method provides the best delineation quality of Deauville-score (DS) 4-5 DLBCL lesions on I-PET at best interobserver agreement on delineation quality and, secondly, to assess the effect of lesional SUVmax on delineation quality and performance agreements. Methods: DS4-5 lesions from 45 I-PET scans were delineated using six semi-automated methods i) SUV 2.5, ii) SUV 4.0, iii) adaptive threshold [A50%peak], iv) 41% of maximum SUV [41%max], v) majority vote including voxels detected by ≥2 methods [MV2] and vi) detected by ≥3 methods [MV3]. Delineation quality per MTV was rated by three independent observers as acceptable or non-acceptable. For each method, observer scores on delineation quality, specific agreements and MTV were assessed for all lesions, and per category of lesional SUVmax (<5, 5-10, >10). Results: In 60 DS4-5 lesions on I-PET, MV3 performed best, with acceptable delineation in 90% of lesions, with a positive agreement (PA) of 93%. Delineation quality scores and agreements per method strongly depended on lesional SUV: the best delineation quality scores were obtained using MV3 in lesions with SUVmax<10 and SUV4.0 in more FDG-avid lesions. Consequently, overall delineation quality and PA improved by applying the most preferred method per SUV category instead of using MV3 as single best method. MV3- and SUV4.0-derived MTVs of lesions with SUVmax>10, were comparable after excluding visually failed MV3 contouring. For lesions with SUVmax<10, MTVs using different methods correlated poorly. Conclusion: On I-PET, MV3 performed best and provided the highest interobserver agreement regarding acceptable delineations of DS4-5 DLBCL lesions. However, delineation method preference strongly depended on lesional SUV. Therefore, we suggest to explore an approach that identifies the optimal delineation method per lesion as function of tumor FDG uptake characteristics, i.e. SUVmax.

Original languageEnglish
JournalJournal of Nuclear Medicine
Volume62
Issue number11
Early online date5 Mar 2021
DOIs
Publication statusPublished - 1 Nov 2021

Cite this