TY - JOUR
T1 - Privacy-preserving distributed learning of radiomics to predict overall survival and HPV status in head and neck cancer
AU - Bogowicz, Marta
AU - Jochems, Arthur
AU - Deist, Timo M.
AU - Tanadini-Lang, Stephanie
AU - Huang, Shao Hui
AU - Chan, Biu
AU - Waldron, John N.
AU - Bratman, Scott
AU - O’Sullivan, Brian
AU - Riesterer, Oliver
AU - Studer, Gabriela
AU - Unkelbach, Jan
AU - Barakat, Samir
AU - Brakenhoff, Ruud H.
AU - Nauta, Irene
AU - Gazzani, Silvia E.
AU - Calareso, Giuseppina
AU - Scheckenbach, Kathrin
AU - Hoebers, Frank
AU - Wesseling, Frederik W.R.
AU - Keek, Simon
AU - Sanduleanu, Sebastian
AU - Leijenaar, Ralph T.H.
AU - Vergeer, Marije R.
AU - Leemans, C. René
AU - Terhaard, Chris H.J.
AU - van den Brekel, Michiel W.M.
AU - Hamming-Vrieze, Olga
AU - van der Heijden, Martijn A.
AU - Elhalawani, Hesham M.
AU - Fuller, Clifton D.
AU - Guckenberger, Matthias
AU - Lambin, Philippe
PY - 2020/12/1
Y1 - 2020/12/1
N2 - A major challenge in radiomics is assembling data from multiple centers. Sharing data between hospitals is restricted by legal and ethical regulations. Distributed learning is a technique, enabling training models on multicenter data without data leaving the hospitals (“privacy-preserving” distributed learning). This study tested feasibility of distributed learning of radiomics data for prediction of two year overall survival and HPV status in head and neck cancer (HNC) patients. Pretreatment CT images were collected from 1174 HNC patients in 6 different cohorts. 981 radiomic features were extracted using Z-Rad software implementation. Hierarchical clustering was performed to preselect features. Classification was done using logistic regression. In the validation dataset, the receiver operating characteristics (ROC) were compared between the models trained in the centralized and distributed manner. No difference in ROC was observed with respect to feature selection. The logistic regression coefficients were identical between the methods (absolute difference <10−7). In comparison of the full workflow (feature selection and classification), no significant difference in ROC was found between centralized and distributed models for both studied endpoints (DeLong p > 0.05). In conclusion, both feature selection and classification are feasible in a distributed manner using radiomics data, which opens new possibility for training more reliable radiomics models.
AB - A major challenge in radiomics is assembling data from multiple centers. Sharing data between hospitals is restricted by legal and ethical regulations. Distributed learning is a technique, enabling training models on multicenter data without data leaving the hospitals (“privacy-preserving” distributed learning). This study tested feasibility of distributed learning of radiomics data for prediction of two year overall survival and HPV status in head and neck cancer (HNC) patients. Pretreatment CT images were collected from 1174 HNC patients in 6 different cohorts. 981 radiomic features were extracted using Z-Rad software implementation. Hierarchical clustering was performed to preselect features. Classification was done using logistic regression. In the validation dataset, the receiver operating characteristics (ROC) were compared between the models trained in the centralized and distributed manner. No difference in ROC was observed with respect to feature selection. The logistic regression coefficients were identical between the methods (absolute difference <10−7). In comparison of the full workflow (feature selection and classification), no significant difference in ROC was found between centralized and distributed models for both studied endpoints (DeLong p > 0.05). In conclusion, both feature selection and classification are feasible in a distributed manner using radiomics data, which opens new possibility for training more reliable radiomics models.
UR - http://www.scopus.com/inward/record.url?scp=85081744940&partnerID=8YFLogxK
U2 - 10.1038/s41598-020-61297-4
DO - 10.1038/s41598-020-61297-4
M3 - Article
C2 - 32161279
AN - SCOPUS:85081744940
VL - 10
JO - Scientific Reports
JF - Scientific Reports
SN - 2045-2322
IS - 1
M1 - 4542
ER -