Power estimation for non-standardized multisite studies

Anisha Keshavan*, Friedemann Paul, Mona K. Beyer, Alyssa H. Zhu, Nico Papinutto, Russell T. Shinohara, William Stern, Michael Amann, Rohit Bakshi, Antje Bischof, Alessandro Carriero, Manuel Comabella, Jason C. Crane, Sandra D'Alfonso, Philippe Demaerel, Benedicte Dubois, Massimo Filippi, Vinzenz Fleischer, Bertrand Fontaine, Laura GaetanoAn Goris, Christiane Graetz, Adriane Gröger, Sergiu Groppa, David A. Hafler, Hanne F. Harbo, Bernhard Hemmer, Kesshi Jordan, Ludwig Kappos, Gina Kirkish, Sara Llufriu, Stefano Magon, Filippo Martinelli-Boneschi, Jacob L. McCauley, Xavier Montalban, Mark Mühlau, Daniel Pelletier, Pradip M. Pattany, Margaret Pericak-Vance, Isabelle Cournu-Rebeix, Maria A. Rocca, Alex Rovira, Regina Schlaeger, Albert Saiz, Till Sprenger, Alessandro Stecco, Bernard M J Uitdehaag, Pablo Villoslada, Mike P. Wattjes, Howard Weiner, Jens Wuerfel, Claus Zimmer, Frauke Zipp, Stephen L. Hauser, Jorge R. Oksenberg, Roland G. Henry, Multiple Sclerosis Genetics Consortium International Multiple Sclerosis Genetics Consortium

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


A concern for researchers planning multisite studies is that scanner and T1-weighted sequence-related biases on regional volumes could overshadow true effects, especially for studies with a heterogeneous set of scanners and sequences. Current approaches attempt to harmonize data by standardizing hardware, pulse sequences, and protocols, or by calibrating across sites using phantom-based corrections to ensure the same raw image intensities. We propose to avoid harmonization and phantom-based correction entirely. We hypothesized that the bias of estimated regional volumes is scaled between sites due to the contrast and gradient distortion differences between scanners and sequences. Given this assumption, we provide a new statistical framework and derive a power equation to define inclusion criteria for a set of sites based on the variability of their scaling factors. We estimated the scaling factors of 20 scanners with heterogeneous hardware and sequence parameters by scanning a single set of 12 subjects at sites across the United States and Europe. Regional volumes and their scaling factors were estimated for each site using Freesurfer's segmentation algorithm and ordinary least squares, respectively. The scaling factors were validated by comparing the theoretical and simulated power curves, performing a leave-one-out calibration of regional volumes, and evaluating the absolute agreement of all regional volumes between sites before and after calibration. Using our derived power equation, we were able to define the conditions under which harmonization is not necessary to achieve 80% power. This approach can inform choice of processing pipelines and outcome metrics for multisite studies based on scaling factor variability across sites, enabling collaboration between clinical and research institutions.

Original languageEnglish
Pages (from-to)281-294
Number of pages14
Publication statusPublished - 1 Jul 2016

Cite this