Comparison of observer variability and accuracy of different criteria for lung scan interpretation

Petronella J. Hagen, Ieneke J.C. Hartmann, Otto S. Hoekstra*, Marcel P.M. Stokkel, Pieter E. Postmus, Martin H. Prins

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Different criteria have been advocated for the interpretation of ventilation/perfusion (V/Q) lung scans in patients with suspected pulmonary embolism (PE). Besides these predefined criteria, many physicians use an integration of the different sets of criteria and their own experience - the so-called Gestalt interpretation. The purpose of this study was to evaluate interobserver variability and accuracy of 3 sets of criteria: the Hull and PIOPED (Prospective Investigation of Pulmonary Embolism Diagnosis) criteria and the Gestalt interpretation. Methods: Two experienced observers interpreted V/Q scans of all 328 patients according to the 3 different schemes. The diagnostic classification obtained for the different sets of criteria was analyzed against the presence or absence of PE. Results: The interobserver variabilities as assessed by the κ statistics of the PIOPED and Hull criteria and for the Gestalt interpretation were 0.70 (95% confidence interval [CI], 0.64-0.76), 0.79 (95% CI, 0.73-0.85), and 0.65 (95% CI, 0.58-0.72), respectively. The differences in κ values between the Hull and PIOPED criteria and between the Hull criteria and Gestalt interpretation were statistically significant (P < 0.05 and P < 0.001, respectively). For 16 patients (14 without PE) with a normal lung scan result according to the Hull criteria, the result according to the PIOPED criteria was low probability. For 21 patients (12 with PE), the scans were intermediate probability according to the PIOPED criteria, whereas the result with the Hull criteria was high probability. Analysis of receiver-operating-characteristic curves yielded a comparable area under the curve for all sets of criteria (0.87-0.90). Conclusion: The Hull, PIOPED, and Gestalt interpretation of V/Q lung scans all have a good accuracy and interobserver variability. However, the reproducibility of the Hull criteria is superior in comparison with that of the other sets of criteria.

Original languageEnglish
Pages (from-to)739-744
Number of pages6
JournalJournal of Nuclear Medicine
Volume44
Issue number5
Publication statusPublished - May 2003

Cite this