OBJECTIVE: To determine interobserver and intraobserver variability in pH-impedance interpretation between experts and accuracy of automated analysis (AA).
STUDY DESIGN: Ten pediatric 24-hour pH-impedance tracings were analyzed by 10 observers from 7 world groups and with AA. Detection of gastroesophageal reflux (GER) episodes was compared between observers and AA. Intraobserver agreement was assessed in 3 observers after 3 to 5 months.
RESULTS: Overall, 1242 liquid and mixed GER events were detected, 490 (42%) were scored by the majority of observers, yielding moderate agreement (Cohen's kappa [κ] = 0.46). Intraclass co-efficient for numbers of GER per study was 0.84 (P < .001). AA has 94% sensitivity rate and 74% specificity rate compared with majority consensus (≥6 observers). Agreement for gas GER was poor (κ = 0.11). Intraobserver agreement was κ = 0.49, κ = 0.71, and κ = 0.85 in 3 observers.
CONCLUSION: Interobserver agreement in combined pH-multichannel intraluminal impedance analysis in experts is moderate; only 42% of GER episodes were detected by the majority of observers. Detection of total GER numbers is more consistent. Considering these poor outcomes, AA seems favorable compared with manual analysis because of its reproducibility. However, the lower specificity rate suggests the need for refinement of AA before widespread use can be advocated.