RESULTS: The inter-observer agreement was excellent for diagnosis (k = 0.96), perianal disease (k = 0.92) and disease location in CD (k = 0.82) and good for age of onset (k = 0.67), upper gastrointestinal disease (k = 0.62), disease behaviour in CD (k = 0.79) and disease extent in UC (k = 0.65). Disease severity in UC was scored poor (k = 0.23). The additional items resulted in a good inter-observer agreement for EIM (k = 0.68) and a moderate agreement for disease severity in CD (k = 0.44). Percentages of correct answers over all Montreal items give a good reflection of the inter-observer agreement (> 80%), except for disease severity (48%-74%). IBD-nurses were significantly worse in scoring upper gastrointestinal disease in CD compared to gastroenterologists (P = 0.008) and gastroenterologists in training (P = 0.040). Observers with less than 10 years of experience were significantly better at scoring UC severity than observers with 10-20 years (P = 0.003) and more than 20 years (P = 0.003) of experience with IBD patient care. Observers with 10-20 years of experience with IBD patient care were significantly better at scoring upper gastrointestinal disease in CD than observers with less than 10 years (P = 0.007) and more than 20 years (P = 0.007) of experience with IBD patient care.
CONCLUSION: We found a good to excellent interobserver agreement for all Montreal items except for disease severity in UC (poor).
AIM: To validate the Montreal classification system for Crohn's disease (CD) and ulcerative colitis (UC) within the Netherlands.
METHODS: A selection of 20 de-identified medical records with an appropriate representation of the inflammatory bowel disease (IBD) sub phenotypes were scored by 30 observers with different professions (gastroenterologist specialist in IBD, gastroenterologist in training and IBD-nurses) and experience level with IBD patient care. Patients were classified according to the Montreal classification. In addition, participants were asked to score extra-intestinal manifestations (EIM) and disease severity in CD based on their clinical judgment. The inter-observer agreement was calculated by percentages of correct answers (answers identical to the "expert evaluation") and Fleiss-kappa (k ). Kappa cutoffs: < 0.4-poor; 0.41-0.6-moderate; 0.61-0.8-good; > 0.8 excellent.