Interpretation of microbiota-based diagnostics by explaining individual classifier decisions

A. Eck, L. M. Zintgraf, E. F.J. de Groot, T. G.J. de Meij, T. S. Cohen, P. H.M. Savelkoul, M. Welling, A. E. Budding

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Background: The human microbiota is associated with various disease states and holds a great promise for non-invasive diagnostics. However, microbiota data is challenging for traditional diagnostic approaches: It is high-dimensional, sparse and comprises of high inter-personal variation. State of the art machine learning tools are therefore needed to achieve this goal. While these tools have the ability to learn from complex data and interpret patterns therein that cannot be identified by humans, they often operate as black boxes, offering no insight into their decision-making process. In most cases, it is difficult to represent the learning of a classifier in a comprehensible way, which makes them prone to be mistrusted, or even misused, in a clinical environment. In this study, we aim to elucidate microbiota-based classifier decisions in a biologically meaningful context to allow their interpretation. Results: We applied a method for explanation of classifier decisions on two microbiota datasets of increasing complexity: gut versus skin microbiota samples, and inflammatory bowel disease versus healthy gut microbiota samples. The algorithm simulates bacterial species as being unknown to a pre-trained classifier, and measures its effect on the outcome. Consequently, each patient is assigned a unique quantitative estimation of which species in their microbiota defined the classification of their sample. The algorithm was able to explain the classifier decisions well, demonstrated by our validation method, and the explanations were biologically consistent with recent microbiota findings. Conclusions: Application of a method for explaining individual classifier decisions for complex microbiota analysis proved feasible and opens perspectives on personalized therapy. Providing an explanation to support a microbiota-based diagnosis could guide decisions of clinical microbiologists, and has the potential to increase their confidence in the outcome of such decision support systems. This may facilitate the development of new diagnostic applications.

Original languageEnglish
Article number441
JournalBMC Bioinformatics
Volume18
Issue number1
DOIs
Publication statusPublished - 4 Oct 2017

Cite this

@article{d8f640c2514e49a1adfba5d69ab4e2f9,
title = "Interpretation of microbiota-based diagnostics by explaining individual classifier decisions",
abstract = "Background: The human microbiota is associated with various disease states and holds a great promise for non-invasive diagnostics. However, microbiota data is challenging for traditional diagnostic approaches: It is high-dimensional, sparse and comprises of high inter-personal variation. State of the art machine learning tools are therefore needed to achieve this goal. While these tools have the ability to learn from complex data and interpret patterns therein that cannot be identified by humans, they often operate as black boxes, offering no insight into their decision-making process. In most cases, it is difficult to represent the learning of a classifier in a comprehensible way, which makes them prone to be mistrusted, or even misused, in a clinical environment. In this study, we aim to elucidate microbiota-based classifier decisions in a biologically meaningful context to allow their interpretation. Results: We applied a method for explanation of classifier decisions on two microbiota datasets of increasing complexity: gut versus skin microbiota samples, and inflammatory bowel disease versus healthy gut microbiota samples. The algorithm simulates bacterial species as being unknown to a pre-trained classifier, and measures its effect on the outcome. Consequently, each patient is assigned a unique quantitative estimation of which species in their microbiota defined the classification of their sample. The algorithm was able to explain the classifier decisions well, demonstrated by our validation method, and the explanations were biologically consistent with recent microbiota findings. Conclusions: Application of a method for explaining individual classifier decisions for complex microbiota analysis proved feasible and opens perspectives on personalized therapy. Providing an explanation to support a microbiota-based diagnosis could guide decisions of clinical microbiologists, and has the potential to increase their confidence in the outcome of such decision support systems. This may facilitate the development of new diagnostic applications.",
keywords = "Inflammatory bowel disease (IBD), IS-pro, Machine learning, Microbiota, Supervised classification",
author = "A. Eck and Zintgraf, {L. M.} and {de Groot}, {E. F.J.} and {de Meij}, {T. G.J.} and Cohen, {T. S.} and Savelkoul, {P. H.M.} and M. Welling and Budding, {A. E.}",
year = "2017",
month = "10",
day = "4",
doi = "10.1186/s12859-017-1843-1",
language = "English",
volume = "18",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

Interpretation of microbiota-based diagnostics by explaining individual classifier decisions. / Eck, A.; Zintgraf, L. M.; de Groot, E. F.J.; de Meij, T. G.J.; Cohen, T. S.; Savelkoul, P. H.M.; Welling, M.; Budding, A. E.

In: BMC Bioinformatics, Vol. 18, No. 1, 441, 04.10.2017.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Interpretation of microbiota-based diagnostics by explaining individual classifier decisions

AU - Eck, A.

AU - Zintgraf, L. M.

AU - de Groot, E. F.J.

AU - de Meij, T. G.J.

AU - Cohen, T. S.

AU - Savelkoul, P. H.M.

AU - Welling, M.

AU - Budding, A. E.

PY - 2017/10/4

Y1 - 2017/10/4

N2 - Background: The human microbiota is associated with various disease states and holds a great promise for non-invasive diagnostics. However, microbiota data is challenging for traditional diagnostic approaches: It is high-dimensional, sparse and comprises of high inter-personal variation. State of the art machine learning tools are therefore needed to achieve this goal. While these tools have the ability to learn from complex data and interpret patterns therein that cannot be identified by humans, they often operate as black boxes, offering no insight into their decision-making process. In most cases, it is difficult to represent the learning of a classifier in a comprehensible way, which makes them prone to be mistrusted, or even misused, in a clinical environment. In this study, we aim to elucidate microbiota-based classifier decisions in a biologically meaningful context to allow their interpretation. Results: We applied a method for explanation of classifier decisions on two microbiota datasets of increasing complexity: gut versus skin microbiota samples, and inflammatory bowel disease versus healthy gut microbiota samples. The algorithm simulates bacterial species as being unknown to a pre-trained classifier, and measures its effect on the outcome. Consequently, each patient is assigned a unique quantitative estimation of which species in their microbiota defined the classification of their sample. The algorithm was able to explain the classifier decisions well, demonstrated by our validation method, and the explanations were biologically consistent with recent microbiota findings. Conclusions: Application of a method for explaining individual classifier decisions for complex microbiota analysis proved feasible and opens perspectives on personalized therapy. Providing an explanation to support a microbiota-based diagnosis could guide decisions of clinical microbiologists, and has the potential to increase their confidence in the outcome of such decision support systems. This may facilitate the development of new diagnostic applications.

AB - Background: The human microbiota is associated with various disease states and holds a great promise for non-invasive diagnostics. However, microbiota data is challenging for traditional diagnostic approaches: It is high-dimensional, sparse and comprises of high inter-personal variation. State of the art machine learning tools are therefore needed to achieve this goal. While these tools have the ability to learn from complex data and interpret patterns therein that cannot be identified by humans, they often operate as black boxes, offering no insight into their decision-making process. In most cases, it is difficult to represent the learning of a classifier in a comprehensible way, which makes them prone to be mistrusted, or even misused, in a clinical environment. In this study, we aim to elucidate microbiota-based classifier decisions in a biologically meaningful context to allow their interpretation. Results: We applied a method for explanation of classifier decisions on two microbiota datasets of increasing complexity: gut versus skin microbiota samples, and inflammatory bowel disease versus healthy gut microbiota samples. The algorithm simulates bacterial species as being unknown to a pre-trained classifier, and measures its effect on the outcome. Consequently, each patient is assigned a unique quantitative estimation of which species in their microbiota defined the classification of their sample. The algorithm was able to explain the classifier decisions well, demonstrated by our validation method, and the explanations were biologically consistent with recent microbiota findings. Conclusions: Application of a method for explaining individual classifier decisions for complex microbiota analysis proved feasible and opens perspectives on personalized therapy. Providing an explanation to support a microbiota-based diagnosis could guide decisions of clinical microbiologists, and has the potential to increase their confidence in the outcome of such decision support systems. This may facilitate the development of new diagnostic applications.

KW - Inflammatory bowel disease (IBD)

KW - IS-pro

KW - Machine learning

KW - Microbiota

KW - Supervised classification

UR - http://www.scopus.com/inward/record.url?scp=85030316587&partnerID=8YFLogxK

U2 - 10.1186/s12859-017-1843-1

DO - 10.1186/s12859-017-1843-1

M3 - Article

VL - 18

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 441

ER -