Incomplete patient data is a substantial problem that is not sufficiently addressed in current clinical research. Many published methods assume both completeness and validity of study data. However,this assumption is often violated as individual features might be unavailable due to missing patient examination or distorted/wrong due to inaccurate measurements or human error. In this work we propose to use the Latent Tree (LT) generative model to address current limitations due to missing data. We show on 491 subjects of a challenging dementia dataset that LT feature estimation is more robust towards incomplete data as compared to mean or Gaussian Mixture Model imputation and has a synergistic effect when combined with common classifiers (we use SVM as example). We show that LTs allow the inclusion of incomplete samples into classifier training. Using LTs,we obtain a balanced accuracy of 62% for the classification of all patients into five distinct dementia types even though 20% of the features are missing in both training and testing data (68% on complete data). Further,we confirm the potential of LTs to detect outlier samples within the dataset.
|Number of pages||9|
|Publication status||Published - 2016|