Applying machine learning on health record data from general practitioners to predict suicidality

Kasper van Mens*, Elke Elzinga, Mark Nielen, Joran Lokkerbol, Rune Poortvliet, Gé Donker, Marianne Heins, Joke Korevaar, Michel Dückers, Claire Aussems, Marco Helbich, Bea Tiemens, Renske Gilissen, Aartjan Beekman, Derek de Beurs

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


Background: Suicidal behaviour is difficult to detect in the general practice. Machine learning (ML) algorithms using routinely collected data might support General Practitioners (GPs) in the detection of suicidal behaviour. In this paper, we applied machine learning techniques to support GPs recognizing suicidal behaviour in primary care patients using routinely collected general practice data. Methods: This case-control study used data from a national representative primary care database including over 1.5 million patients (Nivel Primary Care Database). Patients with a suicide (attempt) in 2017 were selected as cases (N = 574) and an at risk control group (N = 207,308) was selected from patients with psychological vulnerability but without a suicide attempt in 2017. RandomForest was trained on a small subsample of the data (training set), and evaluated on unseen data (test set). Results: Almost two-third (65%) of the cases visited their GP within the last 30 days before the suicide (attempt). RandomForest showed a positive predictive value (PPV) of 0.05 (0.04–0.06), with a sensitivity of 0.39 (0.32–0.47) and area under the curve (AUC) of 0.85 (0.81–0.88). Almost all controls were accurately labeled as controls (specificity = 0.98 (0.97–0.98)). Among a sample of 650 at-risk primary care patients, the algorithm would label 20 patients as high-risk. Of those, one would be an actual case and additionally, one case would be missed. Conclusion: In this study, we applied machine learning to predict suicidal behaviour using general practice data. Our results showed that these techniques can be used as a complementary step in the identification and stratification of patients at risk of suicidal behaviour. The results are encouraging and provide a first step to use automated screening directly in clinical practice. Additional data from different social domains, such as employment and education, might improve accuracy.

Original languageEnglish
Article number100337
JournalInternet Interventions
Publication statusPublished - Sep 2020

Cite this