Individuals with subthreshold depression have an increased risk of developing major depressive disorder (MDD). The aim of this study was to develop a prediction model to predict the probability of MDD onset in subthreshold individuals, based on their proteomic, sociodemographic and clinical data. To this end, we analysed 198 features (146 peptides representing 77 serum proteins (measured using MRM-MS), 22 sociodemographic factors and 30 clinical features) in 86 first-episode MDD patients (training set patient group), 37 subthreshold individuals who developed MDD within two or four years (extrapolation test set patient group), and 86 subthreshold individuals who did not develop MDD within four years (shared reference group). To ensure the development of a robust and reproducible model, we applied feature extraction and model averaging across a set of 100 models obtained from repeated application of group LASSO regression with ten-fold cross-validation on the training set. This resulted in a 12-feature prediction model consisting of six serum proteins (AACT, APOE, APOH, FETUA, HBA and PHLD), three sociodemographic factors (body mass index, childhood trauma and education level) and three depressive symptoms (sadness, fatigue and leaden paralysis). Importantly, the model demonstrated a fair performance in predicting future MDD diagnosis of subthreshold individuals in the extrapolation test set (AUC = 0.75), which involved going beyond the scope of the model. These findings suggest that it may be possible to detect disease indications in subthreshold individuals up to four years prior to diagnosis, which has important clinical implications regarding the identification and treatment of high-risk individuals.