Background: Course of illness in major depression (MD) is highly varied, which might lead to both under- and overtreatment if clinicians adhere to a 'one-size-fits-all' approach. Novel opportunities in data mining could lead to prediction models that can assist clinicians in treatment decisions tailored to the individual patient. This study assesses the performance of a previously developed data mining algorithm to predict future episodes of MD based on clinical information in new data. Methods: We applied a prediction model utilizing baseline clinical characteristics in subjects who reported lifetime MD to two independent test samples (total n = 4226). We assessed the model's performance to predict future episodes of MD, anxiety disorders, and disability during follow-up (1–9 years after baseline). In addition, we compared its prediction performance with well-known risk factors for a severe course of illness. Results: Our model consistently predicted future episodes of MD in both test samples (AUC 0.68–0.73, modest prediction). Equally accurately, it predicted episodes of generalized anxiety disorder, panic disorder and disability (AUC 0.65–0.78). Our model predicted these outcomes more accurately than risk factors for a severe course of illness such as family history of MD and lifetime traumas. Limitations: Prediction accuracy might be different for specific subgroups, such as hospitalized patients or patients with a different cultural background. Conclusions: Our prediction model consistently predicted a range of adverse outcomes in MD across two independent test samples derived from studies in different subpopulations, countries, using different measurement procedures. This replication study holds promise for application in clinical practice.