Background: We recently developed a model of stratified exercise therapy, consisting of (i) a stratification algorithm allocating patients with knee osteoarthritis (OA) into one of the three subgroups (‘high muscle strength subgroup’ representing a post-traumatic phenotype, ‘low muscle strength subgroup’ representing an age-induced phenotype, and ‘obesity subgroup’ representing a metabolic phenotype) and (ii) subgroup-specific exercise therapy. In the present study, we aimed to test the construct validity of this algorithm. Methods: Data from five studies (four exercise therapy trial cohorts and one cross-sectional cohort) were used to test the construct validity of our algorithm by 63 a priori formulated hypotheses regarding three research questions: (i) are the proportions of patients in each subgroup similar across cohorts? (15 hypotheses); (ii) are the characteristics of each of the subgroups in line with their proposed underlying phenotypes? (30 hypotheses); (iii) are the effects of usual exercise therapy in the 3 subgroups in line with the proposed effect sizes? (18 hypotheses). Results: Baseline data from a total of 1211 patients with knee OA were analyzed for the first and second research question, and follow-up data from 584 patients who were part of an exercise therapy arm within a trial for the third research question. In total, the vast majority (73%) of the hypotheses were confirmed. Regarding our first research question, we found similar proportions in each of the three subgroups across cohorts, especially for three cohorts. Regarding our second research question, subgroup characteristics were almost completely in line with the proposed underlying phenotypes. Regarding our third research question, usual exercise therapy resulted in similar, medium to large effect sizes for knee pain and physical function for all three subgroups. Conclusion: We found mixed results regarding the construct validity of our stratification algorithm. On the one hand, it is a valid instrument to consistently allocate patients into subgroups that aligned our hypotheses. On the other hand, in contrast to our hypotheses, subgroups did not differ substantially in effects of usual exercise therapy. An ongoing trial will assess whether this algorithm accompanied by subgroup-specific exercise therapy improves clinical and economic outcomes.