Purpose: The Osteoarthritis Research Society International has identified a core set of performance-based tests of physical function for use in people with knee osteoarthritis (OA). The core set consists of the 30-second chair stand test (30-s CST), 4 × 10 m fast-paced walk test (40 m FPWT) and a stair climb test. The aim of this study was to evaluate the reliability, validity and responsiveness of these performance-based measures to assess the ability to measure physical function in knee OA patients. Methods: A prospective cohort study of 85 knee OA patients indicated for total knee arthroplasty (TKA) was performed. Construct validity and responsiveness were assessed by testing of predefined hypotheses. A subgroup (n = 30) underwent test–retest measurements for reliability analysis. The Oxford Knee Score, Knee injury and Osteoarthritis Outcome Score—Physical Function Short Form, pain during activity score and knee extensor strength were used as comparator instruments. Measurements were obtained at baseline and 12 months after TKA. Results: Appropriate test–retest reliability was found for all three tests. Intraclass correlation coefficient (ICC) for the 30-s CST was 0.90 (95% CI 0.68; 0.96), 40 m FPWT 0.93 (0.85; 0.96) and for the 10-step stair climb test (10-step SCT) 0.94 (0.89; 0.97). Adequate construct validity could not be confirmed for the three tests. For the 30-s CST, 42% of the predefined hypotheses were confirmed; for the 40 m FPWT, 27% and for the 10-step SCT 36% were confirmed. The 40 m FPWT was found to be responsive with 75% of predefined hypothesis confirmed, whereas the responsiveness for the other tests could not be confirmed. For the 30 s CST and 10-step SCT, only 50% of hypotheses were confirmed. Conclusions: The three performance-based tests had good reliability, but poor construct validity and responsiveness in the assessment of function for the domains sit-to-stand movement, walking short distances and stair negotiation. The findings of the present study do not justify their use for clinical practice. Level of evidence: Level 1. Diagnostic study.