Background/Purpose: A newly developed and (cross-cultural) validated measurement tool, the computerized Animated Activity Questionnaire (AAQ) for assessing activity limitations in hip and knee osteoarthritis (HKOA) patients, consists of video animations from which patients can choose the animation that best matches their own performance. For application in daily clinical practice as well as in research, the aim of this study was to determine reliability, responsiveness, and interpretability of the AAQ. Methods: First, 238 HKOA patients mixed from hospital and rehabilitation center completed the AAQ twice with 7 days in between. Test-retest reliability (intra-class correlation coefficient (ICC)) the Standard Error of Measurement (SEM), and the Smallest Detectable Change (SDC) were calculated. Second, 92 other patients with hip or knee OA were followed for 6 months in order to assess responsiveness. Patients received conservative physical therapy treatment or joint replacement surgery and were measured before intervention and 6 months later. We hypothesized that change scores on the AAQ (score range 0-100) correlated at least 0.6 with self-report (ADL subscore of the Hip disability and Knee Injury Osteoarthritis Outcome Score,), performance based tests (Timed Up and Go test, Stair Climbing Test, and 30 seconds Chair Stand Test), and a Global Rating of Change (GRC). To estimate the Minimal Important Change (MIC) of the AAQ an anchor-based MIC distribution method was used. The Receiver Operating Characteristic (ROC) method was used to find the optimal AAQ change score that best discriminates. The MIC was compared to the SDC in order to facilitate the interpretation of change scores. Results: ICC for test-retest reliability was 0.93 (95%CI: 0.91-0.95). SEM and SDC were 4.9 and 13.5, respectively. After 6 months the change scores of the AAQ correlated 0.67 with self-reports, 0.47-0.55 with performance based tests, and 0.43 with GRC. The ROC curve showed an area under the curve of 0.71 with a sensitivity of 62% and a specificity of 79% for the optimal MIC of 9.12 for discrimination. The MIC was smaller than the SDC meaning that the change is important but cannot be distinguished from measurement error in individual patients. Conclusion: The AAQ showed good internal consistency, test-retest reliability, and SDC resulting in an average mean score difference of the AAQ over 14% indicating a real improvement in activity limitations in a mix of surgical and conservative HKOA patients. The AAQ is considered responsive, despite the moderate correlations with performancebased tests and GRC, which seems to be caused by the slightly different, new construct the AAQ is measuring with regard to the domain activity limitations.