Objectives To synthesize the evidence on the psychometrics functional capacity evaluation (FCE) methods. Methods A systematic literature search in nine databases. The resulting articles were screened based on predefined in- and exclusion criteria. Two reviewers independently performed this screening. Included studies were appraised based on their methodological quality. Results The search resulted in 20 eligible studies about nine different FCE methods. The Baltimore Therapeutic Equipment work simulator showed a moderate predictive validity. The Ergo-Kit (EK) showed moderate variability and high inter- and intra-rater reliability. Low discriminative abilities and high convergent validity were found for the EK. Concurrent validity of the EK and the ERGOS Work Simulator was low to moderate. Moderate to high test–retest, inter- and intra-reliability was found in the Isernhagen Work-Systems (IWS) FCE. The predictive validity of the IWS was low. The physical work performance evaluation (PWPE) showed moderate test–retest reliability and moderate to high inter-rater reliability. Low internal and external responsiveness were found for the PWPE, predictive validity was high. The predictive validity of the short-form FCE was also high but need to be further examined on several psychometric properties. Low discriminative and convergent validity were found for the work disability functional assessment battery. The WorkHab showed moderate to high test–retest, inter- and intra-rater reliability. Conclusion Well-known FCE methods have been rigorously studied, but some of the research indicates weaknesses in their reliability and validity. Future research should address how these weaknesses can be overcome.