Objective: This study introduces a new method to establish clinical thresholds for multi-item tests, based on item response theory (IRT), as an alternative to receiver operating characteristic (ROC) analysis. The performance of IRT method was examined and compared with the ROC method across multiple simulated data sets and in a real data set. Study Design and Setting: Simulated data sets (sample size: 1,000) varied in means and variability of the test scores and the prevalence of disease. The true clinical threshold was defined as a predetermined location on the latent trait underlying the questionnaire, with its corresponding expected test score. The real data set (sample size: 295) comprised Hospital Anxiety Depression Scale (HADS) depression scores and Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition major depressive disorder (MDD) diagnoses. Results: The IRT method recovered the clinical thresholds without bias, whereas the ROC method identified thresholds that were biased by the prevalence of disease. Mild MDD was clinically diagnosed in 23%, moderate MDD in 12%, and severe MDD in 14% of the participants. The IRT method identified the following HADS depression score thresholds for mild, moderate, and severe MDD: 10.7, 13.2, and 15.1, respectively. Conclusion: The new IRT method identifies clinical thresholds that are unbiased by disease prevalence.