Background: There is limited information about the agreement and reliability of clinical shoulder tests. Objectives: To assess the interrater agreement and reliability of clinical shoulder tests in patients with shoulder pain treated in primary care. Methods: Patients with a primary report of shoulder pain underwent a set of 21 clinical shoulder tests twice on the same day, by pairs of independent physical therapists. The outcome parameters were observed and specific interrater agreement for positive and negative scores, and interrater reliability (Cohen’s kappa (κ)). Positive and negative interrater agreement values of ≥0.75 were regarded as sufficient for clinical use. For Cohen’s κ, the following classification was used: <0.20 poor, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good, 0.81–1.00 very good reliability. Participating clinics were randomized in two groups; with or without a brief practical session on how to conduct the tests. Results: A total of 113 patients were assessed in 12 physical therapy practices by 36 physical therapists. Positive and negative interrater agreement values were both sufficient for 1 test (the Full Can Test), neither sufficient for 5 tests, and only sufficient for either positive or negative agreement for 15 tests. Interrater reliability was fair for 11 tests, moderate for 9 tests, and good for 1 test (the Full Can Test). An additional brief practical session did not result in better agreement or reliability. Conclusion: Clinicians should be aware that interrater agreement and reliability for most shoulder tests is questionable and their value in clinical practice limited.