We studied the psychometric properties of the 39-item v1.1 Dutch-Flemish Patient-Reported Outcomes Measurement Information System Pain Behavior item bank in a sample of 1,602 patients with musculoskeletal complaints. We evaluated the assumptions of the underlying item response theory (IRT) model (unidimensionality and local dependency with confirmatory factor analyses), and monotonicity with scalability coefficients). We studied the IRT model fit of all items and estimated the item parameters of the IRT model. Differential item functioning (DIF) was studied for age and gender, and DIF for language was studied as a measure of cross-cultural validity. Confirmatory factor analyses showed suboptimal fit of a unidimensional model, but a bifactor model showed low risk of bias when a unidimensional model was assumed (Omega H = .92, explained common variance of .70). Fifteen item pairs (2%) were locally dependent. Five items showed poor scalability. All items fitted the IRT model; slope parameters ranged from .60 to 2.00, and threshold parameters from –2.05 to 6.80. One item showed DIF for age, 1 item DIF for gender, and 5 items showed DIF for language, but the impact on total scores was low. Our study shows limitations of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System Pain Behavior item bank when used in a primary care population with musculoskeletal complaints. Perspective: We studied the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System Pain Behavior item bank in a large primary care population of patients with musculoskeletal complaints. It showed that the Pain Behavior item bank has limitations when used in this population.