Aim To investigate whether items of the SF-12, widely used to assess health outcome in clinical practice and public health research, provide unbiased measurements of underlying constructs in different demographic groups regarding gender, age, educational level and ethnicity. Methods We included 23,146 men and women aged 18–70 of Dutch, South-Asian Surinamese, African Surinamese, Ghanaian, Turkish, or Moroccan origin from the HELIUS study. Both multiple group confirmatory factor analyses (MGCFA), with increasingly stringent model constraints (i.e. assessing Configural, Metric, Strong and Strict measurement invariance (MI)), and regression analysis were conducted to establish comparability of SF-12 items across demographic groups. Results MI regarding gender, age and education was tested in the ethnic Dutch group (N = 4,615). In each subsequent step of testing MI, change in goodness-of-fit measures did not exceed 0.010 (RMSEA) or 0.004 (CFI). Moreover, goodness-of-fit indices showed good fit for strict invariance models: RMSEA<0.055; CFI>0.97. Regarding ethnicity, RMSEA values of metric and subsequent models fell above 0.055, indicating violation of measurement invariance in factor loadings, thresholds and residual variances. Regression analysis revealed possible age-, education- and ethnicity-related DIF. Adjustment for this DIF had little impact on the magnitude of age and educational differences in physical and mental health, but ethnic inequalities in physical health–and to a lesser extent mental health—were reduced after DIF adjustment. Conclusions We found no evidence of violation of measurement invariance of the SF-12 regarding gender, age and educational level. If minor DIF would remain undetected in our MGCFA analyses, we showed that this would have negligible effect on the magnitude of demographic health inequalities. Regarding ethnicity, the SF-12 was not measurement invariant. After accounting for DIF, we observed a reduction of ethnic inequalities in health, in particular in physical health. Caution is warranted when comparing SF-12 scores across population groups with various ethnic backgrounds.