Background: Recognition of pain in people with dementia is challenging. Observational scales have been developed, but there is a need to harmonize and improve the assessment process. In EU initiative COST-Action TD1005, 36 promising items were selected from existing scales to be tested further. We aimed to study the observer agreement of each item, and to analyse the factor structure of the complete set. Methods: One hundred and ninety older persons with dementia were recruited in four different countries (Italy, Serbia, Spain and The Netherlands) from different types of healthcare facilities. Patients represented a convenience sample, with no pre-selection on presence of (suspected) pain. The Pain Assessment in Impaired Cognition (PAIC, research version) item pool includes facial expressions of pain (15 items), body movements (10 items) and vocalizations (11 items). Participants were observed by health professionals in two situations, at rest and during movement. Intrarater and interrater reliability was analysed by percentage agreement. The factor structure was examined with principal component analysis with orthogonal rotation. Results: Health professionals performed observations in 40–57 patients in each country. Intrarater and interrater agreement was generally high (≥70%). However, for some facial expression items, agreement was sometimes below 70%. Factor analyses showed a six-component solution, which were named as follows: Vocal pain expression, Face anatomical descriptors, Protective body movements, Vocal defence, Tension and Lack of affect. Conclusions: Observation of PAIC items can be done reliably in healthcare settings. Observer agreement is quite promising already without extensive training. Significance: In this international project, promising items from existing observational pain scales were identified and evaluated regarding their reliability as an alternative to pain self-report in people with dementia. Analysis on factor structure helped to understand the character of the items. Health professionals from four countries using four different European languages were able to rate items reliably. The results contributed to an informed reduction of items for a clinical observer scale (Pain Assessment in Impaired Cognition scale with 15 items: PAIC15).