Introduction Both clinicians and researchers value observation as an important source of diagnostic information, especially in forensic, mental health and school settings. However, it is not well-known how reliable information collected by means of observation is. Methods The present study aimed to systematically review the literature on the inter-rater reliability (IRR) of observation of aggression and impulsivity. Results A total of 37 papers on the observation of aggression that provided information about the IRR was selected and reviewed. Forms of observation ranged from videotaped observation in a lab to participant observation in a naturalistic setting (e.g. with an observer taking part in the situation). Relatively few studies focused on observation of aggression in naturalistic settings. For various reasons, no papers on the observation of impulsivity could be included. Regardless of differences in forms and settings, the IRR of observing aggression was fair to excellent. Conclusion Different forms of observation (e.g. non-participant, direct) taking place in different settings (e.g. naturalistic or lab) can be executed reliably. This finding is encouraging for clinicians who want to make use of systematic observations in naturalistic settings. However, the relatively sparse research on these naturalistic observations underscores the need for research on the topic.