Background: The standard reference region (RR) for amyloid-beta (Aβ) PET studies is the cerebellar grey matter (GMCB), while alternative RRs have mostly been utilized without prior validation against the gold standard. This study compared five commonly used RRs to gold standard plasma input-based quantification using the GMCB. Methods: Thirteen subjects from a test–retest (TRT) study and 30 from a longitudinal study were retrospectively included (total: 17 Alzheimer’s disease, 13 mild cognitive impairment, 13 controls). Dynamic [11C]PiB PET (90 min) and T1-weighted MR scans were co-registered and time–activity curves were extracted for cortical target regions and the following RRs: GMCB, whole cerebellum (WCB), white matter brainstem/pons (WMBS), whole brainstem (WBS) and eroded subcortical white matter (WMES). A two-tissue reversible plasma input model (2T4k_Vb) with GMCB as RR, reference Logan and the simplified reference tissue model were used to derive distribution volume ratios (DVRs), and standardized uptake value (SUV) ratios were calculated for 40–60 min and 60–90 min intervals. Parameter variability was evaluated using TRT scans, and correlations and agreements with the gold standard (DVR from 2T4k_Vb with GMCB RR) were also assessed. Next, longitudinal changes in SUVs (both intervals) were assessed for each RR. Finally, the ability to discriminate between visually Aβ positive and Aβ negative scans was assessed. Results: All RRs yielded stable TRT performance (max 5.1% variability), with WCB consistently showing lower variability. All approaches were able to discriminate between Aβ positive and Aβ negative scans, with highest effect sizes obtained for GMCB (range − 0.9 to − 0.7), followed by WCB (range − 0.8 to − 0.6). Furthermore, all approaches provided good correlations with the gold standard (r ≥ 0.78), while the highest bias (as assessed by the regression slope) was observed using WMES (range slope 0.52–0.67), followed by WBS (range slope 0.58–0.92) and WMBS (range slope 0.62–0.91). Finally, RR SUVs were stable across a period of 2.6 years for all except WBS and WMBS RRs (60–90 min interval). Conclusions: GMCB and WCB are considered the best RRs for quantifying amyloid burden using [11C]PiB PET.