Objective: To compare the performance of different methods for determining hippocampal atrophy rates using longitudinal MRI scans in aging and Alzheimer's disease (AD). Background: Quantifying hippocampal atrophy caused by neurodegenerative diseases is important to follow the course of the disease. In dementia, the efficacy of new therapies can be partially assessed by measuring their effect on hippocampal atrophy. In radiotherapy, the quantification of radiation-induced hippocampal volume loss is of interest to quantify radiation damage. We evaluated plausibility, reproducibility and sensitivity of eight commonly used methods to determine hippocampal atrophy rates using test-retest scans. Materials and methods: Manual, FSL-FIRST, FreeSurfer, multi-atlas segmentation (MALF) and non-linear registration methods (Elastix, NiftyReg, ANTs and MIRTK) were used to determine hippocampal atrophy rates on longitudinal T1-weighted MRI from the ADNI database. Appropriate parameters for the non-linear registration methods were determined using a small training dataset (N = 16) in which two-year hippocampal atrophy was measured using test-retest scans of 8 subjects with low and 8 subjects with high atrophy rates. On a larger dataset of 20 controls, 40 mild cognitive impairment (MCI) and 20 AD patients, one-year hippocampal atrophy rates were measured. A repeated measures ANOVA analysis was performed to determine differences between controls, MCI and AD patients. For each method we calculated effect sizes and the required sample sizes to detect one-year volume change between controls and MCI (NCTRL_MCI) and between controls and AD (NCTRL_AD). Finally, reproducibility of hippocampal atrophy rates was assessed using within-session rescans and expressed as an average distance measure DAve, which expresses the difference in atrophy rate, averaged over all subjects. The same DAve was used to determine the agreement between different methods. Results: Except for MALF, all methods detected a significant group difference between CTRL and AD, but none could find a significant difference between the CTRL and MCI. FreeSurfer and MIRTK required the lowest sample sizes (FreeSurfer: NCTRL_MCI = 115, NCTRL_AD = 17 with DAve = 3.26%; MIRTK: NCTRL_MCI = 97, NCTRL_AD = 11 with DAve = 3.76%), while ANTs was most reproducible (NCTRL_MCI = 162, NCTRL_AD = 37 with DAve = 1.06%), followed by Elastix (NCTRL_MCI = 226, NCTRL_AD = 15 with DAve = 1.78%) and NiftyReg (NCTRL_MCI = 193, NCTRL_AD = 14 with DAve = 2.11%). Manually measured hippocampal atrophy rates required largest sample sizes to detect volume change and were poorly reproduced (NCTRL_MCI = 452, NCTRL_AD = 87 with DAve = 12.39%). Atrophy rates of non-linear registration methods also agreed best with each other. Discussion and conclusion: Non-linear registration methods were most consistent in determining hippocampal atrophy and because of their better reproducibility, methods, such as ANTs, Elastix and NiftyReg, are preferred for determining hippocampal atrophy rates on longitudinal MRI. Since performances of non-linear registration methods are well comparable, the preferred method would mostly depend on computational efficiency.