Introduction: Manual quality assurance (QA) of radiotherapy contours for clinical trials is time and labor intensive and subject to inter-observer variability. Therefore, we investigated whether deep-learning (DL) can provide an automated solution to salivary gland contour QA. Material and methods: DL-models were trained to generate contours for parotid (PG) and submandibular glands (SMG). Sørensen–Dice coefficient (SDC) and Hausdorff distance (HD) were used to assess agreement between DL and clinical contours and thresholds were defined to highlight cases as potentially sub-optimal. 3 types of deliberate errors (expansion, contraction and displacement) were gradually applied to a test set, to confirm that SDC and HD were suitable QA metrics. DL-based QA was performed on 62 patients from the EORTC-1219-DAHANCA-29 trial. All highlighted contours were visually inspected. Results: Increasing the magnitude of all 3 types of errors resulted in progressively severe deterioration/increase in average SDC/HD. 19/124 clinical PG contours were highlighted as potentially sub-optimal, of which 5 (26%) were actually deemed clinically sub-optimal. 2/19 non-highlighted contours were false negatives (11%). 15/69 clinical SMG contours were highlighted, with 7 (47%) deemed clinically sub-optimal and 2/15 non-highlighted contours were false negatives (13%). For most incorrectly highlighted contours causes for low agreement could be identified. Conclusion: Automated DL-based contour QA is feasible but some visual inspection remains essential. The substantial number of false positives were caused by sub-optimal performance of the DL-model. Improvements to the model will increase the extent of automation and reliability, facilitating the adoption of DL-based contour QA in clinical trials and routine practice.