Metabolic tumor volume (MTV) is a promising biomarker of pretreatment risk in diffuse large B-cell lymphoma (DLBCL). Different segmentation methods can be used that predict prognosis equally well but give different optimal cutoffs for risk stratification. Segmentation can be cumbersome; a fast, easy, and robust method is needed. Our aims were to evaluate the best automated MTV workflow in DLBCL; determine whether uptake time, compliance or noncompliance with standardized recommendations for 18F-FDG scanning, and subsequent disease progression influence the success of segmentation; and assess differences in MTVs and discriminatory power of segmentation methods. Methods: One hundred forty baseline 18F-FDG PET/CT scans were selected from U.K. and Dutch studies on DLBCL to provide a balance between scans at 60 and 90 min of uptake, parameters compliant and noncompliant with standardized recommendations for scanning, and patients with and without progression. An automated tool was applied for segmentation using an SUV of 2.5 (SUV2.5), an SUV of 4.0 (SUV4.0), adaptive thresholding (A50P), 41% of SUVmax (41%), a majority vote including voxels detected by at least 2 methods (MV2), and a majority vote including voxels detected by at least 3 methods (MV3). Two independent observers rated the success of the tool to delineate MTV. Scans that required minimal interaction were rated as a success; scans that missed more than 50% of the tumor or required more than 2 editing steps were rated as a failure. Results: One hundred thirty-eight scans were evaluable, with significant differences in success and failure ratings among methods. The best performing was SUV4.0, with higher success and lower failure rates than any other method except MV2, which also performed well. SUV4.0 gave a good approximation of MTV in 105 (76%) scans, with simple editing for a satisfactory result in additionally 20% of cases. MTV was significantly different for all methods between patients with and without progression. The 41% segmentation method performed slightly worse, with longer uptake times; otherwise, scanning conditions and patient outcome did not influence the tool's performance. The discriminative power was similar among methods, but MTVs were significantly greater using SUV4.0 and MV2 than using other thresholds, except for SUV2.5. Conclusion: SUV4.0 and MV2 are recommended for further evaluation. Automated estimation of MTV is feasible.