TY - JOUR
T1 - Posterior concentration for Bayesian regression trees and forests
AU - Ročková, Veronika
AU - van der Pas, Stéphanie
N1 - Funding Information:
The first author would like to gratefully acknowledge the support from the James S. Kemper Foundation Faculty Research Fund at the University of Chicago, Booth School of Business.
Publisher Copyright:
© Institute of Mathematical Statistics, 2020.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/8
Y1 - 2020/8
N2 - Since their inception in the 1980s, regression trees have been one of the more widely used nonparametric prediction methods. Tree-structured methods yield a histogram reconstruction of the regression surface, where the bins correspond to terminal nodes of recursive partitioning. Trees are powerful, yet susceptible to overfitting. Strategies against overfitting have traditionally relied on pruning greedily grown trees. The Bayesian framework offers an alternative remedy against overfitting through priors. Roughly speaking, a good prior charges smaller trees where overfitting does not occur. While the consistency of random histograms, trees and their ensembles has been studied quite extensively, the theoretical understanding of the Bayesian counterparts has been missing. In this paper, we take a step toward understanding why/when do Bayesian trees and forests not overfit. To address this question, we study the speed at which the posterior concentrates around the true smooth regression function. We propose a spike-and-tree variant of the popular Bayesian CART prior and establish new theoretical results showing that regression trees (and forests) (a) are capable of recovering smooth regression surfaces (with smoothness not exceeding one), achieving optimal rates up to a log factor, (b) can adapt to the unknown level of smoothness and (c) can perform effective dimension reduction when p > n. These results provide a piece of missing theoretical evidence explaining why Bayesian trees (and additive variants thereof) have worked so well in practice.
AB - Since their inception in the 1980s, regression trees have been one of the more widely used nonparametric prediction methods. Tree-structured methods yield a histogram reconstruction of the regression surface, where the bins correspond to terminal nodes of recursive partitioning. Trees are powerful, yet susceptible to overfitting. Strategies against overfitting have traditionally relied on pruning greedily grown trees. The Bayesian framework offers an alternative remedy against overfitting through priors. Roughly speaking, a good prior charges smaller trees where overfitting does not occur. While the consistency of random histograms, trees and their ensembles has been studied quite extensively, the theoretical understanding of the Bayesian counterparts has been missing. In this paper, we take a step toward understanding why/when do Bayesian trees and forests not overfit. To address this question, we study the speed at which the posterior concentrates around the true smooth regression function. We propose a spike-and-tree variant of the popular Bayesian CART prior and establish new theoretical results showing that regression trees (and forests) (a) are capable of recovering smooth regression surfaces (with smoothness not exceeding one), achieving optimal rates up to a log factor, (b) can adapt to the unknown level of smoothness and (c) can perform effective dimension reduction when p > n. These results provide a piece of missing theoretical evidence explaining why Bayesian trees (and additive variants thereof) have worked so well in practice.
KW - Additive regression
KW - Asymptotic minimaxity
KW - BART
KW - Bayesian CART
KW - Posterior concentration
KW - Recursive partitioning
KW - Regression trees
UR - http://www.scopus.com/inward/record.url?scp=85090441746&partnerID=8YFLogxK
U2 - 10.1214/19-AOS1879
DO - 10.1214/19-AOS1879
M3 - Article
AN - SCOPUS:85090441746
SN - 0090-5364
VL - 48
SP - 2108
EP - 2131
JO - Annals of Statistics
JF - Annals of Statistics
IS - 4
ER -