Estimating a sharp convergence bound for randomized ensembles

4 Mar 2013  ·  Miles E. Lopes ·

When randomized ensembles such as bagging or random forests are used for binary classification, the prediction error of the ensemble tends to decrease and stabilize as the number of classifiers increases. However, the precise relationship between prediction error and ensemble size is unknown in practice. In the standard case when classifiers are aggregated by majority vote, the present work offers a way to quantify this convergence in terms of "algorithmic variance," i.e. the variance of prediction error due only to the randomized training algorithm. Specifically, we study a theoretical upper bound on this variance, and show that it is sharp --- in the sense that it is attained by a specific family of randomized classifiers. Next, we address the problem of estimating the unknown value of the bound, which leads to a unique twist on the classical problem of non-parametric density estimation. In particular, we develop an estimator for the bound and show that its MSE matches optimal non-parametric rates under certain conditions. (Concurrent with this work, some closely related results have also been considered in Cannings and Samworth (2017) and Lopes (2019).)

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here