Richard Lawrence, Claudia Perlich, et al.
IBM Systems Journal
Several authors have suggested viewing boosting as a gradient descent search for a good fit in function space. At each iteration observations are re-weighted using the gradient of the underlying loss function. We present an approach of weight decay for observation weights which is equivalent to "robustifying" the underlying loss function. At the extreme end of decay this approach converges to Bagging, which can be viewed as boosting with a linear underlying loss function. We illustrate the practical usefulness of weight decay for improving prediction performance and present an equivalence between one form of weight decay and "Huberizing" - a statistical method for making loss functions more robust. Copyright 2005 ACM.
Richard Lawrence, Claudia Perlich, et al.
IBM Systems Journal
Robert Tibshirani, Michael Saunders, et al.
JRSSB: Statistical Methodology
Aurélie C. Lozano, Naoki Abe, et al.
KDD 2009
Claudia Perlich, Saharon Rosset, et al.
KDD 2007