About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICML 2019
Conference paper
Trimming the 1 regularizer: Statistical analysis, optimization, and applications to deep learning
Abstract
We study high-dimensional estimators with the trimmed 1 penalty, which leaves the h largest parameter entries penalty-free. While optimization techniques for this nonconvEx penalty have been studied, the statistical properties have not yet been analyzed. We present the first statistical analyses for m-estimation, and characterize support recovery, and 2 error of the trimmed 1 estimates as a function of the trimming parameter h. Our results show different regimes based on how h compares to the true support size. Our second contribution is a new algorithm for the trimmed regularization problem, which has the same theoretical convergence rate as difference of convex (DC) algorithms, but in practice is faster and finds lower objective values. Empirical evaluation of 1 trimming for sparse linear regression and graphical model estimation indicate that trimmed 1 can outperform vanilla 1 and non-convex alternatives. Our last contribution is to show that the trimmed penalty is beneficial beyond M-estimation, and yields promising results for two deep learning tasks: input structures recovery and network sparsification.