Anupam Gupta, Moritz Hardt, et al.
SIAM Journal on Computing
We consider the problem of identifying the parameters of an unknown mixture of two arbitrary d-dimensional gaussians from a sequence of independent random samples. Our main results are upper and lower bounds giving a computationally efficient moment-based estimator with an optimal convergence rate, thus resolving a problem introduced by Pearson (1894). Denoting by σ2 the variance of the unknown mixture, we prove that (σ12 ) samples are necessary and sufficient to estimate each parameter up to constant additive error when d = 1. Our upper bound extends to arbitrary dimension d > 1 up to a (provably necessary) logarithmic loss in d using a novel - yet simple - dimensionality reduction technique. We further identify several interesting special cases where the sample complexity is notably smaller than our optimal worst-case bound. For instance, if the means of the two components are separated by ?(σ) the sample complexity reduces to O(σ 2 ) and this is again optimal. Our results also apply to learning each component of the mixture up to small error in total variation distance, where our algorithm gives strong improvements in sample complexity over previous work.
Anupam Gupta, Moritz Hardt, et al.
SIAM Journal on Computing
Eric Price, David P. Woodruff
SODA 2013
Eric Price, David P. Woodruff
FOCS 2011
Vitaly Feldman, Will Perkins, et al.
STOC 2015