no code implementations • 26 Jun 2022 • Xin Bing, Florentina Bunea, Jonathan Niles-Weed
Our results establish this metric to be a canonical choice.
no code implementations • 12 Jul 2021 • Xin Bing, Florentina Bunea, Seth Strimas-Mackey, Marten Wegkamp
When $A$ is unknown, we estimate $T$ by optimizing the likelihood function corresponding to a plug in, generic, estimator $\hat{A}$ of $A$.
no code implementations • 20 Jul 2020 • Xin Bing, Florentina Bunea, Seth Strimas-Mackey, Marten Wegkamp
Our primary contribution is in establishing finite sample risk bounds for prediction with the ubiquitous Principal Component Regression (PCR) method, under the factor regression model, with the number of principal components adaptively selected from the data -- a form of theoretical guarantee that is surprisingly lacking from the PCR literature.
no code implementations • 6 Feb 2020 • Florentina Bunea, Seth Strimas-Mackey, Marten Wegkamp
If the effective rank of the covariance matrix $\Sigma$ of the $p$ regression features is much larger than the sample size $n$, we show that the min-norm interpolating predictor is not desirable, as its risk approaches the risk of trivially predicting the response by 0.
no code implementations • 22 Jan 2020 • Xin Bing, Florentina Bunea, Marten Wegkamp
We derive a finite sample upper bound for our estimator, and show that it matches the minimax lower bound in many scenarios.
no code implementations • 13 Jun 2018 • Carson Eisenach, Florentina Bunea, Yang Ning, Claudiu Dinicu
We employ model assisted clustering, in which the clusters contain features that are similar to the same unobserved latent variable.
1 code implementation • 17 May 2018 • Xin Bing, Florentina Bunea, Marten Wegkamp
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data.
no code implementations • 23 Apr 2017 • Xin Bing, Florentina Bunea, Yang Ning, Marten Wegkamp
This work introduces a novel estimation method, called LOVE, of the entries and structure of a loading matrix A in a sparse latent factor model X = AZ + E, for an observable random vector X in Rp, with correlated unobservable factors Z \in RK, with K unknown, and independent noise E. Each row of A is scaled and sparse.
1 code implementation • 16 Jun 2016 • Florentina Bunea, Christophe Giraud, Martin Royer, Nicolas Verzelen
The problem of variable clustering is that of grouping similar components of a $p$-dimensional vector $X=(X_{1},\ldots, X_{p})$, and estimating these groups from $n$ independent copies of $X$.
Statistics Theory Statistics Theory
1 code implementation • 8 Aug 2015 • Florentina Bunea, Christophe Giraud, Xi Luo, Martin Royer, Nicolas Verzelen
We quantify the difficulty of clustering data generated from a G-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics.
no code implementations • 23 May 2014 • Jacob Bien, Florentina Bunea, Luo Xiao
Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results.