Skip to main content

Recent and Selected Publications

The Sketched Wasserstein Distance for mixture distributions (2022), Xin Bing, Florentina Bunea and Jon Niles-Weed,  New title: Estimation and Inference for the Wasserstein Distance between mixing measures in topic models (2023); Submitted.   https://arxiv.org/abs/2206.12768.

Asymptotic confidence sets for random linear programs (2023),  Shuyu Liu, Florentina Bunea, and Jon Niles-Weed; Submitted; https://arxiv.org/pdf/2302.12364.pdf

Interpolating predictors in high-dimensional factor regression  (2022), Florentina Bunea, Seth Strimas-Mackey and Marten Wegkamp.  Journal of Machine Learning Research, Vol 23. [ArXiv].

Likelihood estimation of sparse topic distributions in topic models and its applications to Wasserstein document distance calculations, (2022), Xin Bing, Florentina Bunea, Seth Strimas-Mackey and Marten Wegkamp. Forthcoming  in the Annals of Statistics. [ArXiv]

Detecting approximate replicate components of a high-dimensional random vector with latent structure (2023), Xin Bing, Florentina Bunea and Marten WegkampBernoulli, Vol. 29, pages 1368-1392 [ArXiv] 

Inference  in  latent factor regression with clusterable features (2022), Xin Bing, Florentina Bunea and Marten Wegkamp.  Bernoulli, Vol 28. [ArXiv].

Essential Regression – a generalizable framework for inferring causal latent factors from multi-omic human datasets (2022), Xin Bing,  Tyler Lovelace, Florentina Bunea, Marten Wegkamp,  Harinder Singh, Panayiotis V Benos, Jishnu Das. Forthcoming in Patterns-Cell Press.

Prediction in latent factor regression: Adaptive PCR and beyond (2021), Xin Bing, Florentina Bunea, Seth Strimas-Mackey and Marten Wegkamp. Journal of Machine Learning Research  [ArXiv].

Optimal estimation of sparse topic models (2020),  Xin Bing, Florentina Bunea, Marten Wegkamp. Journal of Machine Learning Research, Vol. 21, 1-45. [ArXiv].

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics (2020), Xin Bing, Florentina Bunea and Marten WegkampBernoulli, Vol. 26 (3), 1765-1796.  [ArXiv]  (Python code is coming soon.  For the beta-version of the code, please contact xb43@cornell.edu)

Adaptive Estimation in Structured Factor Models with Applications to Overlapping Clustering (2020), Xin Bing, Florentina Bunea, Yang Ning and Marten Wegkamp.  The Annals of  Statistics, Vol. 48(4), 2055-2081. [ArXiv]  (R-package is coming up soon.  For the beta-version of the code, please contact xb43@cornell.edu)

High-Dimensional Inference for Cluster-Based Graphical Models (2020), C. Eisenach, F. Bunea, Y. Ning and C. Dinicu,  Journal of Machine Learning Research, Vol. 21, 1- 55. [ArXiv].

Model-assisted variable clustering: minimax-optimal recovery and algorithms  (2020), Florentina Bunea,  Christophe Giraud, Xi Luo, Martin Royer and Nicolas Verzelen, The  Annals of Statistics, Vol. 48 (1), 111-137. [ArXiv].

Essential Regression (2019),  Xin Bing, Florentina Bunea, Marten Wegkamp and Seth Strimas-Mackey. [ArXiv].

Latent model-based clustering for biological discovery (2019), Xin Bing, Florentina Bunea, Martin Royer, Jishnu Das. iScience ISSN 2589-0042[PDF].

PECOK: a convex optimization approach to variable clustering (2017), Florentina Bunea, Christophe Giraud, Martin Royer, and Nicolas Verzelen. [Arxiv].

Minimax Optimal Variable Clustering in G-models via Cord (2016), Florentina Bunea, Christophe Giraud and Xi Luo. [Arxiv].

Convex banding of the covariance matrix  (2016), J. Bien, F. Bunea and L. Xiao, Journal of the  American Statistical Association,Volume 111, 834-845. [ArXiv]

On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA (2015), F. Bunea and L. Xiao,  Bernoulli, Vol. 21, 1200-1230. [ArXiv]

The square root group lasso: theoretical properties and fast algorithms (2014)F. Bunea, J. Lederer and Y. She, IEEE-Information Theory, Vol. 60, 1313-1325, [ArXiv];  For Matlab Code, see http://stat.fsu.edu/~yshe/code/g-sqrtlasso.zip

Joint variable and rank selection for parsimonious estimation of high dimensional matrices, (2012),  F. Bunea, Y. She and M. Wegkamp, The Annals of Statistics, Vol. 40, 2359-2388, [ArXiv]

Optimal selection of reduced rank estimators of high-dimensional matrices (2011), F. Bunea, Y. She and M. Wegkamp, The Annals of Statistics, Vol. 39, 1282 – 1309, [ArXiv]; For Matlab Code, see http://stat.fsu.edu/~yshe/code/rsc.zip

Spades and Mixture Models (2010), F. Bunea, M. Wegkamp, A. Tsybakov and A. Barbu, The Annals of Statistics, Vol. 38, No. 4, 2525 – 2558, [ArXiv]

Honest variable selection in linear and logistic regression models via l1  and l1 + l2 penalization (2008), F. Bunea,  The Electronic Journal of Statistics , Vol. 2, Pages: 1153-1194 .[ArXiv]

Aggregation for Gaussian Regression (2007),  F. Bunea, M. Wegkamp and A. Tsybakov,  The Annals of Statistics,  35 (4), 1674 – 1697. [ArXiv]

Sparsity oracle inequalities for the lasso (2007),  F. Bunea, A. Tsybakov and M. Wegkamp, The Electronic Journal of Statistics, 169 – 194. [ArXiv]

Consistent Covariate Selection and Post Model Selection Inference in Semiparametric Regression (2004),  F. Bunea, The Annals of Statistics, Vol. 32, No. 3, 898-927. [ArXiv]