Distributed Optimization and Statistics: An Unconsummated Marriage

Interdisciplinary Areas: Internet of Things and Cyber Physical Systems, Data/Information/Computation

Project Description

In the era of big data, learning tasks are high-dimensional and highly nonconvex; examples include deep learning, recommendation systems, and autonomous navigation, just to name a few. These nonconvex large-scale models and solution methods are poorly understood. The broad objective of this project is to develop a theory of the fundamental tradeoffs that govern statistical learning in a large-scale nonconvex networked setting. Current optimization algorithms provably fail to find (even near-)global optima in an efficient manner, terminating in stationary points (e.g., local minima) whose statistical properties are poorly understood. Moreover, the sheer volume and spatial/temporal diversity of data often render centralized processing and storage either infeasible or highly inefficient. These interacting constraints in the learning environment lead to a fundamental tradeoff between distributed computation and storage, and information-theoretic accuracy, a tradeoff that is poorly understood. This project aims at shading light on these tradeoffs, which will feed into algorithms determining the optimal allocation of computational and communication resources–such as computing time and network topology–to obtain provably (near-)optimal distributed, nonconvex, statistical learning/estimation algorithms.

Start Date

August 2019

Postdoc Qualifications

The ideal candidate is expected to have a strong background in math and good knowledge of optimization and/or statistics.  
 

Co-advisors

Gesualdo Scutari, Thomas and Jane Schmidt Rising Star Associate Professor, School of Industrial Engineering, Email: gscutari@purdue.edu; url: https://engineering.purdue.edu/~gscutari/

Guang Cheng, Professor, Dept. of Statistics, Email: chengg@purdue.edu; url:http://www.stat.purdue.edu/~chengg/ 

References

1. Gesualdo Scutari and Ying Sun, "Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization", C.I.M.E Lecture Notes in Mathematics, Springer Verlag Series, 2018, 158 pages

2. Loris Cannelli, Francisco Facchinei, Vyacheslav Kungurtsev, and Gesualdo Scutari, "Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization-Part I: Model and Convergence”, preprint arXiv:1607.04818 

3. Gesualdo Scutari, Francisco Facchinei, and Lorenzo Lampariello, “Parallel and Distributed Methods for Nonconvex Optimization-Part I & Part II: Theory & Applications in Communications and Machine Learning," IEEE Trans. on Signal Processing, vol. 65, no. 8, pp. 1929-1960, April 2017 

4. Liu, M. and Cheng, G. (2018) Early Stopping for Nonparametric Testing, NIPS 

5. Xu, G., Shang, Z. and Cheng, G. (2018) Optimal Tuning for Divide-and-Conquer Kernel Ridge Regression with Massive Data, ICML, 80:5479-5487

6. Volgushev, S., Chao, S. and Cheng, G. (2018) Distributed Inference for Quantile Regression Processes, Annals of Statistics, To Appear.