2024-08-26 10:00:00 2024-08-26 11:00:00 America/Indiana/Indianapolis Yao Ji Defense Yao Ji PhD Defense GRIS 302

August 26, 2024

Yao Ji Defense

Event Date: August 26, 2024
Speaker: Yao Ji
Time: 10:00 am
Location: GRIS 302
Priority: No
School or Program: Industrial Engineering
College Calendar: Show
Yao Ji PhD Defense

Abstract:

 

Distributed optimization problems defined over  mesh networks are ubiquitous in signal processing, machine learning, and control.  In contrast to centralized approaches where all information and computation resources are available at a centralized server, agents on a distributed system can only use locally available information.  As a result, efforts have been put into the design of efficient distributed algorithms that take into account the communication constraints and make coordinated decisions in a fully distributed manner from a pure optimization perspective. However, as there's an increasing amount of data with high-dimensional features that are generated inherently by distributed systems such as social media, sensor networks, and cloud-based databases, it is crucial to understand the statistical and computational guarantees of distributed algorithms to solve such high dimensional problems over a mesh network.

  

A goal of this thesis is a first attempt at studying the behavior of distributed methods in the high-dimensional regime. The first task is to establish statistical and computational guarantees in the high-dimensional regime for distributed algorithms. Specifically, we study linear regression from data distributed over a network of agents (with no master node) by means of LASSO estimation, in high-dimension, which allows the ambient dimension to grow faster than the

sample size. While there is a vast literature of distributed algorithms applicable to the problem,   statistical and computational guarantees of most of them remain unclear in high dimensions. This thesis provides a first statistical study of the Distributed Gradient Descent (DGD) in the Adapt-Then-Combine (ATC) form. Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions--which hold with high probability for standard data generation  models--suitable conditions on the network connectivity and algorithm tuning, DGD-ATC converges globally at a linear rate to an estimate that is within the centralized statistical precision of the model. In the worst-case scenario, the total number of communications to statistical optimality grows logarithmically  with the ambient dimension, which   improves on the communication complexity of DGD in the Combine-Then-Adapt (CTA) form,   scaling linearly with the dimension.  This reveals that mixing gradient information among agents, as DGD-ATC does, is critical in high-dimensions to obtain favorable rate scalings.  

 

Next, we address the problem of distributed stochastic sparse recovery through  stochastic optimization. We develop and analyze stochastic optimization algorithms for problems over a network, modeled as an undirected graph (with no centralized node), where the expected loss is strongly convex with respect to the Euclidean norm, and the optimum is sparse. Assuming agents only have access to unbiased estimates of the gradients of the underlying expected objective, and stochastic gradients are sub-Gaussian, we use non-Euclidean distributed stochastic dual averaging (DSDA) as a building block to develop a fully decentralized restarting procedure for recovery of sparse solutions over a network. We show that with high probability, the iterates generated by all agents linearly converge to an approximate solution, eliminating fast the  initial error; and then converge sublinearly to the exact sparse solution in the steady-state stages owing to observation noise. The algorithm asymptotically achieves the optimal convergence rate and dimension dependence enjoyed by a non-Euclidean centralized scheme. Further, we precisely identify its non-asymptotic convergence rate as a function of characteristics of the objective functions and the network, and we characterize the transient time needed for the algorithm to approach the optimal rate of convergence.  We illustrate the performance of the algorithm in application to classical problems of sparse linear regression and low rank matrix recovery. Numerical experiments demonstrate the tightness of the theoretical results.

 

 

bio:

Yao Ji is a fifth-year Ph.D. candidate in Industrial Engineering at Purdue University, co-advised by Professors Gesualdo Scutari and Harsha Honnappa.  Her research focuses on decentralized estimation and inference, distributed optimization, stochastic optimization, and high-dimensional probability and statistics.  She will be joining H. Milton Stewart School of Industrial and Systems Engineering (ISyE) at Georgia Technology of Institutes as an H. Milton Stewart Postdoctoral Fellow in Fall 2024.