2021-07-16 11:00:00 2021-07-16 12:00:00 America/Indiana/Indianapolis Parallel and decentralized algorithms for big-data optimization over networks Amir Daneshmand, Ph.D. Candidate https://us02web.zoom.us/j/4833833577?pwd=Z21lYVdKMXdad2g1elljMW9kV2lnUT09

July 16, 2021

Parallel and decentralized algorithms for big-data optimization over networks

Event Date: July 16, 2021
Sponsor: Dr. Gesualdo Scutari
Time: 11:00 AM EST
Location: https://us02web.zoom.us/j/4833833577?pwd=Z21lYVdKMXdad2g1elljMW9kV2lnUT09
Contact Name: Anita Park
Contact Email: apark@purdue.edu
Priority: No
School or Program: Industrial Engineering
College Calendar: Show
Amir Daneshmand, Ph.D. Candidate
Amir Daneshmand, Ph.D. Candidate

ABSTRACT

Recent decades have witnessed the rise of data deluge generated incessantly by heterogeneous sources, e.g., social networks, streaming, marketing services etc., which has naturally created a surge of interests in theory and applications of large-scale convex and non-convex optimization problems. For example, real-world instances of statistical learning problems such as deep learning, recommendation systems etc. can generate sheer volumes of spatially/temporally diverse data (up to Petabytes of data in commercial applications) with millions of decision variables to be optimized. Such problems are often referred to as Big-data problems. Solving these problems by standard optimization methods demands intractable amount of centralized storage and computational resources which is infeasible and is the foremost purpose of parallel and decentralized algorithms developed in this thesis. This research consists of two parts: (I) Distributed Nonconvex Optimization and (II) Distributed Convex Optimization.

In Part (I), we start by studying a winning paradigm in Big-data optimization, Block Coordinate Descent (BCD) algorithms, which cease to be effective when problem dimensions grow overwhelmingly. In particular, we consider a general family of constrained non-convex large-scale problems, and we design an efficient hybrid deterministic/random parallel algorithm using a synergy of Successive Convex Approximation (SCA), greedy/random dimensionality reduction techniques. We provide theoretical and empirical results showing efficacy of the proposed scheme in face of huge-scale problems. In next step, we broaden the network setting to general ad-hoc directed topologies and we propose a class of gradient-tracking algorithms with global convergence guarantees to critical points of the problem. We further explore the geometric landscape of the non-convex problems to establish second-order guarantees and strengthen our local solution results to convergence to global optima of a wide range of Machine Learning problems.

In Part (II), we focus on family of distributed convex optimization problems defined over meshed networks. Relevant state-of-the-art algorithms often consider limited problem settings with pessimistic communication complexities with respect to their centralized variants which raises an important question: can we achieve the rate of centralized first-order methods over networks, and moreover, can we improve upon their communication costs by using higher-order local solvers? We answer to this question by designing an algorithm that utilizes surrogate objective functions in local solvers coupled with a perturbed (push-sum) consensus mechanism, which provably matches the centralized variants in terms of iteration complexity up to multiplying network parameters. When considering, Empirical Risk Minimization (ERM) problems with statistically homogeneous data, the algorithm provably achieves faster rates than first-order methods, for the first time without communication of Hessian matrices. Finally, we focus on the ill-conditioning issue, and we propose an “statistically informed” preconditioned cubic regularized Newton method which provably improves upon the rates of first order methods. The proposed scheme does not require communication of Hessian information in the network, and yet, achieves the iteration complexity of centralized second-order methods up to the statistical precision.