Abstract

Classic first-order optimization methods, such as gradient descent (GD), have significant limitations: generally linear convergence, sensitivity to the choice of learning rate, and significant oscillations around the minimum, which are particularly pronounced in ill-conditioned problems. To overcome these limitations, some authors use the Quasi-Newton approach, which approximates the inverse of the Hessian matrix in order to obtain better curvature information. This allows us to move from the linear convergence speed (GD) to superlinear convergence speed (BFGS) using first-order information. In this paper, we propose DASQN, a parallel implementation of the Quasi-Newton method, that brings together the advantages of both DaveQN and AsySQN. After that, we mathematically prove the convergence of DASQN. The experiments on server with 32 cores with MNIST and CIFARD10 showed that DASQN outperforms AsySQN in terms of speedup and with the same precision.

Bio

Arsene Tayo Abichai has a Master’s degree in Mathematical Modeling. They are interested in the applications of mathematical optimization in artificial intelligence and machine learning. Their work focuses on the development and analysis of optimization models applied to machine learning problems.