Shusen Wang, Farbod Roosta-Khorasani, Peng Xu, Michael W. Mahoney

For distributed computing environments, we consider the canonical machinelearning problem of empirical risk minimization (ERM) with quadraticregularization, and we propose a distributed and communication-efficientNewton-type optimization method. At every iteration, each worker locally findsan Approximate NewTon (ANT) direction, and then it sends this direction to themain driver. The driver, then, averages all the ANT directions received fromworkers to form a Globally Improved ANT (GIANT) direction. GIANT naturallyexploits the trade-offs between local computations and global communications inthat more local computations result in fewer overall rounds of communications.GIANT is highly communication efficient in that, for $d$-dimensional datauniformly distributed across $m$ workers, it has $4$ or $6$ rounds ofcommunication and $O (d \log m)$ communication complexity per iteration.Theoretically, we show that GIANT’s convergence rate is faster than first-ordermethods and existing distributed Newton-type methods. From a practicalpoint-of-view, a highly beneficial feature of GIANT is that it has only onetuning parameter—the iterations of the local solver for computing an ANTdirection. This is indeed in sharp contrast with many existing distributedNewton-type methods, as well as popular first-order methods, which have severaltuning parameters, and whose performance can be greatly affected by thespecific choices of such parameters. In this light, we empirically demonstratethe superior performance of GIANT compared with other competing methods.