Abstract: | Adding momentum to SGD algorithm—an improvement known as momentum-based SGD (mSGD) algorithm—is one of the most important techniques for accelerating convergence, but the practical applications encounter several challenges. Classical mSGD algorithm, are designed for an architecture in which a central server collects massive amounts of data from different edge devices and performs the optimization. However, this architecture can lead to data privacy concerns for the local edge devices, as well as communication overhead issues caused by the continuous transmission of large volumes of raw data. As a result, many related distributed algorithms have been proposed, but there has been little corresponding convergence theory research about distributed mSGD algorithm. This talk concerns the last-iterate convergence theory for a class of distributed mSGD algorithms, with a decaying learning rate {εn}n≥0 and based on the joint work with Ruinan Jin, Bo Zhang and Hong Qiao. |