- Abstract: Existing learning rate schedules not needing optimization stopping step T are outperformed by those depending on T. A new approach avoids the need for stopping time by not using schedules at all and shows state-of-the-art performance across various problems. It introduces no extra hyper-parameters over standard optimizers with momentum and is a result of a new theory unifying scheduling and iterate averaging. An open source implementation is available at [this https URL]. Schedule-Free AdamW is the core algorithm for the winning entry in the MLCommons 2024 AlgoPerf Algorithmic Efficiency Challenge Self-Tuning track.
- Subjects: Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Optimization and Control (math.OC), Machine Learning (stat.ML)
- Cite as: arXiv:2405.15682 [cs.LG] (or arXiv:2405.15682v4 [cs.LG] for this version) and [https://doi.org/10.48550/arXi...] via DataCite
- Submission history: From Aaron Defazio. [v1] on Fri, 24 May 2024 16:20:46 UTC (688 KB), [v2] on Thu, 30 May 2024 21:50:15 UTC (689 KB), [v3] on Wed, 7 Aug 2024 17:44:58 UTC (689 KB), [v4] on Tue, 29 Oct 2024 22:40:23 UTC (654 KB)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。