更少计划的路

  • Abstract: Existing learning rate schedules not needing optimization stopping step T are outperformed by those depending on T. A new approach avoids the need for stopping time by not using schedules at all and shows state-of-the-art performance across various problems. It introduces no extra hyper-parameters over standard optimizers with momentum and is a result of a new theory unifying scheduling and iterate averaging. An open source implementation is available at [this https URL]. Schedule-Free AdamW is the core algorithm for the winning entry in the MLCommons 2024 AlgoPerf Algorithmic Efficiency Challenge Self-Tuning track.
  • Subjects: Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Optimization and Control (math.OC), Machine Learning (stat.ML)
  • Cite as: arXiv:2405.15682 [cs.LG] (or arXiv:2405.15682v4 [cs.LG] for this version) and [https://doi.org/10.48550/arXi...] via DataCite
  • Submission history: From Aaron Defazio. [v1] on Fri, 24 May 2024 16:20:46 UTC (688 KB), [v2] on Thu, 30 May 2024 21:50:15 UTC (689 KB), [v3] on Wed, 7 Aug 2024 17:44:58 UTC (689 KB), [v4] on Tue, 29 Oct 2024 22:40:23 UTC (654 KB)
阅读 14
0 条评论