安德鲁·巴托和理查德·萨顿因其开发强化学习的概念和算法基础而获得 2024 年 ACM A.M. 图灵奖。

Recipients: ACM named Andrew G. Barto and Richard S. Sutton as the 2024 ACM A.M. Turing Award recipients for developing RL's conceptual and algorithmic foundations.
- Barto: Professor Emeritus at the University of Massachusetts, Amherst.
- Sutton: Professor at the University of Alberta, Research Scientist at Keen Technologies, and Fellow at Amii.
ACM A.M. Turing Award: Often called the “Nobel Prize in Computing” with a $1 million prize supported by Google. Named for Alan M. Turing.
Reinforcement Learning (RL):
- Concerned with constructing agents that perceive and act, with more intelligent agents choosing better actions. Reward indicates behavior quality.
- Idea known to animal trainers for thousands of years. Turing's 1950 paper addressed machine learning based on rewards. Initial experiments by Turing and Arthur Samuel in the late 1950s.
- In the 1980s, Barto and Sutton formulated RL as a general problem framework using Markov decision processes (MDPs), where the agent makes decisions in a stochastic environment and aims to maximize long-term cumulative reward.
- Developed many basic algorithmic approaches like temporal difference learning, policy-gradient methods, and using neural networks. Their textbook is the standard reference in the field.
Practical Applications: Major advances in the past 15 years by merging RL with deep learning led to deep reinforcement learning. Examples include AlphaGo's victory over human Go players and the development of ChatGPT with RLHF. RL has also succeeded in robot motor skill learning, network congestion control, chip design, etc.
Inspiration and Impact: Research in cognitive science, psychology, and neuroscience inspired RL. Specific RL algorithms provide explanations for dopamine system findings. ACM President Yannis Ioannidis and Jeff Dean praised their work. It continues to grow and offer potential for further advances.