头图

1 Background

Youdao Zongheng is an online children's Go product under Netease Youdao, which is specially designed for children aged 4-8. It was launched in 2019 and has developed the first online interactive Go animation course in China. Starting from children's understanding and preferences, The live interactive course format makes Go knowledge simple, interesting, easy to understand and easy to learn, and helps children master various rules and skills of Go. Not only that, there is also an AI game function after class, which can intelligently identify the child's rank level to match the game practice, and cultivate the child's thinking habits from the root. The intelligent analysis after the end of each game will carry out an all-round analysis from the five aspects of overall situation, computing power, stability, combat and chess type to help children improve in the replay.

The AlphaGo, AlphaGo Zero, and AlphaZero series of algorithms proposed by Google's Deepmind demonstrate the extraordinary capabilities of deep reinforcement learning in the field of chess. In 2016, AlphaGo was born and defeated the European Go champion Fan Hui 2 dan, and in 2017 defeated the Korean Go professional 9 dan 4:1, 14 world champion Lee Sedol, and in 2018, the self-taught AlphaGo Zero defeated the youngest 3:0. The sixth crown king Ke Jie is the ninth dan. Since then, no one has questioned the dominance of AI in the field of Go, and at the same time it has sparked a wave of professional players learning AI moves. In the professional Go arena, "dog moves" often appear. Learning and researching the logic behind AI moves has become a compulsory course for professional chess players.

2 Problems with existing AI technology

There are already Leela Zero, KataGo and other excellent Go AI open source projects based on the AlphaZero series of algorithms on Github. Their main goal is to improve the chess power of AI. At present, the chess power of the above-mentioned Go AI has far surpassed that of professional human players. However, when strong AI is applied to children's Go teaching, there is a phenomenon of "acclimatization", such as:
• AI is so powerful that it is difficult for people to experience the feeling of being "on equal footing" in the process of playing against AI, which can easily lead to user frustration.
• Give a man a fish but not teach him how to fish. AI only tells him what to do, not why.
• The learning path of AI is different from that of NPC. Some knowledge (such as Zhengzi) that can be mastered in the early stage of learning Go, AI can only master it in the later stage of training.

3 The achievements of Youdao Go AI team

The Youdao Go AI team belongs to the Youdao artificial intelligence voice group, and is responsible for the R&D and implementation of Youdao Zongheng products and Go AI. The existing work results quote a passage from CEO Zhou Feng:

What did Youdao Zongheng do?
In general, Youdao Zongheng is a Go enlightenment course for children, with live broadcasts in large classes and teaching by famous teachers. There are rich interactions in the process of learning and practice, and it also has the ability to play games with AI. At the same time, Youdao has done a very good integration of the five links of teaching, learning, practice, testing, and evaluation, forming a complete picture of this product.
There is a question that everyone will be very concerned about, that is, is the AI teacher useful?
The technical team always says that AI teachers are very useful, can solve the problem of personalized teaching, and can teach students according to their aptitude; teams with teacher background often think that AI teachers are monsters, which are useless and deceive a lot of VC money.
Are AI teachers useful?
In the vertical and horizontal project, we have done a lot of thinking and practice of AI teachers. Our view is that the public's perception of AI is actually a double-edged sword for the product team. Only by recognizing the role of the double-edged sword can a correct design be made.
What is a double-edged sword? On the one hand, AI is a very good marketing tool; on the other hand, users do not know how to make products, and the team must find the real AI value point on their own. If you listen to what users are excited about and do whatever, you will often end up in a pit.
In the AI scene, we have been thinking for a long time. First of all, I think of AlphaGo, no matter how good it is, it can beat you, but it is obviously impossible to talk to users like this, so the difficulty and strength of the game itself are not the indicators of AI in teaching, but how to reduce the difficulty and how to flexibly adjust the difficulty.
Therefore, first, our team has spent a lot of effort to make Go AI with controllable difficulty and controllable chess power; second, AI with controllable chess power and replay ability; third, we promote students and students, students and The chess game between teachers emphasizes that everyone plays chess instead of man-machine chess.
Through such means, we have realized the self-developed Go AI, which can replace part of the work of human beings in the teaching process and improve the production efficiency of the team.

在这里插入图片描述

4 Solutions and ideas

4.1 Man-machine game

An ideal man-machine game teaching system has the following characteristics:
• AI's moves are logical, and it is difficult for users to feel that AI is playing chess.
• Reasonably control the level of AI to avoid a one-sided situation.
• AI can coordinate with the teaching progress and help users consolidate teaching content (such as formulas).

Some other schemes generally use the model in the early stage of the AI training process when implementing the human-computer game system, and then use the top-n output of the model to randomly sample the moves to avoid too single AI moves.

This solution has no other advantages except that it is easy to think of. Since the training volume of the early model is not large, the use of the top-n sampling method will cause the AI's moves to be incoherent, and users can easily induce such loopholes in the logic of the move (such as Zhengzi) . Secondly, in the game process, the AI model and the strategy of the move are fixed, but we found in practice that AI's learning speed of moves in the stages of layout, mid-game, and ending in Go is not the same. The speed of mastering is far beyond the middle game and the end game. Using the same model and strategy will lead to great differences in the performance of AI in the entire game of chess. Furthermore, in AI self-play training, there is no concept of a fixed pattern (a fixed pattern is a summary of the experience of Go masters in some parts, and users can quickly improve their chess skills by learning the fixed moves), and it is difficult for a low-level AI to play the best part in the local game. The optimal solution, and people can quickly master the local optimal play by learning the master's chess records, even if the human level does not reach the level of the master Go master who proposed the formula. The root of the above problem is that AI and human learning paths are very different, and it is difficult to directly transplant.

After considering the above problems, the Go AI team has done the following work:
• Abandon the top-n random sampling strategy, use the policy output of the AI engine, and sample by probability. It ensures the logic and coherence of AI tricks.
• In different lot stages, use different AI models based on winning percentage and vision gap information. Make sure that the level of AI at different stages is similar.
• Combined with teaching content, realize the mixed output of AI model and stereotype template. Consolidate the stereotype knowledge learned by the user.
在这里插入图片描述

4.2 Replay

Replay refers to replaying the record of the game after the game is over to check the pros and cons of the moves in the game and the key to gain and loss. Generally used for self-study, or ask experts to give guidance and analysis. The masters of Go have the habit of replaying. The replay is that after each game is over, the players of both sides repeat the previous game again, which can effectively deepen the impression of the game, and can also find out the loopholes in the attack and defense of both sides, which is a good way to improve their own level. In Youdao Zongheng products, AI assumes the role of a reviewer.

In some other solutions, AI replay mainly displays the winning rate or the sight difference curve of the whole game, the recommended change chart of AI, and some basic statistical data. These contents are more suitable for professional users. The needs of professional users are to quickly locate themselves. Play bad chess, and then infer the AI's move logic based on the change graph provided by the AI. Such users can complete self-learning only based on the raw data of the Go AI engine.

 但是当用户群体定位到少儿时,上述的解决方案效果就会大打折扣,少儿用户很难理解统计数据背后的意义,同时对AI提供的变化图的逻辑缺乏分析能力,甚至注意力很难集中在变化图上,仅关注整局棋的胜率、目差的变化。此外,其他方案采用的复盘使用的GPU资源消耗很大,有的用户甚至需要半天时间才能拿到对局的复盘结果。
 考虑到以上问题后,围棋AI团队做了以下工作:

• Introduce the TTS technology of the voice group to translate the review results into a copy that is easy for children and users to pay attention to.
• Performance optimization. In the usage scenario of child users, users do not need the review results generated by AI with high computing power. We have specified a plan for allocating computing power according to the complexity of the situation.
• Combined with the user's previous review records, describe the user's Go level and form a long-term learning situation report.
在这里插入图片描述

5 Summary and Outlook

At present, the technology of Go AI mainly focuses on improving the level of AI. Of course, this provides great convenience for professional users to train themselves. However, due to the advanced logic behind high-level AI, when Go AI provides services for children, It is difficult for young users to directly acquire knowledge from high-level AI.
Next, we hope to provide users with AI sparring with a more appropriate level and more coherent logic in the human-computer game scenario; in the review scene, provide users with a clearer and easier-to-understand review report.


有道AI情报局
788 声望7.9k 粉丝