1
头图

For more than 30 years, Windows has become an indispensable partner for hundreds of millions of people around the world when they work, live and create. On October 5, 2021, the Microsoft product family officially welcomed the latest member, Windows 11. In the context of today's hybrid office and remote collaboration becoming the new normal, the new generation of Windows 11's innovative functional design, efficient and easy-to-use experience, and rich and exciting content will better help users improve their productivity and creativity. Behind the personalized and intelligent functions of Windows 11, there are not only the results of the efforts of the Microsoft product team, but also the technical support of the Microsoft research department. So, how are these rich and useful new functions realized? How are the basic research innovations of Microsoft Research Asia translated into first-line products?

Windows 11 officially released on October 5, 2021, from the "Start" menu, the location of the taskbar, the design of icons and fonts, to the automatic recommendation, voice control and other functions, all make users one step closer to their love . In the more intelligent and user-friendly Windows 11 system, a number of functions based on the technological innovations of Microsoft Research Asia have brought users a brand new experience.

In these technological innovations, in addition to the underlying technical support, there are also some functions that users are familiar with. For example, since Windows 7, the Windows product department has continuously improved the handwriting of Chinese, Japanese, and Korean based on the algorithms provided by Microsoft Research Asia. identification function. On the two functions of "News and Interests" and "Voice Access" in the latest version of Windows 11, the research institute further optimized the underlying model and innovated the algorithm. Behind this is Microsoft. Asia Research Institute has accumulated many years of accumulation in the fields of recommendation algorithms, deep learning, and natural language processing.

The fusion of deep learning and NLP, let Windows 11 "news and interests" know what you

Xie Xing, chief researcher of Microsoft Research Asia, said, "The news content itself and the users who read the news can actually be expressed in text, and the accuracy of the recommendation can essentially be seen as the depth of the semantic understanding of these texts. In the past, the recommendation system Either deep learning technology is not used, and the inherent laws of sample data cannot be learned, or a deeper understanding of semantics cannot be achieved based on deep learning without natural language processing (NLP) technology." Therefore, Microsoft Research Asia will be the latest deep learning It is integrated with NLP technology into the modeling of users and news, which greatly improves the performance and accuracy of the recommendation model. Based on this model, the Windows 11 "News and Interests" feature has achieved diversified, personalized and more accurate news recommendations, and now users can see the news content they are most interested in at any time through widgets and other means in Windows 11 .

Specifically, this recommendation algorithm can be divided into three layers:
The first layer is the understanding of the text of the current news content itself, which is actually the understanding of natural language. This layer is mainly based on the Microsoft Turing Universal Language Representation Model. Its core models and algorithms use the latest unified language pre-training model UniLM and multi-language pre-training model InfoXLM of the institute. These two technologies are used in language understanding, generation and translation tasks. have achieved leading results.

The second layer is the understanding of the user, that is, a series of textual understandings developed around the user, but it is not just a simple text integration. Although users can be regarded as a collection of texts that have browsed or read news, all texts cannot be simply spliced together. The system also needs to understand the order in which users read, as well as the resulting user interest groups, which are important to interests. Sexual distinctions, etc., these are all modeling processes for users. When these factors are considered, the user is transformed from a series of labelled representations to vector representations in deep learning, which greatly improves accuracy.

For example, we can label a user according to different attribute categories, such as male, undergraduate, living in Beijing, etc., so that when a piece of news that is popular with Beijing males appears, the system will push it to relevant users. However, this recommendation method only does simple matching, because these tags cannot accurately describe personal characteristics, such as it is impossible to clearly state that he is really from Beijing, and it does not know what the user's real interests are. But deep learning can put aside the labels, turn each person into a number, or a vector, and recommend content by calculating the similarity between the vectors.

In this way, everyone can be seen as a point in a high-dimensional space. News is another point in the same space, so the distance between users and news can be directly compared. We can imagine that a space contains many users and news at the same time, and the news that is closer to one of the users is naturally his favorite news.
The last layer is sorting. In theory, news recommendation can be regarded as the calculation of the distance between users and news content in a high-dimensional space, but in practice there are more factors to be considered, such as the diversity, fairness, interpretability and so on of news recommendation.

By integrating deep learning and NLP into news recommendation systems, Windows 11 can better meet users' needs for news. The data shows that on the recommended news, users' real-time click-through rate has increased, and the browsing time has also increased.

The recommendation algorithm has high versatility and involves the field of personalized search and recommendation, such as Microsoft advertising, Bing search and other scenarios can be applied. In addition, based on the research on recommendation algorithms, Microsoft Research Asia and the Microsoft News team jointly released MIND, the world's largest English personalized news recommendation data set, which established a relatively authoritative evaluation standard for news recommendation research. Moreover, at the ACL conference in 2021, the two teams also cooperated to build the first benchmark dataset PENS (Personalized News headlineS) that can evaluate the generation method of personalized news headlines offline.

Operates the computer with voice, Microsoft's accessibility features continue to improve

Accessibility enhancements have been incorporated into successive versions of Windows to provide support and convenience for people with different categories of disabilities. The new accessibility feature in Windows 11, Voice Access (Voice Access), allows everyone, including people with reduced mobility, to control their computer through voice, edit text content, such as operating Windows system applications, browsing Web pages, writing emails, etc.

"The Voice Access function uses an end-to-end ASR (Automatic Speech Recognition) technology. It fuses the voice model and the language model into a unified model, which can not only be more accurate It can quickly identify the user's instructions and quickly complete the corresponding tasks. More importantly, it reduces the demand for computing resources and is more suitable for deployment on terminal devices such as laptops. Even in the absence of the Internet, the device can support fast voice. identify."

As shown in the figure above, Voice Access will first number the items on the desktop, and then control them through the following voice commands.

As early as 2019, Microsoft Research Asia has cooperated with the speech group of the Microsoft Azure team to carry out related speech recognition research. The original ASR model is a mixture of an acoustic model and a language model. The acoustic model converts the input speech into the smallest unit phoneme of pronunciation, and then the phoneme is combined with the language model to generate the speech recognition result. Due to the large size of the model, the related technologies were mainly deployed on the Microsoft Azure cloud platform in the SaaS mode for users to use. With the continuous exploration and improvement of ASR technology by researchers, Microsoft's product department hopes that the upgraded ASR technology can be more applied to the product side to support disadvantaged groups to use related products more conveniently.

However, it is not practical to directly deploy large-scale ASR models on end devices. In addition to reducing the weight of the model itself and improving the computing speed, in the process of cooperating with the voice team of the Azure team, the researchers also realized that when converting technology into products, optimizing the model should not only focus on accuracy, but also on user experience. the first priority principle. As Liu Shujie, a senior researcher at Microsoft Research Asia, said, "When we do basic research, we tend to abstract some problems and think about how to make technology better and better at one point. Colleagues in the product department are more Most of them are thinking from the user's point of view, such as when users are more satisfied with the product and feel better when using it."

When testing an end-to-end ASR model, the research team and the product team encountered a collision of different ideas. Liu Shujie introduced that researchers focus on objective indicators, and they will test objective indicators on large data sets. After converting them into products, product managers pay more attention to the subjective feelings of users. Therefore, when Windows 11 runs on Microsoft Surface and computers of various PC manufacturers, the ASR model should also be optimized and adapted accordingly.

During the development process, Microsoft Research Asia worked closely with the voice team of the Azure team and the Windows product department, communicated repeatedly, and through continuous iteration, the test results of the ASR model on multiple devices reached the level of normal human speech. Although the current model only supports speech recognition in American English, the model has cross-language versatility. In the future, it is only necessary to use data in different languages to train the model to realize cross-language speech recognition and manipulation.
Thanks to the development of deep learning and sufficient corpus support, automatic speech recognition ASR has achieved excellent performance in large languages. However, there are still many languages in the world that lack corpus data. These small languages and local dialects are used by a small number of people. Collecting the corresponding language data will consume a lot of manpower and resources, which will cause certain difficulties in the realization of the corresponding ASR. In order to solve this problem, Microsoft Research Asia proposed a new method of speech recognition under extremely low resources, WavLM, especially the pre-training model for ASR. Its various indicators have been ranked first in the SUPERB evaluation data set rankings. ( https://superbbenchmark.org/leaderboard ).

Microsoft has always attached great importance to the construction of accessibility (Accessibility), the purpose is to make the design of products, equipment, services and environments more convenient for people with disabilities to use. In the next step, Microsoft Research Asia will also work with Microsoft's product team to expand similar technologies to more products and application scenarios, breaking communication and use barriers, and empowering everyone.

The refreshed desktop, clean design, comfortable layout and flexible experience all demonstrate the efficiency and innovation of the next-generation operating system Windows 11. Whether it is for work, study, life, games, or artistic creation, programming development, Windows 11 provides users with a more suitable mode. Facing the new normal of hybrid office and new user needs, Microsoft Research Asia will continue to export the latest scientific research results to Microsoft products to help more users improve productivity and inspire creative inspiration!

Pay attention to Microsoft China MSDN for the latest content


微软技术栈
423 声望997 粉丝

微软技术生态官方平台。予力众生,成就不凡!微软致力于用技术改变世界,助力企业实现数字化转型。