Abstract: In order to better understand the Pangu model with hundreds of billions of parameters, the HUAWEI CLOUD community interviewed Xie Lingxi, a senior researcher of the HUAWEI CLOUD EI Pangu team. In a very popular way, Dr. Xie explained to us the "past and present" developed by the Pangu large model and the difficult past behind it.

This article is shared from the Huawei Cloud Community " Huawei Senior Researcher Xie Lingxi: Where will the next generation of AI go? Pangu Large Model Pathfinder Tour ", the original author: Selected by Huawei Cloud Community.
image.png

"Everyone lives in a specific era, and each person has a different life path in a specific era. In the same era, some people lament that their lives are not at the right time, and some just want to rest in peace..." This is the 2021 Beijing college entrance examination proposition The beginning of the composition "On birth at the right time".

The answer is a special candidate who has never attended elementary school, junior high school, or high school. He just studied a large number of articles from People's Daily in a short period of time, and then relied on his reading comprehension, text association and language generation ability to write this seemingly decent composition for the college entrance examination.

Yes, it is an AI-Huawei Cloud Pangu large model, which was just selected as the "Treasure of Town Hall" at the 2021 World Artificial Intelligence Conference (WAIC2021)! At the scene, the audience can interact with the large model and directly give the other party questions. For example, the sentence "Obviously and clearly likes him for nothing, but he just doesn't say it, he is very cold." In this sentence, "Mingming" shows a person's name, which is then used as an adjective, and the sentence needs to be segmented. But when the reporter asked the big model "who do you like for nothing?", the big model quickly answered "obviously". correct answer!

Although Pangu did not study hard for more than ten years, it has also experienced "learning" of hundreds of millions of parameters.

Let's look at another example, such as understanding the following two sentences:

  1. Xiao Ming was reading, and through constant persistence, overcoming all kinds of difficulties, finally finished reading.
  2. Xiaohong encountered a lot of difficulties while painting, and finally finished this painting.

Although the characters and events of the above two sentences are not the same, Pangu can also extract the same meaning from it, just like us humans: perseverance. This ability has actually been demonstrated at the Huawei Developer Conference (Cloud) 2021 site. We can't help but want to ask how the Pangu large model is so "smart"?

In order to better understand the Pangu model with hundreds of billions of parameters, the HUAWEI CLOUD community interviewed Xie Lingxi, a senior researcher of the HUAWEI CLOUD EI Pangu team. Considering that some of the technologies involved in the big model are relatively obscure, Dr. Xie explained to us in a very popular way. Here comes the "past and present" of Pangu large model research and development, and the difficult past behind it.
image.png

Xie Lingxi, Senior Researcher of HUAWEI CLOUD EI Pangu Team

What is a big model: the only way for AI to land in thousands of industries

In myths and legends, Pangu opened up the world, and the universe changed from chaos to order. Speaking of the Pangu model, Xie Lingxi started with the birth of artificial intelligence.

"In the 1950s, the concept of AI was proposed. People used artificial design rules to define AI; in the 1980s, under the wave of big data, people realized AI by training data models; later as the scale of data expanded As well as the development of computing power, deep learning has set off a new wave, and various AI models continue to emerge."

"Until the past two years, we began to integrate cross-domain knowledge into the AI model. Various large models based on the Transformer structure appeared, including OpenAI's GPT-3, and the Pangu large model. They opened up the scale and scale of the deep learning model. The situation of common development of performance has reached a new height in the field of deep learning." Xie Lingxi said.
image.png

In the past ten years, the demand for computing resources for AI algorithms has increased by 400,000 times, and the development of neural networks from small models to large models has become an inevitable development trend. large model can solve AI model customization and application development fragmentation. It can absorb a large amount of knowledge, improve the generalization ability of the model, and reduce the dependence on domain data annotation.

On the one hand, the large model activates the self-supervised learning ability of the deep neural network for large-scale unlabeled data, and at the same time has high requirements for the deep optimization and parallelism of the AI framework. It is the ultimate integration of AI under the deep learning framework. By. "From traditional methods to deep learning, this is a big leap, and on the step of deep learning, the big model has been at the forefront, waiting for the next step."

current Pangu series of ultra-large-scale pre-training models, including NLP large models, CV large models, multi-modal large models, and scientific computing large models. model means that it has absorbed a large amount of data knowledge. Taking the Pangu NLP model as an example, it has learned 40TB of Chinese text data; the Pangu CV model contains 3 billion+ parameters. These data improve the generalization ability of large models and improve the algorithm's adaptability to fresh samples, thereby learning the laws hidden behind the data and reducing the dependence on domain data annotations.

Xie Lingxi further explained that on the one hand, a large model can transfer knowledge from unlabeled data to the target task more universally, thereby improving task performance; on the other hand, through the pre-training process, it can learn better parameter initial points, so that the model Only a small amount of data can achieve good results on the target task.

When a large model can learn more from small data samples, it can help us open the door to general AI. It can solve the problem of AI model customization and application development fragmentation.

Xie Lingxi calculated an account for us. He believes that the difficulty of AI algorithm implementation is not because it cannot solve actual problems, but because the application scenarios are too narrow, and each pain point requires customized development, which leads to high investment costs and manpower.

Once the scene changes, the entire model may need to be re-developed. And the large model is a new model of industrialized AI development, which can solve the customization problem of small models, so that a model can be applied to multiple scenarios, and AI can truly land in thousands of industries.

Therefore, as an inevitable product of the development of this era, the large model is worthy of our efforts to explore and explore what the next stage of deep learning and even AI will look like.

Before that, we need to figure out how the large model is made.

More than parameters, Pangu NLP and CV large models have more "trick"

In January, Google proposed the Switch Transformer, a large model with 1.6 trillion parameters;
Nvidia, Stanford and MSR jointly trained a GPT with a parameter of 1,000 billion;
Zhiyuan Research Institute released the 1.75 trillion parameter large model Enlightenment 2.0;
……

In various news reports, it is easy for us to attribute the breakthrough of the large model to 100 million-level parameters.

Xie Lingxi overturned this stereotype: "Large volume and variety are the inevitable requirements of large models, but parameters are not the best indicator to measure model capabilities. If you store all the intermediate states of large model training and make a simple fusion, we even You can multiply the parameter quantity of the model by a very large number, and it can even be said that there are models with of parameters, but this will not greatly help the effect of the model. 160f5462116a12 Therefore, the parameter quantity This indicator is not the final evaluation standard for the strength of a large model."

large model is a complete system that integrates data preprocessing, model architecture, algorithm training and optimization. Even if there is enough computing power, original data, and original model, it does not mean that it can be made to work. Large model, which is a test of technology research and development and collaboration capabilities.

But there is no doubt that the more data, the more the big model learns. "As long as you give it enough data and let him'rote memorization', its understanding will indeed be enhanced." What kind of data determines the basic effect of the model. Xie Lingxi said that based on a large number of parameters, the model can learn the relationship between data, abstract logical capabilities, and become more intelligent.

Pangu NLP Large Model

In the recent CLUE list, Pangu’s NLP model ranked first in the overall ranking, reading comprehension rankings, and classification task rankings, and the overall ranking score was one percentage point higher than the second. In order to illustrate how Pangu's NLP model is close to humans in terms of understanding, returning to the beginning of the article, Xie Lingxi cited the "persistent" example we mentioned at the beginning of the article:

  • Xiao Ming was studying, and through constant persistence, he overcame difficulties and finally succeeded.
  • Xiaohong encountered a lot of difficulties while painting, and finally finished this painting.

Humans can easily know through logical judgment that two things express the same meaning: perseverance, but a large model requires a lot of data feeding and learning to capture the relationship between elements, such as between two paragraphs of text The relationship between several paragraphs of text, which two paragraphs are closer, can we draw a logical conclusion.

Still in the above example, if 2 is changed to "Xiao Ming was reading a book, he encountered a lot of difficulties, but he could not finish it in the end", so the words in 1 and 2 are very similar, but in fact the two express completely different Meaning.

Large models need to learn to judge this relationship. Xie Lingxi explained: "The association between representations (simple features directly extracted from text and images) and semantics is extremely complex. Humans can understand, but let the computer understand it. It is very difficult to build a computational model. Large models hope to use big data and pile up a large number of trainable parameters to accomplish this."

If you want a large model to understand our logical world, work beyond the parameters is also crucial.

First of all, every optimization of a large model with hundreds of billions of parameters will cost a huge cost and affect the whole body. Therefore, Xie Lingxi and the team chose to add prompt-based tasks in the pre-training stage to reduce the difficulty of fine-tuning and to solve the difficulty of fine-tuning large models for different industry scenarios in the past. When the downstream data is sufficient, the reduction of the difficulty of fine-tuning allows the model to be continuously optimized as the data increases; when the downstream data is scarce, the reduction of the difficulty of fine-tuning makes the model's few-sample learning effect significantly improved.
image.png

Pangu NLP large model architecture

In addition, in the model structure, unlike the traditional NLP large models trained by other companies, Pangu values not only the ability to generate large models, but also stronger understanding capabilities. Huawei uses the Encode and Decode architectures to ensure the generation and understanding of the above two performances of the Pangu large model.

Pangu CV large model

For the Pangu CV model, Xie Lingxi also gave an example first: How to distinguish between a white cat and a white dog? Human beings can recognize at a glance which is a cat and which is a dog when they see these two pictures. So how does the large model deal with these?

" We need to let the model understand the really strong correlation between these examples during the training process. " Xie Lingxi emphasized that one of the very important things in the image is the hierarchical information. "In the process of judging an image, we must first grasp the hierarchical information in the image, and be able to quickly locate which part of the information in the image is decisive, and let the algorithm focus on the more important places or content in an adaptive manner. , So it is easy to capture the relationship between the samples. In these two pictures, it is obvious that white is not the most important information, and the animal is the decisive information in the picture."
image.png

Pangu CV large model architecture

Based on this, the Pangu CV large model for the first time takes into account the ability of image discrimination and generation, which can simultaneously meet the needs of low-level image processing and high-level semantic understanding, and can integrate fine-tuning of industry knowledge to quickly adapt to various downstream tasks.

In addition, in order to solve the problems of low learning efficiency and weak representation performance caused by large models and large amounts of data, the Pangu CV large model is optimized in three stages: data processing, architecture design, and model optimization in the pre-training stage. At present, the Pangu CV large model has the highest level of classification accuracy in the small sample on the Image Net 1% and 10% data sets.

In the CV large model, in addition to applying some common algorithms in the industry, also has Huawei self-developed algorithms, such as forcing the model to inject some hierarchical information in the vision, so that the model can learn better.

Behind every self-developed algorithm is actually a summary of the valuable experience of the team after solving every difficulty.

It is difficult to develop large models, but fortunately they

In the entire Pangu large model development process, there are many difficulties, such as the original algorithm mentioned above, because in addition to the architecture and data, the algorithm is a very core technology.

Xie Lingxi talked about one of the technical difficulties in detail: Whether it is text information or image information, things that look similar in representation are completely different in semantic understanding.

"Starting from the problem, we found that visual features are a hierarchical capture process. Some of the features of representation are more concentrated in the shallow features, but when it comes to semantics, they are more reflected in the deep features. Therefore, we need to be different Align these features at a level so that you can learn better. Similarly, in NLP, you need to focus the model's attention on the most appropriate place. This key point is also found through a complex neural network, rather than casually Use algorithms to find key points in a paragraph of text."

This is a very popular explanation, and the technical details are relatively more complicated and difficult to describe in abstract terms. But this problem is only the tip of the iceberg. During the development of the entire large model, Xie Lingxi and his team have to constantly explore the essence of the appearance problem and solve similar technical problems.

Another tricky issue is the debugging and running of the model. In order to gain more knowledge from pre-training, the data of the Pangu large model will definitely become larger and larger, which requires higher performance of the underlying hardware platform. At this time, the effect of pre-training is not the model itself, but whether the infrastructure is well-built.

For example, running a large model requires enough machines to provide sufficient computing power, but a machine can only install up to 8 GPU cards. NLP large models require thousands of GPU cards. Even a small CV large model requires 128 GPUs to run at the same time, so there must be a very good mechanism to rationally allocate resources.

It is difficult for a clever woman to cook without rice. Xie Lingxi was also very distressed at the beginning. Who will support the operation of the large model? Practice has proved that HUAWEI CLOUD has played a big role in the cloud platform that can be paralleled by multiple machines and multiple cards. Yundao platform can easily allocate resources to avoid the hindrance of Pangu R&D progress due to infrastructure problems. At the same time, it can store data in the most suitable format on the server for more effective reading during use.

Not only that, but the difficulty of large models is also difficult in engineering. Huawei's CANN, MindSpore framework, and ModelArts platform are collaboratively optimized to fully release the computing power and provide strong backing support for the Pangu large model:

  • Aiming at the performance of the underlying operator, based on Huawei CANN, technologies such as operator quantization and operator fusion optimization are used to increase the performance of a single operator by more than 30%.
  • Huawei MindSpore innovatively adopts the multi-dimensional automatic hybrid parallel technology of "pipeline parallelism, model parallelism and data parallelism", which greatly reduces the workload of manual coding and improves the cluster linearity by 20%. With the support of the MindSpore open-source framework, how to "refine" the first Chinese pre-trained language model with 100 billion parameters and terabytes of memory? Detailed interpretation of these key technologies.
  • The ModelArts platform provides E-level computing power scheduling, combined with physical network topology, provides dynamic routing planning capabilities, and provides optimal network communication capabilities for large model training.

But as we all know, the reason why a large model is large is that "a lot of data, a large model", which brings high cost of model training. Taking GPT-3 as an example, the cost of training is 12 million U.S. dollars. Xie Lingxi said with emotion, " large model tuning is very difficult in itself. Before each model training, you need to do verification work in many small scenes in . 160f5462116ce6. Every model training needs to ensure that it is foolproof, and it cannot appear that the training has already started. , But there is a phenomenon of the existence of a bug".

Born for "applications", Pangu empowers more users

Large-scale model training has made breakthroughs in all aspects, and has also laid the track for access to the intelligent era for industries lacking a large amount of data. As mentioned by Professor Tian Qi, the chief scientist in the field of artificial intelligence of Huawei Cloud and IEEE Fellow, in the release of the Pangu model, the Pangu model was born for applications in various industries, and Pangu has unprecedented versatility, whether it is a 2B scenario or 2C scene.

Industry knowledge comes from industry data. The Pangu team uses a large amount of industry voice and text data. With the help of these data for fine-tuning, the industry-specific intentions and knowledge understanding capabilities of the model can be greatly improved.

Taking the Pangu CV large model as an example, has shown super-strong application capabilities in the electric power inspection industry. power data for pre-training, and combines a small number of labeled samples to fine-tune an efficient development mode, saving manual labeling time 160f5462116d28. In terms of model versatility, combined with Pangu's automatic data augmentation and category adaptive loss function optimization strategy, the model maintenance cost is greatly reduced.

Xie Lingxi also mentioned that in addition to industry applications, the Pangu large model is gradually being launched in the AI asset sharing community (AI Gallery) for developers. The invitation test system will be opened one after another in the later period, please stay tuned for . On the platform, Pangu will develop some easy-to-use workflows: if you are a developer with a certain foundation, you can do more customized development from the workflow to better release the ability of pre-training models; if You are just an AI development novice, and want to use a large model to do simple AI development. Pangu will also give you a more easy-to-understand interface for everyone to implement it in a drag-and-drop way. Follow-up Pangu will launch a series of courses for developers, guiding developers to develop applications in practical scenarios based on the Pangu model.

On the other hand, Pangu also hopes to grow with developers. "Large models are just a starting point, let it be applied to actual scenarios. Not only is it better to help users improve the progress of training and shorten the training time, but the number of applications on the model increases, and the user's cost is naturally reduced." Xie Lingxi Said that Pangu is far from enough by our team alone. We also need to build this ecology together with developers.

At last

Speaking of the future of Pangu's large model, Xie Lingxi has a simple small goal- to push Pangu to the next technological explosion point . The AI large model is the highest stage of deep learning. It may be a straight line going down. Everyone is waiting for the day of jumping. HUAWEI CLOUD has been working hard, using various original technologies to promote and solve the problems that AI developers will actually encounter. The most essential purpose is to empower thousands of industries to implement AI.

The road is hindered and long, and the line is approaching.

Just like the name of the Pangu large model, Huawei also hopes to use the large model as a starting point to push AI to an unprecedented height, let us go to the next generation of AI, and split the "chaos" on the road to AI in the future.

Click to follow, and get to know the fresh technology of


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量