On June 1, the third Beijing Zhiyuan Conference officially opened. At the opening ceremony of the conference, Zhiyuan Research Institute released the world's largest super-large-scale intelligent model "Enlightenment 2.0".
The parameter scale of the "Enlightenment 2.0" model reached 1.75 trillion, which is 10 times that of GPT-3. It broke the 1.6 trillion parameter record previously created by the Google Switch Transformer pre-training model. It is currently the first trillion in China and the largest in the world. Level model.
From 1.0 to 2.0, "Enlightenment" to explore general artificial intelligence
On March 20 this year, Zhiyuan Research Institute released the super-large-scale intelligent model "Enlightenment 1.0", which trained a series of models for Chinese, multimodal, cognition, and protein prediction. When President Zhiyuan Institute Professor Huang Tiejun, "enlightenment" development model introduced in mind that in recent years the development of artificial intelligence has from "big refining model" gradually move towards a "large model refining" phase , through advanced design It is an inevitable trend to integrate as much data as possible, gather a large amount of computing power, and train large models intensively for use by a large number of enterprises.
The "Enlightenment 2.0" released yesterday is another successful exploration of the "Large Model".
From the 1.5 billion parameter GPT-2, the 175 billion parameter GPT-3, to the 1.6 trillion parameter Switch Transformer, deep learning models actively embrace violent aesthetics, but these models are not based on Chinese. Enlightenment 2.0 with 1.75 trillion parameters has not only achieved a breakthrough in the amount of parameters, it is also the first trillion-level Chinese pre-training model. Zhang Hongjiang, chairman of the Zhiyuan Research Institute, believes that currently "large model + large computing power" is a feasible path towards general artificial intelligence.
Professor Tang Jie, the academic deputy dean of the Zhiyuan Research Institute, said that "Enlightenment" aims to create cognitive intelligence driven by two wheels of data and knowledge, allowing machines to think like humans and achieve machine cognitive capabilities beyond the Turing test. The "Enlightenment" team has done a lot of basic work in the research and development of large-scale pre-training models, and formed an independent super-large-scale intelligent model technology innovation system. It has everything from pre-training theory and technology to pre-training tools, to pre-training model construction and finalization. The complete chain of model evaluation is technically complete and mature. Through a series of original innovations and technological breakthroughs, the "Enlightenment 2.0" released this time has realized "big and smart", featuring large-scale, high-precision, and high-efficiency features.
Enlightenment 2.0: "Big and Smart"
The parameter scale of Enlightenment 2.0 reached a record-breaking 1.75 trillion. According to reports, the new generation FastMoE technology is the key to the realization of the "Trillion Model" cornerstone of Enlightenment 2.0.
In the past, due to the strong binding of Google's trillion model core technology MoE (Mixture of Experts) with its distributed training framework and its customized hardware, most people could not get the opportunity to use and research. The FastMoE technology researched and open-sourced by the "Enlightenment" team is the first MoE system that supports the PyTorch framework. It is simple to use, flexible, and high-performance, and supports large-scale parallel training. The new generation of FastMoE supports complex balancing strategies such as Switch and GShard, supports different models of different experts, and makes up the last shortcoming for the realization of the trillion model.
FastMoE data parallel mode, each worker puts multiple experts, data parallel between workers. The top-2 gate means that the gate network will select the 2 expert networks with the highest activation scores. (Source:
The high precision of "Enlightenment 2.0" comes from a series of core technological innovations. E.g:
- GLM2.0: A model of model architecture innovation, a more general pre-training model. Previously, it broke the barriers of BERT and GPT for the first time, pioneering a single model compatible with all mainstream architectures. The new generation version is a model of high-performance artificial intelligence with less than more. With 10 billion parameters, it is enough to match Microsoft's 17 billion parameters. The Turing-NLG model achieves better results for multiple tasks.
- P-tuning2.0 algorithm: It greatly narrows the gap between small-sample learning and fully-supervised learning, and the ability of small-sample learning is far ahead.
- CogView: A new framework for text generation and image generation. It overcomes the key problem of "overflow convergence" text and graphics model, and combines VQ-VAE and Transformer to show SOTA (the current model with the best algorithm performance)! The performance of MS COCO FID is better than DALL·E and other models. The model can directly realize the self-scoring function similar to the OpenAI CLIP model, and generate multiple styles such as Chinese painting, oil painting, cartoon painting, and contour painting.
In addition, during the development of the "Enlightenment" model, Zhiyuan Research Institute built the world's largest corpus database WuDaoCorpora2.0, including the world's largest Chinese text data set (3TB), the world's largest multimodal data set (90TB), The world's largest dialog data set (181G) provides rich data support for the research and development of large-scale intelligent models in the industry.
In addition to the release of the Enlightenment 2.0 model, this AI event invited more than 200 top experts in the field of artificial intelligence at home and abroad to conduct in-depth discussions on the cutting-edge research progress and trends in the field of artificial intelligence. The conference set up ``pre-training models'', ``machine learning'', ``group intelligence'', ``mathematical foundations of artificial intelligence'', ``intelligent system architecture and chips'', ``precision intelligence'', ``intelligent information retrieval and mining'' around the international academic frontiers and industrial hot spots of artificial intelligence. 29 thematic forums such as "Qingyuan Academic Annual Conference", "AI Entrepreneurship", "AI Pharmaceutical", "AI System", "AI Open and Sharing", and "Women in AI Technology".
At the opening ceremony on June 1st, Turing Prize winner Yoshua Bengio, Dr. Zhu Min, Dean of the National Institute of Finance of Tsinghua University, and E Weinan, Academician of Peking University, respectively focused on the logical analysis system System2, data assets, science and intelligence. Wonderful keynote report.
For more details, please refer to the official website of the conference: https://2021.baai.ac.cn/
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。