1

A few days ago, the Jingdong Research Institute and the University of Sydney proposed a super deep learning model ViTAEv2 with larger scale, better effects, and better adaptability to various visual tasks. It is worth mentioning that the ViTAEv2 model with 600 million parameters, without relying on any external data, has achieved the outstanding result of "ranking first in the world" in the classification accuracy of the ImageNet Real dataset, with an accuracy of 91.2%, a success Set a world-class record in image classification technology.

图片

For a long time, the ImageNet dataset is currently the largest "image classification" public dataset, and its recognition accuracy list has attracted top international technology companies including Google, Microsoft, Facebook, and Stanford University, Massachusetts Institute of Technology, National University of Singapore, etc. With the attention and participation of well-known universities, its data indicators were once widely used as an important standard to measure the level of computer vision technology, with far-reaching influence.

As one of the core technologies of artificial intelligence, computer vision technology aims to give machines the ability to observe, perceive and understand, and image classification is widely recognized as the basic task of computer vision. The ViTAEv2 model "on the list" this time adopts the "pre-training-fine-tuning" paradigm, making breakthroughs from the model architecture and training paradigm, making full use of the effectiveness of inductive bias in large-scale models, and adapting to the model structure. Pre-training algorithm and transfer learning algorithm to achieve the target effect.

"In addition, we also explored the few-shot learning capability of the large-scale ViTAEv2 model, that is, fine-tuning the large-scale ViTAEv2 model using 1%, 10%, and 100% of the data, respectively. When fine-tuning, the performance of the large-scale model has been significantly better than that of the smaller model using all the data, which further confirms that the large-scale model has a strong few-shot learning ability, which shows that the super deep model has strong representation and learning ability. and sample efficiency.” said JD.com Research Institute.

This move fully verifies that the ViTAEv2 model has the ability to help solve challenging tasks with low resources or even zero resources, as well as reduce data labeling costs, accelerate algorithm development cycles, simplify model deployment, empower and promote the development and development of new generation automated learning technologies. Outstanding ability to land.

It is worth affirming that the excellent performance of the ViTAEv2 model has helped the computer vision model of the JD Research Institute to reach a new level, and it is expected to continue to promote the development of a series of visual tasks, such as semantic segmentation, object detection, pose estimation, video object segmentation, etc. In the future, how to further improve the performance of the ViTAEv2 model from the aspects of training methods and model architecture design while reducing the cost of training and inference is a research direction worthy of further exploration.


京东云开发者
3.4k 声望5.4k 粉丝

京东云开发者(Developer of JD Technology)是京东云旗下为AI、云计算、IoT等相关领域开发者提供技术分享交流的平台。