1
头图

Recently, the biennial top academic conference ICCV (International Conference on Computer Vision) in the field of computer vision has successfully concluded. In this conference, Meituan was selected for 2 accepted papers, one of which won the HTCV Symposium Best Paper Nomination Award, and won the runner-up of two well-known challenges, covering face technology, human body technology, model optimization, and low power consumption. And many other fields. And for the first time, the LargeFineFoodAI (large-scale fine-grained food analysis) technical seminar was jointly held with the Institute of Computing Technology of the Chinese Academy of Sciences, Beijing Zhiyuan and the University of Barcelona, which attracted many participants from different time zones to actively participate and discuss, and promote computer vision on the international stage. Technology is applied in the field of food analysis to help everyone eat better and live better.

ICCV is recognized as the highest among the three major computer vision conferences, and its paper acceptance rate is very low. This year, ICCV received a total of 6236 valid submissions, of which 1617 were included, and the acceptance rate was only 25.9%. In these papers, Chinese scholars almost won "half of the country", accounting for 45.7%, nearly double the second place in the United States, and nearly 13 times the third place in the United Kingdom.

Meituan holds a large-scale fine-grained food analysis seminar, experts gather to discuss artificial intelligence to help food health

The seminar consisted of three parts: specially invited expert report, challenge report and paper report. At the seminar, experts and scholars gave insightful analysis and new problem definitions in the field of food intelligence analysis, and jointly discussed the development direction and application of computer vision empowering the food field, and promoted computer vision, food science, Cross-field integration of nutrition and health.

Ramesh, a professor from the University of California, Irvine and founding director of the Institute of Future Health, spoke about the importance of customized food models. In the model design, personal preference is effectively balanced with the food needed by the body, and the food that is most suitable for each user is recommended to each user. Professor Kiyoharu from the Department of Information and Communication Engineering at the University of Tokyo introduced a new type of food log tool-FoodLog Athl, which can be used for diet-related health care and diet evaluation services. This tool supports food image recognition, nutritional diet evaluation, and food nutritional value Calculation and other functions. Professor Radeva from the School of Mathematics and Computer Science at the University of Barcelona discussed the necessity of uncertainty estimation and demonstrated the uncertainty modeling method in food image recognition. In addition, papers submitted by Carnegie Mellon University and Purdue University were also selected for presentation at the seminar.

Meituan Holds 2 Challenges in the Food Field to Promote Academic Exchange

At the same time, Meituan also organized the first "Large-scale Food Image Recognition and Retrieval" Challenge, which attracted the participation of many powerful teams at home and abroad, including Tsinghua University, University of Science and Technology of China, Nanjing University of Science and Technology, University of Barcelona, Nanyang Technological University of Singapore 143 domestic and foreign teams including companies such as Alibaba, Shenlan Technology, OPPO, and Huanju Times participated in the competition.

As the leading domestic life service platform, Meituan is the first to propose fine-grained analysis of food images with the help of computer vision algorithms to quickly respond to and meet the needs of merchants and users for the review, management, browsing, and evaluation of a large number of online food images. The datasets of the two tracks are derived from Meituan's self-built dish image dataset "Food2K", which contains 1,500 categories and approximately 800,000 images. Each image is taken by a different individual, using different equipment, and captured in different environmental scenarios. It is a rare picture data that can fairly evaluate the robustness and effectiveness of the algorithm. Compared with other mainstream food image recognition data sets, "Food2K" is fully labeled manually, the noise ratio is controlled within 1%, the data distribution is consistent with the real scene, and a unified food classification standard system has been constructed, which covers 12 major categories in China and the West. 2000 categories of food under Food (take pizza as an example, subdivided into categories such as shrimp pizza, durian pizza, etc.).

Food fine-grained recognition and retrieval technology is more difficult than general image recognition and retrieval, because many different types of food look very similar, while the same type of food looks very different due to different cooking methods. In addition, the difference in light, shooting angle, and shooting background will affect the accuracy of the algorithm, and it is difficult for professionals to distinguish quickly and accurately. In the end, based on the competition results and technical solutions, the participating teams from Huanju Times, Nanjing University of Science and Technology, and OPPO were awarded the top three points of the identified track, and the participating teams from Shenlan Technology, the University of Science and Technology of China, and OPPO were awarded the top three points of the search track. Three.

Meituan’s transcript in ICCV2021: 2 papers were accepted by the top conference, papers were accepted at the seminar in the field of low power consumption and ReID, and won the runner-up of the challenge

At this ICCV conference, Meituan selected two papers in total. They are:

Thesis title: "Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning"

Abstract : This paper proposes a general open-set semi-supervised image classification training framework for open-set semi-supervised learning scenarios. By designing a multi-modal matching mechanism compatible with the target of the image classification task, it eliminates nothing for subsequent semi-supervised image classification tasks. Label outlier samples in the data, and use self-supervised learning technology to make full use of all unlabeled data (including outlier samples) to enhance the model feature extractor's ability to understand the high-level semantics of the image.

Thesis title: "Learn to Cluster Faces via Pairwise Classification"

Abstract : This paper proposes a fast face clustering method based on Pairwise Classification, which can solve the problem of memory dependence and efficiency of large-scale data reasoning, and at the same time, to reduce the outliers caused by clustering tasks The cluster center estimates the influence of the shift, and proposes a Rank-Weighted Density calculation method based on rank-weighted density, which is used to guide the selection of pairing in the prediction stage, and can use the monotonic decrease based on the ranking of k nearest neighbors The function weights the similarity between samples, estimates the cluster center more accurately, and further improves the accuracy of clustering. On public data sets such as MS1M and IJB-B, SOTA performance has been achieved.

Won the runner-up in the 5th lpcv low-power vision international competition and VIPriors pedestrian re-identification challenge.

In addition, also at the HTCV seminar in the ReID field, won the best paper nomination award:

Thesis title: "Transformer Meets Part Model: Adaptive Part Division for Person Re-Identification"

Abstract : The method based on local division has become the most mainstream method in the field of pedestrian re-recognition, and there are two main ways to achieve it: one is to divide pedestrians into several fixed areas, but the misalignment of pedestrian images will cause performance degradation; It is to introduce additional pedestrian pose estimation or pedestrian segmentation models, but it requires more calculation and annotation data. Inspired by the recent Vision Transformer, this article proposes an adaptive local division method that does not require additional annotations and requires very little additional calculations to automatically extract different important local features. At present, this method has reached the international leading level on the four most mainstream data sets of Market-1501, CUHK03, DukeMTMC ReID and MSMT17.

Conform to the trend of AI and contribute to the construction of digital China

With the core goal of "Helping everyone eat better and live better", Meituan AI is committed to exploring cutting-edge artificial intelligence technology in actual business scenarios and quickly landing it in real life service scenarios. The Vision Intelligence Department of Meituan is committed to building world-class visual core technology capabilities and platform services. At present, the technical layout of the Visual Intelligence Department has covered multiple fields such as image processing, text recognition, video analysis, face/body recognition, and unmanned driving visual perception. While accumulating leading international/domestic technological achievements, it also takes into account method innovation and Achievement transformation, in-depth empowerment of business scenarios such as retail, smart transportation, logistics and warehousing, and unmanned distribution.

In the "14th Five-Year" plan, it is proposed that we must unswervingly build a digital China and accelerate digital development. At the moment when the digital wave is sweeping, artificial intelligence will have become a new match point, and AI has also become a key part of China's economic development. Meituan is also using its artificial intelligence technology capabilities to continue to explore more application scenarios and application spaces, so that more users can enjoy the dividends brought by technology, and contribute to the construction of Digital China.


美团技术团队
8.6k 声望17.6k 粉丝