Introduction to search has always been one of the core portals for e-commerce industry traffic sources. How to build e-commerce industry search and improve search results has always been a difficult problem for e-commerce developers to overcome. Although it is possible to build basic search services based on traditional databases or open source engines, with the increase in product data and the growth of business traffic, it is inevitable that performance bottlenecks and effect bottlenecks will be encountered. On the other hand, with the continuous development of e-commerce, live broadcast, cloud computing and other technologies, more and more traditional retail companies are undergoing Internet cloud transformation, especially due to the impact of factors such as the epidemic in the past two years. It has become an important source of business growth for retail companies. In this context, how to quickly build an efficient search service has become a difficult problem in the retail industry's cloud migration and transformation.

The author of this article Zhijia Liu, Alibaba Cloud Intelligent Product Manager

For the live video, please click live watch.

Search has always been one of the core portals for e-commerce industry traffic sources. How to build e-commerce industry search and improve search results has always been a difficult problem for e-commerce industry developers to overcome. Although it is possible to build basic search services based on traditional databases or open source engines, with the increase in product data and the growth of business traffic, it is inevitable that performance bottlenecks and effect bottlenecks will be encountered. On the other hand, with the continuous development of e-commerce, live broadcast, cloud computing and other technologies, more and more traditional retail companies are undergoing Internet cloud transformation, especially due to the impact of factors such as the epidemic in the past two years. It has become an important source of business growth for retail companies. In this context, how to quickly build an efficient search service has become a difficult problem in the retail industry's cloud migration and transformation.

In order to solve these two problems, the Alibaba Cloud Computing Platform Division launched a search solution for e-commerce and retail industries based on MaxCompute and open search to realize a search development platform for product storage, library building, search, and tuning.

This article will introduce how to quickly and efficiently build e-commerce industry search services based on MaxCompute and open search from four aspects: product introduction, e-commerce industry characteristics, industry search and development practices, and more solutions.

1. Product introduction

Introduction to MaxCompute

Simple, easy-to-use, fully managed open service

MaxCompute is a simple, easy-to-use, fully managed, analysis-oriented enterprise-level Saas-mode cloud data warehouse launched by Alibaba Cloud. It is simple, easy to use, and can be flexibly and elastically expanded to match business development; for cloud developers, MaxCompute supports multiple business analysis scenarios such as machine learning, data lakes, traditional data warehouses, and near real-time data warehouses, and provides a more open development ecosystem.

image.png

Serverless flexible data warehouse

In order to achieve the goal of minimizing costs while meeting differentiated needs, MaxCompute provides fast and fully managed online data warehouse services with a serverless architecture, which eliminates the limitations of traditional data platforms in terms of resource scalability and flexibility, and satisfies users’ needs. Business agility, cyclical fluctuation scheduling, key mission assurance, and stability and predictability, etc., minimize user operation and maintenance investment, so that users can analyze and process massive amounts of data economically and efficiently. These features make MaxCompute very suitable for application scenarios in the e-commerce and retail industries, and meet the computing and storage needs of industry developers.

图片2.png

In addition, MaxCompute also provides serverless data access services, multi-computing environments, storage services, and resource management, which greatly reduces user operation and maintenance costs and allows users to focus more on their own business expansion and development.

图片3.png

Open ecology

In terms of product ecology, MaxCompute provides a rich and comprehensive open ecology such as the product's own open ecology, Alibaba Cloud product solution ecology, data application ecology, open source engine tool integration, etc. Based on MaxCompute, developers can freely choose business development methods and are more flexible Customized personalized product solutions.

图片4.png

Continue to create an open product ecosystem

MaxCompute's offline, real-time, analysis, and service integrated data warehouse is particularly suitable for enterprise real-time data warehouse scenarios, BI report interactive query scenarios, user portrait analysis and other scenarios. These scenarios are the storage of product data and user behavior in the e-commerce industry. An indispensable part of guidance and analysis.

Within the Alibaba Group, MaxCompute, as the best practice for real-time query scenarios on Double 11, can support hundreds of millions of TPS write speeds and PB-level data sub-second query response, which fully meets the high timeliness of the e-commerce industry promotion scenario need. Based on these characteristics, MaxCompute has become the preferred storage and computing service for developers in the e-commerce industry.

图片5.png

As mentioned earlier, MaxCompute supports multiple open ecosystems such as open source ecological integration and mainstream business software integration. At the same time, it can form a one-stop solution with other Alibaba Cloud products to build big data service applications such as search and recommendation commonly used by e-commerce. Especially for search services in the e-commerce and retail industries, MaxCompute can open search linkage with another cloud product to form a one-stop search development platform.

图片6.png

Introduction to Open Search

Open Search Group's search business. It is an intelligent search cloud service product based on the big data deep learning online service system. Within the Alibaba Group, there are more than 500 business accesses such as Taoxi, Tmall, Hema, and Cainiao, which support tens of billions of search access per day. During Double Eleven, it stably supported the search service of various products within Alibaba Group, and the peak QPS of search for a single business exceeded one million. Open Search has been commercialized on Alibaba Cloud since 2014, and has now provided search services for thousands of customers, hundreds of home appliances, and retail companies.

图片7.png

One-stop intelligent search business development platform

Open search products provide core engine, recall ranking, search guidance and other pre-search, middle and post-search services and capabilities to realize one-stop search business development. For experienced search developers, Open Search provides open services in multiple links such as application structure, recall, sorting, algorithms, etc., to meet the developer's personalized customization needs; for zero-based white users and product and operation students, it is open Search provides industry templates for e-commerce, education and other industries, and quickly builds better search services with one-click to help companies complete their business goals.

Especially for the e-commerce industry, open search provides multi-scenario search methods and solutions such as product, order, store search, database acceleration and analysis.

图片8.png

Second, the characteristics of the e-commerce industry

The e-commerce industry is a highly transaction-oriented and GMV-oriented industry, with the ultimate goal of guiding more and higher purchase transactions to achieve a win-win situation for e-commerce platforms, buyers and sellers. Search and recommendation are currently the most important traffic portals in the e-commerce industry. For the three apps in the picture, the search portal is placed at the core of the entire APP, so that users can find the search portal at the first time. The following are other sub-entries. Application or product classification filtering, and then the following is the recommended feed stream. The data shows that more than 90% of GMV contribution comes from search and recommendation traffic guidance.

When a user has a clear purchase requirement and opens an e-commerce app, he has a high probability of searching for the target product. In this scenario, the guided purchase rate and conversion rate are very high. Therefore, the search effect is very important for the e-commerce industry. It’s very important.

图片9.png

So how do you measure the effectiveness of search? Based on the accumulated search experience in the e-commerce industry for many years, we mainly divide the core indicators of e-commerce search into performance indicators and performance indicators. Performance indicators include click-through rate and non-result rate, etc. Performance indicators include search response time, data synchronization response time, etc. Simply put, it is to allow end users to find the target product faster and more accurately.

In addition, the search query in the e-commerce industry is also different from queries in other industries. When searching, users in the e-commerce industry will habitually pile up keywords. For example, when searching for a Query and failing to find a specified product, they will continue to enter supplementary instructions. Query is used to filter search results. This also leads to the fact that the word order of Query in the e-commerce industry does not have as much impact on search in other industries. For example, searching for Huawei mobile phones and Huawei mobile phones can be completely understood as the same search behavior. Since many general e-commerce apps contain product information from various industries, when the same vocabulary appears in different contexts, it will represent different information. When Xiaomi is followed by a mobile phone, it is a mobile phone brand, and when Xiaomi is in front of an organic product, it is a commodity category.

Based on the special search query characteristics of the e-commerce industry, when users build their own searches through databases or open source engines, they often encounter problems such as less query recall caused by colloquial queries, poor document relevance, and unsatisfactory ranking results, which affect the search effect and even affect User purchase conversion.

In terms of user intention recognition, when different users enter the same vocabulary in different scenarios, they may cover many products in various fields. For example, when a user enters Apple, he may refer to various categories such as mobile phones, fruits, tablets, earphones, and notebooks. This is also one of the bad cases often encountered in the early stages of self-built e-commerce search through open source solutions.

So, how to solve these problems and badcase, optimize the search effect of the e-commerce industry, and improve the search guide GMV?

图片10.png

Three, industry search development practice

MaxCompute+ open search industry search development practice

E-commerce search services involve multiple dimensions such as product data, search queries, and user behaviors, as well as multiple links before, during, and after searches. When we connect with different companies, we often encounter various suggestions from customers. Kind of problem. Students who have no search experience before may ask, how to build a product library? How to accurately understand the user's query intent? Experienced developers may ask, how to provide users with a personalized search experience? How to ensure performance in high concurrency scenarios?

In order to help developers in the e-commerce and retail industries solve the above problems faster and better, MaxCompute has proposed corresponding industry search solutions in conjunction with Open Search.

In general, users transfer product data and behavior data stored in MaxCompute to Open Search through automatic database synchronization or API/SDK synchronization, and then customize query analysis, sorting, search guidance, intervention, and expansion in Open Search. Function etc. Finally, a high-performance, high-real-time, high-reliability, fully managed, and O&M-free search solution for the e-commerce industry with better search results will be realized.

图片11.png

This solution can be disassembled into five key links: building search applications, user input query words, user intention recognition, accessing search engines, and returning search results according to the actual search behavior of users, corresponding to MaxCompute database building, search guidance, and query analysis. The development of five modules of, search engine and sorting service.

图片12.png

Commodity building

In the stage of product library building, users store their own product data and user behavior data in MaxCompute. To facilitate the use of e-commerce developers, Open Search provides e-commerce industry templates. Users can create search application structures with one-click to achieve rapid Build a library. Next, define the type and meaning of the fields in each table, as well as the relationship between multiple tables, according to the fields in MaxCompute or the custom application structure in Open Search. Then, according to the search requirements of different business scenarios, different fields are combined into a target index, and search is performed in the corresponding index. For example, in the e-commerce industry, product names, store names, product categories, etc. are all common search fields, and these fields can be unified as an index. When the user enters Query, they will search for related products in these fields , Shop and other information. After the index structure is built, it will start to build search services for users. When the application status is "available", the basic version of the search service is built.

图片13.png

Search guide

Before the user enters a search query, the e-commerce industry often provides some preset search queries. This process is called search guidance. At present, the common pre-search guidance modules include hot search and shading. Hot search is to provide some popular search terms based on recent hot events and user search behavior, so that users can directly click on the search. Shading means that there is a preset Query in the search box before the user enters the search term, and the user can search for the corresponding search term directly by clicking on the search. Hot searches and shading are an important part of the search process. On the one hand, hot searches and shading can guide users' search behaviors and reduce the difficulty of tuning in subsequent links. On the other hand, they can also be based on different operational goals at different times. Achieve the goal of improving search and guiding purchases. At present, open search not only supports the automatic training of hot search and shading models, but also realizes the manual intervention of timing and positioning through the black and white list, so as to achieve the effect of manual operation and guidance.

Another commonly used search guide is a drop-down prompt, that is, when the user enters a query, other candidate queries are automatically associated, which reduces the user's input cost and achieves the effect of traffic guidance. Currently, Open Search supports a variety of drop-down prompt model construction methods, and supports high-frequency search terms, historical search terms, intelligent sorting, manual intervention and other drop-down prompt extension functions.

Searching guidance through hot searches, shading, and drop-down prompts can enhance the user's search experience and achieve manual operations to attract purchase conversions.

图片14.png

User intention recognition

After the user guides through the search or manually enters the Query, a search request is started.

First of all, we need to understand the user's actual search intent. As we mentioned before, when users in the e-commerce industry enter a search query, they sometimes have some colloquial expressions or keyword stuffing. Therefore, we need to transform the Query described by the user from the perspective of purchase needs into a structured, relatively clear and standardized form of expression. This is the user intent identification process.

Our common user intention recognition includes synonym expansion, stop word omission, error correction and rewriting, entity tag recognition, and category prediction.

图片15.png

Next, we use an example to introduce the user intent identification process in detail.

For example, the user enters a query called NIKE's high top of basketball shoes. We will first normalize and normalize some punctuation or capitalization. The first step is to become Nike's high top of basketball shoes, and then we will segment the input query through word segmentation in the e-commerce industry. High-top into niki's blue sneakers. Next, enter the stop word link. For example, "的" in the settings is a meaningless word, and it becomes a nike basketball shoe high top. Next is the spelling error correction, which will correct the typos and turn it into a Nike basketball shoe high top. Next, I will use a category that is often used in the industry called Industry Entity Recognition to analyze the meaning of the previous words and change to, nike: brand, basketball shoes: category, high top: style. In addition, the development search also supports category prediction. Through the above results, a weight will be given to the current query, nike-high, basketball shoes-medium, and high-top-medium. Another search term is expanded, such as (nike OR Nike) sneakers high top. After finally outputting a layer of rewriting, the query that the engine can understand is input into the search engine.

图片16.png

Search engine recall

After the Query is rewritten, it will enter the search engine recall phase. Open Search provides multiple recall strategies including text recall, personalized recall, and vector recall. Text recall is the most common recall strategy in the search field. It will compare the rewritten Query and the text relevance in the product data, and use the inverted index to achieve recall. Open Search uses the self-developed Wentian 3 text search engine within Alibaba Group, which can handle high-performance search tasks in high-concurrency and multi-write scenarios, and return search results faster. Personalized recall will introduce the user's personalized information on the basis of the rewriting of the query word, and return the personalized search results of thousands of people for the user. Vector recall will introduce vector information on the basis of rewritten words, and return search results based on the vector similarity between the query word and the product data. Traditional text search may miss some search results that seem not relevant but are actually the user's target needs, and vector recall can solve this problem. Using text recall and vector recall to perform multiple searches at the same time can greatly reduce the non-result rate of search results and optimize the search effect.

图片17.png

Sort results

After completing the recall phase, we have obtained some product data related to the user's search needs. Next, we need to sort the recalled product data and feed it back to the user in the most reasonable order to ensure that the user is most likely to click on the search results Ranked in the front, thereby improving search-led conversions and GMV. Open search provides two rounds of sorting mechanism, coarse sorting and fine sorting. It supports sorting expressions, custom plug-ins, algorithm models and other sorting methods. The internal sorting process is fully opened to developers, so that developers can customize their own business according to their own business needs. Sorting strategy.

图片18.png

Among them, in the custom plug-in environment, open search provides cava compiled language and its plug-ins. Cava is a compiled language self-developed by Alibaba. Its syntax is similar to java, its performance is comparable to C++, and it supports object-oriented programming. The open search console has integrated an IDE that supports cava compilation. Users can directly compile custom cava plug-ins on the console for easier debugging and modification.

In summary, using MaxCompute and Open Search, users have realized product database building, search guidance, user intention recognition, search engine recall, search development for e-commerce, and retail industry search results, and they have a fully customized search service with better performance. How to measure and optimize search results next.

图片19.png

Program special effects and effect optimization

First of all, word segmentation is the most basic part of search and an indispensable part of Chinese search. For the e-commerce and retail industries, Open Search integrates the e-commerce word segmenter of Taobao Search Group. The model training corpus comes from the millions of labeled e-commerce industry data accumulated by Taobao Search for many years. We compared the effect of the open search general e-commerce tokenizer with the open source IK tokenizer. Among the 100 e-commerce search actual queries, 63 Query's tokenizer results are better than the open source tokenizer. The ratio of good to bad exceeds 4:1.

图片20.png

Based on the general word segmenter for e-commerce, we cooperated with the natural language processing team of Dharma Academy to optimize the template for the e-commerce industry, and propose an enhanced version of the e-commerce analyzer and corresponding query analysis algorithms. Specifically, the F1 word segmentation accuracy rate of e-commerce segmentation is increased to 95%, the entity recognition F1 accuracy rate is increased to 80%, the spelling error correction FAR is reduced to 1.4%, and more than 100,000 e-commerce synonyms are added. These effects are at the leading level in the field of NLP e-commerce.

The following is a comparison of the effects of some general-purpose analyzers and the enhanced version of the e-commerce industry. In addition, for customers in different fields and different vertical categories of e-commerce and retail industries, we also support algorithm-specific customized services, providing user-level customized query analysis, CTR estimation, vector models, personalized models, etc., all aspects Improve search performance.

图片21.png

One-click configuration

For e-commerce users, especially those in the retail industry who have just started the transformation of the Internet on the cloud, we provide one-click configuration capabilities. Users only need to check the desired recall, query analysis, sorting, and peripheral services search on the console. Relevant functions can automatically generate the corresponding application structure, index structure, and specific function strategies to realize the all-round one-click configuration of e-commerce search.

图片22.png

Customer case

E-commerce industry customers

The following briefly introduces two typical customer cases searched in the e-commerce and retail industries. An e-commerce shopping platform APP that provides users with functions such as product search and coupon shopping guides. Customers initially chose self-researched search development, but soon encountered some bottlenecks. For example, with the index volume of hundreds of millions of products, complex search and filtering requirements often affect search performance, especially during the e-commerce promotion period, when the traffic peaks Will be greatly improved. After investigating various products and solutions, users finally chose the MaxCompute+ open search solution. MaxCompute's flexible operation and maintenance mechanism is highly applicable to e-commerce industry scenarios, and open search can provide performance and effect guarantees for search services. After continuous use for a period of time, we have received good feedback from customers, especially the stability guarantee in engineering and operation and maintenance, which enables users to concentrate on researching business and algorithms, and promote product revenue and development.

图片23.png

Retail industry customers

The other user is a retail user who has just recently accessed. This is a supermarket retail brand that is used in more than 10,000 stores around the world. With the rapid development of the domestic new retail market, online business is particularly important if you want to quickly deploy and increase brand influence. The user initially also chose the self-developed search solution and applied it to the online shopping mall, but the effect was far from reaching expectations and the user's shopping experience was poor. Recently, users have accessed the open search e-commerce industry template, and used the built-in multi-channel recall and personalized sorting functions to greatly improve the search effect. After half a month of access, the overall conversion rate of additional purchases increased by 10%, and the non-result rate dropped significantly from 29% to 7.5%. In addition, users specifically mentioned the full-hosted service model of MaxCompute+ open search in the cloud, which greatly reduces personnel input and operation and maintenance costs, and the user has an extremely high overall price-performance ratio.

图片24.png

Four, more solutions

Multi-modal, multi-scene search effect optimization

In the e-commerce industry, in addition to product search scenarios, there are also multiple simple condition search scenarios such as order search, favorite search, and category search. In these scenarios, MaxCompute+ Open Search can provide database search acceleration services to ensure high performance, High real-time search.

In addition, the use of the vector recall capability of open search can realize the Polaroid effect of searching images with pictures, which has become another typical application scenario for searching backwards.

On this basis, in conjunction with other cloud products such as smart recommendation provided by Alibaba Cloud, it can realize the full-scale application guarantee of the e-commerce industry search + recommendation + advertising.

图片25.png

More open engine capabilities

In the other direction, Open Search is currently revealing its engine capabilities, revealing the built-in core engine to the cloud for more developers to use. It is expected to be officially launched at the end of September. By then, it will provide a more open ecosystem and comprehensive Fangwei user customization capabilities.

图片26.png

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own the copyright, and does not bear the corresponding legal responsibility. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。


引用和评论

0 条评论