Introduction to multi-channel recall refers to the use of different strategies, features or simple models to recall a part of the candidate set, and then mix these candidate sets together for use in subsequent ranking models. This article will introduce the open search platform How does the multi-channel recall technology improve the search effect in depth~

background

The so-called "multi-path recall" refers to the strategy of using different strategies, features or simple models to recall a part of the candidate set, and then mixing these candidate sets together for subsequent use in the ranking model.

Alibaba Cloud OpenSearch (OpenSearch) is a one-stop intelligent search business development platform based on a large-scale distributed search engine independently developed by Alibaba. It currently provides search service support for the core businesses of Alibaba Group including Taobao and Tmall. At present, open search provides text retrieval. By segmenting the text query and adding some query analysis processing, the query is rewritten and then the engine is queried, which greatly improves the search effect. But for some scenes that require high search results, such as: educational search question scenes, educational photo search questions are obviously different from traditional webpages or e-commerce searches. The first point is that the search query is very long, and the second is The point is the text obtained after the searched Query is recognized by the photo OCR. If the key TERM is recognized incorrectly, it will seriously affect the recall ranking. A solution to these problems is to continue to optimize QP and enhance QP's ability to process text. Another solution is to introduce vector recall, which recalls documents by calculating the distance in the vector space, as a supplement to text recall.

Functional value

In scenarios such as long query, long tail query, and non-standard query, if there are problems such as inaccurate recall and insufficient results based on text retrieval, supplementary vector recall can effectively improve the effect of recalling text, and it can also provide the ability to expand the recall.

Open Search provides multi-channel recall algorithm engineering capabilities, giving users in different industries to customize different multi-channel recall function requirements, and has been commercialized and applied in practice among users in multiple industries. Its advantages are as follows:

1. Provide flexible algorithm capabilities , support technical optimization of text vectorization according to the characteristics of different industries, and take into account the effects of ;

2. Support cava script, provide more flexible custom sorting and score ability;

3. supports the analyzer with model and the analyzer without model, and provides vector recall function for users without algorithm ability and users with algorithm ability respectively;

4. Compared with open source products, the open search search accuracy and search delay advantage more obvious, the search delay is reduced from open source seconds to tens of ms.

Multi-channel recall architecture diagram

image

Multiple query

OpenSearch supports multiple query functions. Configure the query strategy, you can query text Query and vector Query at the same time. Of course, it also supports query only text query or query vector query only. If the text vectorization function is configured, open search will vectorize the text during text query, generate vector query, and sort the two-way results after recall.

Vector analyzer

OpenSearch supports multiple types of vector analyzers, mainly industry general vector analyzer , industry custom vector analyzer , and general vector analyzer (vector-64 dimensions, 128 dimensions, 256 Dimension General). Among them, the general vector analyzer requires the user to convert the data into a vector and store it in the DOUBLE\_ARRAY type, which is suitable for customers with strong algorithm capabilities.

Query analysis

gives algorithm students to customize the vector model of different industries , according to the education industry as an example,

specially optimized for educational search questions include:

  • The BERT model adopts StructBERT, which is self-developed by Dharma Academy, and customizes the model for the education industry
  • The vector search engine adopts the proxima engine developed by Dharma Academy, which is far more accurate and faster than the open source system
  • Training data can be continuously accumulated based on the customer's search log, and the effect continues to improve
  • Rewrite the semantic vector query, the text term on RANK, which only participates in the score calculation and does not participate in the recall, which improves the quality of the top text of the recall.

Sorting customization

Open Search (OpenSearch) opens up two-stage sorting: basic sorting and business sorting , namely coarse sorting and fine sorting. Among them, fine sorting supports cava scripts, which more flexibly supports users' sorting needs.

multi-channel recall process, the open search will eventually perform unified sorting. Currently, internal sorting and fine sorting model scoring sorting . The internal sorting is directly based on the results of the multi-way recall, and the returned scores are sorted from high to low. The fine-ranking model scoring requires the user to provide model information, and the results of the multi-channel recall are sorted according to the model scoring.

Multi-channel recall practice case

E-commerce/retail search

lALPDeC22EQlqZTNAkPNBQo_1290_579.png

Community Forum Search

Compare the different effects of top title before and after access

image


If you need product guidance, you can fill out the questionnaire to get expert guidance \>> https://survey.aliyun.com/apps/zhiliao/lKD\_J8cRj

If you want to communicate with more developers, understand the cutting-edge search and recommendation technology , you can scan the code to join the community

Copyright statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。


引用和评论

0 条评论