5

ElasticSearch搜索建议与上下文提示

搜索建议

通过Suggester Api实现

原理是将输入的文本分解为Token,然后在词典中查找类似的Term返回

根据不同场景,ElasticSearch设计了4中类别的Suggesters。

  • Term Suggester
  • Phrase Suggester
  • Complete Suggester
  • Context Suggester

Term Suggester

类似Google搜索引擎,我给的是一个错误的单词elasticserch,但引擎友好地给出了搜索建议。

clipboard.png

要实现这个功能,在ElasticSearch中很简单。

  1. 创建索引,并写入一些文档

    POST articles/_bulk
    { "index" : { } }
    { "body": "lucene is very cool"}
    { "index" : { } }
    { "body": "Elasticsearch builds on top of lucene"}
    { "index" : { } }
    { "body": "Elasticsearch rocks"}
    { "index" : { } }
    { "body": "elastic is the company behind ELK stack"}
    { "index" : { } }
    { "body": "Elk stack rocks"}
    { "index" : {} }
    {  "body": "elasticsearch is rock solid"}
  2. 搜索文档,调用suggest api。

    当中有3种Suggestion Mode

    • missing 索引中已经存在,就不提供建议
    • popular 推荐出现频率更加高的词
    • always 无论是否存在,都提供建议

      POST /articles/_search
      {
        "size": 1,
        "query": {
          "match": {
            "body": "elasticserch"
          }
        },
        "suggest": {
          "term-suggestion": {
            "text": "elasticserch",
            "term": {
              "suggest_mode": "missing",
              "field": "body"
            }
          }
        }
      }
  3. 返回结果

    {
      "took" : 6,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 0,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "suggest" : {
        "term-suggestion" : [
          {
            "text" : "elasticserch",
            "offset" : 0,
            "length" : 12,
            "options" : [
              {
                "text" : "elasticsearch",
                "score" : 0.9166667,
                "freq" : 3
              }
            ]
          }
        ]
      }
    }

Phrase Suggester

Phrase Suggester可以在Term Suggester上增加一些额外的逻辑

其中一些参数

  • max_errors 最多可以拼错的terms
  • confidence 限制返回结果数,默认1

    POST /articles/_search
    {
      "suggest": {
        "my-suggestion": {
          "text": "lucne and elasticsear rock hello world ",
          "phrase": {
            "field": "body",
            "max_errors":2,
            "confidence":2,
            "direct_generator":[{
              "field":"body",
              "suggest_mode":"missing"
            }],
            "highlight": {
              "pre_tag": "<em>",
              "post_tag": "</em>"
            }
          }
        }
      }
    }

Completion Suggester

自动完成功能,用户每输入一个字符。就需要即时发送一个查询请求到后端查找匹配项。

它对性能要求比较苛刻。

elastic将Analyse的数据编码成FST与索引放在一起,它会被整个加载进内存里面,速度非常快

FST只能支持前缀查找。

类似百度这样的提示功能

clipboard.png

在ElasticSearch要实现这样的功能也很简单。

  1. 建立索引

    PUT titles
    {
      "mappings": {
        "properties": {
          "title_completion":{
            "type": "completion"
          }
        }
      }
    }
  2. 写入文档

    POST titles/_bulk
    { "index" : { } }
    { "title_completion": "php是什么"}
    { "index" : { } }
    { "title_completion": "php是世界上最好的语言"}
    { "index" : { } }
    { "title_completion": "php货币"}
    { "index" : { } }
    { "title_completion": "php面试题2019"}
  3. 搜索数据

    POST titles/_search?pretty
    {
      "size": 0,
      "suggest": {
        "article-suggester": {
          "prefix": "php",
          "completion": {
            "field": "title_completion"
          }
        }
      }
    }
  4. 返回结果

    {
      "took" : 173,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 0,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "suggest" : {
        "article-suggester" : [
          {
            "text" : "php",
            "offset" : 0,
            "length" : 3,
            "options" : [
              {
                "text" : "php是世界上最好的语言",
                "_index" : "titles",
                "_type" : "_doc",
                "_id" : "pv8V8WwBISxFcLcZfDXl",
                "_score" : 1.0,
                "_source" : {
                  "title_completion" : "php是世界上最好的语言"
                }
              },
              {
                "text" : "php是什么",
                "_index" : "titles",
                "_type" : "_doc",
                "_id" : "pf8V8WwBISxFcLcZfDXl",
                "_score" : 1.0,
                "_source" : {
                  "title_completion" : "php是什么"
                }
              },
              {
                "text" : "php货币",
                "_index" : "titles",
                "_type" : "_doc",
                "_id" : "p_8V8WwBISxFcLcZfDXl",
                "_score" : 1.0,
                "_source" : {
                  "title_completion" : "php货币"
                }
              },
              {
                "text" : "php面试题2019",
                "_index" : "titles",
                "_type" : "_doc",
                "_id" : "qP8V8WwBISxFcLcZfDXl",
                "_score" : 1.0,
                "_source" : {
                  "title_completion" : "php面试题2019"
                }
              }
            ]
          }
        ]
      }
    }

Context Suggester

是Completion Suggester的扩展,加入了上下文信息场景。

例如:

你在电器商城,输入苹果,想要找到的苹果笔记本...
你在水果商城,输入苹果,想要找的是红苹果、绿苹果...

  1. 建立索引,定制mapping

    PUT comments
    {
      "mappings": {
        "properties": {
          "comment_autocomplete": {
            "type": "completion",
            "contexts": [
              {
                "type": "category",
                "name": "comment_category"
              }
            ]
          }
        }
      }
    }
  2. 并为每个文档加入Context信息

    POST comments/_doc
    {
      "comment":"苹果电脑",
      "comment_autocomplete":{
        "input":["苹果电脑"],
        "contexts":{
          "comment_category":"电器商城"
        }
      }
    }
    
    POST comments/_doc
    {
      "comment":"红红的冰糖心苹果",
      "comment_autocomplete":{
        "input":["苹果"],
        "contexts":{
          "comment_category":"水果商城"
        }
      }
    }
  3. 结合Context进行Suggestion查询

    POST comments/_search
    {
      "suggest": {
        "MY_SUGGESTION": {
          "prefix": "苹",
          "completion":{
            "field":"comment_autocomplete",
            "contexts":{
              "comment_category":"电器商城"
            }
          }
        }
      }
    }
  4. 返回结果

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 0,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "suggest" : {
        "MY_SUGGESTION" : [
          {
            "text" : "苹",
            "offset" : 0,
            "length" : 1,
            "options" : [
              {
                "text" : "苹果",
                "_index" : "comments",
                "_type" : "_doc",
                "_id" : "qf_s9WwBISxFcLcZszWh",
                "_score" : 1.0,
                "_source" : {
                  "comment" : "苹果电脑",
                  "comment_autocomplete" : {
                    "input" : [
                      "苹果电脑"
                    ],
                    "contexts" : {
                      "comment_category" : "电器商城"
                    }
                  }
                },
                "contexts" : {
                  "comment_category" : [
                    "电器商城"
                  ]
                }
              }
            ]
          }
        ]
      }
    }

附录


小鸡
214 声望24 粉丝

1.01的365次方=37.8