ElasticSearch通过word_delimiter_graph实现驼峰的分词后搜索结果里没有部分匹配的内容?

复现步骤如下

创建示例索引

创建一个索引,该索引有一个字段content,该字段使用自定义分词器my_custom_analyzer分词,my_custom_analyzer自定义分词器使用word_delimiter_graph过滤器,过滤器配置在custom_word_delimiter_graph_filter字段对应的值里

PUT test_index_demo
{
  "settings": {
    "index": {
        "number_of_shards": "1",
        "number_of_replicas": "0"
    },
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "char_filter": [
            "html_strip"
          ],
          "tokenizer": "standard",
          "filter": [
            "custom_word_delimiter_graph_filter",
            "lowercase"
          ]
        }
      },
      "filter": {
        "custom_word_delimiter_graph_filter": {
          "type": "word_delimiter_graph",
          "catenate_all": true,
          "preserve_original": true,
          "stem_english_possessive": true
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_custom_analyzer"
      }
    }
  }
}

插入两条示例数据

POST test_index_demo/_create/1
{"content":"这是文档1 onVideoDataCallback video 这是 callback"}

POST test_index_demo/_create/2
{"content":"能不能匹配到这里 video callback"}

查看分词结果

GET /test_index_demo/_analyze
{
  "field": "content",
  "text": "OnVideoDataCallback"
}

image.png

使用match query搜索

POST test_index_demo/_search
{
    "from": 0,
    "size": 10,
    "query" : { 
      "match": { 
        "content" : {
          "query": "onVideoDataCallback"
        } 
      }
    },
    "highlight": {
        "fields": {
            "content": {},
            "title": {}
        },
        "fragment_size": 200
    }
}

image.png

image.png

按理说分词是正确的,搜索结果却只包含完整单词的结果,该怎么做才能又包含部分驼峰中单词的结果?

阅读 2.1k
1 个回答

先要更改索引设置,为my_custom_analyzer添加edge_ngram过滤器。

PUT test_index_demo
{
  "settings": {
    "index": {
        "number_of_shards": "1",
        "number_of_replicas": "0"
    },
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "char_filter": [
            "html_strip"
          ],
          "tokenizer": "standard",
          "filter": [
            "custom_word_delimiter_graph_filter",
            "lowercase",
            "edge_ngram_filter"
          ]
        }
      },
      "filter": {
        "custom_word_delimiter_graph_filter": {
          "type": "word_delimiter_graph",
          "catenate_all": true,
          "preserve_original": true,
          "stem_english_possessive": true
        },
        "edge_ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 50
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_custom_analyzer"
      }
    }
  }
}

再用match查询:

POST test_index_demo/_search
{
  "from": 0,
  "size": 10,
  "query": {
    "match": {
      "content": {
        "query": "onVideoDataCallback"
      }
    }
  },
  "highlight": {
    "fields": {
      "content": {},
      "title": {}
    },
    "fragment_size": 200
  }
}
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进