前言
- 推荐学习阮一鸣《Elasticsearch 核心技术与实战》
- 本文对 Elasticsearch 7.17 适用,官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/...
- 同义词可以再建索引时(index-time synonyms)或者检索时(search-time synonyms)使用,一般在检索时使用
- 本文介绍的是 search-time synonyms
同义词文档格式
单向同义词
ipod, i-pod, i pod => ipod
双向同义词
马铃薯, 土豆, potato
试验步骤
添加同义词文件
- 在 Elasticsearch 的 config 目录下新建
analysis
目录,在analysis
下添加同义词文件synonym.txt
(/etc/elasticsearch/analysis/synonym.txt
) 在检索时使用同义词,不需要重启 Elasticsearch,也不需要重建索引,需要重载搜索分词器
POST my-index-000001/_reload_search_analyzers
创建索引
PUT my_index
{
"settings": {
"analysis": {
"filter": {
"word_syn": {
"type": "synonym_graph",
"synonyms_path": "analysis/synonym.txt",
"updateable": true # 允许热更新
}
},
"analyzer": {
"ik_smart_syn": {
"filter": [ # token filter
"stemmer",
"word_syn"
],
"type": "custom",
"tokenizer": "ik_smart"
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"author": {
"type": "keyword"
}
}
}
}
热更新重载分词器:https://www.elastic.co/guide/en/elasticsearch/reference/7.10/...
POST my_index/_reload_search_analyzers
直接测试分词器
查询语句
GET my_index/_analyze { "analyzer": "ik_smart_syn", "text": "马铃薯" }
输出
{ "tokens" : [ { "token" : "马铃薯", "start_offset" : 0, "end_offset" : 3, "type" : "CN_WORD", "position" : 0 }, { "token" : "土豆", "start_offset" : 0, "end_offset" : 3, "type" : "SYNONYM", "position" : 0 }, { "token" : "potato", "start_offset" : 0, "end_offset" : 3, "type" : "SYNONYM", "position" : 0 } ] }
添加测试数据
添加数据
POST my_index/_doc/1 { "title": "马铃薯", "author": "土豆" }
查看某个文档某个字段的分词结果
GET my_index/_termvectors/1?fields=title
检索测试
查询语句
GET my_index/_search { "query": { "query_string": { "analyzer": "ik_smart_syn", "query": "title:potato AND author:potato" } } }
结果输出
{ "took" : 38, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.5753642, "hits" : [ { "_index" : "my_index", "_type" : "_doc", "_id" : "1", "_score" : 0.5753642, "_source" : { "title" : "马铃薯", "author" : "土豆" } } ] } }
相关文档
- CSDN blog: Elasticsearch:使用同义词 synonyms 来提高搜索效率
- 官方 blog: 一样,却又不同:借助同义词让 Elasticsearch 更加强大
- 同义词过滤器: Synonym token filter、Synonym graph token filter
本文出自 qbit snap
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。