Recently, due to work reasons, I have been busy, and then I am also lazy, and the output has slowed down again. It will slowly recover later. Regularly ensure the weekly update, and ensure the quality output.
Reading this article requires a certain amount of Elasticsearch
the basics, the depth of this article is there, but not deep
Overview
In Elasticsearch Join
the fields of the data type are believed to be used by everyone, which is the often-talked-about parent-child document. In Elasticsearch Join
can not cross indexes and shards, so when saving document information, make sure that parent and child documents use the same routing parameters to ensure that parent and child documents are stored in the same shard of the same index, then What are the restrictions?
Restrictions on parent-child relationships
- There can only be one relational field in each index
- The parent document and the child document must be in the same index shard, so we need to set the routing value when adding, deleting, and modifying the parent and child documents to ensure that the data is in the same shard
- A parent document can contain multiple child documents, but a child document can only have one parent document
- Relationships can only be established on fields of type
Join
- Subdocuments can be added on the premise that the current document is the parent document
Global ordinals
The translation is the global ordinal. What is a global ordinal? It is explained in the official document. This is a thing to speed up the query. After using the global ordinal, the data can be made more compact; the details will not be expanded. I will have the opportunity to explain the global ordinal in detail later. At present, you can check the official documentation
For our content in this chapter, we know that the parent-child document Join
type uses the global ordinal to speed up the query. By default, the global ordinal is basically constructed in real time. When the index changes, the global ordinal will be rebuilt. This process will increase the time of refresh
, of course, this configuration can also be closed, but after closing, the global ordinal will be rebuilt in the first parent connection or aggregate query we encounter next, so that this Part of the time is given back to users, and the official does not recommend us to do this. It feels that it is not so friendly to users, and it is mainly a trade-off. The worst case is that there are multiple writes at the same time, that is, there are multiple global ordinals that need to be reconstructed at the same time, which will cause multiple global ordinals to be reconstructed within a single refresh
time interval.
Of course, if the associated field is not used very frequently and there are many write events, it is recommended to disable it. The disable method is as follows
PUT my-index-000001
{
"mappings": {
"properties": {
"join_field": {
"type": "join",
"relations": {
"goods": ["details","evaluate"],
"evaluate":"vote"
},
"eager_global_ordinals": false
}
}
}
}
Of course, you can use the following statement to view the heap size occupied by the global ordinal number
# Per-index
GET my-index-000001/_stats/fielddata?human&fields=join_field#goods
# Per-node per-index
GET _nodes/stats/indices/fielddata?human&fields=join_field#goods
parent-child document
First of all, we still create a normal parent-child relationship index, with the product as the parent document and the details as the child document
DELETE my-index-000001 PUT my-index-000001 { "mappings": { "properties": { "id": { "type": "keyword" }, "join_field": { "type": "join", "relations": { "goods": "details" } } } } }
- my-index-000001 : index name
- id : document primary key
- join_field : parent-child relationship field,
type
marked asJoin
for parent-child documents - relations : define the parent-child relationship,
goods
is the name of the parent document type,details
is the name of the child document type, after inserting data, the query will use
Insert a few pieces of test data, the products include iphon and mac , the details are color appearance and memory configuration, etc.
PUT my-index-000001/_doc/1?refresh { "id": "1", "text": "iphone 14 pro max", "join_field": { "name": "goods" } } PUT my-index-000001/_doc/2?refresh { "id": "2", "text": "macbook pro ", "join_field": { "name": "goods" } } PUT my-index-000001/_doc/3?routing=1&refresh { "id": "3", "text": "512G 16核", "join_field": { "name": "details", "parent": "1" } } PUT my-index-000001/_doc/4?routing=1&refresh { "id": "4", "text": "粉/银/黑/抹茶绿", "join_field": { "name": "details", "parent": "1" } } PUT my-index-000001/_doc/5?routing=1&refresh { "id": "5", "text": "1T 32G", "join_field": { "name": "details", "parent": "2" } } PUT my-index-000001/_doc/6?routing=1&refresh { "id": "6", "text": "银/黑", "join_field": { "name": "details", "parent": "2" } }
Use
parent_id
to query the parent and child documents, query with the test data inserted above, and find the details ofmac
id
.GET my-index-000001/_search { "query": { "parent_id": { "type": "details", "id":"2" } }, "sort":["id"] }
In most cases, the above cannot satisfy our query request, so we can also use
has_parent
orhas_child
queryUse
has_parent
to query: all subdocuments that containmacbook
in the parent documentgoods
(the grandchild documents of the following text can also be queried)GET my-index-000001/_search { "query": { "has_parent": { "parent_type": "goods", "query": { "match": { "text": "macbook" } } } } }
Use
hash_child
to viewdetails
all parent documents that have the1T
keyword in the subdocumentGET my-index-000001/_search { "query": { "has_child": { "type": "details", "query": { "match": { "text": "1T" } } } } }
Use
parent-join
query or aggregateElasticsearch
Join
类型数据类型时,会自动创建一个附加的字段,Join
的字段名加#号
加Parent type, take the above as an example, create an additional field (join_field#goods
), the following is an example of usingparent-join
field query aggregation, refer to the official website, apply8.1版本
New feature of the runtime fieldGET my-index-000001/_search { "query": { "parent_id": { "type": "details", "id": "1" } }, "aggs": { "parents": { "terms": { "field": "join_field#goods", "size": 10 } } }, "runtime_mappings": { "my_parent_field": { "type": "long", "script": """ emit(Integer.parseInt(doc['join_field#goods'].value)) """ } }, "fields": [ { "field": "my_parent_field" } ] }
Join
type parent-child document, above we demonstrated an example of a parent document corresponding to a child document type,Join
type also supports a parent type with multiple child types, based on the above, Add the following statement to testDELETE my-index-000001 PUT my-index-000001 { "mappings": { "properties": { "id": { "type": "keyword" }, "join_field": { "type": "join", "relations": { "goods": ["details","evaluate"] } } } } } PUT my-index-000001/_doc/7?routing=1&refresh { "id": "7", "text": "运行流程,无卡顿,待机时间长", "join_field": { "name": "evaluate", "parent": "1" } } PUT my-index-000001/_doc/8?routing=1&refresh { "id": "8", "text": "体重轻,携带方便,编码利器", "join_field": { "name": "evaluate", "parent": "2" } }
- In the same way, careful classmates have seen it, the grandson document has been marked above, yes, you read that right, the grandson document, the third level, the level can be deeper, but
Elasticsearch
is not recommended to be very deep After all,Join
is very performance-intensive, and it is useless if the level is deeper. The following is the multi-level statement test. At this time, the relationship between the three is as follows
DELETE my-index-000001
PUT my-index-000001
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"join_field": {
"type": "join",
"relations": {
"goods": ["details","evaluate"],
"evaluate":"vote"
}
}
}
}
}
PUT my-index-000001/_doc/9?routing=1&refresh
{
"id": "9",
"text": "这是投票信息:我买iphone是因为性价比高,保值",
"join_field": {
"name": "vote",
"parent": "1"
}
}
PUT my-index-000001/_doc/10?routing=1&refresh
{
"id": "10",
"text": "这是投票信息:我买mac是因为轻,携带方便,没有流氓软件",
"join_field": {
"name": "vote",
"parent": "2"
}
}
Summarize
I believe everyone can see it. Officials do not recommend the use of parent-child documents. After all, performance is a big problem. I believe everyone uses Elasticsearch
must be that most of them are fast, and use Join
field is slow, who can agree with this? There are pros and cons, it depends on your choice. The next article will bring you Elasticsearch
recommended Join
field substitution Nested
This article is published by mdnice Multiplatform
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。