Recently, due to work reasons, I have been busy, and then I am also lazy, and the output has slowed down again. It will slowly recover later. Regularly ensure the weekly update, and ensure the quality output.

Reading this article requires a certain amount of Elasticsearch the basics, the depth of this article is there, but not deep

Overview

In Elasticsearch Join the fields of the data type are believed to be used by everyone, which is the often-talked-about parent-child document. In Elasticsearch Join can not cross indexes and shards, so when saving document information, make sure that parent and child documents use the same routing parameters to ensure that parent and child documents are stored in the same shard of the same index, then What are the restrictions?

Restrictions on parent-child relationships

  • There can only be one relational field in each index
  • The parent document and the child document must be in the same index shard, so we need to set the routing value when adding, deleting, and modifying the parent and child documents to ensure that the data is in the same shard
  • A parent document can contain multiple child documents, but a child document can only have one parent document
  • Relationships can only be established on fields of type Join
  • Subdocuments can be added on the premise that the current document is the parent document

Global ordinals

The translation is the global ordinal. What is a global ordinal? It is explained in the official document. This is a thing to speed up the query. After using the global ordinal, the data can be made more compact; the details will not be expanded. I will have the opportunity to explain the global ordinal in detail later. At present, you can check the official documentation

For our content in this chapter, we know that the parent-child document Join type uses the global ordinal to speed up the query. By default, the global ordinal is basically constructed in real time. When the index changes, the global ordinal will be rebuilt. This process will increase the time of refresh , of course, this configuration can also be closed, but after closing, the global ordinal will be rebuilt in the first parent connection or aggregate query we encounter next, so that this Part of the time is given back to users, and the official does not recommend us to do this. It feels that it is not so friendly to users, and it is mainly a trade-off. The worst case is that there are multiple writes at the same time, that is, there are multiple global ordinals that need to be reconstructed at the same time, which will cause multiple global ordinals to be reconstructed within a single refresh time interval.

Of course, if the associated field is not used very frequently and there are many write events, it is recommended to disable it. The disable method is as follows

 PUT my-index-000001
{
  "mappings": {
    "properties": {
      "join_field": {
        "type": "join",
        "relations": {
           "goods": ["details","evaluate"],
           "evaluate":"vote"
        },
        "eager_global_ordinals": false
      }
    }
  }
}

Of course, you can use the following statement to view the heap size occupied by the global ordinal number

 # Per-index
GET my-index-000001/_stats/fielddata?human&fields=join_field#goods

# Per-node per-index
GET _nodes/stats/indices/fielddata?human&fields=join_field#goods

parent-child document

  • First of all, we still create a normal parent-child relationship index, with the product as the parent document and the details as the child document

     DELETE my-index-000001
    PUT my-index-000001
    {
      "mappings": {
        "properties": {
          "id": {
            "type": "keyword"
          },
          "join_field": { 
            "type": "join",
            "relations": {
              "goods": "details" 
            }
          }
        }
      }
    }
    • my-index-000001 : index name
    • id : document primary key
    • join_field : parent-child relationship field, type marked as Join for parent-child documents
    • relations : define the parent-child relationship, goods is the name of the parent document type, details is the name of the child document type, after inserting data, the query will use
  • Insert a few pieces of test data, the products include iphon and mac , the details are color appearance and memory configuration, etc.

     PUT my-index-000001/_doc/1?refresh
    {
      "id": "1",
      "text": "iphone 14 pro max",
      "join_field": {
        "name": "goods" 
      }
    }
    
    PUT my-index-000001/_doc/2?refresh
    {
      "id": "2",
      "text": "macbook pro ",
      "join_field": {
        "name": "goods"
      }
    }
    
    PUT my-index-000001/_doc/3?routing=1&refresh 
    {
      "id": "3",
      "text": "512G 16核",
      "join_field": {
        "name": "details", 
        "parent": "1" 
      }
    }
    
    PUT my-index-000001/_doc/4?routing=1&refresh
    {
      "id": "4",
      "text": "粉/银/黑/抹茶绿",
      "join_field": {
        "name": "details",
        "parent": "1"
      }
    }
    PUT my-index-000001/_doc/5?routing=1&refresh 
    {
      "id": "5",
      "text": "1T 32G",
      "join_field": {
        "name": "details", 
        "parent": "2" 
      }
    }
    
    PUT my-index-000001/_doc/6?routing=1&refresh
    {
      "id": "6",
      "text": "银/黑",
      "join_field": {
        "name": "details",
        "parent": "2"
      }
    }
  • Use parent_id to query the parent and child documents, query with the test data inserted above, and find the details of mac id .

     GET my-index-000001/_search
    {
      "query": {
        "parent_id": {
          "type": "details",
          "id":"2"
        }
      },
      "sort":["id"]
    }
  • In most cases, the above cannot satisfy our query request, so we can also use has_parent or has_child query

    • Use has_parent to query: all subdocuments that contain macbook in the parent document goods (the grandchild documents of the following text can also be queried)

       GET my-index-000001/_search
      {
        "query": {
          "has_parent": {
            "parent_type": "goods",
            "query": {
              "match": {
                "text": "macbook"
              }
            }
          }
        }
      }
  • Use hash_child to view details all parent documents that have the 1T keyword in the subdocument

     GET my-index-000001/_search
    {
      "query": {
        "has_child": {
          "type": "details",
          "query": {
            "match": {
              "text": "1T"
            }
          }
        }
      }
    }
  • Use parent-join query or aggregate

    Elasticsearch Join类型数据类型时,会自动创建一个附加的字段, Join的字段名加#号加Parent type, take the above as an example, create an additional field ( join_field#goods ), the following is an example of using parent-join field query aggregation, refer to the official website, apply 8.1版本 New feature of the runtime field

     GET my-index-000001/_search
    {
      "query": {
        "parent_id": { 
          "type": "details",
          "id": "1"
        }
      },
      "aggs": {
        "parents": {
          "terms": {
            "field": "join_field#goods", 
            "size": 10
          }
        }
      },
      "runtime_mappings": {
        "my_parent_field": {
          "type": "long",
          "script": """
            emit(Integer.parseInt(doc['join_field#goods'].value)) 
          """
        }
      },
      "fields": [
        { "field": "my_parent_field" }
      ]
    }
  • Join type parent-child document, above we demonstrated an example of a parent document corresponding to a child document type, Join type also supports a parent type with multiple child types, based on the above, Add the following statement to test

     DELETE my-index-000001
    PUT my-index-000001
    {
      "mappings": {
        "properties": {
          "id": {
            "type": "keyword"
          },
          "join_field": { 
            "type": "join",
            "relations": {
              "goods": ["details","evaluate"] 
            }
          }
        }
      }
    }
    PUT my-index-000001/_doc/7?routing=1&refresh
    {
      "id": "7",
      "text": "运行流程,无卡顿,待机时间长",
      "join_field": {
        "name": "evaluate",
        "parent": "1"
      }
    }
    PUT my-index-000001/_doc/8?routing=1&refresh
    {
      "id": "8",
      "text": "体重轻,携带方便,编码利器",
      "join_field": {
        "name": "evaluate",
        "parent": "2"
      }
    }
  • In the same way, careful classmates have seen it, the grandson document has been marked above, yes, you read that right, the grandson document, the third level, the level can be deeper, but Elasticsearch is not recommended to be very deep After all, Join is very performance-intensive, and it is useless if the level is deeper. The following is the multi-level statement test. At this time, the relationship between the three is as follows

 DELETE my-index-000001
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "join_field": { 
        "type": "join",
        "relations": {
          "goods": ["details","evaluate"],
          "evaluate":"vote"
        }
      }
    }
  }
}
PUT my-index-000001/_doc/9?routing=1&refresh
{
  "id": "9",
  "text": "这是投票信息:我买iphone是因为性价比高,保值",
  "join_field": {
    "name": "vote",
    "parent": "1"
  }
}
PUT my-index-000001/_doc/10?routing=1&refresh
{
  "id": "10",
  "text": "这是投票信息:我买mac是因为轻,携带方便,没有流氓软件",
  "join_field": {
    "name": "vote",
    "parent": "2"
  }
}

Summarize

I believe everyone can see it. Officials do not recommend the use of parent-child documents. After all, performance is a big problem. I believe everyone uses Elasticsearch must be that most of them are fast, and use Join field is slow, who can agree with this? There are pros and cons, it depends on your choice. The next article will bring you Elasticsearch recommended Join field substitution Nested

This article is published by mdnice Multiplatform


醉鱼
31 声望12 粉丝

临渊羡鱼不如退而织网