ElasticSearch basic use posture 2

This article, as the second part of the basic usage posture of elasticsearch, includes the following contents

  • Query the specified field
  • Limit the number of returned items
  • Paging query
  • Group query
  • highlight
  • autocomplete prompt
  • sort
  • Returns the result aggregation, such as the number of statistical documents, the sum and average of a field value, etc.
For more related knowledge points, please check: * Basic usage posture of ElasticSearch - a gray blog
<!-- more -->

0. Data preparation

Initialize an index, write some test data

 post second-index/_doc
{
  "@timestamp": "2021-06-10 08:08:08",
  "url": "/test",
  "execute": {
    "args": "id=10&age=20",
    "cost": 10,
    "res": "test result"
  },
  "response_code": 200,
  "app": "yhh_demo"
}


post second-index/_doc
{
  "@timestamp": "2021-06-10 08:08:09",
  "url": "/test",
  "execute": {
    "args": "id=20&age=20",
    "cost": 11,
    "res": "test result2"
  },
  "response_code": 200,
  "app": "yhh_demo"
}


post second-index/_doc
{
  "@timestamp": "2021-06-10 08:08:10",
  "url": "/test",
  "execute": {
    "args": "id=10&age=20",
    "cost": 12,
    "res": "test result2"
  },
  "response_code": 200,
  "app": "yhh_demo"
}


post second-index/_doc
{
  "@timestamp": "2021-06-10 08:08:09",
  "url": "/hello",
  "execute": {
    "args": "tip=welcome",
    "cost": 2,
    "res": "welcome"
  },
  "response_code": 200,
  "app": "yhh_demo"
}

post second-index/_doc
{
  "@timestamp": "2021-06-10 08:08:09",
  "url": "/404",
  "execute": {
    "args": "tip=welcome",
    "cost": 2,
    "res": "xxxxxxxx"
  },
  "response_code": 404,
  "app": "yhh_demo"
}

1. Query the specified field

For example, I only care about the status code returned by the url. I mainly use _source to specify the fields to be queried. The query syntax is consistent with the previous introduction.

 GET second-index/_search
{
  "_source": [
    "url",
    "response_code"
  ],
  "query": {
    "match_all": {}
  }
}

2. Return limit

Limiting the number of returned results is a relatively common case. In es, it is directly specified by size

 GET second-index/_search
{
  "query": {
    "match_all": {}
  },
  "size": 2
}

3. Paging query

Limit the number of documents returned by size, and implement pagination by from

 GET second-index/_search
{
  "query": {
    "match_all": {}
  },
  "size": 1,
  "from": 1
}

(Note the screenshot of the output below. Compared with the above, the second piece of data is returned here)

4. Group query

Equivalent to group by in sql, commonly used in statistical count scenarios in aggregation operations

In es, use aggs to achieve, the syntax is as follows

 "aggs": {
    "agg-name": { // 这个agg-name 是自定义的聚合名称
        "terms": { // 这个terms表示聚合的策略,根据 field进行分组
            "field": "",
            "size": 10
        }
    }
}

For example, if we want to count access counts based on urls, the corresponding query can be

 GET second-index/_search
{
  "query": {
    "match_all": {}
  },
  "size": 1, 
  "aggs": {
    "my-agg": {
      "terms": {
        "field": "url",
        "size": 2
      }
    }
  }
}

But when executing, you will find that it does not respond normally

The prompt information returned on the right is Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [url] in order to load field data by uninverting the inverted index. Note that this can use significant memory this exception

To put it simply, the url field is of text type. By default, this type does not go through the index and does not support aggregation sorting. If necessary, you need to set fielddata=true , or use the url segmentation url.keyword

 GET second-index/_search
{
  "query": {
    "match_all": {}
  },
  "size": 1, 
  "aggs": {
    "my-agg": {
      "terms": {
        "field": "url.keyword",
        "size": 2
      }
    }
  }
}

Notice

  • Although we pay more attention to the results after grouping, the hit documents will still be returned in hits . If you only want the statistical results after grouping, you can add size:0 to the query conditions.
  • Aggregation operations and query conditions can be combined, such as only querying the count corresponding to a url
 GET second-index/_search
{
  "query": {
    "term": {
      "url.keyword": {
        "value": "/test"
      }
    }
  },
  "size": 1, 
  "aggs": {
    "my-agg": {
      "terms": {
        "field": "url.keyword",
        "size": 2
      }
    }
  }
}

The field of TEXT type is introduced above, and the aggregation operation is performed according to the word segmentation; another way is to set fielddata=true , the operation posture is as follows

 PUT second-index/_mapping
{
  "properties": {
    "url": {
      "type": "text",
      "fielddata": true
    }
  }
}

After the modification is completed, the group query is performed according to the url, and an exception will not be thrown.

5. Full text search

By configuring a dynamic index template, all fields are constructed into a field for full-text search, so as to realize full-text search

6. Aggregate operation

The above grouping is also one of the aggregation operations. Next, take a closer look at the aggregation of es and what things can be supported

Aggregate syntax:

 "aggs": {
    "agg_name": { // 自定义聚合名
        "agg_type": { // agg_type聚合类型, 如 min, max
            "agg_body" // 要操作的计算值
        }, 
        "meta": {}, 
        "aggregations": {} // 子聚合查询
    }
}

From the perspective of aggregation classification, it can be divided into the following categories

  • Metric Aggregation: Metric Analysis Aggregation
  • Bucket Aggregation: bucket aggregation
  • Pipeline: Pipeline Analysis Type
  • Matrix: Matrix analysis type

5.1 Metric Aggregation: Metric Analysis Aggregation

Common ones are min, max, avg, sum, cardinality, value count

Usually value query some values that need to be obtained by calculation

Some demonstrations are given below

5.1.1 min minimum

Get the case with the least time-consuming request

 GET second-index/_search
{
  "size": 0,
  "aggs": {
    "min_cost": {
      "min": {
        "field": "execute.cost"
      }
    }
  }
}
  • size: 0 means no need to return the original data
  • min_cost: custom aggregate name
  • min: Indicates the aggregation type, which is the minimum value
  • "field": "execute.cost" : Indicates the minimum value of Field: execute.cost

5.1.2 max

Basically the same as above, the request code is posted below, and the screenshot is omitted.

 GET second-index/_search
{
  "size": 0,
  "aggs": {
    "max_cost": {
      "max": {
        "field": "execute.cost"
      }
    }
  }
}
5.1.3 sum summation
 GET second-index/_search
{
  "size": 0,
  "aggs": {
    "sum_cost": {
      "sum": {
        "field": "execute.cost"
      }
    }
  }
}
5.1.4 avg average

In the statistics of monitoring the average time-consuming, this can still reflect the overall performance of the service.

 GET second-index/_search
{
  "size": 0,
  "aggs": {
    "avg_cost": {
      "avg": {
        "field": "execute.cost"
      }
    }
  }
}
5.1.5 cardinality deduplication statistics count

This is equivalent to our common distinct count note the difference between the following value count count the number of documents with all values

 GET second-index/_search
{
  "_source": "url", 
  "aggs": {
    "cardinality_cost": {
      "cardinality": {
        "field": "url"
      }
    }
  }
}

Deduplicate the count of urls, as shown in the figure below, you can see that the returned statistical result is 3, but the actual number of documents is 5

5.1.6 value count count statistics

Statistics of the number of documents, different from the above deduplication statistics, here is the full amount returned

 GET second-index/_search
{
  "size": 0, 
  "aggs": {
    "count_cost": {
      "value_count": {
        "field": "url"
      }
    }
  }
}

The output results are matched with the return of cardinality, and a comparison can enhance understanding

5.1.7 stats multi-value calculation

A stats can return the calculated value of the above min,max,sum... etc.

 GET second-index/_search
{
  "size": 0, 
  "aggs": {
    "mult_cost": {
      "stats": {
        "field": "execute.cost"
      }
    }
  }
}

5.1.8 extended_stats multivalue extension

Expand on the basis of the above stats to support variance, standard deviation, etc.

 GET second-index/_search
{
  "size": 0, 
  "aggs": {
    "mult_cost": {
      "extended_stats": {
        "field": "execute.cost"
      }
    }
  }
}

5.1.9 percentile percentile statistics
The record value used to count xx%, less than or equal to the right

As shown in the screenshot below, it can be seen that 99% of the records take less than 12

The default percentage range is: [1, 45, 25, 50, 75, 95, 99] , which can be modified manually

 GET second-index/_search
{
  "size": 0, 
  "aggs": {
    "agg_cost": {
      "percentiles": {
        "field": "execute.cost",
        "percents": [
          10,
          50,
          90,
          99
        ]
      }
    }
  }
}
5.1.10 The interval in which the percentile rank statistic is located

The above is used to count the proportion of different intervals, such as the age distribution of the company's personnel; and this one is I want to know which proportion I am 18 years old in.

 GET second-index/_search
{
  "size": 0, 
  "aggs": {
    "agg_cost": {
      "percentile_ranks": {
        "field": "execute.cost",
        "values": [6, 9]
      }
    }
  }
}

Related blog posts

ElasticSearch: aggregation details

Deep Learning for Elasticsearch Aggregation Analysis

Elasticsearch: The Definitive Guide - Aggregation

A grey contact

It is better to have no books than no books. The above content is purely from one family. Due to limited personal ability, there are inevitably omissions and mistakes. If you find bugs or have better suggestions, you are welcome to criticize and correct them. Thank you

QrCode


小灰灰Blog
251 声望46 粉丝