ElasticSearch basic use posture 2
This article, as the second part of the basic usage posture of elasticsearch, includes the following contents
- Query the specified field
- Limit the number of returned items
- Paging query
- Group query
- highlight
- autocomplete prompt
- sort
- Returns the result aggregation, such as the number of statistical documents, the sum and average of a field value, etc.
For more related knowledge points, please check: * Basic usage posture of ElasticSearch - a gray blog
<!-- more -->
0. Data preparation
Initialize an index, write some test data
post second-index/_doc
{
"@timestamp": "2021-06-10 08:08:08",
"url": "/test",
"execute": {
"args": "id=10&age=20",
"cost": 10,
"res": "test result"
},
"response_code": 200,
"app": "yhh_demo"
}
post second-index/_doc
{
"@timestamp": "2021-06-10 08:08:09",
"url": "/test",
"execute": {
"args": "id=20&age=20",
"cost": 11,
"res": "test result2"
},
"response_code": 200,
"app": "yhh_demo"
}
post second-index/_doc
{
"@timestamp": "2021-06-10 08:08:10",
"url": "/test",
"execute": {
"args": "id=10&age=20",
"cost": 12,
"res": "test result2"
},
"response_code": 200,
"app": "yhh_demo"
}
post second-index/_doc
{
"@timestamp": "2021-06-10 08:08:09",
"url": "/hello",
"execute": {
"args": "tip=welcome",
"cost": 2,
"res": "welcome"
},
"response_code": 200,
"app": "yhh_demo"
}
post second-index/_doc
{
"@timestamp": "2021-06-10 08:08:09",
"url": "/404",
"execute": {
"args": "tip=welcome",
"cost": 2,
"res": "xxxxxxxx"
},
"response_code": 404,
"app": "yhh_demo"
}
1. Query the specified field
For example, I only care about the status code returned by the url. I mainly use _source
to specify the fields to be queried. The query syntax is consistent with the previous introduction.
GET second-index/_search
{
"_source": [
"url",
"response_code"
],
"query": {
"match_all": {}
}
}
2. Return limit
Limiting the number of returned results is a relatively common case. In es, it is directly specified by size
GET second-index/_search
{
"query": {
"match_all": {}
},
"size": 2
}
3. Paging query
Limit the number of documents returned by size, and implement pagination by from
GET second-index/_search
{
"query": {
"match_all": {}
},
"size": 1,
"from": 1
}
(Note the screenshot of the output below. Compared with the above, the second piece of data is returned here)
4. Group query
Equivalent to group by
in sql, commonly used in statistical count scenarios in aggregation operations
In es, use aggs
to achieve, the syntax is as follows
"aggs": {
"agg-name": { // 这个agg-name 是自定义的聚合名称
"terms": { // 这个terms表示聚合的策略,根据 field进行分组
"field": "",
"size": 10
}
}
}
For example, if we want to count access counts based on urls, the corresponding query can be
GET second-index/_search
{
"query": {
"match_all": {}
},
"size": 1,
"aggs": {
"my-agg": {
"terms": {
"field": "url",
"size": 2
}
}
}
}
But when executing, you will find that it does not respond normally
The prompt information returned on the right is Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [url] in order to load field data by uninverting the inverted index. Note that this can use significant memory
this exception
To put it simply, the url field is of text type. By default, this type does not go through the index and does not support aggregation sorting. If necessary, you need to set fielddata=true
, or use the url segmentation url.keyword
GET second-index/_search
{
"query": {
"match_all": {}
},
"size": 1,
"aggs": {
"my-agg": {
"terms": {
"field": "url.keyword",
"size": 2
}
}
}
}
Notice
- Although we pay more attention to the results after grouping, the hit documents will still be returned in
hits
. If you only want the statistical results after grouping, you can addsize:0
to the query conditions. - Aggregation operations and query conditions can be combined, such as only querying the count corresponding to a url
GET second-index/_search
{
"query": {
"term": {
"url.keyword": {
"value": "/test"
}
}
},
"size": 1,
"aggs": {
"my-agg": {
"terms": {
"field": "url.keyword",
"size": 2
}
}
}
}
The field of TEXT type is introduced above, and the aggregation operation is performed according to the word segmentation; another way is to set fielddata=true
, the operation posture is as follows
PUT second-index/_mapping
{
"properties": {
"url": {
"type": "text",
"fielddata": true
}
}
}
After the modification is completed, the group query is performed according to the url, and an exception will not be thrown.
5. Full text search
By configuring a dynamic index template, all fields are constructed into a field for full-text search, so as to realize full-text search
6. Aggregate operation
The above grouping is also one of the aggregation operations. Next, take a closer look at the aggregation of es and what things can be supported
Aggregate syntax:
"aggs": {
"agg_name": { // 自定义聚合名
"agg_type": { // agg_type聚合类型, 如 min, max
"agg_body" // 要操作的计算值
},
"meta": {},
"aggregations": {} // 子聚合查询
}
}
From the perspective of aggregation classification, it can be divided into the following categories
- Metric Aggregation: Metric Analysis Aggregation
- Bucket Aggregation: bucket aggregation
- Pipeline: Pipeline Analysis Type
- Matrix: Matrix analysis type
5.1 Metric Aggregation: Metric Analysis Aggregation
Common ones are min, max, avg, sum, cardinality, value count
Usually value query some values that need to be obtained by calculation
Some demonstrations are given below
5.1.1 min minimum
Get the case with the least time-consuming request
GET second-index/_search
{
"size": 0,
"aggs": {
"min_cost": {
"min": {
"field": "execute.cost"
}
}
}
}
- size: 0 means no need to return the original data
- min_cost: custom aggregate name
- min: Indicates the aggregation type, which is the minimum value
-
"field": "execute.cost"
: Indicates the minimum value ofField: execute.cost
5.1.2 max
Basically the same as above, the request code is posted below, and the screenshot is omitted.
GET second-index/_search
{
"size": 0,
"aggs": {
"max_cost": {
"max": {
"field": "execute.cost"
}
}
}
}
5.1.3 sum summation
GET second-index/_search
{
"size": 0,
"aggs": {
"sum_cost": {
"sum": {
"field": "execute.cost"
}
}
}
}
5.1.4 avg average
In the statistics of monitoring the average time-consuming, this can still reflect the overall performance of the service.
GET second-index/_search
{
"size": 0,
"aggs": {
"avg_cost": {
"avg": {
"field": "execute.cost"
}
}
}
}
5.1.5 cardinality deduplication statistics count
This is equivalent to our common distinct count
note the difference between the following value count
count the number of documents with all values
GET second-index/_search
{
"_source": "url",
"aggs": {
"cardinality_cost": {
"cardinality": {
"field": "url"
}
}
}
}
Deduplicate the count of urls, as shown in the figure below, you can see that the returned statistical result is 3, but the actual number of documents is 5
5.1.6 value count count statistics
Statistics of the number of documents, different from the above deduplication statistics, here is the full amount returned
GET second-index/_search
{
"size": 0,
"aggs": {
"count_cost": {
"value_count": {
"field": "url"
}
}
}
}
The output results are matched with the return of cardinality, and a comparison can enhance understanding
5.1.7 stats multi-value calculation
A stats can return the calculated value of the above min,max,sum...
etc.
GET second-index/_search
{
"size": 0,
"aggs": {
"mult_cost": {
"stats": {
"field": "execute.cost"
}
}
}
}
5.1.8 extended_stats multivalue extension
Expand on the basis of the above stats to support variance, standard deviation, etc.
GET second-index/_search
{
"size": 0,
"aggs": {
"mult_cost": {
"extended_stats": {
"field": "execute.cost"
}
}
}
}
5.1.9 percentile percentile statistics
The record value used to count xx%, less than or equal to the right
As shown in the screenshot below, it can be seen that 99% of the records take less than 12
The default percentage range is: [1, 45, 25, 50, 75, 95, 99]
, which can be modified manually
GET second-index/_search
{
"size": 0,
"aggs": {
"agg_cost": {
"percentiles": {
"field": "execute.cost",
"percents": [
10,
50,
90,
99
]
}
}
}
}
5.1.10 The interval in which the percentile rank statistic is located
The above is used to count the proportion of different intervals, such as the age distribution of the company's personnel; and this one is I want to know which proportion I am 18 years old in.
GET second-index/_search
{
"size": 0,
"aggs": {
"agg_cost": {
"percentile_ranks": {
"field": "execute.cost",
"values": [6, 9]
}
}
}
}
Related blog posts
ElasticSearch: aggregation details
Deep Learning for Elasticsearch Aggregation Analysis
Elasticsearch: The Definitive Guide - Aggregation
A grey contact
It is better to have no books than no books. The above content is purely from one family. Due to limited personal ability, there are inevitably omissions and mistakes. If you find bugs or have better suggestions, you are welcome to criticize and correct them. Thank you
- Personal site: https://blog.hhui.top
- Weibo address: Xiaohuihui Blog
- QQ: A gray gray / 3302797840
- WeChat public account: a gray blog
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。