ElasticSearch 学习笔记 - 11.桶聚合 - 个人文章

1、源数据


DELETE my-index

PUT my-index

PUT my-index/person/1
{
  "name":"张三",
  "age":27,
  "gender":"男",
  "salary":15000,
  "dep":"bigdata"
}

PUT my-index/person/2
{
  "name":"李四",
  "age":26,
  "gender":"女",
  "salary":15000,
  "dep":"bigdata"
}

PUT my-index/person/3
{
  "name":"王五",
  "age":26,
  "gender":"男",
  "salary":17000,
  "dep":"AI"
}
PUT my-index/person/4
{
  "name":"刘六",
  "age":27,
  "gender":"女",
  "salary":18000,
  "dep":"AI"
}

PUT my-index/person/5
{
  "name":"程裕强",
  "age":31,
  "gender":"男",
  "salary":20000,
  "dep":"bigdata"
}
PUT my-index/person/6
{
  "name":"hadron",
  "age":30,
  "gender":"男",
  "salary":20000,
  "dep":"AI"
}

2、Terms Aggregation

根据薪资水平进行分组，统计每个薪资水平的人数

GET /my-index/person/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "terms": {
        "field": "salary"
      }
    }
  }
}

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 15000,
          "doc_count": 2
        },
        {
          "key": 20000,
          "doc_count": 2
        },
        {
          "key": 17000,
          "doc_count": 1
        },
        {
          "key": 18000,
          "doc_count": 1
        }
      ]
    }
  }
}

统计上面每个分组的平均年龄

GET /my-index/person/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "terms": {
        "field": "salary"
      }
      , "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 15000,
          "doc_count": 2,
          "avg_age": {
            "value": 26.5
          }
        },
        {
          "key": 20000,
          "doc_count": 2,
          "avg_age": {
            "value": 30.5
          }
        },
        {
          "key": 17000,
          "doc_count": 1,
          "avg_age": {
            "value": 26
          }
        },
        {
          "key": 18000,
          "doc_count": 1,
          "avg_age": {
            "value": 27
          }
        }
      ]
    }
  }
}

统计每个部门的人数

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "terms": {"field": "dep"}
    }
  }
}

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my-index",
        "node": "fQDwpdT2RfSfPr8ttHQCkA",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    }
  },
  "status": 400
}

根据错误提示”Fielddata is disabled on text fields by default.
Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index.
Note that this can however use significant memory. Alternatively use a keyword field instead.”可知，需要开启fielddata参数。只需要设置某个字段"fielddata": true即可。
此外，根据官方文档提示se the my_field.keyword field for aggregations, sorting, or in scripts，可以尝试my_field.keyword格式用于聚合操作。

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "terms": {"field": "dep.keyword"}
    }
  }
}

2、Filter Aggregation

计算男人的平均年龄

也就是统计gender字段包含关键字“男”的文档的age平均值。

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "filter": {
        "term":{"gender": "男"}
      },
      "aggs":{
        "avg_age":{
          "avg":{"field": "age"}
        }
      }
    }
  }
}

3、Filters Aggregation

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "filters":{
        "filters": [
          {"match":{"gender": "男"}},
          {"match":{"gender": "女"}}
        ]
      },
      "aggs":{
        "avg_age":{
            "avg":{"field": "age"}
        }
      }
    }
  }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "buckets": [
        {
          "doc_count": 4,
          "avg_age": {
            "value": 28.5
          }
        },
        {
          "doc_count": 2,
          "avg_age": {
            "value": 26.5
          }
        }
      ]
    }
  }
}

4、Range Aggregation

from..to区间范围是[from,to),也就是说包含from点，不包含to点
【例子】查询薪资在[0,10000),[10000,20000),[2000,+无穷大)三个范围的员工数

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "range": {
        "field": "salary",
        "ranges": [
            {"to": 10000},
            {"from": 10000,"to":20000},  
            {"from": 20000}
        ]
      }
    }
  }
}

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "buckets": [
        {
          "key": "*-10000.0",
          "to": 10000,
          "doc_count": 0
        },
        {
          "key": "10000.0-20000.0",
          "from": 10000,
          "to": 20000,
          "doc_count": 4
        },
        {
          "key": "20000.0-*",
          "from": 20000,
          "doc_count": 2
        }
      ]
    }
  }
}

5、Date Range聚合

专用于日期值的范围聚合。
这种聚合和正常范围聚合的主要区别在于，起始和结束值可以在日期数学表达式中表示，并且还可以指定返回起始和结束响应字段的日期格式。
请注意，此聚合包含from值并排除每个范围的值。

【例子】计算一年前之前发表的博文数和从一年前以来发表的博文总数

GET website/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "range": {
        "field": "postdate",
        "format":"yyyy-MM-dd",
        "ranges": [
            {"to": "now-12M/M"},
            {"from": "now-12M/M"}
        ]
      }
    }
  }
}



{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "buckets": [
        {
          "key": "*-2017-10-01",
          "to": 1506816000000,
          "to_as_string": "2017-10-01",
          "doc_count": 8
        },
        {
          "key": "2017-10-01-*",
          "from": 1506816000000,
          "from_as_string": "2017-10-01",
          "doc_count": 1
        }
      ]
    }
  }
}

6、Missing聚合

基于字段数据的单桶集合，创建当前文档集上下文中缺少字段值（实际上缺少字段或设置了配置的NULL值）的所有文档的桶。
此聚合器通常会与其他字段数据存储桶聚合器（如范围）一起使用，以返回由于缺少字段数据值而无法放置在其他存储桶中的所有文档的信息。

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "noDep_count": {
      "missing": {"field": "salary"}
    }
  }
}

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "noDep_count": {
      "doc_count": 3
    }
  }
}

ElasticSearch 学习笔记 - 11.桶聚合

1、源数据

2、Terms Aggregation

根据薪资水平进行分组，统计每个薪资水平的人数

统计上面每个分组的平均年龄

统计每个部门的人数

2、Filter Aggregation

计算男人的平均年龄

3、Filters Aggregation

4、Range Aggregation

5、Date Range聚合

6、Missing聚合

kyle

引用和评论

Elasticsearch 8.x 重要变化（qbit）

试试 Elasticsearch 的 unsigned_long（qbit）

换掉ES！SpringBoot + Meilisearch实现商品搜索，太方便了！

超越Elasticsearch！号称下一代搜索引擎，性能炸裂！

优秀！一款基于 SpringBoot + Vue 开发的网盘系统！

day01-基本查询

ElasticSearch 可观测性最佳实践