摘抄一篇关于ES DSL相关的文章

Query DSL

查询所有数据

GET /music/children/_search
{
    "query":{
        "match_all":{}
   }
}

带条件+排序

GET /music/children/_search
{
      "query":{
             "match":{
                 "name":"gymbo"
              }
       },
      "sort":[{"length":"desc"}]
}

分页查询,size从0开始,下面取第10条到19条数据

GET /music/children/_search
{
     "query":{
          "match_all":{}
     },
     "from":10,
     "size":10
}

指定查询的字段

GET /music/children/_search
{
      "query":{
      "match_all":{}
    },
    "_source":["name","content"]
}

Query filter

带多个条件过滤

歌曲名称是gymbo,且时长在65到80秒之间

GET /music/children/_search
{
    "query":{
      "bool":{
       "must":[{
         "match":{
           "name":"gymbo"
        }
      }],
       "filter":{
         "range":{
            "lenth":{
              "gte":65,
              "lte":80
            }
         }
       }
     }
    }
}

全文检索

搜索结果content字段中包含frient或smile的数据

GET /music/children/_search
{
    "query":{
      "match":{
        "content":"frient smile"
    }
    }
}

短语检索

GET /music/children/_search
{
    "query":{
      "match_phrase":{
        "content":"frient smile"
      }
    }
}

全文检索match会拆词,大小写敏感,然后倒排索引里去匹配;phrase search不分词,大小写敏感,要求搜索串完全一样才匹配。

高亮检索

GET /music/children/_search
{
    "query":{
      "match_phrase":{
        "content":"friend smile"
      }
    },
    "hignlight":{
      "fields":{
         "content":{}
      }
    }
}

匹配的关键词会高亮显示,高亮的内容用标签达到标记效果。

聚合分析

聚合分析类似于关系型数据库的分组统计,并且用的语法名称大多与mysql类似

单field分组统计

需求:统计每种语言下的歌曲数量
size为0表示不显示符合条件的document记录,只显示统计信息,不写的话默认值是10

GET /music/children/_search
{
    "size":0,
    "aggs":{
      "group_by_lang":{
        "terms":{
            "field":"language"
        }
      }
    }
}

响应结果:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_lang": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "english",
          "doc_count": 1
        }
      ]
    }
  }
}

如果聚合查询时出现如下错误提示:

"root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [language] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ]

需要将用于分组的字段的fielddata属性设置为true

PUT /music/_mapping/children
{
  "properties":{
    "language":{
      "type":"text",
      "fielddata":true
    }
  }
}

带查询条件的分组统计

需求:对歌词中出现"friend"的歌曲,计算每个语种下的歌曲数量

GET /music/children/_search
{
  "size": 0,
  "query": {
    "match": {
      "content": "friend"
    }
  },
  "aggs": {
    "all_languages": {
      "terms": {
        "field": "language"
      }
    }
  }
}

求平均值

GET /music/children/_search
{
    "size": 0,
    "aggs": {
        "group_by_languages": {
            "terms": {
                "field": "language"
            },
            "aggs": {
                "avg_length": {
                    "avg": {
                        "field": "length"
                    }
                }
            }
        }
    }
}

分组后排序

需求:计算每个语种下手歌曲,平均时长是多少,并按平均时长降序排序

GET /music/children/_search
{
    "size": 0,
    "aggs": {
        "group_by_languages": {
            "terms": {
                "field": "language",
                "order": {
                  "avg_length": "desc"
                }
            },
            "aggs": {
                "avg_length": {
                    "avg": {
                        "field": "length"
                    }
                }
            }
        }
    }
}

嵌套查询,区间分组+分组统计+平均值

需求:按照指定的时长范围区间进行分组,然后在每组内再按照语种进行分组,最后再计算时长的平均值

GET /music/children/_search
{
  "size": 0,
  "aggs": {
    "group_by_price": {
      "range": {
        "field": "length",
        "ranges": [
          {
            "from": 0,
            "to": 60
          },
          {
            "from": 60,
            "to": 120
          },
          {
            "from": 120,
            "to": 180
          }
        ]
      },
      "aggs": {
        "group_by_languages": {
          "terms": {
            "field": "language"
          },
          "aggs": {
            "average_length": {
              "avg": {
                "field": "length"
              }
            }
          }
        }
      }
    }
  }
}

批量查询

上面的示例请求都是单个发的,ES还有一种语法,可以合并多个请求进行批量查询,这样可以减少每个请求的单独的网络开销,语法如下:

GET /_mget
{
  "docs": [
    {
      "_index" : "music",
       "_type" : "children",
       "_id" :    1
    },
    {
      "_index" : "music",
       "_type" : "children",
       "_id" :    2
    }
  ]
}

mget下面的docs参数是一个数组,数组里面每个元素都可以定义一个文档的_index、_type和_id元数据,_index可相同也可不相同,也可以定义_source元数据指定想要的field.
响应示例:

{
  "docs": [
    {
      "_index": "music",
      "_type": "children",
      "_id": "1",
      "_version": 4,
      "found": true,
      "_source": {
        "name": "gymbo",
        "content": "I hava a friend who loves smile, gymbo is his name",
        "language": "english",
        "length": "75",
        "likes": 0
      }
    },
    {
      "_index": "music",
      "_type": "children",
      "_id": "2",
      "_version": 13,
      "found": true,
      "_source": {
        "name": "wake me, shark me",
        "content": "don't let me sleep too late, gonna get up brightly early in the morning",
        "language": "english",
        "length": "55",
        "likes": 9
      }
    }
  ]
}

响应同样是一个docs数组,数组长度与请求时保持一致,如果有文档不存在、未搜索到或别的原因导致报错,不影响整体的结果,mget的http响应码仍然是200,每个文档的搜索都是独立的。
如果批量查询的文档是在同一个index下面,可以将_index元数据移到请求行中:

GET /music/children/_mget
{
  "docs": [
    {
       "_id" :    1
    },
    {
       "_id" :    2
    }
  ]
}

或者:

GET /music/children/_mget
{
  "ids":[1,2]
}

mget的重要性

mget在查询时,如果一次性要查询多条数据那么一定要用batch批量操作的api,尽可能减少网络开销次数。


步履不停
38 声望13 粉丝

好走的都是下坡路