Elastic Search 入门 & DSL应用

Reference

6.4最新版英文：https://www.elastic.co/guide/...
中文：https://www.elastic.co/guide/...
5.4中文：http://cwiki.apachecn.org/pag...

Basic Concepts

Near Realtime （NRT 近实时）：数据写入后到可以被查询会有轻微的延迟（通常为1s）
Cluster （集群）：一个或者多个节点（ Node ）的集合，以cluster name集群名作为唯一标识
Node （节点）：一个ES实例就是一个node，大多数情况下每个node运行在一个独立的环境或虚拟机上。
Index （索引）：一系列documents的集合。类似于数据库中的db概念
Type（类型）：一个类型是索引中一个逻辑的种类/分区，类似于数据库中的table概念。
Document（文档）：索引信息的基本单位

Shards & Replicas （分片和副本）

Shards：每个索引有一个或多个分片，索引的数据被分配到各个分片上，相当于一桶水用了N个杯子装
Replicas：副本，备份分片。每个主分片都有一个备份分片在另一个节点上，此时允许挂掉任意一个节点。

Defination

DSL（Domain Specific Language）:Elasticsearch 定义的查询语言

ES字段类型：https://blog.csdn.net/chengyu...

Elasticsearch vs 传统数据库

Elasticsearch	关系型数据库	NOSQL数据库
索引（index）	数据库（database）	数据库（database）
文档（document）	行（row）	文档（document）
字段（fields）	字段（columns）	字段（fields）

ElasticSearch相较于传统数据库的缺陷
· 不支持事务性操作
· 读写延时（NTR）
· 不适合频繁的update等操作
· 安全性可靠性

API

Settings API: 获取索引设置

GET es-index_*/_settings
{
  "es-index_*": {
    "settings": {
      "index": {
        "mapping": {
          "ignore_malformed": "true"
        },
        "refresh_interval": "10s",
        "translog": {
          "durability": "async"
        },
        "max_result_window": "10000",
        "creation_date": "1551295476399",
        "requests": {
          "cache": {
            "enable": "true"
          }
        },
        "unassigned": {
          "node_left": {
            "delayed_timeout": "6h"
          }
        },
        "priority": "5",
        "number_of_replicas": "1",
        "uuid": "-JvfCJ3-TCaMMxqnOiOfNA",
        "version": {
          "created": "2030399"
        },
        "codec": "best_compression",
        "routing": {},
        "search": {
          "slowlog": {
            "threshold": {
              "fetch": {
                "warn": "1s",
                "trace": "200ms",
                "debug": "500ms",
                "info": "800ms"
              },
              "query": {
                "warn": "10s",
                "trace": "500ms",
                "debug": "1s",
                "info": "5s"
              }
            }
          }
        },
        "number_of_shards": "12",
        "merge": {
          "scheduler": {
            "max_thread_count": "1"
          }
        }
      },
      "tribe": {
        "name": "olap"
      }
    }
  }
}

Stats API: 获取索引统计信息（http://cwiki.apachecn.org/pag...）

GET es-index_*/_stats
{
  "_shards": {
    "total": 622,
    "successful": 622,
    "failed": 0
  },
 //返回的统计信息是索引级的聚合结果，具有primaries和total的聚合结果。其中primaries只是主分片的值，total是主分片和副本分片的累积值。
  "_all": {
    "primaries": {
      "docs": {  //文档和已删除文档（尚未合并的文档）的数量。注意，此值受刷新索引的影响。
        "count": 2932357017,
        "deleted": 86610
      },
      "store": { //索引的大小。
        "size_in_bytes": 2573317479532,
      }, 
      "indexing": {}, //索引统计信息，可以用逗号分隔的type列表组合，以提供文档级统计信息。
      "get": {}, // get api调用统计
      "search": {}, // search api 调用统计
     },
  
    "total": {
    }
  }
}

Search API（两种形式）

using a simple query string as a parameter

GET es-index_*/_search?q=eventid:OMGH5PageView

using a request body

GET es-index_*/_search
{
  "query": {
    "term": {
      "eventid": {
        "value": "OMGH5PageView"
      }
    }
  }
}

Query DSL

Leaf Query Clause: 叶查询子句
Compound Query Clause: 复合查询子句

DSL查询上下文

query context
在查询上下文中，回答的问题是：How well does this document match this query clause?
除了判断一条数据记录(document)是否匹配查询条件以外，还要计算其相对于其他记录的匹配程度，通过_score进行记录。
filter context**
在过滤上下文中，回答的问题是：Does this document match this query clause?
仅判断document是否匹配，不计算_score
一般用来过滤结构化数据,
e.g. timestamp是否在2017-2018范围内，status是否是published
频繁使用的过滤器会被Elasticsearch自动缓存，可提高性能

** 查询时，可先使用filter过滤操作过滤数据，然后使用query查询匹配数据

查询结果字段过滤

fields：字段过滤
script_fields：可对原始数据进行计算

"fields": ["eh"],  //仅返回eh字段
"script_fields": {
   "test": {
      "script": "doc['eh'].value*2"
   }
} // 返回eh字段值*2的数据并命名为test字段

查询过滤：query

bool 组合过滤器

{
   "bool" : {
      "must" :     [], // 所有的语句都必须匹配，相当于SQL中的and
      "must_not" : [], // 所有的语句都不能匹配，相当于SQL中的not
      "should" :   [], // 至少有一个语句要匹配，相当于SQL中的OR
      "filter" :   [] || {
          "and": [],
          "or": [],
          "not": [],
      }, // 
   }
}

filtered过滤器

{
    "filtered": {
          "query": {},
          "filter": {} // 在filter中进行数据过滤，然后再去query中进行匹配
    }
}

match和term

match（模糊匹配）：先检查字段类型是否是analyzed，如果是，则先分词，再去去匹配token；如果不是，则直接去匹配token。
term（精确匹配）：直接去匹配token。

terms: 多项查询

{ terms : { user: ['tony', 'kitty' ] } }

range范围过滤

对于date类型字段的范围选择可以使用 Date Math

{
     "range" : {
          "born" : {
              "gte": "01/01/2012",
              "lte": "2013",
              "format": "dd/MM/yyyy||yyyy" 
           }
       }
 }


{
     "range" : {
          "timestamp" : {
              "gte": "now-6d/d", // Date Math
              "lte": "now/d", // Date Math
              "time_zone": "+08:00"  // 时区
           }
       }
 }

exists 该条记录是否存在某个字段

{
     "exists" : { "field" : "user" }
}

wildcard: 通配符查询（对分词进行匹配查询）

Note that this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow wildcard queries, a wildcard term should not start with one of the wildcards * or ?
wildcard查询性能较差，尽量避免使用*或？开头来进行wildcard匹配

prefix: 前缀查询
regexp：正则表达式查询

Tips

value带-的特殊处理

value带了-，则默认会被切词，导致搜索结果不准确。解决办法之一就是在字段那里加个.raw

term: {status:'pre-active'} => term: {status.raw: 'pre-active'}

sort

GET es-index_*/_search
{
  "fields" : ["eventid", "logtime"],
  "query": {
    "term": {
      "eventid": {
        "value": "OMGH5PageView"
      }
    }
  },
  "sort": [
    {
      "logtime": {
        "order": "asc"
      }
    }
  ]
}

聚合aggregation

date_histogram

（和 histogram 一样）默认只会返回文档数目非零的 buckets。即使 buckets
中没有文档我们也想返回。可以通过设置两个额外参数来实现这种效果：

"min_doc_count" : 0,  // 这个参数强制返回空 buckets。
"extended_bounds" : {  // 强制返回整年
    "min" : "2014-01-01",
    "max" : "2014-12-31"
}

查询返回结果参数

took: 查询返回的时间（单位：毫秒）
time_out: 查询是否超时
_shards: 描述查询分片的信息，包括：查询了多少分片，成功的分片数量，失败的分片数量等
hits：搜索的结果
total: 满足查询条件的文档数
max_score:
hits: 满足条件的文档
_score: 文档的匹配程度

Elastic Search 入门 & DSL应用

Reference

Basic Concepts

Defination

Elasticsearch vs 传统数据库

API

Query DSL

Tips

sort

聚合aggregation

查询返回结果参数

spoonysnail

引用和评论

【ONVIF】Concepts

试试 Elasticsearch 的 unsigned_long（qbit）

换掉ES！SpringBoot + Meilisearch实现商品搜索，太方便了！

超越Elasticsearch！号称下一代搜索引擎，性能炸裂！

优秀！一款基于 SpringBoot + Vue 开发的网盘系统！

day01-基本查询

ElasticSearch 可观测性最佳实践