1. 深度分页
1.1. 定义
早些年在企业表格类的页面中,我们总能见到分页的那种表格设计。当分页的页数太多,还设计了“跳转到x页”的功能。
我们这里讲的“深度分页”,就是指当分页太多(假设有几十万页),当访问第几万页时,必然会查询很慢。因为针对通常对于分页的搜索,都是将几万页的数据查出来,然后取差集,只返回当前页的数据。只为了返回一页的数据,却查询的很深入,带来的性能代价很大。
1.2. 需求上杜绝
1. 少分页
首先,我们先想想这样的需求设计是否合理,谁会在一个表格中跳到第几万页查看数据。我们如果想要找到想要的内容,应该是通过搜索,而非每页都点进去去看。
所以看看百度、谷歌搜索引擎的设计,都是最多只展示10页。
2. “翻页”替代“跳页”
那如果有些场景,我就需要查看所有页面的内容呢?例如:贴吧的评论等。ok,就算有这种情况,也不应该存在“跳页”的需求,可以设计成一页一页的“翻页”。
例如我们身边很多app的设计:下拉刷新更多。每下拉一次,就多展示一页数据。
总结来看,实在想不到需要深度“跳页”的需求,那就应该在需求上杜绝。如果为了一个没必要的需求,需要对研发、服务器投入大量的资源消耗,是很浪费的。
1.3. 技术实现
下面是从 mysql 和 elasticsearch 的维度讲讲深度分页的实现。这里先总结一下:
- mysql:深度分页,包括跳页也能做,可以通过减少回表提高性能
- elasticsearch:天然不建议跳页,但提供了翻页的优化方案
2. mysql深度分页优化
2.1. 减少回表
优化前
表格查询,基本上查询的字段都要求很多。按照通常分页查询的场景,假设正常分页查询的sql如下:
select * from table_name where userId = 'xxx' limit 10000, 10
就算我们基于 userId 创建普通索引,因为 select * ,我们依然要先基于普通索引查询 10010 条数据,然后回表查询 10010 次,但最后再将前 10000条数据丢掉,只取10条。
优化后
优化的sql是:
select * from table_name where id in (select id from table_name where userId = 'xxx' limit 10000, 10)
这里建立子查询,子查询中只查询了主键,可以走普通索引直接查到。而最终的回表查询,只是查 10 条数据。所以,避免了 10000 次的回表,而且避免了 10000 次 select *
带来的 io 浪费。
这里的优化,适用于翻页、跳页的场景。那么下面针对于翻页,还可以进一步优化。
2.2. 翻页优化
如果是翻页,那每一次查询,都能拿到上一次查询的最后一条数据。这里有个前提,就看我们分页搜索是根据什么字段排序的。这里先假设是根据id排序的,id是顺序的(自增长或雪花ID),假设上一次查询的id为 15000,那么sql可以是:
select * from table_name where userId = 'xxx' and id > 15000 limit 0, 10
3. es深度分页
3.1. from/size
这个是我们最常用的 es 分页语法了。但是它天然不支持深度分页。
要求:from + size < max_result_window ,否则es查询就会报错。而es max_result_window默认值为 10000。为什么呢?
假设 es 索引创建了8个分片,假设我们查询 from=1000,size=10
,第101页的10条数据,实际需要查询多少数据呢?
因为 es 不知道这第 101页的数据在哪个分片,所以协调节点发请求命令到每个分片对应的数据节点上,从每个分片都获取了 1010 条数据。
此时协调节点中一共获取到了 1010 * 8 = 8080 条数据,再基于协调节点的内存做排序,拿到符合 from=1000,size=10
的10条数据,最终将 8070 条数据丢弃。
这里就分页到了101页,如果分页上万呢,如果es的索引分片更多呢?就为了获取这10条数据,对es的内存等性能损害太大了,所以才有了 max_result_window 的限制。
3.2. scroll遍历
ES官方不再推荐使用Scroll API 进行深度分页。 如果您需要在分页超过 10,000 个点击时保留索引状态,请使用带有时间点 (PIT) 的 search_after 参数。所以这里就不多说了。
Scroll API 原理上是对某次查询生成一个游标 scroll_id , 后续的查询只需要根据这个游标去取数据,直到结果集中返回的 hits 字段为空,就表示遍历结束。scroll_id 的生成可以理解为建立了一个临时的历史快照,在此之后的增删改查等操作不会影响到这个快照的结果。
所有文档获取完毕之后,需要手动清理掉 scroll_id 。虽然es 会有自动清理机制,但是 srcoll_id 的存在会耗费大量的资源来保存一份当前查询结果集映像,并且会占用文件描述符。所以用完之后要及时清理。使用 es 提供的 CLEAR_API 来删除指定的 scroll_id。
3.3. search after
当我们在es搜索时用到 sort
排序,默认返回数据中每个对象都会带上 sort
值,如:
1. 全部数据
请求:
GET operation_log/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"operation_time": {
"order": "desc"
}
}
]
}
返回:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "0nPH_4MBMgWicIn2Nxwj",
"_score" : null,
"_source" : {
"ip" : "22.1.11.0",
"trace_id" : "780821e89b2dc653",
"operation_time" : "2022-10-02 12:31:10",
"module" : "资源中心",
"action_code" : "DELETE",
"location" : "资源中心->文件管理",
"object_id" : "fffff-1",
"object_name" : "《2022员工绩效打分细则》",
"operator_id" : "operator_id-3",
"operator_name" : "王五",
"operator_dept_id" : "operator_dept_id-2",
"operator_dept_name" : "人力资源部",
"changes" : [ ]
},
"sort" : [
1664713870000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1nPI_4MBMgWicIn2-hzS",
"_score" : null,
"_source" : {
"ip" : "10.0.0.0",
"trace_id" : "670021ff9a28768",
"operation_time" : "2022-10-02 09:32:00",
"module" : "资源中心",
"action_code" : "DELETE",
"location" : "资源中心->文件管理",
"object_id" : "fffff-b",
"object_name" : "《有NULL的文档》",
"operator_id" : "operator_id-b",
"operator_name" : "路人B",
"changes" : [ ]
},
"sort" : [
1664703120000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1XPI_4MBMgWicIn21xwt",
"_score" : null,
"_source" : {
"ip" : "10.0.0.0",
"trace_id" : "670021ff9a28ei6",
"operation_time" : "2022-10-02 09:31:00",
"module" : "资源中心",
"action_code" : "DELETE",
"location" : "资源中心->文件管理",
"object_id" : "fffff-a",
"object_name" : "《有空字符串的文档》",
"operator_id" : "operator_id-a",
"operator_dept_id" : "",
"operator_dept_name" : "",
"operator_name" : "路人A",
"changes" : [ ]
},
"sort" : [
1664703060000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "0XPG_4MBMgWicIn2_Bxn",
"_score" : null,
"_source" : {
"trace_id" : "990821e89a2dc653",
"operator_id" : "operator_id-2",
"ip" : "22.1.11.0",
"module" : "资源中心",
"action_code" : "UPDATE",
"changes" : [
{
"old_value" : "仅李四可查看",
"new_value" : "全员可查看",
"field_name" : "查看权限"
},
{
"old_value" : "仅李四可查看",
"new_value" : "人力资源部可查看",
"field_name" : "编辑权限"
}
],
"operator_dept_id" : "operator_dept_id-2",
"object_id" : "fffff-1",
"operator_dept_name" : "人力资源部",
"operator_name" : "李四",
"operation_time" : "2022-09-05 11:31:10",
"object_name" : "《2022员工绩效打分细则》",
"location" : "资源中心->文件管理->文件权限"
},
"sort" : [
1662377470000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1HPI_4MBMgWicIn2rhyD",
"_score" : null,
"_source" : {
"trace_id" : "670021e89a2dc655",
"operator_id" : "operator_id-2",
"ip" : "10.1.11.5",
"module" : "企业组织",
"action_code" : "DELETE",
"changes" : [ ],
"operator_dept_id" : "operator_dept_id-2",
"object_id" : "xxxxx-1",
"operator_dept_name" : "人力资源部",
"operator_name" : "李四",
"operation_time" : "2022-05-05 10:35:12",
"object_name" : "成德善",
"location" : "企业组织->员工管理->身份管理"
},
"sort" : [
1651746912000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "03PI_4MBMgWicIn2chxb",
"_score" : null,
"_source" : {
"ip" : "10.1.11.1",
"trace_id" : "670021e89a2dc7b6",
"operation_time" : "2022-05-03 09:35:10",
"module" : "企业组织",
"action_code" : "ADD",
"location" : "企业组织->员工管理->身份管理",
"object_id" : "xxxxx-2",
"object_name" : "成宝拉",
"operator_id" : "operator_id-1",
"operator_name" : "张三",
"operator_dept_id" : "operator_dept_id-1",
"operator_dept_name" : "研发中心-后端一部",
"changes" : [
{
"field_name" : "姓名",
"new_value" : "成宝拉"
},
{
"field_name" : "性别",
"new_value" : "女"
},
{
"field_name" : "手机号码",
"new_value" : "13055770002"
},
{
"field_name" : "邮箱",
"new_value" : "baola@qq.com"
}
]
},
"sort" : [
1651570510000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "0HPG_4MBMgWicIn2yxxE",
"_score" : null,
"_source" : {
"ip" : "10.1.11.1",
"trace_id" : "670021ff9a2dc6b7",
"operation_time" : "2022-05-02 09:31:18",
"module" : "企业组织",
"action_code" : "UPDATE",
"location" : "企业组织->员工管理->身份管理",
"object_id" : "xxxxx-1",
"object_name" : "成德善",
"operator_id" : "operator_id-1",
"operator_name" : "张三",
"operator_dept_id" : "operator_dept_id-1",
"operator_dept_name" : "研发中心-后端一部",
"changes" : [
{
"field_name" : "手机号码",
"old_value" : "13055660000",
"new_value" : "13055770001"
},
{
"field_name" : "姓名",
"old_value" : "成德善",
"new_value" : "成秀妍"
}
]
},
"sort" : [
1651483878000
]
}
]
}
}
可以查到一共7条数据,然后每条数据的 sort
。
2. 查 0,3 条数据
请求:
GET operation_log/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"operation_time": {
"order": "desc"
}
}
],
"from": 0,
"size": 3
}
返回:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "0nPH_4MBMgWicIn2Nxwj",
"_score" : null,
"_source" : {
"ip" : "22.1.11.0",
"trace_id" : "780821e89b2dc653",
"operation_time" : "2022-10-02 12:31:10",
"module" : "资源中心",
"action_code" : "DELETE",
"location" : "资源中心->文件管理",
"object_id" : "fffff-1",
"object_name" : "《2022员工绩效打分细则》",
"operator_id" : "operator_id-3",
"operator_name" : "王五",
"operator_dept_id" : "operator_dept_id-2",
"operator_dept_name" : "人力资源部",
"changes" : [ ]
},
"sort" : [
1664713870000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1nPI_4MBMgWicIn2-hzS",
"_score" : null,
"_source" : {
"ip" : "10.0.0.0",
"trace_id" : "670021ff9a28768",
"operation_time" : "2022-10-02 09:32:00",
"module" : "资源中心",
"action_code" : "DELETE",
"location" : "资源中心->文件管理",
"object_id" : "fffff-b",
"object_name" : "《有NULL的文档》",
"operator_id" : "operator_id-b",
"operator_name" : "路人B",
"changes" : [ ]
},
"sort" : [
1664703120000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1XPI_4MBMgWicIn21xwt",
"_score" : null,
"_source" : {
"ip" : "10.0.0.0",
"trace_id" : "670021ff9a28ei6",
"operation_time" : "2022-10-02 09:31:00",
"module" : "资源中心",
"action_code" : "DELETE",
"location" : "资源中心->文件管理",
"object_id" : "fffff-a",
"object_name" : "《有空字符串的文档》",
"operator_id" : "operator_id-a",
"operator_dept_id" : "",
"operator_dept_name" : "",
"operator_name" : "路人A",
"changes" : [ ]
},
"sort" : [
1664703060000
]
}
]
}
}
拿到最后一条(第3条)数据的 sort
值 [1664703060000]
,查询后续的3条。
3. 查 3,6 条数据
请求:
GET operation_log/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"operation_time": {
"order": "desc"
}
}
],
"from": 0,
"size": 3,
"search_after": [
1664703060000
]
}
返回:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "0XPG_4MBMgWicIn2_Bxn",
"_score" : null,
"_source" : {
"trace_id" : "990821e89a2dc653",
"operator_id" : "operator_id-2",
"ip" : "22.1.11.0",
"module" : "资源中心",
"action_code" : "UPDATE",
"changes" : [
{
"old_value" : "仅李四可查看",
"new_value" : "全员可查看",
"field_name" : "查看权限"
},
{
"old_value" : "仅李四可查看",
"new_value" : "人力资源部可查看",
"field_name" : "编辑权限"
}
],
"operator_dept_id" : "operator_dept_id-2",
"object_id" : "fffff-1",
"operator_dept_name" : "人力资源部",
"operator_name" : "李四",
"operation_time" : "2022-09-05 11:31:10",
"object_name" : "《2022员工绩效打分细则》",
"location" : "资源中心->文件管理->文件权限"
},
"sort" : [
1662377470000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1HPI_4MBMgWicIn2rhyD",
"_score" : null,
"_source" : {
"trace_id" : "670021e89a2dc655",
"operator_id" : "operator_id-2",
"ip" : "10.1.11.5",
"module" : "企业组织",
"action_code" : "DELETE",
"changes" : [ ],
"operator_dept_id" : "operator_dept_id-2",
"object_id" : "xxxxx-1",
"operator_dept_name" : "人力资源部",
"operator_name" : "李四",
"operation_time" : "2022-05-05 10:35:12",
"object_name" : "成德善",
"location" : "企业组织->员工管理->身份管理"
},
"sort" : [
1651746912000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "03PI_4MBMgWicIn2chxb",
"_score" : null,
"_source" : {
"ip" : "10.1.11.1",
"trace_id" : "670021e89a2dc7b6",
"operation_time" : "2022-05-03 09:35:10",
"module" : "企业组织",
"action_code" : "ADD",
"location" : "企业组织->员工管理->身份管理",
"object_id" : "xxxxx-2",
"object_name" : "成宝拉",
"operator_id" : "operator_id-1",
"operator_name" : "张三",
"operator_dept_id" : "operator_dept_id-1",
"operator_dept_name" : "研发中心-后端一部",
"changes" : [
{
"field_name" : "姓名",
"new_value" : "成宝拉"
},
{
"field_name" : "性别",
"new_value" : "女"
},
{
"field_name" : "手机号码",
"new_value" : "13055770002"
},
{
"field_name" : "邮箱",
"new_value" : "baola@qq.com"
}
]
},
"sort" : [
1651570510000
]
}
]
}
}
3.4. pit(search after)
使用 search_after 需要具有相同查询和排序值的多个搜索请求。 如果在这些请求之间发生刷新,结果的顺序可能会发生变化,从而导致跨页面的结果不一致。 为防止出现这种情况,您可以创建一个时间点 (PIT) 以保留搜索中的当前索引状态。
1. 创建pit
请求
POST operation_log/_pit?keep_alive=10m
返回:
{
"id" : "i9W1AwENb3BlcmF0aW9uX2xvZxZYRllDdWw4d1NFbTZDaXJhYXU1VGhnABY1SVZXdWt3Y1JEaUNndzMwNHFZNkhBAAAAAAAAACTZFmtMZ2xzTTVZVGI2V1JxaDV6dFdXMVEAARZYRllDdWw4d1NFbTZDaXJhYXU1VGhnAAA="
}
2. 带上pit去查询 0,3=
GET _search
{
"query": {
"match_all": {}
},
"sort": [
{
"operation_time": {
"order": "desc"
}
}
],
"from": 0,
"size": 3,
"pit": {
"id": "i9W1AwENb3BlcmF0aW9uX2xvZxZYRllDdWw4d1NFbTZDaXJhYXU1VGhnABY1SVZXdWt3Y1JEaUNndzMwNHFZNkhBAAAAAAAAACTZFmtMZ2xzTTVZVGI2V1JxaDV6dFdXMVEAARZYRllDdWw4d1NFbTZDaXJhYXU1VGhnAAA=",
"keep_alive": "10m"
}
}
这里要注意的是 GET _search
,并没有指定索引。因为从设计上来看,通过 pit id,就已经能找到当初创建pit的索引了。
2. 带上pit去查询 3,6=
GET _search
{
"query": {
"match_all": {}
},
"sort": [
{
"operation_time": {
"order": "desc"
}
}
],
"from": 0,
"size": 3,
"search_after": [
1664703060000
],
"pit": {
"id": "i9W1AwENb3BlcmF0aW9uX2xvZxZYRllDdWw4d1NFbTZDaXJhYXU1VGhnABY1SVZXdWt3Y1JEaUNndzMwNHFZNkhBAAAAAAAAACTZFmtMZ2xzTTVZVGI2V1JxaDV6dFdXMVEAARZYRllDdWw4d1NFbTZDaXJhYXU1VGhnAAA=",
"keep_alive": "10m"
}
}
3.5. 性能对比
分别分页获取1 - 10,49000 - 49010,99000 - 99010范围各10条数据(前提10w条),性能大致是这样:
分页方式 | 1~10 | 49000~49010 | 99000~99010 |
---|---|---|---|
from/size | 8ms | 30ms | 117ms |
scroll | 7ms | 66ms | 36ms |
search_after | 5ms | 8ms | 7ms |
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。