本文旨在通过日志解析 OceanBase 的冻结转储流程,以其冻结检查线程为切入点,以租户(1002)的线程名为例。

作者:陈慧明,爱可生测试工程师,主要参与 DMP 和 DBLE 自动化测试项目。

爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。

本文共 3200 字,预计阅读需要 10 分钟。

以下内容基于版本:5.7.25 OceanBase_CE 4.2.0.0 (r100000152023080109-8024d8ff45c45cf7c62a548752b985648a5795c3)

基本流程如下:

点击放大

T1002_Occam

1.1 线程介绍

冻结检查线程每 2 秒执行一次检查,一旦需要进行冻结操作,会生成一个检查点任务,并由冻结线程负责处理。可以通 过在日志中检索 “tenant freeze timer task” 来验证该线程是否正常运行。

1.2 日志流程

当需要进行冻结操作时,系统会记录日志输出:“[TenantFreezer] A minor freeze is needed”。触发条件为租户的 active_memstore_used_ 超过了 memstore_freeze_trigger 阈值。在触发后,系统会遍历租户日志流,生成并提交相应的冻结任务到冻结线程中。

succeed to start ls_freeze_task(ret=0, ls_id={id:xxx})

T1002_LSFreeze

2.1 线程介绍

该线程的主要职责是将满足刷盘条件的冻结检查点从 new_create_list 流转至 prepare_list。在执行这一过程中,它会依据 road_to_flush 方法和 ready_for_flush_ 方法所定义的判断条件进行操作。这些条件包括检查 memtablerec_scn 是否处于冻结状态以及是否存在回放引用等因素。

注意:每当初始化一个 memtable 后,会将与之关联的冻结检查点注册到一个名为 new_create_list 的双向链表中。这一过程的具体实现可以在 ObTabletMemtableMgr::create_memtable() 方法中找到。

2.2 日志流程

通过日志记录的信息并不会详细展示流程的所有细节,但可以通过以下信息来判断流程是否正常执行,"road_to_flush end" 也标志着冻结流程完成。

[2023-08-18 06:44:51.285827] INFO  [STORAGE] road_to_flush (ob_data_checkpoint.cpp:333) [1553][T1002_LSFreeze1][T1002][Y0-0000000000000000-0-0] [lt=7] [Freezer] road_to_flush begin(ls_->get_ls_id()={id:1001})
[2023-08-18 06:44:51.285846] INFO  [STORAGE] road_to_flush (ob_data_checkpoint.cpp:341) [1553][T1002_LSFreeze1][T1002][Y0-0000000000000000-0-0] [lt=16] [Freezer] new_create_list to ls_frozen_list success(ls_->get_ls_id()={id:1001})
[2023-08-18 06:44:51.285861] INFO  [STORAGE] road_to_flush (ob_data_checkpoint.cpp:345) [1553][T1002_LSFreeze1][T1002][Y0-0000000000000000-0-0] [lt=3] [Freezer] ls_frozen_list to active_list success(ls_->get_ls_id()={id:1001})
[2023-08-18 06:44:51.285867] INFO  [STORAGE] road_to_flush (ob_data_checkpoint.cpp:355) [1553][T1002_LSFreeze1][T1002][Y0-0000000000000000-0-0] [lt=6] [Freezer] active_list to ls_frozen_list success(ls_->get_ls_id()={id:1001})
[2023-08-18 06:44:51.337395] INFO  [STORAGE] road_to_flush (ob_data_checkpoint.cpp:358) [1553][T1002_LSFreeze1][T1002][Y0-0000000000000000-0-0] [lt=16] [Freezer] road_to_flush end(ls_->get_ls_id()={id:1001})

T1002_Flush

3.1 线程介绍

Flush 线程每 5 秒运行一次,其运行状态可以通过日志信息 “traversal_flush timer task” 来标识。该线程的主要任务是遍历 prepare_list 中的检查点对象,并生成相应的 ObTabletMiniMergeDag 对象作为 DAG 任务执行。

3.2 日志流程

转储的执行对象为数据分片(Tablet),每次转储操作可能涉及多个数据分片。以下以数据分片 ID 为 200001 的数据分片为例来描述流程:

首先,针对数据分片 ID 为 200001,创建并添加相应的 DAG(有向无环图)至任务队列中。

[2023-08-18 06:44:51.335124] INFO  [COMMON] inner_add_dag (ob_dag_scheduler.cpp:3377) [1655][T1002_Flush][T1002][Y0-0000000000000000-0-0] [lt=29] add dag success(dag=0x7fa95f358b20, start_time=0, id=Y0-0000000000000000-0-0, dag->hash()=7887337314793470841, dag_cnt=23, dag_type_cnts=22)
[2023-08-18 06:44:51.335132] INFO  [COMMON] create_and_add_dag (ob_dag_scheduler.h:1119) [1655][T1002_Flush][T1002][Y0-0000000000000000-0-0] [lt=3] success to create and add dag(ret=0, dag=0x7fa95f358b20)

如果 DAG 创建成功,会记录相应的成功标志,即日志中会出现 “schedule tablet merge dag successfully”。同时,该 DAG 的任务类型会标记为 “MINI_MERGE”。

[2023-08-18 06:44:51.335134] INFO  [STORAGE.TRANS] flush (ob_memtable.cpp:2095) [1655][T1002_Flush][T1002][Y0-0000000000000000-0-0] [lt=2] schedule tablet merge dag successfully(ret=0, param={merge_type:"MINI_MERGE", merge_version:0, ls_id:{id:1001}, tablet_id:{id:200001}, report_:null, for_diagnose:false,
...
recommend_snapshot_version:{val:18446744073709551615, v:3}})

T1002_DagSchedu

4.1 线程介绍

根据 DAG 队列中的任务类型,系统会相应地创建对应的线程来执行任务。在这个过程中,会创建一个名为 “T1002_MINI_MERGE” 的线程来执行转储任务。同时,会创建第一个任务,即 ObTabletMergePrepareTask,这个任务的执行最终会触发生成另外两个任务:ObTabletMergeTask 和 ObTabletMergeFinishTask。

4.2 日志流程

在 “T1002_DagScheduler” 线程中,通过 tablet_id 可以筛选出对应的日志。可以找到类型为 “DAG_MINI_MERGE” 的记录,并记录下对应的 task_id (YB427F000001-0006032C0D448715-0-0)。

[2023-08-18 06:44:51.420180] INFO  [SERVER] add_task (ob_sys_task_stat.cpp:142) [1597][T1002_DagSchedu][T1002][Y0-0000000000000000-0-0] [lt=9] succeed to add sys task(task={start_time:1692341091420175, task_id:YB427F000001-0006032C0D448715-0-0, task_type:3, svr_ip:"127.0.0.1:2882", tenant_id:1002, is_cancel:false, comment:"info="DAG_MINI_MERGE";ls_id=1001;tablet_id=200001;compaction_scn=0;extra_info="merge_type="MINI_MERGE"";"})

在线程 “T1002_DagScheduler” 中,通过筛选任务标识 task_id,可以明确看到整个 DAG 任务的调度过程,总计调度了 3 个任务。

[2023-08-18 06:44:51.420180] INFO  [SERVER] add_task (ob_sys_task_stat.cpp:142) [1597][T1002_DagSchedu][T1002][Y0-0000000000000000-0-0] [lt=9] succeed to add sys task(task={start_time:1692341091420175, task_id:YB427F000001-0006032C0D448715-0-0, task_type:3, svr_ip:"127.0.0.1:2882", tenant_id:1002, is_cancel:false, comment:"info="DAG_MINI_MERGE";ls_id=1001;tablet_id=200001;compaction_scn=0;extra_info="merge_type="MINI_MERGE"";"})
[2023-08-18 06:44:51.420192] INFO  [COMMON] schedule_one (ob_dag_scheduler.cpp:2997) [1597][T1002_DagSchedu][T1002][YB427F000001-0006032C0D448715-0-0] [lt=12] schedule one task(task=0x7fa9264c8080, priority="PRIO_COMPACTION_HIGH", group id=0, total_running_task_cnt=6, running_task_cnts_[priority]=6, low_limits_[priority]=6, up_limits_[priority]=6, task->get_dag()->get_dag_net()=NULL)
[2023-08-18 06:44:51.421879] INFO  [COMMON] schedule_one (ob_dag_scheduler.cpp:2997) [1597][T1002_DagSchedu][T1002][YB427F000001-0006032C0D448715-0-0] [lt=8] schedule one task(task=0x7fa9264c81b0, priority="PRIO_COMPACTION_HIGH", group id=0, total_running_task_cnt=6, running_task_cnts_[priority]=6, low_limits_[priority]=6, up_limits_[priority]=6, task->get_dag()->get_dag_net()=NULL)
[2023-08-18 06:44:51.876070] INFO  [COMMON] schedule_one (ob_dag_scheduler.cpp:2997) [1597][T1002_DagSchedu][T1002][YB427F000001-0006032C0D448715-0-0] [lt=16] schedule one task(task=0x7fa9264c8390, priority="PRIO_COMPACTION_HIGH", group id=0, total_running_task_cnt=6, running_task_cnts_[priority]=6, low_limits_[priority]=6, up_limits_[priority]=6, task->get_dag()->get_dag_net()=NULL)

T1002_MINI_MERG

5.1 线程介绍

这个线程主要负责执行在 “T1002_DagScheduler” 中调度的任务。

5.2 日志流程

从完整日志中筛选出对应的任务标识 task_id,我们可以清楚地看到总共进行了 3 个任务调度。这里将日志分成了以下 3 个部分。

5.2.1 ObTabletMergePrepareTask

Prepare 任务:主要涉及一些初始化工作和检查项,为后续的任务做准备。

[2023-08-18 06:44:51.420180] INFO  [SERVER] add_task (ob_sys_task_stat.cpp:142) [1597][T1002_DagSchedu][T1002][Y0-0000000000000000-0-0] [lt=9] succeed to add sys task(task={start_time:1692341091420175, task_id:YB427F000001-0006032C0D448715-0-0, task_type:3, svr_ip:"127.0.0.1:2882", tenant_id:1002, is_cancel:false, comment:"info="DAG_MINI_MERGE";ls_id=1001;tablet_id=200001;compaction_scn=0;extra_info="merge_type="MINI_MERGE"";"})
[2023-08-18 06:44:51.420192] INFO  [COMMON] schedule_one (ob_dag_scheduler.cpp:2997) [1597][T1002_DagSchedu][T1002][YB427F000001-0006032C0D448715-0-0] [lt=12] schedule one task(task=0x7fa9264c8080, priority="PRIO_COMPACTION_HIGH", group id=0, total_running_task_cnt=6, running_task_cnts_[priority]=6, low_limits_[priority]=6, up_limits_[priority]=6, task->get_dag()->get_dag_net()=NULL)
...
[2023-08-18 06:44:51.421833] INFO  [STORAGE.COMPACTION] process (ob_tablet_merge_task.cpp:976) [1561][T1002_MINI_MERG][T1002][YB427F000001-0006032C0D448715-0-0] [lt=20] succeed to init merge ctx(task={this:0x7fa9264c8080, type:15, status:2, dag:{ObIDag:{this:0x7fa95f358b20, type:0, name:"MINI_MERGE", id:YB427F000001-0006032C0D448715-0-0, dag_ret:0, dag_status:2, start_time:1692341091420191, running_task_cnt:1, indegree:0, consumer_group_id:0, hash:7887337314793470841}, param:{merge_type:"MINI_MERGE", merge_version:0, ls_id:{id:1001}, tablet_id:{id:200001}, report_:null, for_diagnose:false, is_tenant_major_merge:false, need_swap_tablet_flag:false}, compat_mode:0, ctx:{sstable_version_range:{multi_version_start:1, base_version:0, snapshot_version:1692341091113671451}, scn_range:{start_scn:{val:1, v:0}, end_scn:{val:1692341091275445526, v:0}}}}})

5.2.2 ObTabletMergeTask

Merge 任务:该任务的重点在于写入宏块,将多版本的记录融合成一条记录,以实现数据的整理和合并。

[2023-08-18 06:44:51.421879] INFO  [COMMON] schedule_one (ob_dag_scheduler.cpp:2997) [1597][T1002_DagSchedu][T1002][YB427F000001-0006032C0D448715-0-0] [lt=8] schedule one task(task=0x7fa9264c81b0, priority="PRIO_COMPACTION_HIGH", group id=0, total_running_task_cnt=6, running_task_cnts_[priority]=6, low_limits_[priority]=6, up_limits_[priority]=6, task->get_dag()->get_dag_net()=NULL)
...
...
[2023-08-18 06:44:51.875958] INFO  [STORAGE.COMPACTION] process (ob_tablet_merge_task.cpp:1555) [1595][T1002_MINI_MERG][T1002][YB427F000001-0006032C0D448715-0-0] [lt=25] merge macro blocks ok(idx_=0, task={this:0x7fa9264c81b0, type:1, status:2, dag:{ObIDag:{this:0x7fa95f358b20, type:0, name:"MINI_MERGE", id:YB427F000001-0006032C0D448715-0-0, dag_ret:0, dag_status:2, start_time:1692341091420191, running_task_cnt:1, indegree:0, consumer_group_id:0, hash:7887337314793470841}, param:{merge_type:"MINI_MERGE", merge_version:0, ls_id:{id:1001}, tablet_id:{id:200001}, report_:null, for_diagnose:false, is_tenant_major_merge:false, need_swap_tablet_flag:false}, compat_mode:0, ctx:{sstable_version_range:{multi_version_start:1, base_version:0, snapshot_version:1692341091113671451}, scn_range:{start_scn:{val:1, v:0}, end_scn:{val:1692341091275445526, v:0}}}}})

5.2.3 ObTabletMergeFinishTask

Finish 任务:主要负责生成新的 MINI SSTable 并释放相关 MemTable。

 [2023-08-18 06:44:51.876070] INFO  [COMMON] schedule_one (ob_dag_scheduler.cpp:2997) [1597][T1002_DagSchedu][T1002][YB427F000001-0006032C0D448715-0-0] [lt=16] schedule one task(task=0x7fa9264c8390, priority="PRIO_COMPACTION_HIGH", group id=0, total_running_task_cnt=6, running_task_cnts_[priority]=6, low_limits_[priority]=6, up_limits_[priority]=6, task->get_dag()->get_dag_net()=NULL)
...

[2023-08-18 06:44:51.876907] INFO  [STORAGE.COMPACTION] create_sstable (ob_tablet_merge_ctx.cpp:344) [1589][T1002_MINI_MERG][T1002][YB427F000001-0006032C0D448715-0-0] [lt=50] succeed to merge sstable(param={table_key:{tablet_id:{id:200001}, column_group_idx:0, table_type:"MINI", scn_range:{start_scn:{val:1, v:0}, end_scn:{val:1692341091275445526, v:0}}}, sstable_logic_seq:0, schema_version:1692341087064224, ...
...

[2023-08-18 06:44:51.889896] INFO  [STORAGE] release_memtables (ob_i_memtable_mgr.cpp:164) [1589][T1002_MINI_MERG][T1002][YB427F000001-0006032C0D448715-0-0] [lt=6] succeed to release memtable(ret=0, i=1, scn={val:1692341091275445526, v:0})
[2023-08-18 06:44:51.889938] INFO  [STORAGE.COMPACTION] process (ob_tablet_merge_task.cpp:1209) [1589][T1002_MINI_MERG][T1002][YB427F000001-0006032C0D448715-0-0] [lt=12] sstable merge finish(ret=0, merge_info={is_inited:true, sstable_merge_info:{tenant_id:1002, ls_id:{id:1001}, tablet_id:{id:200001}, compaction_scn:1692341091275445526, merge_type:"MINI_MERGE", merge_cost_time:454652, merge_start_time:1692341091421154, merge_finish_time:1692341091875806, dag_id:YB427F000001-0006032C0D448715-0-0, occupy_size:63203471, new_flush_occupy_size:63203471, original_size:75545791, compressed_size:62951855, macro_block_count:31, multiplexed_macro_block_count:0, new_micro_count_in_new_macro:3823, multiplexed_micro_count_in_new_macro:0, total_row_count:333312, incremental_row_count:333312,
...

最终,DAG 任务执行完毕后,相关任务会被清除,标志着数据冻结和转储流程的成功执行。

[2023-08-18 06:44:51.890015] INFO  [COMMON] finish_dag_ (ob_dag_scheduler.cpp:2563) [1589][T1002_MINI_MERG][T1002][YB427F000001-0006032C0D448715-0-0] [lt=19] dag finished(dag_ret=0, runtime=469823, dag_cnt=9, dag_cnts_[dag.get_type()]=9, &dag=0x7fa95f358b20, dag={ObIDag:{this:0x7fa95f358b20, type:0, name:"MINI_MERGE", id:YB427F000001-0006032C0D448715-0-0, dag_ret:0, dag_status:3, start_time:1692341091420191, running_task_cnt:0, indegree:0, consumer_group_id:0, hash:7887337314793470841}, param:{merge_type:"MINI_MERGE", merge_version:0, ls_id:{id:1001}, tablet_id:{id:200001}, report_:null, for_diagnose:false, is_tenant_major_merge:false, need_swap_tablet_flag:false}, compat_mode:0, ctx:{sstable_version_range:{multi_version_start:1, base_version:0, snapshot_version:1692341091113671451}, scn_range:{start_scn:{val:1, v:0}, end_scn:{val:1692341091275445526, v:0}}}})
[2023-08-18 06:44:51.890035] INFO  [SERVER] del_task (ob_sys_task_stat.cpp:171) [1589][T1002_MINI_MERG][T1002][YB427F000001-0006032C0D448715-0-0] [lt=18] succeed to del sys task(removed_task={start_time:1692341091420175, task_id:YB427F000001-0006032C0D448715-0-0, task_type:3, svr_ip:"127.0.0.1:2882", tenant_id:1002, is_cancel:false, comment:"info="DAG_MINI_MERGE";ls_id=1001;tablet_id=200001;compaction_scn=0;extra_info="merge_type="MINI_MERGE"";"})

爱可生开源社区
426 声望207 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。