Previously, I took you to read the seventh chapter of the source code: "Understanding the Principles of Database Index Implementation in One Article" tried to briefly introduce the index construction process of OceanBase from the perspective of code reading, and lead you to understand the relevant code of index construction.

In this issue of "Source Code Interpretation", OceanBase development engineer Ke Qing brings you "transaction log submission and playback". The log ( clog ) of OceanBase is similar to the REDO log of traditional databases. This module is responsible for persisting transaction data when the transaction is committed, and implements a distributed consistency protocol based on Multi_Paxos.

The design concept of the log module

Compared with the log module of traditional relational database, OceanBase's log service mainly has the following challenges:

  • The traditional active-standby synchronization mechanism is replaced by Multi_Paxos, thereby achieving high system availability and high data reliability.
  • It needs to support global deployment, and the network delay between multiple replicas reaches tens or even hundreds of ms.
  • As a distributed database, OceanBase uses Partition as the basic unit of data synchronization. A single machine is required to support partitions of the order of 100,000. On this basis, it is necessary to support efficient writing and reading of log data.
  • At the same time, any function itself needs to consider the batching of operations to speed up execution. The log service maintains the member list, leader and other status information of the replica, and needs to provide efficient support for the rich functions of the distributed system.

journal of life

The log module of OceanBase implements a standard set of Multi_Paxos, which ensures that all successfully submitted data can be recovered without a permanent majority failure.

At the same time, out-of-order log submission is implemented to ensure that there is no dependency between transactions. Before the introduction of OceanBase's engineering implementation of Multi_Paxos below, we assume that readers have already understood the core idea of Multi_Paxos. If you want to know more, welcome to learn about Multi_Paxos in the community Q&A area.

Taking a transaction log of a partition as an example, the normal process is roughly as follows:

image.png

1. The transaction layer calls the log_service->submit_log() interface to submit the log (which carries the on_succ_cb callback pointer).

2. The clog first assigns log_id and submit_timestamp to it, then submits it to the sliding window, generates a new log_task, submits it to the disk locally, and synchronizes the log to the follower through RPC.

3. When the local disk writing is completed, call log_service->flush_cb(), update the log_task status, mark the successful local persistence, and the follower will return ack to the leader after the successful disk writing.

4. The leader receives ack and updates the ack_list of log_task.

5. The leader will count the majority in steps 4 and 5. Once the majority is reached, the log_task->on_success callback transaction will be called in sequence, and the confirmed_info message will be sent to the follower, and the log will be slid out of the sliding window.

6. After the follower receives the confirmed_info message, try to slide out the log from the sliding window, and submit the log for playback during the slide-out action.

On the follower, the submitted logs will be played back in real time. The playback process is roughly as follows:

image.png

When the follower slides out of the log, it will push up the corresponding value of the submit_log_task of the recording playback site in each partition. This task will be asynchronously submitted to a global thread pool for consumption.

When the global thread pool consumes the submit_log_task of a certain partition, it will read all the logs to be played back recorded in the disk and allocate it to the corresponding four playback task queues of the partition. The allocation method is mainly based on the trans_id hash, Make sure that the log of the same transaction is assigned to a queue, and the queue assigned to the new task will asynchronously submit the queue task_queue as a task to the global thread pool mentioned in 1 for consumption.

When the global thread pool consumes task_queue, it will traverse all subtasks in the queue in turn, and execute the corresponding application logic according to the log type. At this point, a log is truly synchronized to the follower, and it is also readable on the follower.

After learning the source code interpretation of this article, I hope you have a corresponding understanding of "transaction log submission and playback". In the next issue, we will bring you the relevant content of the OceanBase storage layer code interpretation, so stay tuned!


OceanBase技术站
22 声望122 粉丝

海量记录,笔笔算数