B-tree and B+tree, a structure born for data query

B-tree

Introduction

B-tree (balanced multiple search tree) is self-balancing tree , which maintains sorted data. For the binary tree and other self-balancing trees, please previous article 161a0d3593b0f1 Red-Black Tree .

A tree of order \( m \) satisfies the following properties,

Each node has at most \( m \) child nodes.
If the root is not a leaf node, the root has at least two child nodes.
Every non-leaf node (except the root) has at least \( {\frac{m}{2}} \) child nodes.
A non-leaf node with \( k \) child nodes contains \( k-1 \) keys.
All leaf nodes have the same height.

The key of each non-internal node serves as a separator value that separates its subtrees. For example, the following is a fragment of a 5-level tree. The internal node contains two keys [7, 16], so it has three child nodes. The key of the leftmost child node satisfies less than 7, and the key of the middle child node satisfies greater than 7 less than 16. The key of the rightmost child node is greater than 16.

internal node : internal nodes are all nodes except leaf nodes and root nodes.

B-tree

scene

B-tree is red-black tree application scenarios. This data structure was invented to handle large amounts of data. It is optimized for reading and writing large data systems and is often used in databases and file systems.

red-black tree commonly used in applications, because the amount of data it processes generally does not exceed the capacity of the main memory (RAM). In the database scenario, the amount of data is GB, TB level, the data is stored on the disk, and each operation needs to access the disk to read the data.

The time of accessing data in computer storage hardware from fast to slow is as follows:

register
CPU cache (L1, L2, and L3)
Main memory (RAM)
Disk Drive (Solid State Disk)
Disk drive (disk)

执行典型指令   　　　　　　　　　　   1/1000000000 秒 = 1纳秒
从一级缓存中读取数据 　　　 　　　　   0.5 纳秒
从二级缓存中读取数据 　　　　　　　    7 纳秒
从主内存中读取数据 　　  　　　　        100 纳秒 
从新的磁盘位置读取数据 　　          8000000 纳秒 = 8毫秒

As can be seen from the above, the main memory access time is at the nanosecond level, and the disk access time is at the millisecond level. This means that if a program reads from disk, it will be 100,000 times slower than reading from main memory.

So B-tree optimization purposes is reduce disk access , through the following ways:

To reduce the tree height, use the multi-tree structure , allow a single node to store more keys .
Data is read in blocks, so that more data can be read at a time. Generally speaking, the node capacity is equal to the disk block size.

1 second = 1000 milliseconds = 1000000 microseconds = 1000000000 nanoseconds

Self-balancing

自平衡

split tree node

The following is a 6th-order B-tree (m=6). The maximum number of keys a node can have is 5. When the number 6 is inserted, the left node reaches the maximum key, and the node of the tree needs to be split.

btree-insert

Insert the new number 6 steps as follows:

First find the median of the node to be 3;
Create a new leaf node and copy all keys after the median of 3 to the new node;
Move the median 3 up and insert it into the appropriate position of the parent node;
Add a pointer from the parent node to the new node after the median of 3;
Add the new number 6 to the correct position of the new node.

merge two nodes of the tree

After deleting the key, if the number of node keys is less than the minimum number of keys, the nodes need to be merged. The minimum number of keys a node in the tree below can have is 2.

btree-del

Delete leaf node key 1:

Find 1 delete;
Delete the left pointer of 3, copy 2 to the [4, 5] node, and move 3 down;
After 3 moves down, there is only one key 6, and the parent node continues to move down until it is balanced.

btree-del

Delete internal node 20:

Find 20 to delete, select the maximum value of 19 in the left child node at position 20 to replace;
Delete the left pointer at position 19;
The 17 key moves down to the [15,16] node, and 18 is appended to the back.

B+tree

B+tree is an optimized version of B-tree, used for database indexing. There are two main differences between B+tree and B-tree:

B+tree non-leaf nodes only store keys, while all B-tree nodes can store key values;
The values corresponding to the B+tree keys are all stored in the leaf nodes and linked together through a linked list.

The following figure shows the situation where B+tree stores the key value, the key [1-7] corresponds to the value [d1-d7].

B+tree

MySQL storage engine B+tree in InnoDB

When MySQL creates a table, it generates an .ibd file. This ibd file is a fully functional space . Each space is divided into multiple pages (Page), and each page is assigned a 32-bit integer page number, and the size of each page is usually 16kB.

Each node in the B+tree index has the size of one page. The data types of leaf nodes and non-leaf nodes

Leaf nodes contain key and next record pointer .

B_Tree_Simplified_Leaf_Page

Non-leaf nodes contain subpages page and pointed least key subpages .

B_Tree_Simplified_Non_Leaf_Page

Between nodes of the same level, each node a pointer to the previous page and next page 161a0d3593b7c0, forming a doubly linked list of pages of the same level.

In summary, the index structure is as follows. Pages at the same level form two-way links, and ascending links are recorded in each page. Non-leaf pages contain sub-page "pointers" (sub-page numbers).

Regarding the efficiency of B+tree, you can check the following table

Tree height	Non-leaf pages	Leaf page	Rows	size
1	0	1	468	16.0 KB
2	1	1203	> 563,000	18.8 MB
3	1204	1447209	> 677 million	22.1 GB
4	1448413	1740992427	> 814 billion	25.9 TB

Most of the table index height is 1-3 levels, so generally only 1-3 disk IO operations can be completed. The above figure describes the clustered index (primary key).

The B+Tree index in the database is divided into clustered index (clustered index) and secondary index (secondary index), the leaf page of the clustered index is the page containing the entire row of data, the leaf page of the secondary index stores the corresponding The primary key of the row.

Use a clustered index query to directly obtain the entire row of data.
When using the secondary index query, first query the primary key value, and then find the complete row data in the clustered index through the primary key.

summary

B-tree is an optimized data structure for large data volume scenarios, designed to reduce the number of disk accesses (reducing tree height).

The well-known version of B-tree, B+tree, allows non-leaf nodes to store only keys, which takes up less space, allowing each layer of nodes to index more data, effectively controlling the height of the tree.

MYSQL's InnoDB uses B+tree as an index, and the tree height is maintained at levels 1-4 under normal circumstances.

B-tree and B+tree, a structure born for data query

B-tree

Introduction

scene

Self-balancing

B+tree

summary

编程码农

引用和评论

微信公众号的AI时代：使用扣子机器人连接大模型

Open WebUI：开源AI交互平台的全面解析

被 Manus 带火的 MCP 是什么｜一文看懂

MySQL × 向量数据库：大模型时代的黄金组合实战指南

金融行业MySQL索引实战：交易系统的最左原则深度优化

百万级群聊的设计实践

MySQL Binlog 深度解析