B-trees and B+ trees

[TOC]

Reference: B tree, B+ tree detailed explanation

foreword

B+ tree is a storage structure that is commonly used in database indexing.

Preparation Knowledge

Tree of order m: the maximum number of branches in the tree is m, that is, the maximum number of child nodes is m;

root node: a node without a parent node;

Leaf node: a node without child nodes;

Internal nodes: nodes that are not root or leaf nodes;

Binary search tree: the value in the left subtree is smaller , and the value in the right subtree is larger ;

Balanced binary tree: a special case of binary search tree, the height of the is the same as

1. B-tree

The B number is a balanced to fork tree, an m-order B-tree with the following characteristics:

internal node must have at least ceil(m/2) child nodes;
When the root node is not a leaf node, it must have at least 2 child nodes, that is, order 2;
m-order node contains m-1 data ;
All leaf nodes have the same height;

B树结构

2. B+ tree

A B+ tree is a B-tree variant with some changes to its rules. Change as follows:

A leaf node consists of an ordered array and a pointer to a leaf node to its right;
Non-leaf nodes of an ordered array of, but the array element consists of a index a pointer composition;
1. Pointer: points to a leaf node;
2. Index value: the smallest index value in the leaf node pointed to;
Non-leaf nodes are tool nodes used to quickly find the specified leaf nodes. Only leaf nodes store real data (a row of data);
The leaves nodes similar to an ordered linked list;
The node of order m contains m data;

B+树结构

Note: The leaf node of the mysql B+ tree is a double-linked list, and the tail leaf node also points to the head leaf node.

2.1 Why are B+ trees suitable for databases?

The B+ tree facilitates range queries, which is the main thing.

You only need to find the leftmost range, and after finding it, traverse the leaf nodes to the right until you reach the end of the right range, so that all data in the range is filtered out.

The range search of the B-tree uses in-order traversal, while the B+ tree uses the traversal on the linked list;

The disk read and write costs of B+ trees are lower.

The minimum storage unit of Mysql is a page, and each access to data is a page-by-page load (accessing a page is an IO). Mysql's index tree is naturally built in pages, that is, a node is a page. Since the B+ tree only stores index data in non-leaf nodes, reading a page of data, that is, reading it into the memory at one time, requires more keywords to be searched, and relatively speaking, the number of IO reads and writes is reduced;

B+ tree query efficiency is more stable

Since non-leaf nodes are not nodes that ultimately point to the contents of the file, any keyword search must follow a path from the root node to the leaf node. The path lengths of all keyword queries are the same, resulting in the same query efficiency for each data;

B-trees and B+ trees