1

A fan who has worked for 8 years sent me a private message with a question.

He said that this question was asked when he went to Ali for an interview. He searched a lot of information and couldn't figure it out. He hoped that I could help him answer it.

The question is: "Why does Mysql use B+Tree as the index structure"

On this question, look at the answers of ordinary people and experts.

Ordinary people:

The characteristic of B+ number is that compared with B number, its non-leaf node does not store data, and all data exists in leaf nodes.

Compared with the number of B, the number of queries and IO times will be more stable.

Expert:

I will answer this question in several ways.

First of all, conventional database storage engines generally use B-trees or B+ trees to store indexes.

Because the B-tree is a multi-way balanced tree, using this storage structure to store a large amount of data, its overall height will be much shorter than that of a binary tree.

For databases, all data must be stored on disk, and the efficiency of disk IO is actually very low, especially in the case of random disk IO.

Therefore, the height of the tree can determine the number of disk IOs. The less the number of disk IOs, the greater the performance improvement, which is why the B-tree is used as the index storage structure.

image-20220422124736684

But in Mysql's InnoDB storage engine, it uses an enhanced B-tree structure, that is, B+ tree, as the storage structure for indexes and data.

Compared with the B-tree structure, the B+ tree has been optimized in several aspects.

  1. All data of the B+ tree is stored in leaf nodes, and non-leaf nodes only store indexes.
  2. The data in the leaf nodes are associated using a doubly linked list.

image-20220422125222216

I think there are several reasons for using a B+ tree to implement an index.

  1. The non-leaf nodes of the B+ tree do not store data, so the number of indexes that can be stored in each layer will increase, which means that the B+ tree can store more data than the B tree with the same layer height, resulting in fewer disk IOs.
  2. In Mysql, range query is a relatively common operation, and all the data stored in the leaf nodes of the B+ tree is associated with a doubly linked list, so when querying, you only need to search two nodes for traversal, while the B tree requires Get all nodes, so B+ trees are more efficient on range queries.
  3. In terms of data retrieval, since all data is stored in leaf nodes, the IO times of the B+ tree will be more stable.
  4. Because the leaf nodes store all the data, the global scan capability of the B+ tree is stronger because it only needs to scan the leaf nodes. But B-tree needs to traverse the whole tree.

In addition, based on the structure of the B+ tree, if self-incrementing integer data is used as the primary key, it can better avoid the problem of a large number of operations caused by leaf node splitting when adding data.

In general, I think the selection of technical solutions is more about solving specific problems in the current scenario, not necessarily to say that B+ tree is the best choice, just like the B tree structure used in MongoDB, in essence In fact, it is the difference between relational databases and non-relational databases.

The above is my understanding of the problem.

Summarize

For the question of "why choose xx technology", it is actually a good answer.

As long as you know enough about the characteristics of the technology itself, you will naturally know why it is designed this way.

Just like, in business development, we know when to use List and when to use Map. The reason is the same.

If you have any interview questions, career development questions, or study questions, you can PM me.

file

Copyright notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated. Please indicate the source for Mic带你学架构 !
If this article is helpful to you, please help to follow and like, your persistence is the driving force for my continuous creation. Welcome to follow the WeChat public account of the same name to get more technical dry goods!

跟着Mic学架构
810 声望1.1k 粉丝

《Spring Cloud Alibaba 微服务原理与实战》、《Java并发编程深度理解及实战》作者。 咕泡教育联合创始人,12年开发架构经验,对分布式微服务、高并发领域有非常丰富的实战经验。