MySQL Index Interview Questions

Interviewer : I see that MySQL is written on your resume. Do you know the indexes of the MySQL InnoDB engine?

candidate : Well, using indexes can speed up the query speed, in fact, it is to turn disordered data into order (order can speed up retrieval)

candidate : In the InnoDB engine, the underlying data structure of the index is a B+ tree

Interviewer : Then why not use red-black tree or B-tree?

candidate : MySQL data is stored on the hard disk, and it is generally not possible to load all the data into the memory "at one time" when querying

Candidate : Red-black tree is a variant of "binary search tree", a Node node can only store one Key and one Value

Candidate : B and B+ trees are not the same as red-black trees. They are regarded as "multiple search trees". Compared with the "binary search tree", a Node node can store more information. The height of the "Road Search Tree" will be lower than that of the "Binary Search Tree".

candidate : After understanding the difference, it is actually easy to find that in the scenario where the data cannot be loaded into the memory at one time, the data needs to be retrieved. The reason for choosing B or B+ tree is very sufficient (a Node node stores information More (compared to binary search tree), the height of the tree is lower, and the height of the tree affects the speed of retrieval)

candidate : Compared with B-tree, B+ tree has two characteristics.

candidate : 1. The non-leaf nodes of the B+ tree do not store data. Under the same amount of data, the B+ tree is sturdier. (This should not need to be explained, the data is stored on the leaf nodes, and the storage of non-leaf nodes can store more indexes, so the whole tree is even sturdier)

candidate : two, B+ tree leaf nodes form a linked list, which is convenient for traversal queries (traversal operations are more common in MySQL)

Candidate : Let me explain a little bit, you can fill in the picture

candidate : Under the MySQL InnoDB engine, every time we create an index, it is equivalent to generating a B+ tree.

candidate : If the index is a "clustered (clustered) index", then the leaf node of the current B+ tree stores the "primary key and current row data"

candidate : If the index is a "non-clustered index", then the leaf node of the current B+ tree stores the "primary key and current index column value"

candidate : For example, write a sql: select * from user where id >=10, then just locate the record with id 10, and then traverse the linked list (linked list composed of leaf nodes) between the leaf nodes. Found the record in the future.

candidate : Since the B-tree also stores data in non-leaf nodes, cross-layer retrieval may be necessary when traversing, which is relatively troublesome.

candidate : Based on the level of the tree and the characteristics of the business use scenario, MySQL chose the B+ tree as the underlying data structure of the index.

candidate : For the hash structure, the InnoDB engine is actually an "adaptive" hash index (the creation of the hash index is automatically optimized and created by the InnoDB storage engine engine, and we cannot intervene)

Interviewer : Um... Then I understand, by the way, do you know what a return form is?

candidate : The so-called return table is actually, when we use the index to query data, the retrieved data may contain other columns, but the leaf nodes of the index tree can only find the current column value and the primary key ID, so we need to follow The primary key ID to check the data again, get the columns required by SQL

candidate : For example, I built an index for the order number ID, but my SQL is: select orderId,orderName from orderdetail where orderId = 123

candidate : SQL is indexed by order ID, but the leaf nodes of the index tree of order ID only have orderId and Id, and we also want to retrieve the orderName, so MySQL will get the ID and then find out the orderName and return it to us. This Back to the table

candidate : If you want to avoid returning to the table, you can also use a covering index (use it if you can, because it avoids returning to the table).

candidate : The so-called covering index is actually that the column you want to find out happens to exist on the leaf node. For example, I built a joint index of orderId and orderName. It just so happens that I need to query orderId and orderName. These data are all If it exists on the leaf nodes of the index tree, there is no need to return to the table.

Interviewer : Since you also mentioned the joint index, I would like to ask if you know the principle of leftmost matching?

candidate : Well, to illustrate this concept, it is easier to illustrate with an example

Candidate : If there is an index (a, b, c, d), and the query conditions a=1 and b=2 and c>3 and d=4, each node will hit a, b, c in turn, Unable to hit d

Candidate : Match the leftmost one first. The index can only be used to find whether the key exists (equal). When encountering range queries (>, <, between, like left matching), etc., no further matching can be made, and the subsequent degradation is linear Find

candidate : This is the leftmost matching principle

Interviewer : , I also want to ask how your primary keys are generated?

candidate : the primary key is incremented

Interviewer : Then suppose I don't use the MySQL auto-increment primary key, what do you think will be the problem?

candidate : First of all, the primary key must ensure its uniqueness and space as short as possible. These two areas need to be considered.

candidate : In addition, due to the characteristics of the index (order), if a primary key like uuid is generated, the performance of the insert is worse than the self-increment

candidate : Because the generated uuid may need to move the disk block when inserting it (for example, the space in the block is already full at the current moment, but the newly generated uuid needs to be inserted into the full block, so it needs to be moved Block data)

interviewer : OK...

This article summarizes :

Why is the B+ tree ? The data cannot be loaded into the memory at one time. The B+ tree is a multi-way search tree. Only the leaf nodes store data, and the linked lists between the leaf nodes are associated. (The tree is short and easy to traverse)
What is the return form ? Non-clustered indexes store only column values and primary key IDs in leaf nodes. Covering indexes should be used as much as possible to avoid table-back operations and improve query speed if possible.
What is the leftmost matching principle ? Start continuous matching from the leftmost as the starting point, and terminate when the range query is encountered
primary key is not auto-increment 1612ee5bc43988? Insertion efficiency is reduced, there is a data problem of moving blocks