MySQL Optimization Learning Notes (4) Single Table Access Method

This article is the foreshadowing of the introduction of the MySQL execution plan. Today, I finally figured out how to organize this part of the content. First, I will briefly introduce the implementation of the query, and then lead to the execution plan.

Overview

Our daily inquiries can basically be divided into three categories:

Single table query
subquery
connection query

These three can be combined or separated. The above order is also the order in which we learn SQL. We will introduce its implementation below, which is also in the above order. Before reading this article, it is recommended to read the first few articles of this article:

MySQL Optimization Learning Notes (1)
MySQL Optimization Learning Notes (2)
MySQL Optimization Learning Notes (3)

Of course, if you are familiar with MySQL's B+ tree index, you can skip it.

Single table access method

We wrote a single-table query statement. How does MySQL obtain the records corresponding to our query statement:

 SELECT * FROM Student WHERE ID = 1;

We ignore the steps of syntax parsing and connection establishment. These are all done, so how does MySQL locate records? There are two ways to execute a query in MySQL:

Full table scan (compare one record by one record)
There are different types of indexes for querying using indexes, so even using indexes for querying can be divided into several different situations:
- Equal query on primary key or unique secondary index
- Equivalent query against ordinary secondary indexes
- Range queries against indexed columns
- Scan the entire index directly

The way a query is executed in MySQL is called an access method or access type.

Locate records by matching primary key column equivalences - const

 SELECT * FROM Student WHERE ID = 1; ID是主键

Id is the primary key of the Student table. Let us recall the basic structure of MySQL storage data:

InnoDB divides data into several pages, and uses pages as the basic unit of interaction between disk and memory. The size of pages in InnoDB is generally 16KB.
The InnoDB storage engine will automatically create a clustered index for the primary key (if not, it will be automatically added for us), and the leaf nodes of the clustered index contain complete user records.
Each index corresponds to a B+ tree. The B+ tree is divided into several layers. The bottom layer is a leaf node, and the rest are inner nodes. All user records are stored in the leaf nodes of the B+ tree. All directory entry records are stored in the inner node
We can also build a secondary index for the columns we are interested in. The user records contained in the leaf nodes of the secondary index are composed of the index column + primary key, so if you want to find the complete user record through the secondary index, you need to return Table operation, that is, to find the complete user record in the clustered index after finding the primary key value through the secondary index.
The nodes of each layer of the B+ tree are sorted according to the order of the index column values from small to form a doubly linked list, and the records in each page (whether it is a user record or a directory entry record) are based on the value of the index column from small to small. The order of arrivals forms a singly linked list. If it is a joint index, the pages and records are first sorted according to the column in front of the joint index, and if the value of the column is the same, then they are sorted according to the column behind the joint index.
That is, the ID column is a clustered index, and it is sorted by ID, so this is quite fast. You can first locate the data page where the column of the directory entry is located, and then use binary search to locate the record in which location. Now let's add a column name to Student and create a unique index for this column, let's execute the following query:

 select  * from student where name ='aa'

It is also similar to the clustered index, but because the name column is a non-clustered index column, the leaf node does not have a complete record. After locating the record name='aa', the primary key corresponding to this record is used to go to the primary key index column. Go check the full record. Even with the cost of returning the table, MySQL developers still believe that this query method is very fast, and define this access method as: const, which is the constant level. But if you query for NULL values, the situation is different:

 select * from  student where name is null

Because the unique index does not limit the number of NULL values, the above query statement may access multiple records.

ref

Using a unique index column to find a NULL value will query multiple rows. This situation is similar to using a non-unique index column to match records. The query steps are the same as using a unique index column to find which data page this record is located on. , and then go to the specific page to match the record. Since we are select * , we need to return the table to query. If there are fewer matching records and the cost of returning to the table is relatively low, MySQL is more inclined to use the index + return table method to search, and uses the secondary index to query records of equal value, which MySQL defines as ref.

But for a secondary index column that contains multiple index columns, the ref access method may be used as long as the leftmost consecutive index column is an equivalent comparison with a constant. As for why it is possible, it is still a measure of cost. If you choose *, it will be returned to the table. If there are many records queried, MySQL feels that it is better to scan the whole table directly than index columns + return table.

Now let's add: age, sex to Student, and build a common index for age and sex. If our query statement is written as follows:

 select * from student where age = '18' and sex >  '女'

The query level of this access method in MySQL is not ref, because the range query is used for the sex column.

ref_or_null

Match against a normal index, but also look for values whose index column is null, like this:

 select * from  student where name = 'aa' and  name is  null

This query level is called ref_or_null in MySQL.

range - range query

In our daily development, range queries also appear frequently, like the following:

 select * from student where age in ('96','18') OR (age >= 28 and key <= 79);

For MySQL, there are two options for executing this query. The slow speed is naturally a full table scan. Of course, you can also use the secondary index + return table method. The above condition is a search range, in MySQL this query level is defined as range.

index

 select sex,age from student where sex = '男'

In the joint index composed of age and sex, age is in the front, and both sex and age have indexes. For MySQL, there are two options. One is to use the index on sex + return table to find out age. The other is to traverse the joint index corresponding to sex and age. This efficiency is actually higher, because the non-primary key index only stores the primary key column and the index column, the data page will be smaller, and there is no need to return to the table, and the cost is lower. This query level is defined as index in MySQL.

all

As the name suggests, scan the entire table.

How to return the form

We talked about the word return table earlier. In fact, rows have different return table strategies for different query scenarios.

Case 1, the query uses two index columns, how to return the table:

 select  *  from student where age > '50'  and name = '张三'

Both age and name are index columns. Which index should be used? The MySQL optimizer will generally use the Student's statistics to determine which condition corresponds to the secondary index that will scan fewer rows in the query. Then, the result of the query in the second level will be returned to the table to obtain the complete user record, and then the record will be filtered according to another condition. In general, an equivalent lookup needs to scan fewer ranges than a range match, assuming the query optimizer uses the name column for lookups. So for the above statement, the query steps are as follows:
According to name = 'Zhang San', go to the index corresponding to the name to find the record corresponding to the secondary index.
Then go back to the primary key corresponding to the secondary index of the table to find the corresponding complete user record in the clustered index, and then filter according to age > '50'.

Case 3: Some search conditions cannot use the index

Let's add a card field for Student and execute the following query:

 select * from student where age > 30 and card = '001'

In this case, there is no index on the card, MySQL will tend to use the index on the age and then return to the table to find the records, and then use card = '001' to filter.

What if it is or?

 select * from student where age > 30 or card = '001'

In this case, there is no way to use the index on age, because there are only index columns and primary keys on the age index, and MySQL may scan.

index merge

Under normal circumstances, MySQL will only use a single secondary index at most to execute a query. Therefore, there are generally special cases. In some special cases, multiple secondary indexes may be used in a query, and multiple secondary indexes may be used. This case of a level index is called in MySQL: index merge.

Intersection merge

In the following discussion, the separate index column on age is temporarily removed, and only the joint index of age and sex is retained.

Intersection means intersection. A simple understanding is that a query can use multiple secondary indexes to intersect the results queried in multiple secondary indexes:

 select * from  Student  where age = '18'and sex = '男' and name = '张三'

There are indexes on both age and name, so when the above statement is executed, there are two execution schemes for MySQL to choose (not to mention the full table):

Select a condition to go to the corresponding index column to find records, then go back to the table and filter with another condition.
The age index column is used to find records, and the name index column is used to find records. Since there are primary key columns stored, the two result sets are intersected.

Which access method is cheaper? In general, MySQL prefers to choose the method of reading multiple secondary indexes, because reading secondary indexes is sequential I/O, and returning tables is random I/O.

Therefore, if only one secondary index is read, the number of records that need to be returned to the table is very large, and the number of records that need to be returned to the table after reading multiple secondary indexes is very small. In this case, the loss of returning the table may be more than accessing multiple indexes. Index is higher.

The following two queries cannot be merged using the Intersection index:

 select * from Student where  age = '18' and sex = '男' and name > 'zhangsan'

Let's analyze why we can't use index merging to find intersection in this situation. The reason is the problem of time complexity, age is range matching (age and sex establish a joint index,), the scanned primary key columns may not be ordered, ordered collections The time complexity of finding the intersection is O(n), and the time complexity of finding the intersection of unordered sets is O(n^2).

For the same reason, the following query cannot use index merging:

 select * from Student where  age = '18'  and name =  'zhangsan'

The joint index age is the same, sorted by sex, but the primary key of the record obtained only from the age column may still be out of order, so index merging cannot be used.

Primary key columns can be range-matched queries:

 select  * from Student where id > 1 and name = '张三'

Didn't the above say that range matching cannot be used for index merging? But the index stores the index column and primary key, we can even think that this only uses the name column, and then return to the table. If the cost of returning to the table using only the name column is greater than the cost of merging the primary key column and the name column using the index column, MySQL may prefer to use the id column and the name column for index merging.

Union Merger

If there is an intersection, there is a union. This is a pair of twins. MySQL may only use Union index merging under certain specific college entrance examinations last year:

Case 1: The secondary index column is an equal-value match. For a joint index, each column in the joint index must be an equal-value match, and only part of the columns cannot be matched.

 select * from student where name = 'a' or (age = '18' and sex = '男')

The following two queries cannot perform Union indexing

 select * from student where name > 'a' or (age = '18' and sex = '男')
select * from student where name = 'a' or age = '18'

The reason is still related to sorting, the two sets are sorted.

Case 2: The primary key column can be a range match
Case 3: Search criteria combined using the Intersection index

It can be understood as the union of two intersections.

Sort-Union merge

The usage conditions of Union index merging are a bit harsh, and it may be used only under the condition that each secondary index column is matched by equal value.

 select * from Student  where name > 'a' and age < '25'

The above SQL statement cannot use Union sorting, because the primary key values found from the two columns of name and age are not sorted, but if the sorting cost is not high, MySQL is not expensive. will execute as follows:

First take the value according to the condition name > 'a', and then sort according to the primary key
Then take the value according to age < '25', and then sort according to the primary key.

The rest of the operations after the sequence is completed is the same as the union index merging method.

Considerations for Index Merging

Let's first add a normal index to the card property:

 select * from student where name = '张三' and card = '001'

The reason why this query may use index merging is that name and card are two indexes, and if the two columns are an index, they will not be merged with MySQL. This way you don't have to read the B+ tree.

write at the end

In fact, this article is also to look at the reference materials of this article, and use my own ideas to sort out and make study notes.

References

How MySQL Works: Understanding MySQL from the Roots