Overview of MySQL Architecture -> Query Execution Process -> SQL Parsing Sequence

Preface

I have always wanted to know how a SQL statement is executed, and what is the order in which it is executed, and then check and summarize the information of all parties, and there is the following blog post.

This article will discuss the knowledge from MySQL overall structure -> query execution process -> statement execution sequence.

Overview of MySQL architecture

The structure is best to look at the picture, with the necessary explanatory text.

The following picture is based on a picture in the reference book as the original, and then I have added my own understanding to it.

From the above figure, we can see that the entire architecture is divided into two layers. The upper layer is MySQLD's called'SQL Layer', and the lower layer is a variety of storage engines that provide interfaces to the upper layer, called'Storage Engine Layer' '. The functions of other modules and components can be easily understood from the names, so I won’t repeat them here.

Query execution process

Let me move forward a bit further, let me talk about the process of query execution based on my own knowledge:

connect

The client initiates a Query request and listens to the client's "connection management module" to receive the request;
Forward the request to the'connection into/thread module';
Call the'user module' to perform authorization check;
After passing the check, the'connection in/thread module' takes out the idle cached connection thread from the'thread connection pool' to connect with the client request, and if it fails, it creates a new connection request.

deal with

Query the cache first, check whether the Query statement is completely matched, and then check whether it has permission, and if it succeeds, it will directly fetch the data and return;
If there is a failure in the previous step, it will be transferred to the'command parser', after lexical analysis, the parse tree is generated after grammatical analysis;
Next is the preprocessing stage, which deals with semantics that the parser cannot solve, checks permissions, etc., and generates a new parse tree;
Then transfer it to the corresponding module for processing;
If it is a SELECT query, a lot of optimization will be done through the "query optimizer" to generate an execution plan;
After the module receives the request, it checks whether the connected user has the permission to access the target table and target field through the "access control module";
If there is, call the "table management module", first check whether there is in the table cache, if there is, directly correspond to the table and acquire the lock, otherwise reopen the table file;
According to the meta data of the table, obtain information such as the storage engine type of the table, and call the corresponding storage engine processing through the interface;
When the data changes in the above process, if the log function is turned on, it will be recorded in the corresponding binary log file.

result

After the Query request is completed, the result set is returned to the'connect/thread module';
The returned status can also be the corresponding status identifier, such as success or failure;
The'connection into/thread module' performs subsequent clean-up work, and continues to wait for requests or disconnects from the client.

SQL parsing order

Next step, let us look at the past and present of a SQL statement.

First look at the example statement:

SELECT DISTINCT
    < select_list >
FROM
    < left_table > < join_type >
JOIN < right_table > ON < join_condition >
WHERE
    < where_condition >
GROUP BY
    < group_by_list >
HAVING
    < having_condition >
ORDER BY
    < order_by_condition >
LIMIT < limit_number >

However, its execution order is as follows:

FROM <left_table>
ON <join_condition>
<join_type> JOIN <right_table>
WHERE <where_condition>
GROUP BY <group_by_list>
HAVING <having_condition>
SELECT 
DISTINCT <select_list>
ORDER BY <order_by_condition>
LIMIT <limit_number>

Although I didn't expect it to be like this, it was natural and harmonious at first glance. Where to get it, constant filtering conditions, you have to choose the same or different ones, sort them out, and then you know the first few ones to take.

That being the case, let us take a look at the details step by step.

Ready to work

Create a test database

create database testQuery

Create test table

CREATE TABLE table1
(
    uid VARCHAR(10) NOT NULL,
    name VARCHAR(10) NOT NULL,
    PRIMARY KEY(uid)
)ENGINE=INNODB DEFAULT CHARSET=UTF8;

CREATE TABLE table2
(
    oid INT NOT NULL auto_increment,
    uid VARCHAR(10),
    PRIMARY KEY(oid)
)ENGINE=INNODB DEFAULT CHARSET=UTF8;

Insert data

INSERT INTO table1(uid,name) VALUES('aaa','mike'),('bbb','jack'),('ccc','mike'),('ddd','mike');
INSERT INTO table2(uid) VALUES('aaa'),('aaa'),('bbb'),('bbb'),('bbb'),('ccc'),(NULL);

The final desired result

SELECT
    a.uid,
    count(b.oid) AS total
FROM
    table1 AS a
LEFT JOIN table2 AS b ON a.uid = b.uid
WHERE
    a. NAME = 'mike'
GROUP BY
    a.uid
HAVING
    count(b.oid) < 2
ORDER BY
    total DESC
LIMIT 1;

Now start the journey of SQL parsing!

FROM

When multiple tables are involved, the output of the left table will be used as the input of the right table, and then a virtual table VT1 will be generated.

(1-J1) Cartesian product

Calculate the Cartesian product (CROSS JOIN) of two related tables to generate a virtual table VT1-J1.

mysql> select * from table1,table2;
+-----+------+-----+------+
| uid | name | oid | uid  |
+-----+------+-----+------+
| aaa | mike |   1 | aaa  |
| bbb | jack |   1 | aaa  |
| ccc | mike |   1 | aaa  |
| ddd | mike |   1 | aaa  |
| aaa | mike |   2 | aaa  |
| bbb | jack |   2 | aaa  |
| ccc | mike |   2 | aaa  |
| ddd | mike |   2 | aaa  |
| aaa | mike |   3 | bbb  |
| bbb | jack |   3 | bbb  |
| ccc | mike |   3 | bbb  |
| ddd | mike |   3 | bbb  |
| aaa | mike |   4 | bbb  |
| bbb | jack |   4 | bbb  |
| ccc | mike |   4 | bbb  |
| ddd | mike |   4 | bbb  |
| aaa | mike |   5 | bbb  |
| bbb | jack |   5 | bbb  |
| ccc | mike |   5 | bbb  |
| ddd | mike |   5 | bbb  |
| aaa | mike |   6 | ccc  |
| bbb | jack |   6 | ccc  |
| ccc | mike |   6 | ccc  |
| ddd | mike |   6 | ccc  |
| aaa | mike |   7 | NULL |
| bbb | jack |   7 | NULL |
| ccc | mike |   7 | NULL |
| ddd | mike |   7 | NULL |
+-----+------+-----+------+
28 rows in set (0.00 sec)

(1-J2) ON filter

Filter based on the virtual table VT1-J1, filter out all the columns that meet the ON predicate condition, and generate the virtual table VT1-J2.

Note: Because of grammatical restrictions,'WHERE' is used instead, and readers can also feel the subtle relationship between the two.

mysql> SELECT
    -> *
    -> FROM
    -> table1,
    -> table2
    -> WHERE
    -> table1.uid = table2.uid
    -> ;
+-----+------+-----+------+
| uid | name | oid | uid  |
+-----+------+-----+------+
| aaa | mike |   1 | aaa  |
| aaa | mike |   2 | aaa  |
| bbb | jack |   3 | bbb  |
| bbb | jack |   4 | bbb  |
| bbb | jack |   5 | bbb  |
| ccc | mike |   6 | ccc  |
+-----+------+-----+------+
6 rows in set (0.00 sec)

(1-J3) Add external column

If outer joins (LEFT, RIGHT, FULL) are used, the columns in the main table (reserved table) that do not meet the ON conditions will also be added to VT1-J2 as external rows to generate virtual tables VT1-J3.

mysql> SELECT
    -> *
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid;
+-----+------+------+------+
| uid | name | oid  | uid  |
+-----+------+------+------+
| aaa | mike |    1 | aaa  |
| aaa | mike |    2 | aaa  |
| bbb | jack |    3 | bbb  |
| bbb | jack |    4 | bbb  |
| bbb | jack |    5 | bbb  |
| ccc | mike |    6 | ccc  |
| ddd | mike | NULL | NULL |
+-----+------+------+------+
7 rows in set (0.00 sec)

Below, I found a very vivid explanatory picture about "SQL JOINS" from the Internet. If it violates your rights, please let me know and delete it, thank you.

WHERE

The temporary table generated in the VT1 process is filtered, and the columns satisfying the WHERE clause are inserted into the VT2 table.

Note: At this time, because of grouping, aggregation operations cannot be used; aliases created in SELECT cannot also be used;

The difference with ON:

If there are external columns, ON will filter the related table, and the main table (reserved table) will return all the columns;
If no external column is added, the effect of the two is the same.

application:

The filtering of the main table should be placed in WHERE;
For related tables, use ON for conditional query first and then connect, and use WHERE for conditional query first after connection.

mysql> SELECT
    -> *
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike';
+-----+------+------+------+
| uid | name | oid  | uid  |
+-----+------+------+------+
| aaa | mike |    1 | aaa  |
| aaa | mike |    2 | aaa  |
| ccc | mike |    6 | ccc  |
| ddd | mike | NULL | NULL |
+-----+------+------+------+
4 rows in set (0.00 sec)

GROUP BY

This clause will group the tables generated in VT2 according to the columns in the GROUP BY to generate the VT3 table.

Note: For subsequent processing statements, such as SELECT and HAVING, the columns used must be included in the GROUP BY. For those that do not appear, aggregate functions must be used;

Reason: GROUP BY changes the reference to the table and converts it to a new reference method. The number of columns that can be used for the next level of logical operations will be reduced.

My understanding is: according to the grouping field, the records with the same grouping field are merged into one record, because each grouping can only return one record, unless it is filtered out, there may be multiple fields that are not in the grouping field Value, multiple values cannot be put into a record, so these multi-value columns must be converted into single values through aggregation functions;

mysql> SELECT
    -> *
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike'
    -> GROUP BY
    -> a.uid;
+-----+------+------+------+
| uid | name | oid  | uid  |
+-----+------+------+------+
| aaa | mike |    1 | aaa  |
| ccc | mike |    6 | ccc  |
| ddd | mike | NULL | NULL |
+-----+------+------+------+
3 rows in set (0.00 sec)

HAVING

This clause filters different groups in the VT3 table and only affects the grouped data. The clauses that meet the HAVING condition are added to the VT4 table.

mysql> SELECT
    -> *
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike'
    -> GROUP BY
    -> a.uid
    -> HAVING
    -> count(b.oid) < 2;
+-----+------+------+------+
| uid | name | oid  | uid  |
+-----+------+------+------+
| ccc | mike |    6 | ccc  |
| ddd | mike | NULL | NULL |
+-----+------+------+------+
2 rows in set (0.00 sec)

SELECT

This clause processes the elements in the SELECT clause to generate the VT5 table.

(5-J1) Calculation expression Calculate the expression in the SELECT clause to generate VT5-J1.

(5-J2) DISTINCT

Look for duplicate columns in VT5-1 and delete them to generate VT5-J2.

If the DISTINCT clause is specified in the query, a temporary memory table will be created (if it cannot be stored in the internal storage, it needs to be stored on the hard disk). The table structure of this temporary table is the same as the virtual table VT5 generated in the previous step. The difference is that a unique index is added to the column for the DISTINCT operation to remove duplicate data.

mysql> SELECT
    -> a.uid,
    -> count(b.oid) AS total
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike'
    -> GROUP BY
    -> a.uid
    -> HAVING
    -> count(b.oid) < 2;
+-----+-------+
| uid | total |
+-----+-------+
| ccc |     1 |
| ddd |     0 |
+-----+-------+
2 rows in set (0.00 sec)

ORDER BY

From the tables in VT5-J2, sort the results according to the conditions of the ORDER BY clause to generate the VT6 table.

Note: The only place where aliases in SELECT can be used.

mysql> SELECT
    -> a.uid,
    -> count(b.oid) AS total
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike'
    -> GROUP BY
    -> a.uid
    -> HAVING
    -> count(b.oid) < 2
    -> ORDER BY
    -> total DESC;
+-----+-------+
| uid | total |
+-----+-------+
| ccc |     1 |
| ddd |     0 |
+-----+-------+
2 rows in set (0.00 sec)

LIMIT

The LIMIT clause selects the specified row of data from the specified position from the VT6 virtual table obtained in the previous step.

Notice:

The effect of the positive and negative of offset and rows;
When the offset is large, the efficiency is very low, you can do this;
Optimized by sub-query, first obtain the largest id from the index in the sub-query, then sort in reverse order, and then take the N row result set;
Using INNER JOIN optimization, the JOIN clause also first obtains the ID list from the index, and then directly correlate the query to obtain the final result.

mysql> SELECT
    -> a.uid,
    -> count(b.oid) AS total
    -> FROM
    -> table1 AS a
    -> LEFT JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike'
    -> GROUP BY
    -> a.uid
    -> HAVING
    -> count(b.oid) < 2
    -> ORDER BY
    -> total DESC
    -> LIMIT 1;
+-----+-------+
| uid | total |
+-----+-------+
| ccc |     1 |
+-----+-------+
1 row in set (0.00 sec)

At this point, the SQL parsing journey is over. The above figure summarizes:

end

Well, this time of in-depth understanding tour here is almost really over, although it is not very in-depth, it is just a few things that put it together, referring to some books that have been read before, the master's pen is really true. Different. And in the process, I got a lot of things. The most important thing is to further realize that the world of computer software is magnificent~

Transfer from: AnnsShadoW
cnblogs.com/annsshadow/p/5037667.html

Overview of MySQL Architecture -> Query Execution Process -> SQL Parsing Sequence

Preface

Overview of MySQL architecture

Query execution process

connect

deal with

result

SQL parsing order

Ready to work

Now start the journey of SQL parsing!

end

民工哥

引用和评论

早知道有这么个吊炸天的 CI&CD 工具，我就不用 Jenkins 了！

70k star，取代Postman！这款轻量级API工具，太香了！

MySQL慢查询日志：性能优化的终极指南

C++ 中 VS 项目引入公共配置文件

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

大数据从业者必知必会的Hive SQL调优技巧

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份