Interviewer: How does MySQL execute a query?

For a development engineer, I think it is very necessary to understand how MySQL executes a query statement.

First of all, we need to understand what the architecture of MYSQL looks like? Then let's talk about the execution flow of a query statement?

MYSQL Architecture

First look at an architecture diagram, as follows:

Module details

Connector : used to support the interaction of various languages and SQL, such as PHP, Python, Java JDBC;
Management Serveices & Utilities : System management and control tools, including backup and recovery, MySQL replication, clustering, etc.;
Connection Pool : connection pool, manage resources that need to be buffered, including user password permission threads, etc.;
SQL Interface : used to receive the user's SQL command and return the query result required by the user;
Parser : used to parse SQL statements;
Optimizer : query optimizer;
Cache and Buffer : Query cache, in addition to row record cache, there are table cache, key cache, permission cache, etc.;
Pluggable Storage Engines : A plug-in storage engine, which provides APIs for the service layer to use to deal with specific files.

Architecture layering

Divide MySQL into three layers, the connection layer that connects with the client, the service layer that actually performs operations, and the storage engine layer that deals with hardware.

connection layer

To connect our client to MySQL server port 3306, we must establish a connection with the server, then manage all connections, verify the identity and permissions of the client, these functions are completed in the connection layer.

Service Layer

The connection layer will hand over the SQL statement to the service layer, which contains a series of processes:

For example, the judgment of query cache, calling the corresponding interface according to SQL, and lexical and grammatical analysis of our SQL statements (such as how to identify keywords, how to identify aliases, whether there are errors in grammar, etc.).

Then there is the optimizer. The bottom layer of MySQL will optimize our SQL statement according to certain rules, and finally hand it over to the executor for execution.

Storage Engine

The storage engine is where our data is really stored, and different storage engines are supported in MySQL. Next is memory or disk.

Execution process of SQL

Take a query statement as an example, let's see what the workflow of MySQL looks like.

select name from user where id=1 and age>20;

First, let's look at a picture, and the next process is based on this picture:

connect

For a program or tool to operate a database, the first step is to establish a connection with the database.

There are two kinds of connections in the database:

Short connection: A short connection is closed immediately after the operation is completed.
Long connection: The long connection can be kept open, reducing the consumption of the server to create and release the connection, and the connection can be used when the program accesses it later.

It is more troublesome to establish a connection. First, you need to send a request. After sending the request, you need to verify the account password. After the verification, you need to see the permissions you have. Therefore, during the use process, try to use a long connection.

Keeping a long connection consumes memory. After a long period of inactivity, the MySQL server will be disconnected. You can use the sql statement to view the default time:

show global variables like 'wait_timeout';

This time is controlled by wait_timeout, the default is 28800 seconds, 8 hours.

query cache

MySQL has a built-in cache module. After executing the same query we found that the cache is not working, why? MySQL's cache is turned off by default.

show variables like 'query_cache%';

The default closed means that it is not recommended to use, why is MySQL not recommended to use its own cache?

Mainly because the application scenarios of MySQL's own cache are limited:

The first one is that it requires that the SQL statements must be exactly the same, with one more space in the middle, and different uppercase and lowercase letters are considered to be different SQLs.

The second is that when any piece of data in the table changes, all caches of this table will be invalidated, so it is not suitable for applications with a large number of data updates.

Therefore, the cache is still handed over to the ORM framework (for example, MyBatis enables the first-level cache by default), or an independent cache service, such as Redis, is more suitable for processing.

In MySQL 8.0, the query cache has been removed.

Parsing and Preprocessing

Why can an SQL statement be recognized? If you randomly execute a string hello, the server reports a 1064 error:

[Err] 1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'hello' at line 1

This is MySQL's parser and preprocessing module.

The main thing to do in this step is to perform lexical and syntactic analysis and semantic analysis on the statement based on SQL grammar.

Lexical Analysis

Lexical analysis is to break a complete SQL statement into individual words.

For example, a simple SQL statement: select name from user where id = 1 and age >20;

It will identify select , which is a query statement. Next, it will identify user . You want to do a query in this table, and then identify the conditions behind where . I need to find these contents. .

Syntax Analysis

The syntax analysis will perform some syntax checks on the SQL, such as whether the single quotes are closed, and then generate a data structure according to the SQL statement according to the syntax rules defined by MySQL. We call this data structure a parse tree (select_lex).

For example, the English grammar "I use is, you use are", if it is wrong, it is definitely not possible. After the grammar analysis, it is found that your SQL statement does not conform to the rules, and you will receive an error prompt of You hava an error in your SQL syntax .

Preprocessor

If you write a lexically and grammatically correct SQL, but the table name or field does not exist, where will the error be reported? Is it at the execution layer of the database or the parser? for example:
select * from hello;

It still reports an error during parsing. There is a preprocessor in the part of parsing SQL. It examines the resulting parse tree for semantics that the parser cannot resolve. For example, it checks the existence of table and column names, checks names and aliases, and ensures that there is no ambiguity. After preprocessing, a new parse tree is obtained.

query optimizer

Is there only one way to execute an SQL statement? Or is the final SQL executed by the database the same as the SQL we sent?

The answer to this is no. A SQL statement can be executed in many ways, and ultimately return the same result, they are equivalent. But if there are so many ways of implementation, how are these implementations obtained? Which one do you choose to execute in the end? On what criteria to choose?

This is the MySQL query optimizer module (Optimizer). The purpose of the query optimizer is to generate different execution plans (Execution Plan) according to the parse tree, and then select an optimal execution plan. MySQL uses a cost-based optimizer, which has the smallest execution plan overhead. Use whichever.

The cost of the query can be viewed with this command:

show status like 'Last_query_cost';

What optimization types can MySQL's optimizer handle?

Take two simple examples:

1. When we perform an associated query on multiple tables, which table data is used as the benchmark table.

2. When there are multiple indexes available, choose which index.

In fact, for every kind of database, the modules of the optimizer are essential, and they achieve the goal of optimizing query efficiency as much as possible through complex algorithms. However, the optimizer is not omnipotent. It does not automatically optimize even the junk SQL statements, and it does not always select the optimal execution plan. You should pay attention when writing SQL statements.

Implementation plan

The optimizer will eventually turn the parse tree into an execution plan (execution_plans), which is a data structure. Of course, this execution plan is not necessarily the optimal execution plan, because MySQL may not cover all execution plans.

How can we view the execution plan of MySQL? For example, if multiple tables are related to query, which table should be queried first? What indexes might be used when executing the query, and what indexes are actually used?

MySQL provides a tool for executing plans. We can see the information of the execution plan by adding EXPLAIN in front of the SQL statement.

EXPLAIN select name from user where id=1;

storage engine

Before introducing the storage engine, let's ask two questions:

1. From a logical point of view, where is our data placed, or what structure is it placed in?
2. Where is the execution plan executed? Who will do it?

introduction to storage

In relational databases, data is stored in tables. We can understand this table as an Excel spreadsheet. Therefore, while storing data in our table, we also need to organize the storage structure of the data. This storage structure is determined by our storage engine, so we can also call the storage engine a table type.

In MySQL, multiple storage engines are supported, and they can be replaced, so they are called plug-in storage engines. Why support so many storage engines? Is one not enough?

In MySQL, each table can specify its storage engine, instead of a database that can only use one storage engine. The usage of the storage engine is in units of tables. Also, the storage engine can be modified after the table is created.

How to choose a storage engine?

If the requirements for data consistency are relatively high and transaction support is required, InnoDB can be selected.
If there are more data queries and fewer updates, and the query performance is relatively high, MyISAM can be selected.
If you need a temporary table for querying, you can choose Memory.
If all storage engines cannot meet your needs and the technical capabilities are sufficient, you can develop a storage engine in C language according to the internal manual of the official website. ( https://dev.mysql.com/doc/internals/en/custom-engine.html )

execution engine

Who uses the execution plan to operate the storage engine? This is the execution engine (executor), which uses the corresponding API provided by the storage engine to complete the operation.

Why did we modify the storage engine of the table without any changes to the way it operates? Because the APIs implemented by storage engines with different functions are the same.

Finally, return the data to the client, even if there is no result.

Chestnut

Take the above sql statement as an example, and then sort out the entire sql execution process.

select name from user where id = 1 and age >20;

Query whether the role of the current executor has permissions through the connector, and make a query. If there is, continue to go down, if not, it will be rejected, and an error message of Access denied for user will be reported;
The next step is to query the cache. First, see if there is any in the cache. If there is, then there is no need to go down and return the result directly to the client; if there is no cache, then execute the parser and preprocessing module. (The MySQL 8.0 version directly deletes the entire function of the query cache)
The syntax parser and preprocessing are mainly to analyze whether the lexical and syntax of the sql statement are correct. If there is no problem, it will proceed to the next step and come to the query optimizer;
The query optimizer will perform some optimizations on the sql statement, depending on which method is the most cost-effective, which sql statement will be executed. The above sql has two optimization schemes:
- First query the name of the person whose id is 1 in the user table, and then look for the person whose age is greater than 20 years old.
- First query the table user who is older than 20 years old, and then find the id 1 from it.
After the optimizer decides which option to choose, the execution engine executes it. Then return the result to the client.

Epilogue

If the article is a little helpful to you, I still hope that after reading it, move your little finger, Like, follow and bookmark .