MySQL&#39;s common mistakes in design specifications

Relying on the development of the Internet, we can use the fragmented time of waiting for the bus or taking the subway to learn and understand information anytime and anywhere. At the same time, the developed Internet also facilitates people to quickly share their knowledge and discuss together with friends with the same hobbies and needs.

However, too convenient sharing also makes knowledge diversified, and it is easy for people to receive wrong information. Most of these errors are caused by the rapid development of technology and no free time to update the published content in time. In order to avoid misunderstandings for those who will learn later, let's take a look at a few common examples of errors in the MySQL design specification today.

The design of the primary key

Wrong design specification: primary key is recommended to use self-incrementing ID value, do not use UUID, MD5, HASH, string as the primary key

This design specification can be seen in many articles. The advantages of auto-incrementing the primary key are small space, orderly, and easy to use.

Let's first look at the disadvantages of auto-incrementing primary keys:

Since self-increment is generated on the server side, a self-incremental AI lock is required. If there are a large number of insert requests at this time, there may be a performance bottleneck caused by self-increment, so there are concurrent performance problems;
Self-value-added as the primary key, it can only be guaranteed unique in the current instance, cannot guarantee global uniqueness, which makes it impossible to use in a distributed architecture;
Disclosure of data values is easy to cause security problems. If our product ID is an auto-incrementing primary key, users can modify the ID value to obtain products. In severe cases, we can know how many products coexist in our database.
MGR (MySQL Group Replication) may cause performance problems;

Since self-increment is a value generated on the MySQL server, it needs to be protected by an auto-increment AI lock. If there are a large number of insert requests at this time, there may be a performance bottleneck caused by self-increment. For example, in the MySQL database, the parameter innodb_autoinc_lock_mode is used to control the holding time of the self-increasing lock. Although we can adjust the parameter innodb_autoinc_lock_mode to obtain the maximum performance of auto-increment, there are other problems due to it. Therefore, is more recommended to use UUID as the primary key or custom generated primary key for business in concurrent scenarios.

We can directly use the UUID() function in MySQ L to get the value of UUID.

MySQL> select UUID();
+--------------------------------------+
| UUID()                               |
+--------------------------------------+
| 23ebaa88-ce89-11eb-b431-0242ac110002 |
+--------------------------------------+
1 row in set (0.00 sec)

needs to pay special attention to that when storing time, UUID is stored in reverse order according to the time bit. is the low time low bit stored at the top, and the high time bit is at the end, that is, the first 4 bytes of UUID will change with time. Changes and constant "random" changes, not monotonically increasing. The non-random value will generate discrete IO when inserted, resulting in a performance bottleneck. This is also the biggest drawback of UUID compared to self-value added.

To solve this problem, MySQL 8.0 introduced the function UUID_TO_BIN, which can convert UUID strings:

Put the time high in the first place through the parameter, which solves the problem of disorder when inserting UUID;
Removed the useless string "-" to streamline the storage space;
The string is converted into a binary value and stored, and the space is finally shortened from the previous 36 bytes to 16 bytes.

Below we convert the previous UUID string 23ebaa88-ce89-11eb-b431-0242ac110002 through the function UUID_TO_BIN, and the binary value is as follows:

MySQL> SELECT UUID_TO_BIN('23ebaa88-ce89-11eb-b431-0242ac110002',TRUE) as UUID_BIN;
+------------------------------------+
| UUID_BIN                           |
+------------------------------------+
| 0x11EBCE8923EBAA88B4310242AC110002 |
+------------------------------------+
1 row in set (0.01 sec)

In addition, MySQL 8.0 also provides the function BIN_TO_UUID, which supports the inversion of binary values into UUID strings.

Although there is no function UUID_TO_BIN/BIN_TO_UUID before MySQL 8.0, it can be solved by custom function. In the application layer, you can write corresponding functions according to your own programming language.

Of course, many students are also worried about the performance of UUID and the space occupied by storage. Here I also did a related insert performance test. The results are shown in the following table:

As you can see, the sorting UUID provided by MySQL 8.0 has the best performance, even better than the self-incrementing ID. In addition, since the result of UUID_TO_BIN conversion is 16 bytes, it is only 8 bytes more than the auto-increment ID, and the space occupied by the final storage is only 3G more than the auto-increment.

And because UUID can guarantee global uniqueness, the benefit of using UUID is far greater than self-incrementing ID. Maybe you are used to using auto-increment as the primary key, but in a concurrent scenario, it is more recommended to use a globally unique value such as UUID as the primary key.

Of course, UUID is good, but in a distributed scenario, the primary key still needs to add some additional information, so as to ensure the query efficiency of the subsequent secondary index, it is recommended to generate the primary key according to the business definition. However, when the amount of concurrency and data volume is not so large, it is still recommended to use self-incrementing UUID. Don't even think that UUID can be used as a primary key.

Design of financial fields

Wrong design specification: financial-related amount data must use the decimal type because float and double are both imprecise floating-point number types, and decimal is a precise floating-point number type. Therefore, the decimal type is generally used in the design of user balance, commodity price and other financial fields, which can be accurate to the point.

However, in the design standards for mass Internet services, the DECIMAL type is not recommended, but it is more recommended to convert DECIMAL to an integer type. other words, the financial type is more recommended to use sub-unit storage instead of meta-unit storage. For example, 1 yuan is stored in the database with an integer type of 100.

The following are the advantages of the bigint type:

Decimal is an encoding method implemented by binary, which is not as efficient as bigint
If you use bigint, the field is a fixed-length field, and storage is efficient, while decimal is determined by the defined width. In data design, fixed-length storage performance is better
Use bigint to store the amount divided into units, and you can also store the amount of gigabytes, which is completely enough

Use of enumerated fields

Wrong design specification: avoid using ENUM type

In the previous development projects, when encountering fields such as user gender, whether the product is on the shelf, whether the comment is hidden, etc., the fields are simply designed as tinyint, and then 0 why status and 1 why status are noted in the field.

The problems with this design are also more obvious:

Unclear expression: This table may be designed by other colleagues. If you are not particularly impressed, you need to read the field comments every time, and sometimes you need to go to the database to confirm the meaning of the fields when coding.
Dirty data: Although the inserted value can be restricted by code at the application layer, the value can still be modified through SQL and visualization tools

For this fixed option value field, it is recommended to use the ENUM enumerated string type, plus the strict mode of SQL_MODE

In MySQL 8.0.16 and later versions, the check constraint mechanism can be used directly, without the need to use enum enumeration field types

And we generally use single characters such as "Y" and "N" when defining enumeration values, which will not take up a lot of space. However, if the option value is not fixed, it may increase with the development of the business, and it is not recommended to use the enumerated field.

Index number limit

Wrong design specification: limits the number of indexes on each table, the index of a table cannot exceed 5

There is no limit on the number of indexes of MySQL single table, business queries have specific needs, just create them, don’t be superstitious about the number limit

Use of subqueries

Wrong design specification: avoid using subquery

In fact, this specification is correct for the old version of MySQL, because the previous version of MySQL database has limited optimization of sub-queries, so in many OLTP business situations, we require that online businesses do not use sub-queries as much as possible.

However, in MySQL 8.0 version, the optimization of subqueries has been greatly improved, so you can use subqueries with confidence in the new version of MySQL.

Subqueries are easier to understand than JOIN. For example, we now want to check the number of students who have not posted articles in 2020

SELECT COUNT(*)
FROM user
WHERE id not in (
    SELECT user_id
    from blog
    where publish_time >= "2020-01-01" AND  publish_time <= "2020-12-31"
)

As you can see, the logic of the subquery is very clear: who are the users who query the article table through not IN.

If you write with left join

SELECT count(*)
FROM user LEFT JOIN blog
ON user.id = blog.user_id and blog.publish_time >= "2020-01-01" and blog.publish_time <= "2020-12-31"
where blog.user_id is NULL;

It can be found that although LEFT JOIN can also fulfill the above requirements, it is not easy to understand.

We use explain to view the execution plan of the two sql and found that they are all the same

From the above figure, it is obvious that whether it is a subquery or a LEFT JOIN, it is eventually converted to a left hash Join, so the execution time of the above two SQLs is the same. That is, in MySQL 8.0, the optimizer will automatically optimize the IN subquery to the best JOIN execution plan, which will significantly improve performance.

to sum up

After reading the previous content, I believe that everyone has a new understanding of MySQL. These common mistakes can be summarized as follows:

UUID can also be used as the primary key. Self-incrementing UUID has better performance than self-incrementing primary key, and the extra space occupied is negligible.
In addition to decimal for financial fields, you can also try bigint to store data divided into units
For fields with fixed option values, MySQL8 recommended the use of enumerated fields before, and MySQL8 will use check function constraints in the future. Do not use 0, 1, or 2 to indicate
There is no limit to the number of indexes of a table, which can not exceed 5, and can be added and deleted according to business conditions
MySQL8 has optimized sub-queries and can be used with confidence.

MySQL's common mistakes in design specifications

The design of the primary key

Design of financial fields

Use of enumerated fields

Index number limit

Use of subqueries

to sum up

Recommended reading

云叔_又拍云

引用和评论

3 分钟了解 NVIDIA 新出的 H200

手写一个动态海洋和天空效果的vue hooks

你可能不知道的图片加载相关知识

使用CSS给标题添加书名号并超出省略

原生electron起步-从零到一完成构建和打包

Koa+Typescript起手式(空环境) 不用每次玩node都要搭环境了！

LRU算法，你别跑，我就要吃透你

MySQL&#39;s common mistakes in design specifications

The design of the primary key

Design of financial fields

Use of enumerated fields

Index number limit

Use of subqueries

to sum up

Recommended reading

云叔_又拍云

引用和评论

3 分钟了解 NVIDIA 新出的 H200

手写一个动态海洋和天空效果的vue hooks

你可能不知道的图片加载相关知识

使用CSS给标题添加书名号并超出省略

原生electron起步-从零到一完成构建和打包

Koa+Typescript起手式(空环境) 不用每次玩node都要搭环境了！

LRU算法，你别跑，我就要吃透你

MySQL's common mistakes in design specifications