Issue 33: Standardized Design of MySQL Tables

Whether the relational table design is reasonable or not is one of the core elements that affect the performance of relational databases.

When it comes to relational database table design, the first thing that comes to mind is paradigm theory. That is to say, the design of a table must first meet a certain paradigm, and then use the anti-paradigm design according to certain requirements after it is finished, that is, redundant backup design.

The database paradigm generally contains six, namely 1NF, 2NF, 3NF, BCNF, 4NF, 5NF. These six paradigm levels start from the two key points of whether the data allows a certain range of redundancy and whether the data is more refined management. The later, the more standardized the data, the smaller the redundancy, and the stronger the readability.

Generally speaking, 3NF or BCNF is enough, or further, 4NF is enough, 5NF is more academic.

Suppose again that we are very clear about various dependencies (partial functional dependence, full functional dependence, transfer function dependence, multi-value dependence, connection dependence, etc.). Below I use the classic employee table and student table to illustrate the step-by-step optimization of each paradigm.

1NF: That is, the attribute is atomic and cannot be split.

The minimum requirement on how the data is stored is to keep the attributes (fields or columns) of the relational table atomic and cannot be split again.

1NF uses the employee table as an example, the table structure is as follows:

(debian-ytt1:3500)|(ytt)>desc employee;
+-----------------+-------------+------+-----+---------+-------+
| Field           | Type        | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| employee_number | varchar(64) | YES  |     | NULL    |       |
| employee_name   | varchar(64) | YES  |     | NULL    |       |
| salary          | json        | YES  |     | NULL    |       |
| dept            | varchar(64) | YES  |     | NULL    |       |
| dept_desc       | text        | YES  |     | NULL    |       |
+-----------------+-------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

The employee table has five fields, namely employee_number (employee number), employee_name (employee name), salary (employee salary), dept (department), dept_desc (department description information).

The data type of the employee salary field is JSON, which is used to store array or dictionary data. Obviously, the employee salary field is not atomic, and the employee table does not conform to 1NF. Insert a record, more intuitive:

(debian-ytt1:3500)|(ytt)>select * from employee\G
*************************** 1. row ***************************
employee_number: 202010001
  employee_name: 小王
         salary: {"base_salary": 10000, "extra_salary": 100000}
           dept: 财务部
      dept_desc: NULL
1 row in set (0.00 sec)

As can be seen from the above results, the employee salary field contains two attributes: basic salary and additional salary. Then change this field to two, and the structure of the employee table after the transformation is as follows:


(debian-ytt1:3500)|(ytt)>desc employee;
+-----------------+-------------+------+-----+---------+-------+
| Field           | Type        | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| employee_number | varchar(64) | YES  |     | NULL    |       |
| employee_name   | varchar(64) | YES  |     | NULL    |       |
| base_salary     | varchar(30) | YES  |     | NULL    |       |
| extra_salary    | varchar(30) | YES  |     | NULL    |       |
| dept            | varchar(64) | YES  |     | NULL    |       |
| dept_desc       | text        | YES  |     | NULL    |       |
+-----------------+-------------+------+-----+---------+-------+
6 rows in set (0.00 sec)

The employee salary field is split into two: basic salary and additional salary. At this time, each field of the employee table is atomic, cannot be split, and conforms to 1NF.

However, the 1NF-compliant table is still too redundant. For example, if there are 10 employees in a department, there will be many duplicate records in the two fields of dept and dept_desc. can be understood in this way, except 1NF is used to ensure the atomicity of the column, all the later more advanced paradigms are to ensure the atomicity of the table in a specific scenario.

2NF: Non-primary key fields must be fully functionally dependent on primary key fields, and there is no partial functional dependence of non-primary key fields on primary key fields.

2NF is a further step in data specification than 1NF, reducing more redundancy.

2NF uses the student table as a demonstration. The structure of the student table is as follows:


(debian-ytt1:3500)|(ytt)>desc `学生表` ;
+--------------------+------------------+------+-----+---------+-------+
| Field              | Type             | Null | Key | Default | Extra |
+--------------------+------------------+------+-----+---------+-------+
| 学号               | varchar(64)      | NO   | PRI | NULL    |       |
| 姓名               | varchar(64)      | YES  |     | NULL    |       |
| 年龄               | tinyint unsigned | YES  |     | NULL    |       |
| 性别               | char(1)          | YES  |     | NULL    |       |
| 课程               | varchar(64)      | NO   | PRI | NULL    |       |
| 课程学分           | tinyint unsigned | YES  |     | NULL    |       |
| 所属系名称         | varchar(64)      | YES  |     | NULL    |       |
| 课程成绩           | varchar(10)      | YES  |     | NULL    |       |
| 系地址             | varchar(100)     | YES  |     | NULL    |       |
| 系电话             | varchar(20)      | YES  |     | NULL    |       |
+--------------------+------------------+------+-----+---------+-------+
10 rows in set (0.00 sec)

Here for the student table, the design primary key is (student number, course), and fields such as name, age, and gender depend on the student number field, but not on the course field; course grades and course credits depend on the course field, not on the student number field. Therefore, the non-primary key part of this table depends on the primary key, which does not satisfy 2NF and needs to be further split. The structure of the split table is as follows:

Student table:

(debian-ytt1:3500)|(ytt)>desc `学生表`;
+-----------------+------------------+------+-----+---------+-------+
| Field           | Type             | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+-------+
| 学号            | varchar(64)      | NO   | PRI | NULL    |       |
| 姓名            | varchar(64)      | YES  |     | NULL    |       |
| 年龄            | tinyint unsigned | YES  |     | NULL    |       |
| 性别            | char(1)          | YES  |     | NULL    |       |
| 所属系名称      | varchar(64)      | YES  |     | NULL    |       |
| 系地址          | varchar(100)     | YES  |     | NULL    |       |
| 系电话          | varchar(20)      | YES  |     | NULL    |       |
+-----------------+------------------+------+-----+---------+-------+
7 rows in set (0.00 sec)

Class Schedule:

(debian-ytt1:3500)|(ytt)>desc `课程表`;
+--------------+------------------+------+-----+---------+-------+
| Field        | Type             | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+-------+
| 课程         | varchar(64)      | NO   | PRI | NULL    |       |
| 课程学分     | tinyint unsigned | YES  |     | NULL    |       |
+--------------+------------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

Course selection table:

(debian-ytt1:3500)|(ytt)>desc `选课表`;
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| 学号         | varchar(64) | NO   | PRI | NULL    |       |
| 课程         | varchar(64) | NO   | PRI | NULL    |       |
| 课程成绩     | varchar(10) | YES  |     | NULL    |       |
+--------------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

After dismantling, the student table meets 2NF. Most scenarios are sufficient, but the split student table still has redundancy problems in some scenarios. For example, to open a few popular lines, no hire any student, the student will be inserted into the table in addition to a number of your system name, Department address, empty record of telephone lines. For this scenario, 2NF is temporarily unable to meet the requirements, and a higher-level specification design is required.

`3NF: There is no transitive dependency of non-primary key fields on primary key fields.`

`3NF is more detailed on the basis of 2NF, eliminating the transitive dependence of non-primary key fields on primary key fields in tables that meet 2NF.`

Take the split student table as an example: the non-primary key fields all rely on the primary key field student number, but at the same time the non-primary key fields department address and telephone number also depend on the department name of the . So there is such a relationship at this time:

Department address and line phone dependent your system name, while your system name in this table are also dependent school, so in this case does not meet the student table 3nf. In order to satisfy 3NF, we once again split the student table further:

(debian-ytt1:3500)|(ytt)>desc `学生表`;
+-----------------+------------------+------+-----+---------+-------+
| Field           | Type             | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+-------+
| 学号            | varchar(64)      | NO   | PRI | NULL    |       |
| 姓名            | varchar(64)      | YES  |     | NULL    |       |
| 年龄            | tinyint unsigned | YES  |     | NULL    |       |
| 性别            | char(1)          | YES  |     | NULL    |       |
| 所属系名称      | varchar(64)      | YES  |     | NULL    |       |
+-----------------+------------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

Faculty list

(debian-ytt1:3500)|(ytt)>desc `院系表`;
+--------+--------------+------+-----+---------+-------+
| Field  | Type         | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+-------+
| 系名称   | varchar(64)  | NO   | PRI | NULL    |       |
| 地址   | varchar(100) | YES  |     | NULL    |       |
| 电话   | varchar(20)  | YES  |     | NULL    |       |
+--------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

After splitting, the new student table satisfies 3NF. Next, transform the previous employee table employee into 3NF and split it into two tables, employee and dept.


(debian-ytt1:3500)|(ytt)>desc employee;
+-----------------+-------------+------+-----+---------+-------+
| Field           | Type        | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| employee_number | varchar(64) | NO   | PRI | NULL    |       |
| employee_name   | varchar(64) | NO   |     | NULL    |       |
| base_salary     | varchar(30) | YES  |     | NULL    |       |
| extra_salary    | varchar(30) | YES  |     | NULL    |       |
| dept            | varchar(64) | YES  |     | NULL    |       |
+-----------------+-------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

(debian-ytt1:3500)|(ytt)>desc dept;
+-----------+-------------+------+-----+---------+-------+
| Field     | Type        | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| dept      | varchar(64) | NO   | PRI | NULL    |       |
| dept_desc | text        | YES  |     | NULL    |       |
+-----------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

After re-split, the employee table also conforms to 3NF. In most scenarios, 3NF is sufficient.

The tables in my example above are all single primary keys. If it is a combined primary key, a more detailed split is required. At this time, a higher-level BCNF may be required.

`BCNF: On the basis of 3NF, the partial dependence and transitive dependence between each key in the joint primary key are eliminated.`

BCNF emphasizes the dependence of the fields in the joint primary key.

For example, the student table and employee table that have satisfied 3NF above, suppose there is such a scenario:

Some students may study in multiple departments. At this time, the relationship between students and departments is many pairs. primary key of the table from 16110d27cc2fd8 to the joint primary key ( student number, belongs to the department name)

(debian-ytt1:3500)|(ytt)>alter table `学生表` drop primary key, add primary key (`学号`,` 所属系名 `);
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

At this time, the two fields of the joint primary key depend on each other and conform to 3NF, but do not conform to BCNF, which may cause duplicate records.

For example, Xiao Li studies very well and is interested in many other departments. At the same time, he is studying in the Department of Physics, Department of Mathematics, and Department of Psychology. The relevant records are as follows:

(debian-ytt1:3500)|(ytt)>select * from `学生表`;
+------------+--------+--------+--------+-----------------+
| 学号       | 姓名   | 年龄   | 性别   | 所属系名称      |
+------------+--------+--------+--------+-----------------+
| 2020100090 | 小李   |     21 | 男     | 心理学系        |
| 2020100090 | 小李   |     21 | 男     | 数学系          |
| 2020100090 | 小李   |     21 | 男     | 物理系          |
+------------+--------+--------+--------+-----------------+
3 rows in set (0.00 sec)

You can see that the non-primary key field has multiple duplicate values! So now split the student table according to such a scenario,

(debian-ytt1:3500)|(ytt)>desc `学生表`
    -> ;
+--------+------------------+------+-----+---------+-------+
| Field  | Type             | Null | Key | Default | Extra |
+--------+------------------+------+-----+---------+-------+
| 学号   | varchar(64)      | NO   | PRI | NULL    |       |
| 姓名   | varchar(64)      | YES  |     | NULL    |       |
| 年龄   | tinyint unsigned | YES  |     | NULL    |       |
| 性别   | char(1)          | YES  |     | NULL    |       |
+--------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

(debian-ytt1:3500)|(ytt)>desc `学生_系_关系表`;
+-----------------+--------------+------+-----+---------+----------------+
| Field           | Type         | Null | Key | Default | Extra          |
+-----------------+--------------+------+-----+---------+----------------+
| id              | int unsigned | NO   | PRI | NULL    | auto_increment |
| 学号            | varchar(64)  | YES  |     | NULL    |                |
| 所属系名称      | varchar(64)  | YES  |     | NULL    |                |
+-----------------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

Look at the two records after the transformation

(debian-ytt1:3500)|(ytt)>select * from `学生表`;
+------------+--------+--------+--------+
| 学号       | 姓名   | 年龄   | 性别   |
+------------+--------+--------+--------+
| 2020100090 | 小李   |     21 | 男     |
+------------+--------+--------+--------+
1 row in set (0.00 sec)

(debian-ytt1:3500)|(ytt)>select * from  `学生_系_关系表`;
+----+------------+-----------------+
| id | 学号       | 所属系名称      |
+----+------------+-----------------+
|  1 | 2020100090 | 心理学系        |
|  2 | 2020100090 | 数学系          |
|  3 | 2020100090 | 物理系          |
+----+------------+-----------------+
3 rows in set (0.00 sec)

For the employee table, there are also such scenarios. For example, some employees may hold multiple positions and belong to multiple departments, and the transformation methods are similar.

(debian-ytt1:3500)|(ytt)>desc employee;
+-----------------+-------------+------+-----+---------+-------+
| Field           | Type        | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| employee_number | varchar(64) | NO   | PRI | NULL    |       |
| employee_name   | varchar(64) | NO   |     | NULL    |       |
| base_salary     | varchar(30) | YES  |     | NULL    |       |
| extra_salary    | varchar(30) | YES  |     | NULL    |       |
+-----------------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

(debian-ytt1:3500)|(ytt)>desc employee_vs_dept;
+-----------------+--------------+------+-----+---------+----------------+
| Field           | Type         | Null | Key | Default | Extra          |
+-----------------+--------------+------+-----+---------+----------------+
| id              | int unsigned | NO   | PRI | NULL    | auto_increment |
| employee_number | varchar(64)  | YES  |     | NULL    |                |
| dept            | varchar(64)  | YES  |     | NULL    |                |
+-----------------+--------------+------+-----+---------+----------------+
3 rows in set (0.01 sec)

`4NF: On the basis of 3NF, eliminate multi-value dependencies.`

It is more refined than 3NF, and still takes the employee table as an example. For example, a foreign company has a branch in China. The employees of the Chinese branch have Chinese names, but they must communicate with the foreign headquarters every day, so everyone takes an English name. At this time, the primary key of the employee table becomes the joint primary key (employee_no, employee_name) Some sample data are as follows:

(debian-ytt1:3500)|(ytt)>select * from employee;
+-----------------+---------------+-------------+--------------+
| employee_number | employee_name | base_salary | extra_salary |
+-----------------+---------------+-------------+--------------+
| 202010050       | lucy          | 50000       | 60000        |
| 202010050       | 小青          | 50000       | 60000        |
| 202010051       | simon         | 60000       | 20000        |
| 202010051       | 小张          | 60000       | 20000        |
+-----------------+---------------+-------------+--------------+
4 rows in set (0.00 sec)

From the record results, it is found that the two fields of salary have duplicate data, and the employee table generally has many other fields, so there will be more duplicate data in this scenario.

The way to eliminate this duplicate record is to split this table again and split it into three tables. The table structure is as follows:

(debian-ytt1:3500)|(ytt)>desc employee;
+-----------------+-------------+------+-----+---------+-------+
| Field           | Type        | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| employee_number | varchar(64) | NO   | PRI | NULL    |       |
| base_salary     | varchar(30) | YES  |     | NULL    |       |
| extra_salary    | varchar(30) | YES  |     | NULL    |       |
+-----------------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)



(debian-ytt1:3500)|(ytt)>desc employee_zh;
+-----------------+-------------+------+-----+---------+-------+
| Field           | Type        | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| employee_number | varchar(64) | NO   | PRI | NULL    |       |
| employee_name   | varchar(64) | NO   |     | NULL    |       |
+-----------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

(debian-ytt1:3500)|(ytt)>desc employee_en;
+-----------------+-------------+------+-----+---------+-------+
| Field           | Type        | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| employee_number | varchar(64) | NO   | PRI | NULL    |       |
| employee_name   | varchar(64) | NO   |     | NULL    |       |
+-----------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)


(debian-ytt1:3500)|(ytt)>select  * from employee;
+-----------------+-------------+--------------+
| employee_number | base_salary | extra_salary |
+-----------------+-------------+--------------+
| 202010050       | 50000       | 60000        |
| 202010051       | 60000       | 20000        |
+-----------------+-------------+--------------+
2 rows in set (0.00 sec)

(debian-ytt1:3500)|(ytt)>select  * from employee_zh;
+-----------------+---------------+
| employee_number | employee_name |
+-----------------+---------------+
| 202010050       | 小青          |
| 202010051       | 小张          |
+-----------------+---------------+
2 rows in set (0.00 sec)

(debian-ytt1:3500)|(ytt)>select  * from employee_en;
+-----------------+---------------+
| employee_number | employee_name |
+-----------------+---------------+
| 202010050       | lucy          |
| 202010051       | simon         |
+-----------------+---------------+
2 rows in set (0.00 sec)

In this scenario, the redundant data is split again.

`Summarize:`

In fact, in the real scene, it is not necessary to design the table strictly according to the paradigm, and sometimes it may be necessary to go the other way. After all, the higher the paradigm theory followed, the finer the table splits, and the more table connections are required to retrieve table data, which is expensive.

What else do you want to know about the technical content of MySQL? Hurry up and leave a message to tell the editor!

Issue 33: Standardized Design of MySQL Tables

1NF: That is, the attribute is atomic and cannot be split.

The minimum requirement on how the data is stored is to keep the attributes (fields or columns) of the relational table atomic and cannot be split again.

2NF: Non-primary key fields must be fully functionally dependent on primary key fields, and there is no partial functional dependence of non-primary key fields on primary key fields.

2NF is a further step in data specification than 1NF, reducing more redundancy.

`3NF: There is no transitive dependency of non-primary key fields on primary key fields.`

`3NF is more detailed on the basis of 2NF, eliminating the transitive dependence of non-primary key fields on primary key fields in tables that meet 2NF.`

`BCNF: On the basis of 3NF, the partial dependence and transitive dependence between each key in the joint primary key are eliminated.`

`4NF: On the basis of 3NF, eliminate multi-value dependencies.`

`Summarize:`

爱可生开源社区

`引用和评论`

SQLShift 重大更新：Oracle→PostgreSQL 存储过程转换功能上线！

Full GC 频率优化实战

技术分享 | 国产达梦数据库DM8优化思路

书籍-《优化与最优控制简明教程》

如何进行OceanBase 运维工具的部署和表性能优化呢?

Java 请求合并技术：十倍提升系统性能

Go语言极致性能优化实战：生产级代码优化完全指南

Issue 33: Standardized Design of MySQL Tables

1NF: That is, the attribute is atomic and cannot be split.

The minimum requirement on how the data is stored is to keep the attributes (fields or columns) of the relational table atomic and cannot be split again.

2NF: Non-primary key fields must be fully functionally dependent on primary key fields, and there is no partial functional dependence of non-primary key fields on primary key fields.

2NF is a further step in data specification than 1NF, reducing more redundancy.

3NF: There is no transitive dependency of non-primary key fields on primary key fields.

3NF is more detailed on the basis of 2NF, eliminating the transitive dependence of non-primary key fields on primary key fields in tables that meet 2NF.

BCNF: On the basis of 3NF, the partial dependence and transitive dependence between each key in the joint primary key are eliminated.

4NF: On the basis of 3NF, eliminate multi-value dependencies.

Summarize:

爱可生开源社区

引用和评论

SQLShift 重大更新：Oracle→PostgreSQL 存储过程转换功能上线！​​

Full GC 频率优化实战

技术分享 | 国产达梦数据库DM8优化思路

书籍-《优化与最优控制简明教程》

如何进行OceanBase 运维工具的部署和表性能优化呢?

Java 请求合并技术：十倍提升系统性能

Go语言极致性能优化实战：生产级代码优化完全指南

`3NF: There is no transitive dependency of non-primary key fields on primary key fields.`

`3NF is more detailed on the basis of 2NF, eliminating the transitive dependence of non-primary key fields on primary key fields in tables that meet 2NF.`

`BCNF: On the basis of 3NF, the partial dependence and transitive dependence between each key in the joint primary key are eliminated.`

`4NF: On the basis of 3NF, eliminate multi-value dependencies.`

`Summarize:`

`引用和评论`

SQLShift 重大更新：Oracle→PostgreSQL 存储过程转换功能上线！