Author: Mo Shan

Senior DBA of an Internet company.

Source of this article: original submission

* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.

this article

1. Guide

2. Background

Three, analysis

Fourth, the text

Five, write at the end

1. Guide

DBA [MySQL DBA], other DBAs are beyond the scope of this article.

DDL [ALTER TABLE], other operations are beyond the scope of this article.

OMM【Only Modifies Metadata】。

NOMM【Not Only Modifies Metadata】。

2. Background

beep! beep! beep!

Hello, I am going to launch a new feature tonight, there is a table change to help me deal with it.

Hello, I have a billion-level (billion) table, and I need to add a field/index to help me deal with it.

Hello, I just added a field, the small table has not been added in half an hour, and it seems that I can't write data now.

DBAs should often receive this kind of demand.

DDL operation may be one of the most troublesome tasks for DBAs, and it is also the most daily task. Always add a field to expand the length. If unfortunately the unreasonable system is designed in the early stage, then the maintenance in the later stage will really be a greeting for parents. If it is unfortunate that the business table that has been running for 800 years needs to be launched with a new function, one or two fields, or an extension of the length, it is simply sour and refreshing.

MySQLL 5.6 started to support OnlineDDL. I hit Enter, but I still didn’t respond after waiting for a long time. I slapped one after another, and the business was urged over and over again. The awkward development was a second child, but it didn’t end. After running, you lay down and swiped your phone for a while, looking for some comfort to say goodbye to the hard working day, but found that it was another lush day, throwing your phone angrily to sleep, never thought that you had just been drowsy for a while, and the trouble came again. Up.

The business made a crazy call and was reluctantly connected. Business feedback is that there are problems with the data returned by many orders. Tactfully, you guessed that the table change must have caused the delay. The business is still complaining about it. Let's talk a lot, and you have to cut the read traffic to the main library. Is it so sour?

Why is this happening, let's talk about this problem.

There are two main ways to operate MySQL DDL: copy and inplace. The copy table needs to lock the table all the way, which has a great impact on the business, and inplace does not block dml operations. But before the release of OnlineDDL in MySQL 5.6, basically what copy, what can't be done, and what lock table is done. Version 5.6 launched OnlineDDL, DBAs simply love it. But inplace? Are you really in love? Don't forget, inplace is divided into rebuild and no-rebuild, which can explain why many operations are still very slow. The reason is: [Not Only Modifies Metadata].

Since OnlineDDL still has so many problems, what should we do? This is just a DDL operation. If there are many, dozens or hundreds of DDL requirements, is it going to be crazy? Does it have to be done? But don't panic, there is a later move. Life is very good, for example, you can get food delivered by riding a small eDonkey.

The nonsense time is over, now let’s get back to the subject, let’s talk about how the back-end logic is implemented in the table change ticket system.

Three, analysis

Why does DBA reject DDL operations so much?

1) Awe of the production environment. If you can't do it, you can make more mistakes. As long as you are operating in the online environment, you must be aware of this. Any simple operation may hide unpredictable risks. There is no small matter in the online environment, and an accident is a major event. Therefore, to avoid manual operation, try to avoid manual operation. (Is automation not fragrant, or is the workload not saturated to reflect the workload)

2) The DDL operation cost of large tables is high, and it is easy to affect the business during the day and cause delays. It is usually done early in the morning, staying up late is too bad for the kidneys. Also, most of the DDL overhead costs are extremely high. Simply put, the [Not Only Modifies Metadata] overhead cost is not small, then it will bring some problems, such as the delay mentioned above, and the mdl lock problem. There are also disk io contention, full disk space, etc., so there are many risks.
Whether it is [Only Modifies Metadata] , please refer to the following table :

Operation5.65.78.0
Dropping an indexYESYESYES
Renaming an indexNULLYESYES
Changing the index typeYESYESYES
Renaming a columnYESYESYES
Setting a column default valueYESYESYES
Extending VARCHAR column sizeNOYESYES
Dropping the column default valueYESYESYES
Modifying the definition of an ENUM or SET columnYESYESYES
Adding a VIRTUAL columnNULLYESYES
Dropping a VIRTUAL columnNULLYESYES
Adding a foreign key constraintYESYESYES
Dropping a foreign key constraintYESYESYES
Setting persistent table statisticsYESYESYES
Renaming a tableYESYESYES

Note: 8.0 supports fast column addition (instant algorithm), but there are many limitations. The common thing is (1) Position attributes such as after and first cannot be used, and can only be added to the last column of the table. (2) It cannot be a table with compression turned on. (3) It cannot be a table with a full-text index.

3) Repetitive work, most of which is a carriage return, and then watch the monitoring, pay attention to the alarm and wait for an hour or more. The problem is that it repeats a lot of work every day. (The dba is for big business, think about whether it recruited you in accordance with the kernel development requirements at that time).

Of course, in daily work, DBAs usually use gh-ost or pt-osc to handle DDL operations, but are these two tools okay? Below I list a few questions:

  1. pt-osc has trigger restrictions, resulting in deadlock, and the trigger probability is significantly higher than ghost. This is a risk point and can easily be overlooked. When pt-osc exits abnormally, the trigger will remain on the table. At this time, if you delete the temporary table (xx_new) without first deleting the trigger, it must be very sour and the business will blow your call.
  2. Gh-ost has relatively high requirements on the network. If the network delay is severe, the table change may never be completed. The requirement for binlog format is row, and the requirement for binlog image is full. There is no support for foreign key tables.
  3. The most important problem is that such tools are relatively dumb and only copy data. That is to say, even [Only Modifies Metadata] will copy dumbly. For large tables or businesses with relatively large TPS, a lot of binlogs will be written, which can easily blow up the disk.
  4. For slow query environments, mdl waits are still prone to occur. gh-ost is a little better, but pt-osc is better, and it will always be locked. Until human intervention.

In summary, you will find that DDL is really annoying, and the pain points encountered in actual work are estimated to be more than the above. (It is also recommended for the business. The database design must be done well in the early stage. If the conditions are available, dba must be involved in the design. If there are no conditions, conditions must be created.)

Fourth, the text

First on a flow chart:

Tip: link to the original image

https://gitee.com/mo-shan/myonlineddl/blob/master/img/myonlineddl.png

The main points are as follows:

  1. To format SQL, business SQL needs to be formatted uniformly to facilitate SQL parsing later.
  2. The first SQL parsing. Make a pre-defined usage judgment based on the DDL submitted by the user, and check whether all operations meet the pre-defined ones. This piece of logic is to control the operation of the work order so that all operations are controllable. What can be done and what can not be done.
  3. The second SQL parsing. Judge whether a single DDL satisfies [Only Modifies Metadata] If it is satisfied, go directly to OnlineDDL, if not, use ghost/ptosc tool. When choosing a tool, we will make our own judgments based on the above-mentioned advantages and disadvantages, and then give priority to ghost. To put it bluntly, the work order is adaptive. It can make judgments based on the current operation and different environments, and adjust the corresponding configuration to generate a safe, reasonable and efficient execution plan.
  4. Providing DDL review function does not mean that everything submitted by the business can be passed. For example, do you allow null values, do you require comments, do you require standardized index names, and so on. All can be reviewed through the rule switch. (This point is different from the above-mentioned predefined usage judgment, one is to restrict the operation, and the other is whether the control operation is standardized and reasonable)
  5. Plus monitoring, and exception handling. For example, for the monitoring of mdl, the number of connections, the number of active connections, disk space, etc., there will be e-mail notifications for triggering alarms.
  6. As much as possible to ensure the atomicity of each work order. It is to ensure that every DDL in every work order is successful as much as possible. Either you don’t do it, you have to be successful if you do it, you need to pay attention to it, what I’m talking about here is as much as possible.
  7. The API is used to implement dynamic configuration modification work orders. For example, the modification schedule that is being executed needs to be suspended, resumed, terminated, or the schedule modification progress needs to be obtained.
  8. Finishing work. There is nothing to say about this, it is very important to dry your butt to avoid stepping on the pit next time.

Of course, to achieve the above functions, the biggest difficulty is how to analyze the DDL operations of the business. Only when SQL parsing is implemented, the pain points can be eliminated to realize the function. You can also think about how to implement the SQL parsing function.

Here are a few problems I encountered in the implementation, for everyone to think about:

  1. When SQL is formatted, comments will have a great obstacle to your implementation, so how to deal with comments. ("/ / comment'' # --" and so on these comments)
  2. When parsing SQL, consider [Only Modifies Metadata] scenarios such as change, modify, and varchar extension (with limitations). How to identify whether these meet [OMM], and how to identify if it is the combination mode of modify (modify varchar and not varchar).
  3. How to determine whether to meet the 8.0 quick addition.
  4. The function of SQL audit, how to control DDL operation is reasonable.
  5. How to deal with the problem of reserved words? Do not process and return an error directly? The user experience is too bad. Some people may say that it’s fine to use backticks. Then the problem comes. The user doesn’t know that this is a keyword. The SQL he wrote doesn’t use backticks, so it needs to be added by the ticket system for a second time. How should it be added? Which ones need to be added?

You can think about how to implement the above points. You may think it is not difficult once you use your brain, but it should be somewhat difficult to actually implement the code. After all, the syntax of ALTER TABLE is ever-changing, and the combination is very complicated. For how I achieved it, interested friends can take a look at git, and welcome improvements. (Both installation, deployment and testing)

git:https://gitee.com/mo-shan/myonlineddl

Need to understand the test process, you can directly read the [README.md] test part

Five, write at the end

Once the sun is over, it will become a car one day. Just do it!


爱可生开源社区
426 声望207 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。