Author: Ren Kun

Now living in Zhuhai, he has served as a full-time Oracle and MySQL DBA, and now he is mainly responsible for the maintenance of MySQL, mongoDB and Redis.

Source of this article: original contribution

*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.


1. Background

An online core MySQL, version 5.6, has 1 master and 2 slaves in the local computer room, and a remote slave library is deployed at the same time.

Since February 14, the remote database has started to report the delay of replication. At first, it was thought that it was caused by network fluctuations and it was not processed. However, after 2 days, the alarm still exists and the delay is getting higher and higher.

2. Diagnosis

Log in to the off-site slave library, and first check whether it is the delay caused by the IO replication thread.

This step is very simple. Check whether the Master_Log_File of show slave status is the current binlog of the main library. If it means that there is no delay in the IO replication thread, it is caused by the SQL replication thread.

Get the process ID of the mysqld, execute perf record -ag -p 11029 -- sleep 10; perf report

Repeatedly executed many times, each time has deflate_slow and occupies the highest proportion

Expand it, associated with the compressed page

pstack 11029 crawls the scene many times, which is also related to compressed pages.

The instance does have a large table, and only the off-site slave library has page compression turned on and its row format is converted to dynamic.

Looking at Seconds_Behind_Master, the latency indicator begins to gradually decrease, indicating that the plan has taken effect.

Grab the perf and pstack scene again.

--perf report

--pstack

It can be seen that the API related to page compression has disappeared, confirming again that this replication delay is directly related to the opening of page compression for large tables.

3. Summary

With the help of perf and pstack tools, the SQL thread replication delay caused by the compressed table can be quickly located, and the problem can be solved by decompressing the large table.


爱可生开源社区
426 声望207 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。