Failure analysis | MySQL off-site slave database replication delay case - 个人文章

Author: Ren Kun
Now living in Zhuhai, he has served as a full-time Oracle and MySQL DBA, and now he is mainly responsible for the maintenance of MySQL, mongoDB and Redis.
Source of this article: original contribution
*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.

1. Background

An online core MySQL, version 5.6, has 1 master and 2 slaves in the local computer room, and a remote slave library is deployed at the same time.

Since February 14, the remote database has started to report the delay of replication. At first, it was thought that it was caused by network fluctuations and it was not processed. However, after 2 days, the alarm still exists and the delay is getting higher and higher.

2. Diagnosis

This step is very simple. Check whether the Master_Log_File of show slave status is the current binlog of the main library. If it means that there is no delay in the IO replication thread, it is caused by the SQL replication thread.

Get the process ID of the mysqld, execute perf record -ag -p 11029 -- sleep 10; perf report

Repeatedly executed many times, each time has deflate_slow and occupies the highest proportion

Expand it, associated with the compressed page

pstack 11029 crawls the scene many times, which is also related to compressed pages.

The instance does have a large table, and only the off-site slave library has page compression turned on and its row format is converted to dynamic.

Looking at Seconds_Behind_Master, the latency indicator begins to gradually decrease, indicating that the plan has taken effect.

Grab the perf and pstack scene again.

--perf report

--pstack

It can be seen that the API related to page compression has disappeared, confirming again that this replication delay is directly related to the opening of page compression for large tables.

3. Summary

With the help of perf and pstack tools, the SQL thread replication delay caused by the compressed table can be quickly located, and the problem can be solved by decompressing the large table.

Failure analysis | MySQL off-site slave database replication delay case

1. Background

2. Diagnosis

3. Summary

爱可生开源社区

引用和评论

gh-ost 扩展 MySQL 字段失败？看看 ChatDBA 和 DeepSeek 都怎么说？

分布式数据库解析

Mybatis源码-缓存机制

Mybatis-基础使用

Mysql 连接区别与事务隔离级别

MySQL × 向量数据库：大模型时代的黄金组合实战指南

百万级群聊的设计实践