oplog备份突然变得很慢

新手上路,请多包涵

问题描述

oplog备份突然变的很慢,6月4号,oplog备份半个小时就搞定,从6月5号开始oplog备份十几个小时都无法完成。

问题出现的平台版本及自己尝试过哪些方法

系统:CentOS 6.5
MongoDB版本:3.0.15
复制集:1个主节点 + 2个备份节点 + 1 隐藏节点(隐藏节点主要用于备份)
备份方式:每天一次物理备份 + 每天一次oplog增量备份

自行排查:目前检查服务器磁盘IO、内存、CPU、负载均正常,db.currentOp()也没有其他请求阻塞

相关代码

# 备份命令
$ mongodump --port 5080 -uxxx -pyyy --authenticationDatabase=admin -d local -c oplog.rs -o /path/to/backup -q {'ts': {'$lt':Timestamp(1559764762,1), '$gte':Timestamp(1559592001,1)}} 

# show dbs输出
$ rs001:SECONDARY> show dbs
admin                     0.000GB
local                    27.190GB
.....
other                    xxx.xxxGB

# db.printSlaveReplicationInfo()输出
$ rs001:SECONDARY> db.printSlaveReplicationInfo()
source: 192.168.0.58:5000
    syncedTo: Thu Jun 06 2019 12:16:48 GMT+0800 (HKT)
    0 secs (0 hrs) behind the primary 
source: 172.19.228.66:5000
    syncedTo: Thu Jun 06 2019 12:16:48 GMT+0800 (HKT)
    0 secs (0 hrs) behind the primary 
source: 192.168.0.99:5080
    syncedTo: Thu Jun 06 2019 12:16:21 GMT+0800 (HKT)
    27 secs (0.01 hrs) behind the primary 

# db.printReplicationInfo()输出
$ rs001:SECONDARY> db.printReplicationInfo()
configured oplog size:   102400MB
log length start to end: 158864secs (44.13hrs)
oplog first event time:  Tue Jun 04 2019 16:06:28 GMT+0800 (HKT)
oplog last event time:   Thu Jun 06 2019 12:14:12 GMT+0800 (HKT)
now:                     Thu Jun 06 2019 12:14:30 GMT+0800 (HKT)

# 复制集状态
$ rs001:SECONDARY> rs.status()
{
    "set" : "rs001",
    "date" : ISODate("2019-06-06T04:17:27.774Z"),
    "myState" : 2,
    "syncingTo" : "192.168.0.58:5000",
    "members" : [
        {
            "_id" : 1,
            "name" : "192.168.0.58:5000",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 12796420,
            "optime" : Timestamp(1559794646, 829),
            "optimeDate" : ISODate("2019-06-06T04:17:26Z"),
            "lastHeartbeat" : ISODate("2019-06-06T04:17:26.742Z"),
            "lastHeartbeatRecv" : ISODate("2019-06-06T04:17:26.532Z"),
            "pingMs" : 0,
            "syncingTo" : "192.168.0.55:5000",
            "configVersion" : 34
        },
        {
            "_id" : 2,
            "name" : "192.168.0.55:5000",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 12796420,
            "optime" : Timestamp(1559794646, 111),
            "optimeDate" : ISODate("2019-06-06T04:17:26Z"),
            "lastHeartbeat" : ISODate("2019-06-06T04:17:26.139Z"),
            "lastHeartbeatRecv" : ISODate("2019-06-06T04:17:26.115Z"),
            "pingMs" : 0,
            "electionTime" : Timestamp(1536714964, 1),
            "electionDate" : ISODate("2018-09-12T01:16:04Z"),
            "configVersion" : 34
        },
        {
            "_id" : 3,
            "name" : "172.19.228.66:5000",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 72274,
            "optime" : Timestamp(1559794647, 154),
            "optimeDate" : ISODate("2019-06-06T04:17:27Z"),
            "lastHeartbeat" : ISODate("2019-06-06T04:17:27.241Z"),
            "lastHeartbeatRecv" : ISODate("2019-06-06T04:17:27.666Z"),
            "pingMs" : 56,
            "syncingTo" : "192.168.0.58:5000",
            "configVersion" : 34
        },
        {
            "_id" : 5,
            "name" : "192.168.0.99:5080",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 12796422,
            "optime" : Timestamp(1559794618, 355),
            "optimeDate" : ISODate("2019-06-06T04:16:58Z"),
            "syncingTo" : "192.168.0.58:5000",
            "configVersion" : 34,
            "self" : true
        }
    ],
    "ok" : 1
}

# 备份oplog输出,很慢
$ less backup log
2019-06-06T10:00:02.541+0800    writing local.oplog.rs to /patch/to/backup_dir/local/oplog.rs.bson
2019-06-06T10:15:31.303+0800    [................]  local.oplog.rs  9536/192905661  (0.0%)
2019-06-06T10:15:34.303+0800    [................]  local.oplog.rs  536244/192905661  (0.3%)
.....
2019-06-06T12:41:55.303+0800    [##..............]  local.oplog.rs  23882578/192905661  (12.4%)
2019-06-06T12:41:58.303+0800    [##..............]  local.oplog.rs  23882578/192905661  (12.4%)

你期待的结果是什么?实际看到的错误信息又是什么?

猜测是oplog太大,但是奇怪的是4号为何半小时就备份完了

该如何进一步分析,缩短oplog备份时间?

阅读 2.5k
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进