问题描述
oplog备份突然变的很慢,6月4号,oplog备份半个小时就搞定,从6月5号开始oplog备份十几个小时都无法完成。
问题出现的平台版本及自己尝试过哪些方法
系统:CentOS 6.5
MongoDB版本:3.0.15
复制集:1个主节点 + 2个备份节点 + 1 隐藏节点(隐藏节点主要用于备份)
备份方式:每天一次物理备份 + 每天一次oplog增量备份
自行排查:目前检查服务器磁盘IO、内存、CPU、负载均正常,db.currentOp()也没有其他请求阻塞
相关代码
# 备份命令
$ mongodump --port 5080 -uxxx -pyyy --authenticationDatabase=admin -d local -c oplog.rs -o /path/to/backup -q {'ts': {'$lt':Timestamp(1559764762,1), '$gte':Timestamp(1559592001,1)}}
# show dbs输出
$ rs001:SECONDARY> show dbs
admin 0.000GB
local 27.190GB
.....
other xxx.xxxGB
# db.printSlaveReplicationInfo()输出
$ rs001:SECONDARY> db.printSlaveReplicationInfo()
source: 192.168.0.58:5000
syncedTo: Thu Jun 06 2019 12:16:48 GMT+0800 (HKT)
0 secs (0 hrs) behind the primary
source: 172.19.228.66:5000
syncedTo: Thu Jun 06 2019 12:16:48 GMT+0800 (HKT)
0 secs (0 hrs) behind the primary
source: 192.168.0.99:5080
syncedTo: Thu Jun 06 2019 12:16:21 GMT+0800 (HKT)
27 secs (0.01 hrs) behind the primary
# db.printReplicationInfo()输出
$ rs001:SECONDARY> db.printReplicationInfo()
configured oplog size: 102400MB
log length start to end: 158864secs (44.13hrs)
oplog first event time: Tue Jun 04 2019 16:06:28 GMT+0800 (HKT)
oplog last event time: Thu Jun 06 2019 12:14:12 GMT+0800 (HKT)
now: Thu Jun 06 2019 12:14:30 GMT+0800 (HKT)
# 复制集状态
$ rs001:SECONDARY> rs.status()
{
"set" : "rs001",
"date" : ISODate("2019-06-06T04:17:27.774Z"),
"myState" : 2,
"syncingTo" : "192.168.0.58:5000",
"members" : [
{
"_id" : 1,
"name" : "192.168.0.58:5000",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 12796420,
"optime" : Timestamp(1559794646, 829),
"optimeDate" : ISODate("2019-06-06T04:17:26Z"),
"lastHeartbeat" : ISODate("2019-06-06T04:17:26.742Z"),
"lastHeartbeatRecv" : ISODate("2019-06-06T04:17:26.532Z"),
"pingMs" : 0,
"syncingTo" : "192.168.0.55:5000",
"configVersion" : 34
},
{
"_id" : 2,
"name" : "192.168.0.55:5000",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 12796420,
"optime" : Timestamp(1559794646, 111),
"optimeDate" : ISODate("2019-06-06T04:17:26Z"),
"lastHeartbeat" : ISODate("2019-06-06T04:17:26.139Z"),
"lastHeartbeatRecv" : ISODate("2019-06-06T04:17:26.115Z"),
"pingMs" : 0,
"electionTime" : Timestamp(1536714964, 1),
"electionDate" : ISODate("2018-09-12T01:16:04Z"),
"configVersion" : 34
},
{
"_id" : 3,
"name" : "172.19.228.66:5000",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 72274,
"optime" : Timestamp(1559794647, 154),
"optimeDate" : ISODate("2019-06-06T04:17:27Z"),
"lastHeartbeat" : ISODate("2019-06-06T04:17:27.241Z"),
"lastHeartbeatRecv" : ISODate("2019-06-06T04:17:27.666Z"),
"pingMs" : 56,
"syncingTo" : "192.168.0.58:5000",
"configVersion" : 34
},
{
"_id" : 5,
"name" : "192.168.0.99:5080",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 12796422,
"optime" : Timestamp(1559794618, 355),
"optimeDate" : ISODate("2019-06-06T04:16:58Z"),
"syncingTo" : "192.168.0.58:5000",
"configVersion" : 34,
"self" : true
}
],
"ok" : 1
}
# 备份oplog输出,很慢
$ less backup log
2019-06-06T10:00:02.541+0800 writing local.oplog.rs to /patch/to/backup_dir/local/oplog.rs.bson
2019-06-06T10:15:31.303+0800 [................] local.oplog.rs 9536/192905661 (0.0%)
2019-06-06T10:15:34.303+0800 [................] local.oplog.rs 536244/192905661 (0.3%)
.....
2019-06-06T12:41:55.303+0800 [##..............] local.oplog.rs 23882578/192905661 (12.4%)
2019-06-06T12:41:58.303+0800 [##..............] local.oplog.rs 23882578/192905661 (12.4%)
你期待的结果是什么?实际看到的错误信息又是什么?
猜测是oplog太大,但是奇怪的是4号为何半小时就备份完了
该如何进一步分析,缩短oplog备份时间?