前言
前面已经部署了:redis的4对主从集群 + 1对主从session主从备份。如果redis集群中有宕机情况发生,怎么保障服务的可用性呢,本文准备在session服务器上添加启动哨兵服务,测试集群的容灾情况。
整理后的环境网络地址集合:
"Name": "rm", "172.1.13.11/16", (session主,cl集群的sentinel——1)
"Name": "rs", "172.1.13.12/16", (session从,cl集群的sentinel——2)
"Name": "clm1", "172.1.50.11/16",
"Name": "clm2", "172.1.50.12/16",
"Name": "clm3", "172.1.50.13/16",
"Name": "cls1", "172.1.30.11/16",
"Name": "cls2", "172.1.30.12/16",
"Name": "cls3", "172.1.30.13/16",
"Name": "rbt1", "172.1.12.13/16",
"Name": "rbt2", "172.1.12.14/16",
"Name": "p1", "172.1.1.11/16",
"Name": "p2", "172.1.1.12/16",
"Name": "p3", "172.1.1.13/16",
"Name": "mm", "172.1.11.11/16", (对外端口,主)
"Name": "mm", "172.1.12.12/16", (对外端口,从)
"Name": "n1", "172.1.0.2/16", (对外端口,内网ip随机)
"Name": "n2", "172.1.0.3/16", (对外端口,内网ip随机)
1.集群外部添加哨兵
a.重新整理集群
由于集群的连接操作均使用内网,实际应用同样。修改容器启动命令、配置文件,取消redis集群对外公网的端口映射、cli连接密码。另外节点太多不方便管理,所以减小点。
调整为无密码的3对主从集群,删除clm4、cls4:
/ # redis-cli --cluster del-node 172.1.30.21:6379 c2b42a6c35ab6afb1f360280f9545b3d1761725e
>>> Removing node c2b42a6c35ab6afb1f360280f9545b3d1761725e from cluster 172.1.30.21:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
/ # redis-cli --cluster del-node 172.1.50.21:6379 6d1b7a14a6d0be55a5fcb9266358bd1a42244d47
>>> Removing node 6d1b7a14a6d0be55a5fcb9266358bd1a42244d47 from cluster 172.1.50.21:6379
[ERR] Node 172.1.50.21:6379 is not empty! Reshard data away and try again.
#需要先清空槽数据(rebalance成weigth=0)
/ # redis-cli --cluster rebalance 172.1.50.21:6379 --cluster-weight 6d1b7a14a6d0be55a5fcb9266358bd1a42244d47=0
Moving 2186 slots from 172.1.50.21:6379 to 172.1.30.11:6379
###
Moving 2185 slots from 172.1.50.21:6379 to 172.1.50.11:6379
###
Moving 2185 slots from 172.1.50.21:6379 to 172.1.50.12:6379
###
/ # redis-cli --cluster del-node 172.1.50.21:6379 6d1b7a14a6d0be55a5fcb9266358bd1a42244d47
>>> Removing node 6d1b7a14a6d0be55a5fcb9266358bd1a42244d47 from cluster 172.1.50.21:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
缩容成功。
b.设置配置项,启动哨兵服务
这里修改rm/rs的容器启动命令:
docker run --name rm \
--restart=always \
--network=mybridge --ip=172.1.13.11 \
-v /root/tmp/dk/redis/data:/data \
-v /root/tmp/dk/redis/redis.conf:/etc/redis/redis.conf \
-v /root/tmp/dk/redis/sentinel.conf:/etc/redis/sentinel.conf \
-d cffycls/redis5:1.7
docker run --name rs \
--restart=always \
--network=mybridge --ip=172.1.13.12 \
-v /root/tmp/dk/redis_slave/data:/data \
-v /root/tmp/dk/redis_slave/redis.conf:/etc/redis/redis.conf \
-v /root/tmp/dk/redis_slave/sentinel.conf:/etc/redis/sentinel.conf \
-d cffycls/redis5:1.7
参考《redis集群实现(六) 容灾与宕机恢复》、《Redis及其Sentinel配置项详细说明》,修改配置文件 sentinel.conf:
#若产生数据的存放路径
dir /data/sentinel
#<master-name> <ip> <redis-port> <quorum>
#监视名,ip,port,保证一致性的最小数目
sentinel monitor mymaster1 172.1.50.11 6379 2
sentinel monitor mymaster2 172.1.50.12 6379 2
sentinel monitor mymaster3 172.1.50.13 6379 2
#sentinel down-after-milliseconds <master-name> <milliseconds>
#监视名,认为此节点下线了的超时时间
# Default is 30 seconds.
sentinel down-after-milliseconds mymaster1 30000
sentinel down-after-milliseconds mymaster2 30000
sentinel down-after-milliseconds mymaster3 30000
#sentinel parallel-syncs <master-name> <numslaves>
#监视名,值设为 1 来保证每次只有一个slave 处于不能处理命令请求的状态
sentinel parallel-syncs mymaster1 1
sentinel parallel-syncs mymaster2 1
sentinel parallel-syncs mymaster3 1
#默认值
# Default is 3 minutes.
sentinel failover-timeout mymaster1 180000
sentinel failover-timeout mymaster2 180000
sentinel failover-timeout mymaster3 180000
创建相应文件夹(xx/data/sentinel),重启2个容器,并进入rm:
/ # redis-sentinel /etc/redis/sentinel.conf
... ...
14:X 11 Jul 2019 18:25:24.418 # +monitor master mymaster3 172.1.50.13 6379 quorum 2
14:X 11 Jul 2019 18:25:24.419 # +monitor master mymaster1 172.1.50.11 6379 quorum 2
14:X 11 Jul 2019 18:25:24.419 # +monitor master mymaster2 172.1.50.12 6379 quorum 2
14:X 11 Jul 2019 18:25:24.421 * +slave slave 172.1.30.12:6379 172.1.30.12 6379 @ mymaster1 172.1.50.11 6379
14:X 11 Jul 2019 18:25:24.425 * +slave slave 172.1.30.13:6379 172.1.30.13 6379 @ mymaster2 172.1.50.12 6379
14:X 11 Jul 2019 18:26:14.464 # +sdown master mymaster3 172.1.50.13 6379
“不需要监视slave,监视了master的话,slave会自动加入到sentinel里边”,+sdown感觉什么不对,查看发现sentinel.conf被修改了??是被redis运行时自动修改了,重新调用这个命令是稳定了的配置:/data/sentinel被加上引号,监视配置改为了这样:
sentinel monitor mymaster2 172.1.50.12 6379 2
sentinel config-epoch mymaster2 0
sentinel leader-epoch mymaster2 0
c.处理哨兵的异常
上面有看到,后面每次启动哨兵,会有+sdown的日志(实际集群情况不对应)。进入集群发现:
819ad37676cc77b6691d0e74258c9f8b2d163121 172.1.50.13:6379@16379 slave cd2d78f87dd8a696dc127f762a168129ab91d9c6 0 1562843221035 10 connected
775bf0b33a34898a6a33bee85299982aae0d8a72 172.1.30.13:6379@16379 slave f02ee958993c79b63ffbef5238bb65b3cf552418 0 1562843220030 12 connected
ee0dcbbcc3634ca6e5d079835695bfe822ce17e6 172.1.50.11:6379@16379 myself,master - 0 1562843219000 11 connected 2185-5460 5462-7646
b69937a22d69d71596167104a3c2a9b8e308622c 172.1.30.12:6379@16379 slave ee0dcbbcc3634ca6e5d079835695bfe822ce17e6 0 1562843218000 11 connected
f02ee958993c79b63ffbef5238bb65b3cf552418 172.1.50.12:6379@16379 master - 0 1562843218000 12 connected 7647-13107
cd2d78f87dd8a696dc127f762a168129ab91d9c6 172.1.30.11:6379@16379 master - 0 1562843219029 10 connected 0-2184 5461 13108-16383
原来是50.13和30.13交换了主从,可能是之前宕机clm4、cls4时产生的问题。
这里redis运行时会根据环境自动修改保存配置文件,所以想到应该把初始调一致,所以准备先切换这对主从(也可以修改sentinel.conf 为当前的master的ip配置,但现在这个文件已经被改的有些乱了)。
--方案一、手动停掉30.11主
结果30.12(cls2)成了master、多了个鬼[:0@0 slave,noaddr],而50.11依然slave,失败。
这个是使用 redis-cli --cluster create 创建的所以其中节点之间并没有明确的主从关系;使用php客户端仍然可以获取比先前到不变的数据。
--方案二、继续手动停掉所有计划外的从节点
docker stop cls1 cls2 cls3
等待全部变为主节点(刷新web页面无变化);看到:
b69937a22d69d71596167104a3c2a9b8e308622c 172.1.30.12:6379@16379 master,fail - 1562850125182 1562850123000 14 connected
5a95cbf53f635b1bd28dad6f25ed1e093bc5a2ba :0@0 slave,noaddr - 1562833027365 1562833027365 9 disconnected
775bf0b33a34898a6a33bee85299982aae0d8a72 172.1.30.13:6379@16379 slave,fail f02ee958993c79b63ffbef5238bb65b3cf552418 1562850125182 1562850121174 12 connected
f02ee958993c79b63ffbef5238bb65b3cf552418 172.1.50.12:6379@16379 myself,master - 0 1562850409000 12 connected 7647-13107
819ad37676cc77b6691d0e74258c9f8b2d163121 172.1.50.13:6379@16379 master - 0 1562850408000 15 connected 0-2184 5461 13108-16383
ee0dcbbcc3634ca6e5d079835695bfe822ce17e6 172.1.50.11:6379@16379 master - 0 1562850410051 16 connected 2185-5460 5462-7646
cd2d78f87dd8a696dc127f762a168129ab91d9c6 172.1.30.11:6379@16379 master,fail - 1562849524911 1562849524503 10 connected
OK,启动docker start cls1 cls2 cls3,再看,30.x全部成为slave,达到预期效果。回来kill掉redis-sentinel,再启动哨兵,日志错误消失。
2.宕机测试
a.初步测试:停掉主节点
再启动一个哨兵,进入session从节点(rs)启动(用于对比观察):
16:X 11 Jul 2019 21:19:47.255 # +monitor master mymaster3 172.1.50.13 6379 quorum 2
16:X 11 Jul 2019 21:19:47.256 # +monitor master mymaster1 172.1.50.11 6379 quorum 2
16:X 11 Jul 2019 21:19:47.256 # +monitor master mymaster2 172.1.50.12 6379 quorum 2
16:X 11 Jul 2019 21:19:47.260 * +slave slave 172.1.30.11:6379 172.1.30.11 6379 @ mymaster3 172.1.50.13 6379
16:X 11 Jul 2019 21:19:47.264 * +slave slave 172.1.30.12:6379 172.1.30.12 6379 @ mymaster1 172.1.50.11 6379
16:X 11 Jul 2019 21:19:47.267 * +slave slave 172.1.30.13:6379 172.1.30.13 6379 @ mymaster2 172.1.50.12 6379
16:X 11 Jul 2019 21:19:48.252 * +sentinel sentinel 6b0995ba08e950c69848e3b2ffaf468bb6662626 172.1.13.11 26379 @ mymaster3 172.1.50.13 6379
16:X 11 Jul 2019 21:19:48.258 * +sentinel sentinel 6b0995ba08e950c69848e3b2ffaf468bb6662626 172.1.13.11 26379 @ mymaster1 172.1.50.11 6379
16:X 11 Jul 2019 21:19:48.261 * +sentinel sentinel 6b0995ba08e950c69848e3b2ffaf468bb6662626 172.1.13.11 26379 @ mymaster2 172.1.50.12 6379
#宕机 docker stop clm1,观察如下:
16:X 11 Jul 2019 21:23:11.259 # +sdown master mymaster1 172.1.50.11 6379
16:X 11 Jul 2019 21:23:11.327 # +new-epoch 1
16:X 11 Jul 2019 21:23:11.329 # +vote-for-leader 6b0995ba08e950c69848e3b2ffaf468bb6662626 1
16:X 11 Jul 2019 21:23:12.370 # +odown master mymaster1 172.1.50.11 6379 #quorum 2/2
16:X 11 Jul 2019 21:23:12.371 # Next failover delay: I will not start a failover before Thu Jul 11 21:29:11 2019
16:X 11 Jul 2019 21:24:00.417 # +config-update-from sentinel 6b0995ba08e950c69848e3b2ffaf468bb6662626 172.1.13.11 26379 @ mymaster1 172.1.50.11 6379
16:X 11 Jul 2019 21:24:00.418 # +switch-master mymaster1 172.1.50.11 6379 172.1.30.12 6379
16:X 11 Jul 2019 21:24:00.418 * +slave slave 172.1.50.11:6379 172.1.50.11 6379 @ mymaster1 172.1.30.12 6379
16:X 11 Jul 2019 21:24:30.484 # +sdown slave 172.1.50.11:6379 172.1.50.11 6379 @ mymaster1 172.1.30.12 6379
可以明确看到,会把当前配置情况修改保存到配置文件(主从中也是这样,甚至命令帮助中有命令专用于导出配置)。web端访问正常。
继续 docker stop clm2,名句“Next failover delay: I will not start a failover”出现在rm上,更新配置文件,web刷新有提示 Host is unreachable: 172.1.50.11:6379、能出现结果。
再继续 docker stop clm3,名句“Next failover delay: I will not start a failover”出现在rs上,更新配置文件。web获取不到数据,显示
Host is unreachable: 172.1.50.11:6379Host is unreachable: 172.1.50.12:6379Host is unreachable: 172.1.50.13:6379。进入一个容器查看集,群状态正常,需要修改php的访问配置,配置为尝试集群中所有节点。
b.web访问测试:php端的数据操作
require "../vendor/autoload.php";
//使用swoole时可以保持在线,从缓存当中读取,根据集群状态更新
$servers = ['172.1.50.11:6379', '172.1.50.12:6379', '172.1.50.13:6379',
'172.1.30.11:6379', '172.1.30.12:6379', '172.1.30.13:6379'];
//查出所有节点分布
$rs = [];
$slotNodes = [];
foreach ($servers as $addr){
$server=explode(':',$addr);
try{
$r = new Redis();
$r->connect($server[0], (int) $server[1], 0.2);
$slotInfo = $r->rawCommand('cluster','slots');
//节点分片分槽信息,注意一个主节点会保存不连续的槽,所以添加了序号 ix+1
foreach ($slotInfo as $ix => $value){
$slotNodes[$value[2][0].':'.$value[2][1].' '.($ix+1)]=[$value[0], $value[1]];
}
$rs[$addr] = $r;
foreach ($slotNodes as $slot => $value){
$addr = explode(' ', $slot)[0];
if(!isset($rs[$addr])){
$server = explode(':', $host);
$r = new Redis();
$r->connect($server[0], (int) $server[1]);
$rs[$addr] = $r;
}
}
break;
}catch (\RedisException $e){
echo $e->getMessage(). ': '. $addr;
continue;
}
}
echo '<pre>';
//print_r($rs);
//计算,测试批量查询
$crc = new \Predis\Cluster\Hash\CRC16();
$getAddr = function ($key) use (&$slotNodes, &$crc, &$rs) {
$code = $crc->hash($key) % 16384;
foreach ($slotNodes as $addr => $boundry){
if( $code>=$boundry[0] && $code<=$boundry[1] ){
$host =explode(' ', $addr)[0];
//print_r(['OK: '. $addr => $boundry, $host, $rs]);
return $addr. ' = '. $rs[$host]->get($key);
}
}
};
$result=[];
for($i=10; $i<30; $i++){
$key = 'set-'.$i;
$result[$key] = $getAddr($key);
}
print_r($result);
foreach ($rs as $r){
$r->close();
}
50.x节点逐一、全部挂掉运行结果(集群中有1万条数据,前面添加的,过程同理):
Operation timed out: 172.1.50.11:6379Operation timed out: 172.1.50.12:6379Operation timed out: 172.1.50.13:6379
Array
(
[set-10] => 172.1.30.11:6379 6 = bc1c1134c6b9da41dce82bb7b50d6fa5
[set-11] => 172.1.30.13:6379 1 = 78e23ac793c7ce7a7ec498f46c7a0ee0
[set-12] => 172.1.30.12:6379 3 = 90191fa0ba4d3ee127c5bc2295a524c7
[set-13] => 172.1.30.12:6379 2 = bb626b73081c69ae737a4f0b66af376f
[set-14] => 172.1.30.11:6379 6 = c7b5a610b9aa9640a277ec0d19336aea
[set-15] => 172.1.30.13:6379 1 = ef2a7c6c2ebc01c937551f59ce1be516
[set-16] => 172.1.30.12:6379 3 = d9cb45c5fe69875f9c3cea47f3d7c81d
[set-17] => 172.1.30.12:6379 2 = 2ecc3cb21debbc6d24c07f18c036c66f
[set-18] => 172.1.30.11:6379 6 = e6186afca37c42ccb828fdc94fb34be8
[set-19] => 172.1.30.13:6379 1 = 50400663e0ab9eea2cd0e3389f8e9007
[set-20] => 172.1.30.13:6379 1 = 46e1162866db417d987b64bc89690da3
[set-21] => 172.1.30.11:6379 6 = 08dbce4c73e6ba90e3f54da890e63ba3
[set-22] => 172.1.30.11:6379 4 = df10958fd828d505c9a91d97c8641355
[set-23] => 172.1.30.12:6379 3 = e8a9615af5b2ed5360987e5ed9d49cea
[set-24] => 172.1.30.13:6379 1 = 00cd8741e8828a1ddb7a272e89b64aeb
[set-25] => 172.1.30.11:6379 6 = c808e68289fb9dcd93f19629e2dd7795
[set-26] => 172.1.30.11:6379 4 = d85eeb441f895ff7cac12ecf7c08313b
[set-27] => 172.1.30.12:6379 3 = cbf006ee0c96b4d585cfbed7d0edecd0
[set-28] => 172.1.30.13:6379 1 = 01b0268951256097595c714bb90c3b8c
[set-29] => 172.1.30.11:6379 6 = 853fa04d88512a0dfe4f27e72c08b125
)
符合预期,启动docker start clm1 clm2 clm3,这时30.x全部变为master节点。
c.综合测试:批量宕机
测试一、主节点同时宕机
一次下线当前全部的30.x master节点,此时2个哨兵都一直在输出选举日志,大约等了10分钟都没有选出结果。启动一个或全部的30.x节点无用,进入容器cluster info显示集群失败。无法重组群。
127.0.0.1:6379> cluster nodes
cd2d78f87dd8a696dc127f762a168129ab91d9c6 172.1.30.11:6379@16379 master - 0 1562854957611 21 connected 0-2184 5461 13108-16383
b69937a22d69d71596167104a3c2a9b8e308622c 172.1.30.12:6379@16379 master,fail? - 1562854249393 1562854248493 18 connected 2185-5460 5462-7646
819ad37676cc77b6691d0e74258c9f8b2d163121 172.1.50.13:6379@16379 slave cd2d78f87dd8a696dc127f762a168129ab91d9c6 0 1562854958615 21 connected
f02ee958993c79b63ffbef5238bb65b3cf552418 172.1.50.12:6379@16379 slave 775bf0b33a34898a6a33bee85299982aae0d8a72 0 1562854956000 19 connected
775bf0b33a34898a6a33bee85299982aae0d8a72 172.1.30.13:6379@16379 master,fail? - 1562854249393 1562854247000 19 connected 7647-13107
ee0dcbbcc3634ca6e5d079835695bfe822ce17e6 172.1.50.11:6379@16379 myself,slave b69937a22d69d71596167104a3c2a9b8e308622c 0 1562854955000 16 connected
单个重启无效,除非全部启动(全部恢复旧节点),正常。
测试二、双节点节点宕机
同上,集群重建失败。这里与raft协议的选择机制有关,需要过半投票,所以必须是少于一半的主节点挂掉,才能选出新的master、恢复集群的服务状态。如果出现了这个状况,需要尝试重新启动相应服务器,回复状态到可以完成选举的情况,待选举完成。
测试三、主从批量切换
逐一关闭30.x(clsx)节点,在哨兵处(任一个哨兵)观察是否已经选出新的master节点(没有的话、可以看到无法选出新master节点,需要先恢复上步操作);直到剩余节点全是50.x(clmx),这是剩余节点全部是master节点。
哨兵观察完成情况:
43:X 11 Jul 2019 22:55:15.208 # +sdown master mymaster2 172.1.30.13 6379
43:X 11 Jul 2019 22:55:15.326 # +new-epoch 22
43:X 11 Jul 2019 22:55:15.329 # +vote-for-leader 82710666110f7241e0d4aa6fa445fb95790fac86 22
43:X 11 Jul 2019 22:55:16.268 # +odown master mymaster2 172.1.30.13 6379 #quorum 2/2
43:X 11 Jul 2019 22:55:16.269 # Next failover delay: I will not start a failover before Thu Jul 11 23:01:15 2019
43:X 11 Jul 2019 22:55:16.414 # +config-update-from sentinel 82710666110f7241e0d4aa6fa445fb95790fac86 172.1.13.12 26379 @ mymaster2 172.1.30.13 6379
43:X 11 Jul 2019 22:55:16.414 # +switch-master mymaster2 172.1.30.13 6379 172.1.50.12 6379
43:X 11 Jul 2019 22:55:16.415 * +slave slave 172.1.30.13:6379 172.1.30.13 6379 @ mymaster2 172.1.50.12 6379
43:X 11 Jul 2019 22:57:32.364 # +sdown slave 172.1.30.13:6379 172.1.30.13 6379 @ mymaster2 172.1.50.12 6379
重新启动 docker start cls1 cls2 cls3,自动加入到slave节点,主从完成切换。
小结
a.数据迁移(扩容、缩容)
操作均使用集群命令 redis-cli --cluster。扩容时add-node、reblance,缩容reblance weight=0、del-node。
b.宕机
宕机时哨兵会自动执行选举、切换、下线,打开查看哨兵日志(可以指定配置中logfile,上文cli观察),如果不可用需要重启相应机器。查看选举情况,如果不能完成选举,说明+down的机器已经过半。
c.容灾情况
这里集群共有 3对主从(6台)+1对主从(2台 session):
A. session主节点必须正常
1对主从时,主节点宕机、从节点无法自己升级为主节点:需要手动升级为主、并在客户端多第二连接的兼容处理,或者重启主节点。
当对session主从增加节点建立1+n主从时,能投票选举,才具有自动容灾能力;
B. 6个节点的3对主从集群:
集群哨兵非必须(实测);
从节点可以全部宕机;
或主节点宕机数小于一半、而且对应节点必须有从节点存活(存在具有独一份数据的从节点作为候选人成为主节点)。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。