【Redis学习笔记】2018-06-20 Reis集群

顺风车运营研发团队施洪宝
一. CAP理论

C(Consistence)，即一致性。不同节点上的数据是否相同。
A(Availability)，即可用性。系统能否提供服务(数据不保证是最新数据，可能是以前某个时刻的数据，但是保证不是错误的数据)。
P(Partition tolerance)，分区容忍性。网络不连通导致的系统分块。

三者之间的关系：一个分布式系统，网络本来是连通的。然而可能因为一些故障，使得有些节点之间不连通了，整个网络就分成了几块区域。数据就散布在了这些不连通的区域中，这就叫分区。当一个数据项只在一个节点中保存，分区出现后，和这个节点不连通的部分就访问不到这个数据，此时分区就是无法容忍的。提高分区容忍性的办法就是将一个数据复制到多个节点上，出现分区之后，这一数据项就可能分布到各个区里，容忍性就提高了。然而，要把数据复制到多个节点，就会带来一致性的问题，就是多个节点上面的数据可能是不一致的。要保证一致，每次写操作就都要等待全部节点写成功，而这等待又会带来可用性的问题。总的来说就是，数据存在的节点越多，分区容忍性越高，但要复制更新的数据就越多，一致性就越难保证。为了保证一致性，更新所有节点数据所需要的时间就越长，可用性就会降低。

CAP定理：一个分布式计算系统，不可能同时满足三点。

注：一致性、可用性、分区容忍性都是有程度之分的。

   关于强一致性，弱一致性，最终一致性暂未找到通用的定义。

参考:

     https://zh.wikipedia.org/wiki/CAP%E5%AE%9A%E7%90%86

     https://www.zhihu.com/question/54105974/answer/139037688

     https://en.wikipedia.org/wiki/Consistency_model

一致性的定义：

（1）定义：N：数据的副本总数；W：写操作被确认接受的副本数量；R：读操作的副本数量

（2）强一致性：R+W>N，以保证对副本的读写操作会产生交集，从而保证可以读取到最新版本；如果 W=N，R=1，则需要全部更新，适合大量读少量写操作场景下的强一致性；如果 R=N，W=1，则只更新一个副本，通过读取全部副本来得到最新版本，适合大量写少量读场景下的强一致性。

（3）弱一致性：R+W<=N，如果读写操作的副本集合不产生交集，就可能会读到脏数据；适合对一致性要求比较低的场景
二. Redis集群
集群是Redis提供的分布式解决方案。一个集群由多个节点构成。开始时，每个节点相互独立，处在自己的集群里。通过cluster meet指令可以将其他节点加入到本节点的集群中。

注：集群中的单个节点，不一定只有一个节点，也可以是主从结构中的主服务器。

Redis集群的底层实现主要数据结构有：

typedef struct clusterNode {
    mstime_t ctime; /* Node object creation time. */
    char name[CLUSTER_NAMELEN]; /* Node name, hex string, sha1-size */
    int flags;      /* CLUSTER_NODE_... */
    uint64_t configEpoch; /* Last configEpoch observed for this node */
    unsigned char slots[CLUSTER_SLOTS/8]; /* slots handled by this node */
    int numslots;   /* Number of slots handled by this node */
    int numslaves;  /* Number of slave nodes, if this is a master */
    struct clusterNode **slaves; /* pointers to slave nodes */
    struct clusterNode *slaveof; /* pointer to the master node. Note that it
                                    may be NULL even if the node is a slave
                                    if we don't have the master node in our
                                    tables. */
    mstime_t ping_sent;      /* Unix time we sent latest ping */
    mstime_t pong_received;  /* Unix time we received the pong */
    mstime_t fail_time;      /* Unix time when FAIL flag was set */
    mstime_t voted_time;     /* Last time we voted for a slave of this master */
    mstime_t repl_offset_time;  /* Unix time we received offset for this node */
    mstime_t orphaned_time;     /* Starting time of orphaned master condition */
    long long repl_offset;      /* Last known repl offset for this node. */
    char ip[NET_IP_STR_LEN];  /* Latest known IP address of this node */
    int port;                   /* Latest known clients port of this node */
    int cport;                  /* Latest known cluster port of this node. */
    clusterLink *link;          /* TCP/IP link with this node */
    list *fail_reports;         /* List of nodes signaling this as failing */
} clusterNode;

clusterNode负责记录集群中的节点信息。

/* clusterLink encapsulates everything needed to talk with a remote node. */
typedef struct clusterLink {
    mstime_t ctime;             /* Link creation time */
    int fd;                     /* TCP socket file descriptor */
    sds sndbuf;                 /* Packet send buffer */
    sds rcvbuf;                 /* Packet reception buffer */
    struct clusterNode *node;   /* Node related to this link if any, or NULL */
} clusterLink;

clusterNode中的link域指向clusterLink结构，这个结构负责记录每个节点对应的连接信息，主要有套接字fd, 输入、输出缓冲等。

typedef struct clusterState {
    clusterNode *myself;  /* This node */
    uint64_t currentEpoch;
    int state;            /* CLUSTER_OK, CLUSTER_FAIL, ... */
    int size;             /* Num of master nodes with at least one slot */
    dict *nodes;          /* Hash table of name -> clusterNode structures */
    dict *nodes_black_list; /* Nodes we don't re-add for a few seconds. */
    clusterNode *migrating_slots_to[CLUSTER_SLOTS];
    clusterNode *importing_slots_from[CLUSTER_SLOTS];
    clusterNode *slots[CLUSTER_SLOTS]; //记录每个slot对应的处理节点
    uint64_t slots_keys_count[CLUSTER_SLOTS];
    rax *slots_to_keys;
    /* The following fields are used to take the slave state on elections. */
    mstime_t failover_auth_time; /* Time of previous or next election. */
    int failover_auth_count;    /* Number of votes received so far. */
    int failover_auth_sent;     /* True if we already asked for votes. */
    int failover_auth_rank;     /* This slave rank for current auth request. */
    uint64_t failover_auth_epoch; /* Epoch of the current election. */
    int cant_failover_reason;   /* Why a slave is currently not able to
                                   failover. See the CANT_FAILOVER_* macros. */
    /* Manual failover state in common. */
    mstime_t mf_end;            /* Manual failover time limit (ms unixtime).
                                   It is zero if there is no MF in progress. */
    /* Manual failover state of master. */
    clusterNode *mf_slave;      /* Slave performing the manual failover. */
    /* Manual failover state of slave. */
    long long mf_master_offset; /* Master offset the slave needs to start MF
                                   or zero if stil not received. */
    int mf_can_start;           /* If non-zero signal that the manual failover
                                   can start requesting masters vote. */
    /* The followign fields are used by masters to take state on elections. */
    uint64_t lastVoteEpoch;     /* Epoch of the last vote granted. */
    int todo_before_sleep; /* Things to do in clusterBeforeSleep(). */
    /* Messages received and sent by type. */
    long long stats_bus_messages_sent[CLUSTERMSG_TYPE_COUNT];
    long long stats_bus_messages_received[CLUSTERMSG_TYPE_COUNT];
    long long stats_pfail_nodes;    /* Number of nodes in PFAIL status,
                                       excluding nodes without address. */
} clusterState;

clusterState负责记录整个集群的状态。三者之间的关系如下图：

注：

集群的任意一个节点，都会维护上述的数据结构。
客户端可以连接集群中的任意一个节点，也可以增加代理层，让客户端连接代理，由代理连接集群中的Redis服务器。
集群中新增节点时，本集群中的节点会通知集群中的其他节点与新节点进行握手。

【Redis学习笔记】2018-06-20 Reis集群

AI及LNMPRG研究

引用和评论

【AI主题】走进算法All in AI

Redis 持久化原理分析和使用建议

分布式数据库解析

Redis-扩容机制

嘎嘎好用！推荐三款开源的 Redis 桌面客户端！

自制审批流框架记录

Redis-主从同步原理