Dubbo剖析-集群容错

本篇主要对dubbo集群容错进行剖析，主要下面几个模块

cluster容错方案
Directory目录服务
route 路由解析
loadBalance 软负载均衡

一、调用链路

二、容错方案

集群模式的配置

<dubbo:service cluster="failsafe" /> 服务提供方
<dubbo:reference cluster="failsafe" /> 服务消费方

集群容错实现

接口类 com.alibaba.dubbo.rpc.cluster.Cluster
Cluster实现类

1.AvailableCluster
获取可用的调用。遍历所有Invokers判断Invoker.isAvalible,只要一个有为true直接调用返回，不管成不成功

2.BroadcastCluster
广播调用。遍历所有Invokers, 逐个调用每个调用catch住异常不影响其他invoker调用

3.FailbackCluster
失败自动恢复，对于invoker调用失败，后台记录失败请求，任务定时重发, 通常用于通知

//FailbackClusterInvoker
//记录失败的调用
private final ConcurrentMap<Invocation, AbstractClusterInvoker<?>> failed = new ConcurrentHashMap<Invocation, AbstractClusterInvoker<?>>();

protected Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        try {
            checkInvokers(invokers, invocation);
            Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            //失败后调用 addFailed
            addFailed(invocation, this);
            return new RpcResult(); // ignore
        }
    }

private void addFailed(Invocation invocation, AbstractClusterInvoker<?> router) {
    if (retryFuture == null) {
        synchronized (this) {
            if (retryFuture == null) {
                retryFuture = scheduledExecutorService.scheduleWithFixedDelay(new Runnable() {

                    public void run() {
                        // 收集统计信息
                        try {
                            retryFailed();
                        } catch (Throwable t) { // 防御性容错
                            logger.error("Unexpected error occur at collect statistic", t);
                        }
                    }
                }, RETRY_FAILED_PERIOD, RETRY_FAILED_PERIOD, TimeUnit.MILLISECONDS);
            }
        }
    }
    failed.put(invocation, router);
}

//失败的进行重试，重试成功后移除当前map
void retryFailed() {
        if (failed.size() == 0) {
            return;
        }
        for (Map.Entry<Invocation, AbstractClusterInvoker<?>> entry : new HashMap<Invocation, AbstractClusterInvoker<?>>(
                failed).entrySet()) {
            Invocation invocation = entry.getKey();
            Invoker<?> invoker = entry.getValue();
            try {
                invoker.invoke(invocation);
                failed.remove(invocation);
            } catch (Throwable e) {
                logger.error("Failed retry to invoke method " + invocation.getMethodName() + ", waiting again.", e);
            }
        }
    }

4.FailfastCluster
快速失败，只发起一次调用，失败立即保错，通常用于非幂等性操作

5.FailoverCluster default
失败转移，当出现失败，重试其它服务器，通常用于读操作，但重试会带来更长延迟
（1）目录服务directory.list(invocation) 列出方法的所有可调用服务
获取重试次数，默认重试两次

int len = getUrl().getMethodParameter(invocation.getMethodName(), Constants.RETRIES_KEY, Constants.DEFAULT_RETRIES) + 1;

（2）根据LoadBalance负载策略选择一个Invoker
（3）执行invoker.invoke(invocation)调用
（4）调用成功返回
调用失败小于重试次数，重新执行从3）步骤开始执行，调用次数大于等于重试次数抛出调用失败异常

6.FailsafeCluster
失败安全，出现异常时，直接忽略，通常用于写入审计日志等操作。

7.ForkingCluster
并行调用，只要一个成功即返回，通常用于实时性要求较高的操作，但需要浪费更多服务资源。

注：
还有 MergeableCluster 和 MockClusterWrapper策略，但是个人没有用过所以就不说了

三、Directory目录服务

1. StaticDirectory

静态目录服务，它的所有Invoker通过构造函数传入，服务消费方引用服务的时候，服务对多注册中心的引用，将Invokers集合直接传入 StaticDirectory构造器

public StaticDirectory(URL url, List<Invoker<T>> invokers, List<Router> routers) {
    super(url == null && invokers != null && invokers.size() > 0 ? invokers.get(0).getUrl() : url, routers);
    if (invokers == null || invokers.size() == 0)
        throw new IllegalArgumentException("invokers == null");
    this.invokers = invokers;
}

StaticDirectory的list方法直接返回所有invoker集合

@Override
protected List<Invoker<T>> doList(Invocation invocation) throws RpcException {
    return invokers;
}

2. RegistryDirectory

注册目录服务，它的Invoker集合是从注册中心获取的，它实现了NotifyListener接口实现了回调接口notify(List<Url>)。

比如消费方要调用某远程服务，会向注册中心订阅这个服务的所有服务提供方，订阅时和服务提供方数据有变动时回调消费方的NotifyListener服务的notify方法NotifyListener.notify(List<Url>) 回调接口传入所有服务的提供方的url地址然后将urls转化为invokers, 也就是refer应用远程服务到此时引用某个远程服务的RegistryDirectory中有对这个远程服务调用的所有invokers。

RegistryDirectory.list(invocation)就是根据服务调用方法获取所有的远程服务引用的invoker执行对象

四、服务路由

dubbo路由功能貌似用的不多，目的主要是对已注册的服务进行过滤，比如只能调用某些配置的服务，或者禁用某些服务。

1. ConditionRouter条件路由

dubbo-admin 后台进行配置。

路由代码入口

public <T> List<Invoker<T>> route(List<Invoker<T>> invokers, URL url, Invocation invocation)
            throws RpcException {
    if (invokers == null || invokers.size() == 0) {
        return invokers;
    }
    try {
        if (!matchWhen(url, invocation)) {
            return invokers;
        }
        List<Invoker<T>> result = new ArrayList<Invoker<T>>();
        if (thenCondition == null) {
            logger.warn("The current consumer in the service blacklist. consumer: " + NetUtils.getLocalHost() + ", service: " + url.getServiceKey());
            return result;
        }
    .............................

2. ScriptRouter脚本路由

按照dubbo脚本规则进行编写，程序识别

五、软负载均衡

1. RandomLoadBalance `default`

随机，按权重设置随机概率。权重default=100
在一个截面上碰撞的概率高，但调用量越大分布越均匀，而且按概率使用权重后也比较均匀，有利于动态调整提供者权重。

    protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
        int length = invokers.size(); // 总个数
        int totalWeight = 0; // 总权重
        boolean sameWeight = true; // 权重是否都一样
        for (int i = 0; i < length; i++) {
            int weight = getWeight(invokers.get(i), invocation);
            totalWeight += weight; // 累计总权重
            if (sameWeight && i > 0
                    && weight != getWeight(invokers.get(i - 1), invocation)) {
                sameWeight = false; // 计算所有权重是否一样
            }
        }
        if (totalWeight > 0 && !sameWeight) {
            // 如果权重不相同且权重大于0则按总权重数随机
            int offset = random.nextInt(totalWeight);
            // 并确定随机值落在哪个片断上
            for (int i = 0; i < length; i++) {
                offset -= getWeight(invokers.get(i), invocation);
                if (offset < 0) {
                    return invokers.get(i);
                }
            }
        }
        // 如果权重相同或权重为0则均等随机
        return invokers.get(random.nextInt(length));
    }

算法含义
如果所有的服务权重都一样，就采用总服务数进行随机。如果权重不一样，则按照权重出随机数，然后用随机数减去服务权重，结果为负数则使用当前循环的服务。其实也就是一个概率性问题每个服务的概率就是当前服务的权重/ 总服务权重

2. RoundRobinLoadBalance

轮循，按公约后的权重设置轮循比率。
存在慢的提供者累积请求的问题，比如：第二台机器很慢，但没挂，当请求调到第二台时就卡在那，久而久之，所有请求都卡在调到第二台上。

该负载算法维护着一个方法调用顺序计数

private final ConcurrentMap<String, AtomicPositiveInteger> sequences = new ConcurrentHashMap<String, AtomicPositiveInteger>();

以方法名作为key

轮循分为普通轮询和加权轮询。权重一样时，采用取模运算普通轮询，反之加权轮询。

下面看下具体的实现
RoundRobinLoadBalance#doSelect

i.普通轮询

AtomicPositiveInteger sequence = sequences.get(key);
if (sequence == null) {
    sequences.putIfAbsent(key, new AtomicPositiveInteger());
    sequence = sequences.get(key);
}
//获取本次调用的服务器序号，并+1
int currentSequence = sequence.getAndIncrement();

//当前序号和服务总数取模
return invokers.get(currentSequence % length);

ii.加权轮询
下面贴下核心实现代码。注意几个变量

weightSum = 服务权重之和
invokerToWeightMap = 权重>0的 invoker map

int currentSequence = sequence.getAndIncrement();
if (maxWeight > 0 && minWeight < maxWeight) { // 权重不一样

    // mod < weightSum，下面for循环进行weight递减，weight大的服务被调用的概率大
    int mod = currentSequence % weightSum;
    for (int i = 0; i < maxWeight; i++) {
        for (Map.Entry<Invoker<T>, IntegerWrapper> each : invokerToWeightMap.entrySet()) {
            final Invoker<T> k = each.getKey();
            final IntegerWrapper v = each.getValue();
            if (mod == 0 && v.getValue() > 0) {
                return k;
            }
            if (v.getValue() > 0) {
                v.decrement();
                mod--;
            }
        }
    }
}

可以举个例子
两个服务 A 和 B，权重分别是1和2
那么 mod=[0,1,2]，经过上面的逻辑，调用概率是 A B B A B B A B B ..... 显然B的概率更大一些

3. LeastActiveLoadBalance

最少活跃调用数优先，活跃数指调用前后计数差。使慢的提供者收到更少请求，因为越慢的提供者的调用前后计数差会越大。

每个服务有一个活跃计数器,我们假如有A,B两个提供者.计数均为0.当A提供者开始处理请求,该计数+1,此时A还没处理完,当处理完后则计数-1.而B请求接收到请求处理得很快.B处理完后A还没处理完,所以此时A,B的计数为1,0.那么当有新的请求来的时候,就会选择B提供者(B的活跃计数比A小).这就是文档说的,使慢的提供者收到更少请求。

int leastCount = 0; // 相同最小活跃数的个数
int[] leastIndexs = new int[length]; // 相同最小活跃数的下标

i.最小活跃服务个数=1，该服务优先

if (leastCount == 1) {
    // 如果只有一个最小则直接返回
    return invokers.get(leastIndexs[0]);
}

ii.最小活跃服务个数>1, 最小活跃的服务按照权重随机

if (!sameWeight && totalWeight > 0) {
    // 如果权重不相同且权重大于0则按总权重数随机
    int offsetWeight = random.nextInt(totalWeight);
    // 并确定随机值落在哪个片断上
    for (int i = 0; i < leastCount; i++) {
        int leastIndex = leastIndexs[i];
        //权重越大，offsetWeight越快减成负数
        offsetWeight -= getWeight(invokers.get(leastIndex), invocation);
        if (offsetWeight <= 0)
            return invokers.get(leastIndex);
    }
}

iii. 最小活跃服务个数>1, 权重相同，服务个数随机

// 如果权重相同或权重为0则均等随机
return invokers.get(leastIndexs[random.nextInt(leastCount)]);

4. ConsistentHashLoadBalance

一致性 Hash，相同参数的请求总是发到同一提供者。
当某一台提供者挂时，原本发往该提供者的请求，基于虚拟节点，平摊到其它提供者，不会引起剧烈变动。
算法参见：http://en.wikipedia.org/wiki/Consistent_hashing
缺省只对第一个参数 Hash，如果要修改，请配置 <dubbo:parameter key="hash.arguments" value="0,1" />
缺省用 160 份虚拟节点，如果要修改，请配置 <dubbo:parameter key="hash.nodes" value="320" />

配置样例

<dubbo:reference id="demoService" interface="com.youzan.dubbo.api.DemoService" loadbalance="consistenthash">
    <!--缺省只对第一个参数 Hash-->
    <dubbo:parameter key="hash.arguments" value="0,1" />
    <!--缺省用 160 份虚拟节点，-->
    <dubbo:parameter key="hash.nodes" value="160" />
</dubbo:reference>

算法解析

ConsistentHashLoadBalance为使用该算法的服务维护了一个selectors,

key=invokers.get(0).getUrl().getServiceKey() + "." + invocation.getMethodName()
eg: com.youzan.dubbo.api.DemoService.sayHello

#com.alibaba.dubbo.rpc.cluster.loadbalance.ConsistentHashLoadBalance

private final ConcurrentMap<String, ConsistentHashSelector<?>> selectors = new ConcurrentHashMap<String, ConsistentHashSelector<?>>();

@SuppressWarnings("unchecked")
@Override
protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
    String key = invokers.get(0).getUrl().getServiceKey() + "." + invocation.getMethodName();
    int identityHashCode = System.identityHashCode(invokers);

    //获取该服务的ConsistentHashSelector，并跟进本次调用获取对应invoker
    ConsistentHashSelector<T> selector = (ConsistentHashSelector<T>) selectors.get(key);
    if (selector == null || selector.getIdentityHashCode() != identityHashCode) {
        selectors.put(key, new ConsistentHashSelector<T>(invokers, invocation.getMethodName(), identityHashCode));
        selector = (ConsistentHashSelector<T>) selectors.get(key);
    }
    return selector.select(invocation);
}

ConsistentHashSelector作为ConsistentHashLoadBalance的内部类，就是具体的一致性hash实现。

ConsistentHashSelector内部元素

#com.alibaba.dubbo.rpc.cluster.loadbalance.ConsistentHashLoadBalance.ConsistentHashSelector

//该服务的所有hash节点
private final TreeMap<Long, Invoker<T>> virtualInvokers;
//虚拟节点数量
private final int replicaNumber;
//该服务的唯一hashcode，通过System.identityHashCode(invokers)获取
private final int identityHashCode;

如何构建该服务的虚拟节点？

public ConsistentHashSelector(List<Invoker<T>> invokers, String methodName, int identityHashCode) {
    // 创建TreeMap 来保存结点
    this.virtualInvokers = new TreeMap<Long, Invoker<T>>();
    // 生成调用结点HashCode
    this.identityHashCode = System.identityHashCode(invokers);
    // 获取Url 
    //dubbo://192.168.0.4:20880/com.youzan.dubbo.api.DemoService?anyhost=true&application=consumer-of-helloworld-app&check=false&class=com.youzan.dubbo.provider.DemoServiceImpl&dubbo=2.5.4&generic=false&hash.arguments=0,1&hash.nodes=160&interface=com.youzan.dubbo.api.DemoService&loadbalance=consistenthash&methods=sayHello&pid=32710&side=consumer&timestamp=1527383363936
    URL url = invokers.get(0).getUrl();
    // 获取所配置的结点数，如没有设置则使用默认值160
    this.replicaNumber = url.getMethodParameter(methodName, "hash.nodes", 160);
    // 获取需要进行hash的参数数组索引，默认对第一个参数进行hash
    String[] index = Constants.COMMA_SPLIT_PATTERN.split(url.getMethodParameter(methodName, "hash.arguments", "0"));
    argumentIndex = new int[index.length];
    for (int i = 0; i < index.length; i ++) {
        argumentIndex[i] = Integer.parseInt(index[i]);
    }
    // 创建虚拟结点
    // 对每个invoker生成replicaNumber个虚拟结点，并存放于TreeMap中
    for (Invoker<T> invoker : invokers) {
        for (int i = 0; i < replicaNumber / 4; i++) {
            // 根据md5算法为每4个结点生成一个消息摘要，摘要长为16字节128位。
            byte[] digest = md5(invoker.getUrl().toFullString() + i);
            // 随后将128位分为4部分，0-31,32-63,64-95,95-128，并生成4个32位数，存于long中，long的高32位都为0
            // 并作为虚拟结点的key。
            for (int h = 0; h < 4; h++) {
                long m = hash(digest, h);
                virtualInvokers.put(m, invoker);
            }
        }
    }
}

代码如果看的不是很懂，也不用去深究了（我就没看懂，瞻仰了网上大神的文章贴了帖注释），大家可以就粗略的认为，这段代码就是尽可能的构建出散列均匀的服务hash表。

如何从virtualInvokers选取本次调用的invoker？

// 选择invoker
public Invoker<T> select(Invocation invocation) {
    // 根据调用参数来生成Key
    String key = toKey(invocation.getArguments());
    // 根据这个参数生成消息摘要
    byte[] digest = md5(key);
    //调用hash(digest, 0)，将消息摘要转换为hashCode，这里仅取0-31位来生成HashCode
    //调用sekectForKey方法选择结点。
    Invoker<T> invoker = sekectForKey(hash(digest, 0));
    return invoker;
}

private String toKey(Object[] args) {
    StringBuilder buf = new StringBuilder();
    // 由于hash.arguments没有进行配置，因为只取方法的第1个参数作为key
    for (int i : argumentIndex) {
        if (i >= 0 && i < args.length) {
            buf.append(args[i]);
        }
    }
    return buf.toString();
}

//根据hashCode选择结点
private Invoker<T> sekectForKey(long hash) {
    Invoker<T> invoker;
    Long key = hash;
    // 若HashCode直接与某个虚拟结点的key一样，则直接返回该结点
    if (!virtualInvokers.containsKey(key)) {
        // 若不一致，找到一个比传入的key大的第一个结点。
        SortedMap<Long, Invoker<T>> tailMap = virtualInvokers.tailMap(key);
        // 若不存在，那么选择treeMap中第一个结点
        // 使用TreeMap的firstKey方法，来选择最小上界。
        if (tailMap.isEmpty()) {
            key = virtualInvokers.firstKey();
        } else {
           // 若存在则返回
            key = tailMap.firstKey();
        }
    }
    invoker = virtualInvokers.get(key);
    return invoker;
}

一致性hash环是什么东东？和上面的算法什么关系？

ConsistentHashSelector.virtualInvokers这个东西就是我们的服务hash节点，单纯的从数据结构上的确看不到什么环状的存在，可以先示意下，当前的数据结构
selector结构

virtualInvokers

我们的服务节点只是一个普通的 map数据存储而已，如何形成环呢？其实所谓的环只是逻辑上的展现，ConsistentHashSelector.sekectForKey()方法里通过 TreeMap.tailMap()、TreeMap.tailMap().firstKey、TreeMap.tailMap().firstKey() 结合case实现了环状逻辑。下面我们画图说话。

第一步原始数据结构，我们按照hash从小到大排列

A,B,C表示我们提供的服务，改示意图假设服务节点散列均匀

第二步选择服务节点

i. 假设本地调用得到的key=2120, 代码逻辑(指ConsistentHashSelector.sekectForKey)走到tailMap.firstKey()

那么读取到 3986 A服务

ii.假设本地调用得到的key=9991, tailMap为空，逻辑走到 virtualInvokers.firstKey() 回到起点

读取到 1579 A服务

上述两部情况基本已经能够描述清楚节点的选择逻辑，至于hash直接命中，那么读取对应的服务即可，无需多讲。

最后环状形成
上面两部的介绍已经描述hash算法，那么我们所谓的环状是怎么一回事呢？其实也就是为了方便更好的理解这个逻辑，我们将线性的hash排列作为环状，然后hash的选择按照顺时针方向选择节点（等价于上面hash比较大小）
环状示意
节点选择算法与上面等价，本图主要用来示意，理想的hash环hash差距应该是等差，均匀的排列。

参考:
https://blog.csdn.net/column/details/learningdubbo.html?&page=1
https://blog.csdn.net/revivedsun/article/details/71022871
https://www.jianshu.com/p/53feb7f5f5d9

Dubbo剖析-集群容错

一、调用链路

二、容错方案

集群模式的配置

集群容错实现

三、Directory目录服务

1. StaticDirectory

2. RegistryDirectory

四、服务路由

1. ConditionRouter条件路由

2. ScriptRouter脚本路由

五、软负载均衡

1. RandomLoadBalance `default`

2. RoundRobinLoadBalance

3. LeastActiveLoadBalance

4. ConsistentHashLoadBalance

配置样例

算法解析

青芒

引用和评论

What's new in dubbo-go v3.3.0

Dubbo 中的集群容错

Dubbo剖析-集群容错

一、调用链路

二、容错方案

集群模式的配置

集群容错实现

三、Directory目录服务

1. StaticDirectory

2. RegistryDirectory

四、服务路由

1. ConditionRouter条件路由

2. ScriptRouter脚本路由

五、软负载均衡

1. RandomLoadBalance default

2. RoundRobinLoadBalance

3. LeastActiveLoadBalance

4. ConsistentHashLoadBalance

配置样例

算法解析

青芒

引用和评论

What's new in dubbo-go v3.3.0

Dubbo 中的集群容错

1. RandomLoadBalance `default`