2

Spring Cloud Ribbon源码分析

The core role of Ribbon is to load balance requests. Its basic principle is shown in the figure below. The client integrates the Ribbon component. The Ribbon will perform load balancing calculations based on the configured service provider address list, and after obtaining a target address, the request will be initiated.

image-20211118135001876

So next, we analyze the principle of Ribbon from two levels

  1. @LoadBalanced Annotation how to make ordinary RestTemplate have the ability to load balance
  2. The realization principle of OpenFeign integrated Ribbon

@LoadBalancer annotation analysis process analysis

When using the RestTemplate, we add an @LoadBalance annotation, so that this RestTemplate has the ability to load balance the client when requested.

@Bean
@LoadBalanced
RestTemplate restTemplate() {
    return new RestTemplate();
}

Then, when we open the @LoadBalanced annotation, we can find that the annotation merely declares a @qualifier annotation.

@Target({ ElementType.FIELD, ElementType.PARAMETER, ElementType.METHOD })
@Retention(RetentionPolicy.RUNTIME)
@Documented
@Inherited
@Qualifier
public @interface LoadBalanced {

}

The role of @qualifier annotation

We usually use @Autowired when using annotations to inject a Bean. And everyone should know that @Autowired can inject a List or Map. Give you an example (in a springboot application)

Define a TestClass
@AllArgsConstructor
@Data
public class TestClass {
    private String name;
}
Declare a configuration class and inject TestClass
@Configuration
public class TestConfig {

    @Bean("testClass1")
    TestClass testClass(){
        return new TestClass("testClass1");
    }

    @Bean("testClass2")
    TestClass testClass2(){
        return new TestClass("testClass2");
    }
}
Define a Controller for testing. Note that at this time we are using @Autowired to inject a List collection
@RestController
public class TestController {

    @Autowired(required = false)
    List<TestClass> testClasses= Collections.emptyList();

    @GetMapping("/test")
    public Object test(){
        return testClasses;
    }
}
Visit at this time: http://localhost:8080/test , the result is
[
    {
        name: "testClass1"
    },
    {
        name: "testClass2"
    }
]
Modify TestConfig and TestController
@Configuration
public class TestConfig {

    @Bean("testClass1")
    @Qualifier
    TestClass testClass(){
        return new TestClass("testClass1");
    }

    @Bean("testClass2")
    TestClass testClass2(){
        return new TestClass("testClass2");
    }
}
@RestController
public class TestController {

    @Autowired(required = false)
    @Qualifier
    List<TestClass> testClasses= Collections.emptyList();

    @GetMapping("/test")
    public Object test(){
        return testClasses;
    }
}
Visit again: http://localhost:8080/test , the result is
[
    {
        name: "testClass1"
    }
]

@LoadBalancer annotation filtering and blocking

Learn the @qualifier role comment after, and then back @LoadBalancer the comment, not difficult to understand.

Because we need to scan to the RestTemplate instance @LoadBalancer @LoadBalancer can complete this action, and its specific implementation code is as follows:

@Configuration(proxyBeanMethods = false)
@ConditionalOnClass(RestTemplate.class)
@ConditionalOnBean(LoadBalancerClient.class)
@EnableConfigurationProperties(LoadBalancerProperties.class)
public class LoadBalancerAutoConfiguration {

   @LoadBalanced
   @Autowired(required = false)
   private List<RestTemplate> restTemplates = Collections.emptyList();
}

It can be seen from this code that in the configuration class LoadBalancerAutoConfiguration, the same method is RestTemplate inject the restTemplates @LoadBalanced annotation into the 061b832eb873dd collection.

After getting RestTemplate , in the LoadBalancerInterceptorConfig configuration class, these RestTemplate be intercepted. The implementation code is as follows:

@Configuration(proxyBeanMethods = false)
@ConditionalOnClass(RestTemplate.class)
@ConditionalOnBean(LoadBalancerClient.class)
@EnableConfigurationProperties(LoadBalancerProperties.class)
public class LoadBalancerAutoConfiguration {

    @LoadBalanced
    @Autowired(required = false)
    private List<RestTemplate> restTemplates = Collections.emptyList();

    //省略....

    @Bean
    @ConditionalOnMissingBean
    public LoadBalancerRequestFactory loadBalancerRequestFactory(LoadBalancerClient loadBalancerClient) {
        return new LoadBalancerRequestFactory(loadBalancerClient, this.transformers);
    }

    @Configuration(proxyBeanMethods = false)
    @Conditional(RetryMissingOrDisabledCondition.class)
    static class LoadBalancerInterceptorConfig {
        
        //装载一个LoadBalancerInterceptor的实例到IOC容器。
        @Bean
        public LoadBalancerInterceptor loadBalancerInterceptor(LoadBalancerClient loadBalancerClient,
                LoadBalancerRequestFactory requestFactory) {
            return new LoadBalancerInterceptor(loadBalancerClient, requestFactory);
        }
        
        //会遍历所有加了@LoadBalanced注解的RestTemplate,在原有的拦截器之上,再增加了一个LoadBalancerInterceptor
        @Bean
        @ConditionalOnMissingBean
        public RestTemplateCustomizer restTemplateCustomizer(final LoadBalancerInterceptor loadBalancerInterceptor) {
            return restTemplate -> {
                List<ClientHttpRequestInterceptor> list = new ArrayList<>(restTemplate.getInterceptors());
                list.add(loadBalancerInterceptor);
                restTemplate.setInterceptors(list);
            };
        }

    }
    //省略....
}

LoadBalancerInterceptor

@Override
public ClientHttpResponse intercept(final HttpRequest request, final byte[] body,
      final ClientHttpRequestExecution execution) throws IOException {
   final URI originalUri = request.getURI();
   String serviceName = originalUri.getHost();
   Assert.state(serviceName != null, "Request URI does not contain a valid hostname: " + originalUri);
   return this.loadBalancer.execute(serviceName, this.requestFactory.createRequest(request, body, execution));
}

RestTemplate calling process

In our program, when using the following code to initiate a remote request

restTemplate.getForObject(url,String.class);

Its entire calling process is as follows.

RestTemplate.getForObject

​ -----> AbstractClientHttpRequest.execute()

​ ----->AbstractBufferingClientHttpRequest.executeInternal()

​ -----> InterceptingClientHttpRequest.executeInternal()

​ -----> InterceptingClientHttpRequest.execute()

The code of the InterceptingClientHttpRequest.execute() method is as follows.
@Override
public ClientHttpResponse execute(HttpRequest request, byte[] body) throws IOException {
    if (this.iterator.hasNext()) { //遍历所有的拦截器,通过拦截器进行逐个处理。
        ClientHttpRequestInterceptor nextInterceptor = this.iterator.next();
        return nextInterceptor.intercept(request, body, this);
    }
    else {
        HttpMethod method = request.getMethod();
        Assert.state(method != null, "No standard HTTP method");
        ClientHttpRequest delegate = requestFactory.createRequest(request.getURI(), method);
        request.getHeaders().forEach((key, value) -> delegate.getHeaders().addAll(key, value));
        if (body.length > 0) {
            if (delegate instanceof StreamingHttpOutputMessage) {
                StreamingHttpOutputMessage streamingOutputMessage = (StreamingHttpOutputMessage) delegate;
                streamingOutputMessage.setBody(outputStream -> StreamUtils.copy(body, outputStream));
            }
            else {
                StreamUtils.copy(body, delegate.getBody());
            }
        }
        return delegate.execute();
    }
}

LoadBalancerInterceptor

LoadBalancerInterceptor is an interceptor. When a RestTemplate object modified @Loadbalanced annotation initiates an HTTP request, it will be LoadBalancerInterceptor by the intercept method of 061b832eb87655.

In this method, the getHost method (because when we use the RestTemplate to call the service, we use the service name instead of the domain name, so here we can get the service name directly through getHost and then call the execute method to initiate ask)

@Override
public ClientHttpResponse intercept(final HttpRequest request, final byte[] body,
      final ClientHttpRequestExecution execution) throws IOException {
   final URI originalUri = request.getURI();
   String serviceName = originalUri.getHost();
   Assert.state(serviceName != null, "Request URI does not contain a valid hostname: " + originalUri);
   return this.loadBalancer.execute(serviceName, this.requestFactory.createRequest(request, body, execution));
}

LoadBalancerClient is actually an interface. Let's take a look at its class diagram. It has a unique implementation class: RibbonLoadBalancerClient .

image-20211211152356718

RibbonLoadBalancerClient.execute

The code of the RibbonLoadBalancerClient class is relatively long, we mainly look at its core method execute

public <T> T execute(String serviceId, LoadBalancerRequest<T> request, Object hint)
    throws IOException {
    ILoadBalancer loadBalancer = getLoadBalancer(serviceId);
    Server server = getServer(loadBalancer, hint);
    if (server == null) {
        throw new IllegalStateException("No instances available for " + serviceId);
    }
    RibbonServer ribbonServer = new RibbonServer(serviceId, server,
                                                 isSecure(server, serviceId),
                                                 serverIntrospector(serviceId).getMetadata(server));

    return execute(serviceId, ribbonServer, request);
}

The implementation logic of the above code is as follows:

  • Obtain an ILoadBalancer according to the serviceId, an example is: ZoneAwareLoadBalancer
  • Call the getServer method to obtain a service instance
  • Determine whether the value of Server is empty. The Server here is actually a traditional service node. This object stores some metadata of the service node, such as host, port, etc.

getServer

getServer is used to obtain a specific service node, and its implementation is as follows

protected Server getServer(ILoadBalancer loadBalancer, Object hint) {
    if (loadBalancer == null) {
        return null;
    }
    // Use 'default' on a null hint, or just pass it on?
    return loadBalancer.chooseServer(hint != null ? hint : "default");
}

As you can see from the code, getServer actually calls the method IloadBalancer.chooseServer, which is a load balancer interface.

public interface ILoadBalancer {
    //addServers表示向负载均衡器中维护的实例列表增加服务实例
    public void addServers(List<Server> newServers);
    //chooseServer表示通过某种策略,从负载均衡服务器中挑选出一个具体的服务实例
    public Server chooseServer(Object key);
    //markServerDown表示用来通知和标识负载均衡器中某个具体实例已经停止服务,否则负载均衡器在下一次获取服务实例清单前都会认为这个服务实例是正常工作的
    public void markServerDown(Server server);
    //getReachableServers表示获取当前正常工作的服务实例列表
    public List<Server> getReachableServers();
    //getAllServers表示获取所有的服务实例列表,包括正常的服务和停止工作的服务
    public List<Server> getAllServers();
}

The class diagram of ILoadBalancer is as follows:

image-20211211153617850

Judging from the relationship diagram of the entire class, the BaseLoadBalancer class implements basic load balancing, while DynamicServerListLoadBalancer and ZoneAwareLoadBalancer are some functional extensions based on the load balancing strategy.

  • AbstractLoadBalancer implements the ILoadBalancer interface, which defines the enumeration class of the service group/chooseServer (used to select a service instance)/getServerList (gets all service instances in a certain group)/getLoadBalancerStats is used to obtain a LoadBalancerStats object, which is saved Information about the status of each service.
  • BaseLoadBalancer, which implements the basic functions as a load balancer, such as service list maintenance, service survival monitoring, load balancing algorithm selection Server, etc. But it only completes basic functions, which cannot be achieved in some complex scenarios, such as dynamic service list, server filtering, and zone awareness (calling between services is expected to be done in the same area as much as possible to reduce delay).
  • DynamicServerListLoadBalancer is a subclass of BaseLoadbalancer. It provides extensions to basic load balancing. As can be seen from the name, it provides the characteristics of a dynamic service list.
  • ZoneAwareLoadBalancer is based on DynamicServerListLoadBalancer and adds the function of configuring multiple LoadBalancers in the form of Zone.

Then in the getServer method, loadBalancer.chooseServer specific implementation class of 061b832eb879ee? We found the category RibbonClientConfiguration

@Bean
@ConditionalOnMissingBean
public ILoadBalancer ribbonLoadBalancer(IClientConfig config,
                                        ServerList<Server> serverList, ServerListFilter<Server> serverListFilter,
                                        IRule rule, IPing ping, ServerListUpdater serverListUpdater) {
    if (this.propertiesFactory.isSet(ILoadBalancer.class, name)) {
        return this.propertiesFactory.get(ILoadBalancer.class, config, name);
    }
    return new ZoneAwareLoadBalancer<>(config, rule, ping, serverList,
                                       serverListFilter, serverListUpdater);
}

From the above statement, it is found that if there is no custom ILoadBalancer, it will directly return a ZoneAwareLoadBalancer

ZoneAwareLoadBalancer

Zone means a region, and a region refers to the concept of a geographical area. Generally, large-scale Internet companies will deploy across regions. There are several advantages to this. The first is to provide users in different regions with the nearest access node To reduce access delay, the second is to ensure high availability and perform disaster recovery processing.

ZoneAwareLoadBalancer provides a zone-aware load balancer. Its main function is to perceive zones and ensure that the load balancing strategy in each zone is isolated. It does not guarantee that requests from zone A will be launched. Go to the server corresponding to area A. The one that really fulfills this requirement is ZonePreferenceServerListFilter/ZoneAffinityServerListFilter .

The core function of ZoneAwareLoadBalancer is

  • If zone awareness is turned on and the number of zones> 1, then continue zone selection logic
  • Get the available zones according to the ZoneAvoidanceRule.getAvailableZones() method (it will remove the completely unavailable zones and the zone that is available but the highest load)
  • From the available zones, a zone is randomly selected through ZoneAvoidanceRule.randomChooseZone (this random follow the weight rule: whose zone has the most servers in the zone, the greater the probability of being selected)
  • Among all servers in the selected zone, use the zone to choose the corresponding Rule
@Override
public Server chooseServer(Object key) {
    //ENABLED,表示是否用区域意识的choose选择Server,默认是true,
    //如果禁用了区域、或者只有一个zone,就直接按照父类的逻辑来进行处理,父类默认采用轮询算法
    if (!ENABLED.get() || getLoadBalancerStats().getAvailableZones().size() <= 1) {
        logger.debug("Zone aware logic disabled or there is only one zone");
        return super.chooseServer(key);
    }
    Server server = null;
    try {
        LoadBalancerStats lbStats = getLoadBalancerStats();
        Map<String, ZoneSnapshot> zoneSnapshot = ZoneAvoidanceRule.createSnapshot(lbStats);
        logger.debug("Zone snapshots: {}", zoneSnapshot);
        if (triggeringLoad == null) {
            triggeringLoad = DynamicPropertyFactory.getInstance().getDoubleProperty(
                "ZoneAwareNIWSDiscoveryLoadBalancer." + this.getName() + ".triggeringLoadPerServerThreshold", 0.2d);
        }

        if (triggeringBlackoutPercentage == null) {
            triggeringBlackoutPercentage = DynamicPropertyFactory.getInstance().getDoubleProperty(
                "ZoneAwareNIWSDiscoveryLoadBalancer." + this.getName() + ".avoidZoneWithBlackoutPercetage", 0.99999d);
        }
        //根据相关阈值计算可用区域
        Set<String> availableZones = ZoneAvoidanceRule.getAvailableZones(zoneSnapshot, triggeringLoad.get(), triggeringBlackoutPercentage.get());
        logger.debug("Available zones: {}", availableZones);
        if (availableZones != null &&  availableZones.size() < zoneSnapshot.keySet().size()) {
            //从可用区域中随机选择一个区域,zone里面的服务器节点越多,被选中的概率越大
            String zone = ZoneAvoidanceRule.randomChooseZone(zoneSnapshot, availableZones);
            logger.debug("Zone chosen: {}", zone);
            if (zone != null) {
                //根据zone获得该zone中的LB,然后根据该Zone的负载均衡算法选择一个server
                BaseLoadBalancer zoneLoadBalancer = getLoadBalancer(zone);
                server = zoneLoadBalancer.chooseServer(key);
            }
        }
    } catch (Exception e) {
        logger.error("Error choosing server using zone aware logic for load balancer={}", name, e);
    }
    if (server != null) {
        return server;
    } else {
        logger.debug("Zone avoidance logic is not invoked.");
        return super.chooseServer(key);
    }
}

BaseLoadBalancer.chooseServer

Assuming that we are not using multi-region deployment now, then the load strategy will be executed to BaseLoadBalancer.chooseServer ,

public Server chooseServer(Object key) {
    if (counter == null) {
        counter = createCounter();
    }
    counter.increment();
    if (rule == null) {
        return null;
    } else {
        try {
            return rule.choose(key);
        } catch (Exception e) {
            logger.warn("LoadBalancer [{}]:  Error choosing server for key {}", name, key, e);
            return null;
        }
    }
}

Obtain the designated service node according to the default load balancing algorithm. The default algorithm is RoundBin.

rule.choose

rule stands for load balancing algorithm rule, which has many implementations. The implementation class relationship diagram of IRule is as follows.

image-20211211155112400

By default, rule achieved is ZoneAvoidanceRule , it is RibbonClientConfiguration defined class of this configuration, as follows:

@Configuration(proxyBeanMethods = false)
@EnableConfigurationProperties
// Order is important here, last should be the default, first should be optional
// see
// https://github.com/spring-cloud/spring-cloud-netflix/issues/2086#issuecomment-316281653
@Import({ HttpClientConfiguration.class, OkHttpRibbonConfiguration.class,
        RestClientRibbonConfiguration.class, HttpClientRibbonConfiguration.class })
public class RibbonClientConfiguration {
    @Bean
    @ConditionalOnMissingBean
    public IRule ribbonRule(IClientConfig config) {
        if (this.propertiesFactory.isSet(IRule.class, name)) {
            return this.propertiesFactory.get(IRule.class, config, name);
        }
        ZoneAvoidanceRule rule = new ZoneAvoidanceRule();
        rule.initWithNiwsConfig(config);
        return rule;
    }
}

So, in BaseLoadBalancer.chooseServer call rule.choose(key); , it will actually enter into ZoneAvoidanceRule of choose method

@Override
public Server choose(Object key) {
    ILoadBalancer lb = getLoadBalancer(); //获取负载均衡器
    Optional<Server> server = getPredicate().chooseRoundRobinAfterFiltering(lb.getAllServers(), key); //通过该方法获取目标服务
    if (server.isPresent()) {
        return server.get();
    } else {
        return null;
    }       
}
Compound judgment of the performance of the area where the server is located and the availability of the server

Mainly analyze the chooseRoundRobinAfterFiltering method.

chooseRoundRobinAfterFiltering

As can be seen from the method name, it uses polling to achieve load balancing after filtering the target service cluster through a filtering algorithm.

public Optional<Server> chooseRoundRobinAfterFiltering(List<Server> servers, Object loadBalancerKey) {
    List<Server> eligible = getEligibleServers(servers, loadBalancerKey);
    if (eligible.size() == 0) {
        return Optional.absent();
    }
    return Optional.of(eligible.get(incrementAndGetModulo(eligible.size())));
}

CompositePredicate.getEligibleServers

Use the main filter criteria to filter all instances and return to the filtered list,

@Override
public List<Server> getEligibleServers(List<Server> servers, Object loadBalancerKey) {
    //
    List<Server> result = super.getEligibleServers(servers, loadBalancerKey);
    
    //按照fallbacks中存储的过滤器顺序进行过滤(此处就行先ZoneAvoidancePredicate然后AvailabilityPredicate)
    Iterator<AbstractServerPredicate> i = fallbacks.iterator();
    while (!(result.size() >= minimalFilteredServers && result.size() > (int) (servers.size() * minimalFilteredPercentage))
           && i.hasNext()) {
        AbstractServerPredicate predicate = i.next();
        result = predicate.getEligibleServers(servers, loadBalancerKey);
    }
    return result;
}

Use the secondary filter conditions in turn to filter the results of the primary filter conditions*

  • //Whether it is the primary filter condition or the secondary filter condition, the following two conditions need to be judged
  • //As long as one condition is met, it will no longer filter, and the current result will be returned for linear polling

    • The first condition: the total number of filtered instances >= the minimum number of filtered instances (default is 1)
    • The second condition: the proportion of instances that filter each other> the minimum filter percentage (the default is 0)

getEligibleServers

The implementation logic here is to traverse all server lists, call the this.apply method for verification, and the nodes that pass the verification will be added to the list of results

public List<Server> getEligibleServers(List<Server> servers, Object loadBalancerKey) {
    if (loadBalancerKey == null) {
        return ImmutableList.copyOf(Iterables.filter(servers, this.getServerOnlyPredicate()));            
    } else {
        List<Server> results = Lists.newArrayList();
        for (Server server: servers) {
            if (this.apply(new PredicateKey(loadBalancerKey, server))) {
                results.add(server);
            }
        }
        return results;            
    }
}

this.apply , will enter the CompositePredicate.apply method, the code is as follows.

//CompositePredicate.apply

@Override
public boolean apply(@Nullable PredicateKey input) {
    return delegate.apply(input);
}

delegate Examples are AbstractServerPredicate , the following code!

public static AbstractServerPredicate ofKeyPredicate(final Predicate<PredicateKey> p) {
    return new AbstractServerPredicate() {
        @Override
        @edu.umd.cs.findbugs.annotations.SuppressWarnings(value = "NP")
            public boolean apply(PredicateKey input) {
            return p.apply(input);
        }            
    };        
}

In other words, it will AbstractServerPredicate.apply method, where input represents a specific node of the target server cluster.

Among them, p represents AndPredicate instance of 061b832eb8827e. The combination predicate is used for judgment here, and the combination judgment here is the relationship of and, which is realized by AndPredicate.

 private static class AndPredicate<T> implements Predicate<T>, Serializable {
        private final List<? extends Predicate<? super T>> components;
        private static final long serialVersionUID = 0L;

        private AndPredicate(List<? extends Predicate<? super T>> components) {
            this.components = components;
        }

        public boolean apply(@Nullable T t) {
            for(int i = 0; i < this.components.size(); ++i) { //遍历多个predicate,逐一进行判断。
                if (!((Predicate)this.components.get(i)).apply(t)) {
                    return false;
                }
            }

            return true;
        }
 }

In the above code, components are composed of two predicates

  1. AvailabilityPredicate, which filters out services in the fuse state and services with too many concurrent connections.
  2. ZoneAvoidancePredicate, filters out nodes with no available zones.

Therefore AndPredicate the apply method requires two traverse the predicate is determined one by one.

AvailablilityPredicate

To filter services in the fused state and services with too many concurrent connections, the code is as follows:

@Override
public boolean apply(@Nullable PredicateKey input) {
    LoadBalancerStats stats = getLBStats();
    if (stats == null) {
        return true;
    }
    return !shouldSkipServer(stats.getSingleServerStat(input.getServer()));
}

To determine whether to skip this target node, the implementation logic is as follows.

private boolean shouldSkipServer(ServerStats stats) {  
        //niws.loadbalancer.availabilityFilteringRule.filterCircuitTripped是否为true
    if ((CIRCUIT_BREAKER_FILTERING.get() && stats.isCircuitBreakerTripped()) //该Server是否为断路状态
        || stats.getActiveRequestsCount() >= activeConnectionsLimit.get()) {//本机发往这个Server未处理完的请求个数是否大于Server实例最大的活跃连接数
        return true;
    }
    return false;
}

How to judge whether the Server is in a disconnected state?

ServerStats source code, we will not post the detailed source code here, let's talk about the mechanism:

The disconnection is realized by time judgment, and the time of the last failure is recorded for each failure. If it fails, then a judgment is triggered, whether it is greater than the minimum number of failures of the circuit breaker, then the judgment:

Calculate the disconnection duration: (2^failure times) * disconnection time factor, if it is greater than the maximum disconnection time, take the maximum disconnection time.
Determine whether the current time is greater than the last failure time + short-circuit duration, if it is less, it is in a disconnected state.
Here are three more configurations (here you need to replace default with the name of the microservice you call):

  • niws.loadbalancer.default.connectionFailureCountThreshold, the default is 3, which triggers the minimum number of failures to determine whether the circuit is disconnected, that is, if it fails three times by default, it will determine whether the circuit is to be disconnected.
  • niws.loadbalancer.default.circuitTripTimeoutFactorSeconds, the default is 10, the circuit break time factor,
  • niws.loadbalancer.default.circuitTripMaxTimeoutSeconds, the default is 30, the maximum break time

ZoneAvoidancePredicate

ZoneAvoidancePredicate, filter out nodes in unavailable areas, the code is as follows!

@Override
public boolean apply(@Nullable PredicateKey input) {
    if (!ENABLED.get()) {//查看niws.loadbalancer.zoneAvoidanceRule.enabled配置的熟悉是否为true(默认为true)如果为false没有开启分片过滤 则不进行过滤
        return true;
    }
    ////获取配置的分区字符串 默认为UNKNOWN
    String serverZone = input.getServer().getZone();
    if (serverZone == null) { //如果没有分区,则不需要进行过滤,直接返回即可
        // there is no zone information from the server, we do not want to filter
        // out this server
        return true;
    }
    //获取负载均衡的状态信息
    LoadBalancerStats lbStats = getLBStats();
    if (lbStats == null) {
        // no stats available, do not filter
        return true;
    }
    //如果可用区域小于等于1,也不需要进行过滤直接返回
    if (lbStats.getAvailableZones().size() <= 1) {
        // only one zone is available, do not filter
        return true;
    }
    //针对当前负载信息,创建一个区域快照,后续会用快照数据进行计算(避免后续因为数据变更导致判断计算不准确问题)
    Map<String, ZoneSnapshot> zoneSnapshot = ZoneAvoidanceRule.createSnapshot(lbStats);
    if (!zoneSnapshot.keySet().contains(serverZone)) { //如果快照信息中没有包含当前服务器所在区域,则也不需要进行判断。
        // The server zone is unknown to the load balancer, do not filter it out 
        return true;
    }
    logger.debug("Zone snapshots: {}", zoneSnapshot);
    //获取有效区域
    Set<String> availableZones = ZoneAvoidanceRule.getAvailableZones(zoneSnapshot, triggeringLoad.get(), triggeringBlackoutPercentage.get());
    logger.debug("Available zones: {}", availableZones);
    if (availableZones != null) { //有效区域如果包含当前节点,则返回true,否则返回false, 返回false表示这个区域不可用,不需要进行目标节点分发。
        return availableZones.contains(input.getServer().getZone());
    } else {
        return false;
    }
} 
LoadBalancerStats, each time a communication is initiated, the status information will be printed on the console as follows!
DynamicServerListLoadBalancer for client goods-service initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=goods-service,current list of Servers=[localhost:9091, localhost:9081],Load balancer stats=Zone stats: {unknown=[Zone:unknown;    Instance count:2;    Active connections count: 0;    Circuit breaker tripped count: 0;    Active connections per server: 0.0;]
},Server stats: [[Server:localhost:9091;    Zone:UNKNOWN;    Total Requests:0;    Successive connection failure:0;    Total blackout seconds:0;    Last connection made:Thu Jan 01 08:00:00 CST 1970;    First connection made: Thu Jan 01 08:00:00 CST 1970;    Active Connections:0;    total failure count in last (1000) msecs:0;    average resp time:0.0;    90 percentile resp time:0.0;    95 percentile resp time:0.0;    min resp time:0.0;    max resp time:0.0;    stddev resp time:0.0]
, [Server:localhost:9081;    Zone:UNKNOWN;    Total Requests:0;    Successive connection failure:0;    Total blackout seconds:0;    Last connection made:Thu Jan 01 08:00:00 CST 1970;    First connection made: Thu Jan 01 08:00:00 CST 1970;    Active Connections:0;    total failure count in last (1000) msecs:0;    average resp time:0.0;    90 percentile resp time:0.0;    95 percentile resp time:0.0;    min resp time:0.0;    max resp time:0.0;    stddev resp time:0.0]
]}ServerList:com.netflix.loadbalancer.ConfigurationBasedServerList@74ddb59a
The code of the getAvailableZones method is as follows to calculate the effective available zone.
public static Set<String> getAvailableZones(
    Map<String, ZoneSnapshot> snapshot, double triggeringLoad,
    double triggeringBlackoutPercentage) {
    if (snapshot.isEmpty()) { //如果快照信息为空,返回空
        return null;
    }
    //定义一个集合存储有效区域节点
    Set<String> availableZones = new HashSet<String>(snapshot.keySet());
    if (availableZones.size() == 1) { //如果有效区域的集合只有1个,直接返回
        return availableZones;
    }
    //记录有问题的区域集合
    Set<String> worstZones = new HashSet<String>();
    double maxLoadPerServer = 0; //定义一个变量,保存所有zone中,平均负载最高值
    // true:zone有限可用
    // false:zone全部可用
    boolean limitedZoneAvailability = false; //
    
    //遍历所有的区域信息. 对每个zone进行逐一分析
    for (Map.Entry<String, ZoneSnapshot> zoneEntry : snapshot.entrySet()) {
        String zone = zoneEntry.getKey();  //得到zone字符串
        ZoneSnapshot zoneSnapshot = zoneEntry.getValue(); //得到该zone的快照信息
        int instanceCount = zoneSnapshot.getInstanceCount();
        if (instanceCount == 0) { //若该zone内一个实例都木有了,那就是完全不可用,那就移除该zone,然后标记zone是有限可用的(并非全部可用)
            availableZones.remove(zone);
            limitedZoneAvailability = true;
        } else {
            double loadPerServer = zoneSnapshot.getLoadPerServer(); //获取该区域的平均负载
            // 机器的熔断总数 / 总实例数已经超过了阈值(默认为1,也就是全部熔断才会认为该zone完全不可用)
            if (((double) zoneSnapshot.getCircuitTrippedCount())
                / instanceCount >= triggeringBlackoutPercentage
                || loadPerServer < 0) { //loadPerServer表示当前区域所有节点都熔断了。
                availableZones.remove(zone); 
                limitedZoneAvailability = true;
            } else { // 进入到这个逻辑,说明并不是完全不可用,就看看区域的状态
                // 如果当前负载和最大负载相当,那认为当前区域状态很不好,加入到worstZones中
                if (Math.abs(loadPerServer - maxLoadPerServer) < 0.000001d) {
                    // they are the same considering double calculation
                    // round error
                    worstZones.add(zone);
                   
                } else if (loadPerServer > maxLoadPerServer) {// 或者若当前负载大于最大负载了。
                    maxLoadPerServer = loadPerServer;
                    worstZones.clear();
                    worstZones.add(zone);
                }
            }
        }
    }
    // 如果最大负载小于设定的负载阈值 并且limitedZoneAvailability=false
    // 说明全部zone都可用,并且最大负载都还没有达到阈值,那就把全部zone返回   
    if (maxLoadPerServer < triggeringLoad && !limitedZoneAvailability) {
        // zone override is not needed here
        return availableZones;
    }
    //若最大负载超过阈值, 就不能全部返回,则直接从负载最高的区域中随机返回一个,这么处理的目的是把负载最高的那个哥们T除掉,再返回结果。
    String zoneToAvoid = randomChooseZone(snapshot, worstZones);
    if (zoneToAvoid != null) {
        availableZones.remove(zoneToAvoid);
    }
    return availableZones;

}

The above logic is quite complicated, we will explain it in a simple text:

  1. If zone is null, then there is no available area, directly return null
  2. If zone is 1, there is nothing to choose, just return to this one
  3. Use Set<String> worstZones record a list of the badly-compared zones in all zones. Use maxLoadPerServer indicate the highest load zone in all zones; use limitedZoneAvailability indicate whether some zones are available (true: partly available, false: all available), and then we need to traverse All zone information is judged one by one to process the results of the effective zone.

    1. If the current zone of instanceCount is 0, just remove this area directly and mark limitedZoneAvailability as partially available. There is nothing to say.
    2. Get the current total load average loadPerServer , if the blown instances in the zone/total number of instances >= triggeringBlackoutPercentage or loadPerServer < 0 , it means that there is a problem with the current zone, and directly execute remove to remove the current zone, and limitedZoneAvailability=true .

      1. ( blown instance number / total number of instances >= threshold, marked as unavailable in the current zone (removed), this is easy to understand. This threshold is 0.99999d means that all Server instances are blown, and the zone is only Not available).
      2. loadPerServer = -1 , that is, when all instances are blown. These two conditional judgments are similar, both of which judge the availability of this area.
    3. If the current zone does not reach the threshold, judge the load of the area, find the area with the highest load zone 0.000001d ), then add these areas to the worstZones , that is, this set saves the higher load Area.
  4. After calculating the area data through the above traversal, finally set the effective area data returned.

    1. The highest load maxLoadPerServer still less than the provided triggeringLoad threshold, and limitedZoneAvailability=false (that is, when all zones are available), then all zones are returned: availableZones . (That is, the load of all areas is within the threshold range and the nodes in each area are still alive, then all return)
    2. Otherwise, when the maximum load exceeds the threshold or there are some unavailable nodes in some areas, one of these nodes with higher load worstZones will be randomly removed

AbstractServerPredicate

In answer to the code below, getEligibleServers judging the available service node by 061b832eb88a26, if the available node is not 0, then perform the incrementAndGetModulo to poll.

public Optional<Server> chooseRoundRobinAfterFiltering(List<Server> servers, Object loadBalancerKey) {
    List<Server> eligible = getEligibleServers(servers, loadBalancerKey);
    if (eligible.size() == 0) {
        return Optional.absent();
    }
    return Optional.of(eligible.get(incrementAndGetModulo(eligible.size())));
}

The method is implemented by polling, the code is as follows!

private int incrementAndGetModulo(int modulo) {
    for (;;) {
        int current = nextIndex.get();
        int next = (current + 1) % modulo;
        if (nextIndex.compareAndSet(current, next) && current < modulo)
            return current;
    }
}

Loading process of service list

In this example, we configure the service list in the application.properties file, which means that this list will be loaded at a certain time and saved in a certain location. When is it loaded?

In RibbonClientConfiguration , there is the following Bean declaration (the Bean is a conditional trigger), which is used to define the default load balancing implementation.

@Bean
@ConditionalOnMissingBean
public ILoadBalancer ribbonLoadBalancer(IClientConfig config,
                                        ServerList<Server> serverList, ServerListFilter<Server> serverListFilter,
                                        IRule rule, IPing ping, ServerListUpdater serverListUpdater) {
    if (this.propertiesFactory.isSet(ILoadBalancer.class, name)) {
        return this.propertiesFactory.get(ILoadBalancer.class, config, name);
    }
    return new ZoneAwareLoadBalancer<>(config, rule, ping, serverList,
                                       serverListFilter, serverListUpdater);
}

As analyzed earlier, its class diagram is as follows!

image-20211211153617850

When ZoneAwareLoadBalancer is initializing, it will call DynamicServerListLoadBalancer , the code is as follows.

public DynamicServerListLoadBalancer(IClientConfig clientConfig, IRule rule, IPing ping,
                                         ServerList<T> serverList, ServerListFilter<T> filter,
                                         ServerListUpdater serverListUpdater) {
        super(clientConfig, rule, ping);
        this.serverListImpl = serverList;
        this.filter = filter;
        this.serverListUpdater = serverListUpdater;
        if (filter instanceof AbstractServerListFilter) {
            ((AbstractServerListFilter) filter).setLoadBalancerStats(getLoadBalancerStats());
        }
        restOfInit(clientConfig);
    }

restOfInit

restOfInit method mainly does two things.

  1. Enable the function of dynamically updating Server
  2. Update Server list
void restOfInit(IClientConfig clientConfig) {
    boolean primeConnection = this.isEnablePrimingConnections();
    // turn this off to avoid duplicated asynchronous priming done in BaseLoadBalancer.setServerList()
    this.setEnablePrimingConnections(false);
    enableAndInitLearnNewServersFeature(); //开启动态更新Server

    updateListOfServers(); //更新Server列表
    
    
    if (primeConnection && this.getPrimeConnections() != null) {
        this.getPrimeConnections()
            .primeConnections(getReachableServers());
    }
    this.setEnablePrimingConnections(primeConnection);
    LOGGER.info("DynamicServerListLoadBalancer for client {} initialized: {}", clientConfig.getClientName(), this.toString());
}

updateListOfServers

Update the service list once in full.

public void updateListOfServers() {
    List<T> servers = new ArrayList<T>();
    if (serverListImpl != null) {
        servers = serverListImpl.getUpdatedListOfServers();
        LOGGER.debug("List of Servers for {} obtained from Discovery client: {}",
                     getIdentifier(), servers);

        if (filter != null) {
            servers = filter.getFilteredListOfServers(servers);
            LOGGER.debug("Filtered List of Servers for {} obtained from Discovery client: {}",
                         getIdentifier(), servers);
        }
    }
    updateAllServerList(servers);
}

The above code is explained as follows

  1. Since we are through application.properties configuration file static service address lists, so in this case serverListImpl examples are: ConfigurationBasedServerList , call getUpdatedListOfServers when the method returns in application.properties list service definition file.
  2. Determine whether filter is required, and if so, filter the service list filter

Finally, call updateAllServerList to update all servers to the local cache.

protected void updateAllServerList(List<T> ls) {
    // other threads might be doing this - in which case, we pass
    if (serverListUpdateInProgress.compareAndSet(false, true)) {
        try {
            for (T s : ls) {
                s.setAlive(true); // set so that clients can start using these
                // servers right away instead
                // of having to wait out the ping cycle.
            }
            setServersList(ls);
            super.forceQuickPing();
        } finally {
            serverListUpdateInProgress.set(false);
        }
    }
}

Dynamic Ping mechanism

In Ribbon, based on the Ping mechanism, the target service address will also be dynamically changed. The specific implementation is in the DynamicServerListLoadBalancer.restOfInit method

void restOfInit(IClientConfig clientConfig) {
    boolean primeConnection = this.isEnablePrimingConnections();
    // turn this off to avoid duplicated asynchronous priming done in BaseLoadBalancer.setServerList()
    this.setEnablePrimingConnections(false);
    enableAndInitLearnNewServersFeature();  //开启定时任务动态更新

    updateListOfServers();
    if (primeConnection && this.getPrimeConnections() != null) {
        this.getPrimeConnections()
            .primeConnections(getReachableServers());
    }
    this.setEnablePrimingConnections(primeConnection);
    LOGGER.info("DynamicServerListLoadBalancer for client {} initialized: {}", clientConfig.getClientName(), this.toString());
}
public void enableAndInitLearnNewServersFeature() {
    LOGGER.info("Using serverListUpdater {}", serverListUpdater.getClass().getSimpleName());
    serverListUpdater.start(updateAction);
}

Note that a timed task will be started here, and the program executed by the timed task is updateAction , which is an anonymous internal class, defined as follows.

protected final ServerListUpdater.UpdateAction updateAction = new ServerListUpdater.UpdateAction() {
    @Override
    public void doUpdate() {
        updateListOfServers();
    }
};

The starting method of the timed task is as follows, this task is executed every 30s.

public synchronized void start(final UpdateAction updateAction) {
    if (isActive.compareAndSet(false, true)) {
        final Runnable wrapperRunnable = new Runnable() {
            @Override
            public void run() {
                if (!isActive.get()) {
                    if (scheduledFuture != null) {
                        scheduledFuture.cancel(true);
                    }
                    return;
                }
                try {
                    updateAction.doUpdate();  //执行具体的任务。
                    lastUpdated = System.currentTimeMillis();
                } catch (Exception e) {
                    logger.warn("Failed one update cycle", e);
                }
            }
        };

        scheduledFuture = getRefreshExecutor().scheduleWithFixedDelay(
            wrapperRunnable,
            initialDelayMs,  //1000
            refreshIntervalMs,  //30000 
            TimeUnit.MILLISECONDS 
        );
    } else {
        logger.info("Already active, no-op");
    }
}

doUpdate method is triggered after 30s, it finally enters the updateAllServerList method

protected void updateAllServerList(List<T> ls) {
    // other threads might be doing this - in which case, we pass
    if (serverListUpdateInProgress.compareAndSet(false, true)) {
        try {
            for (T s : ls) {
                s.setAlive(true); // set so that clients can start using these
                // servers right away instead
                // of having to wait out the ping cycle.
            }
            setServersList(ls);
            super.forceQuickPing();
        } finally {
            serverListUpdateInProgress.set(false);
        }
    }
}

Among them, super.forceQuickPing(); will be called for heartbeat health detection.

public void forceQuickPing() {
    if (canSkipPing()) {
        return;
    }
    logger.debug("LoadBalancer [{}]:  forceQuickPing invoking", name);

    try {
        new Pinger(pingStrategy).runPinger();
    } catch (Exception e) {
        logger.error("LoadBalancer [{}]: Error running forceQuickPing()", name, e);
    }
}

RibbonLoadBalancerClient.execute

After the above analysis, return to the RibbonLoadBalancerClient.execute method!

public <T> T execute(String serviceId, LoadBalancerRequest<T> request, Object hint)
    throws IOException {
    ILoadBalancer loadBalancer = getLoadBalancer(serviceId);
    Server server = getServer(loadBalancer, hint);
    if (server == null) {
        throw new IllegalStateException("No instances available for " + serviceId);
    }
    RibbonServer ribbonServer = new RibbonServer(serviceId, server,
                                                 isSecure(server, serviceId),
                                                 serverIntrospector(serviceId).getMetadata(server));

    return execute(serviceId, ribbonServer, request);
}

At this time, Server server = getServer(loadBalancer, hint); will return a specific target server.

Among them, before calling the execute RibbonServer object will be wrapped and passed down. Its main function is to record the requested load information.

@Override
public <T> T execute(String serviceId, ServiceInstance serviceInstance,
                     LoadBalancerRequest<T> request) throws IOException {
    Server server = null;
    if (serviceInstance instanceof RibbonServer) {
        server = ((RibbonServer) serviceInstance).getServer();
    }
    if (server == null) {
        throw new IllegalStateException("No instances available for " + serviceId);
    }

    RibbonLoadBalancerContext context = this.clientFactory
        .getLoadBalancerContext(serviceId);
    RibbonStatsRecorder statsRecorder = new RibbonStatsRecorder(context, server);

    try {
        T returnVal = request.apply(serviceInstance);
        statsRecorder.recordStats(returnVal);  //记录请求状态
        return returnVal;
    }
    // catch IOException and rethrow so RestTemplate behaves correctly
    catch (IOException ex) {
        statsRecorder.recordStats(ex); //记录请求状态
        throw ex;
    }
    catch (Exception ex) {
        statsRecorder.recordStats(ex);
        ReflectionUtils.rethrowRuntimeException(ex);
    }
    return null;
}

request.apply

request is the LoadBalancerRequest interface, which provides an apply method, but from the code we find that this method does not have an implementation class, so where is it implemented?

Continue to analyze and find that the request object is passed from the intercept method of LoadBalancerInterceptor

public ClientHttpResponse intercept(final HttpRequest request, final byte[] body, final ClientHttpRequestExecution execution) throws IOException {
    URI originalUri = request.getURI();
    String serviceName = originalUri.getHost();
    Assert.state(serviceName != null, "Request URI does not contain a valid hostname: " + originalUri);
    return (ClientHttpResponse)this.loadBalancer.execute(serviceName, this.requestFactory.createRequest(request, body, execution));
}

The transfer of request is this.requestFactory.createRequest(request, body, execution) , so we found this method.

public LoadBalancerRequest<ClientHttpResponse> createRequest(final HttpRequest request, final byte[] body, final ClientHttpRequestExecution execution) {
    return (instance) -> {
        HttpRequest serviceRequest = new ServiceRequestWrapper(request, instance, this.loadBalancer);
        LoadBalancerRequestTransformer transformer;
        if (this.transformers != null) {
            for(Iterator var6 = this.transformers.iterator(); var6.hasNext(); serviceRequest = transformer.transformRequest((HttpRequest)serviceRequest, instance)) {
                transformer = (LoadBalancerRequestTransformer)var6.next();
            }
        }

        return execution.execute((HttpRequest)serviceRequest, body);
    };
}

From the code, it is found that it is an anonymous inner class implemented with lambda expressions. In this internal class, a ServiceRequestWrapper is created. This ServiceRequestWrapper is actually a subclass of HttpRequestWrapper. ServiceRequestWrapper rewrites the getURI() method of HttpRequestWrapper. The rewritten URI is actually reconstructed by calling the reconstructURI function of the LoadBalancerClient interface. A URI for access.

InterceptingClientHttpRequest.execute

execution.execute executed by the above code will enter the InterceptingClientHttpRequest.execute method. The code is as follows.

public ClientHttpResponse execute(HttpRequest request, byte[] body) throws IOException {
    if (this.iterator.hasNext()) {
        ClientHttpRequestInterceptor nextInterceptor = this.iterator.next();
        return nextInterceptor.intercept(request, body, this);
    }
    else {
        HttpMethod method = request.getMethod();
        Assert.state(method != null, "No standard HTTP method");
        ClientHttpRequest delegate = requestFactory.createRequest(request.getURI(), method); //注意这里
        request.getHeaders().forEach((key, value) -> delegate.getHeaders().addAll(key, value));
        if (body.length > 0) {
            if (delegate instanceof StreamingHttpOutputMessage) {
                StreamingHttpOutputMessage streamingOutputMessage = (StreamingHttpOutputMessage) delegate;
                streamingOutputMessage.setBody(outputStream -> StreamUtils.copy(body, outputStream));
            }
            else {
                StreamUtils.copy(body, delegate.getBody());
            }
        }
        return delegate.execute();
    }
}

Note that at this time, request instance of an object is HttpRequestWrapper .

request.getURI()

When you call request.getURI() acquiring the target address to create http request, it will call ServiceRequestWrapper in .getURI() method.

@Override
public URI getURI() {
    URI uri = this.loadBalancer.reconstructURI(this.instance, getRequest().getURI());
    return uri;
}

In this method, call RibbonLoadBalancerClient instance reconstructURI method for generating a target address based on the service service-id.

RibbonLoadBalancerClient.reconstructURI

public URI reconstructURI(ServiceInstance instance, URI original) {
        Assert.notNull(instance, "instance can not be null");
        String serviceId = instance.getServiceId(); //获取实例id,也就是服务名称
        RibbonLoadBalancerContext context = this.clientFactory
                .getLoadBalancerContext(serviceId); //获取RibbonLoadBalancerContext上下文,这个是从spring容器中获取的对象实例。

        URI uri;
        Server server;
        if (instance instanceof RibbonServer) { //如果instance为RibbonServer
            RibbonServer ribbonServer = (RibbonServer) instance;
            server = ribbonServer.getServer();  //获取目标服务器的Server信息
            uri = updateToSecureConnectionIfNeeded(original, ribbonServer); //判断是否需要更新成一个安全连接。
        }
        else { //如果是一个普通的http地址
            server = new Server(instance.getScheme(), instance.getHost(),
                    instance.getPort());
            IClientConfig clientConfig = clientFactory.getClientConfig(serviceId);
            ServerIntrospector serverIntrospector = serverIntrospector(serviceId);
            uri = updateToSecureConnectionIfNeeded(original, clientConfig,
                    serverIntrospector, server);
        }
        return context.reconstructURIWithServer(server, uri);  //调用这个方法拼接成一个真实的目标服务器地址。
}
Copyright statement: All articles in this blog, except for special statements, adopt the CC BY-NC-SA 4.0 license agreement. Please indicate the reprint from Mic takes you to learn architecture!
If this article is helpful to you, please help me to follow and like. Your persistence is the motivation for my continuous creation. Welcome to follow the WeChat public account of the same name for more technical dry goods!

跟着Mic学架构
810 声望1.1k 粉丝

《Spring Cloud Alibaba 微服务原理与实战》、《Java并发编程深度理解及实战》作者。 咕泡教育联合创始人,12年开发架构经验,对分布式微服务、高并发领域有非常丰富的实战经验。