Preface

System performance optimization is the only way for every programmer, but it may also be the deepest routine ever. It not only requires an in-depth understanding of various tools, but sometimes also needs to combine specific business scenarios to get a customized optimization plan. Of course, you can also hide a Thread.sleep quietly in the code to save a few milliseconds of sleep when optimization is needed (manual dog head). The subject of performance optimization is so vast that there is no good book on the market that can comprehensively summarize this subject. Not only that, even if it goes deep into various subdivisions, the means of performance optimization are very rich and dazzling.

This article will not cover all optimization routines, but will only give its own general solution for the concurrent call scenario encountered in the recent project development process. You can directly package it or copy and paste it into the project. You are also welcome to give more opinions and optimize scenarios.

background

I don’t know if you encounter such a scenario during the development process. We will call service A first, then call service B, and then call service C after assembling the data (if you haven’t encountered it in the development of the microservice system) In such a scenario, I want to say that either your system's split granularity is too coarse, or this is an underlying system that is fortunately not dependent on downstream services~)

this link is 16161763b6f2fc duration(A) + duration(B) + duration(C) + other operations . From experience, most of the time consuming comes from the processing time of downstream services and network IO, and the time consuming of CPU operations inside the application is basically negligible. However, when we know that there is no dependency between the calls to services A and B, can we reduce the waiting time of synchronous calls by concurrently calling A and B at the same time, so that the link time consumption can be ideally Optimized to max(duration(A),duration(B)) + duration(C) + other operations

To give another example, sometimes we may need to call downstream services in batches, such as querying user information in batches. The downstream query interface often restricts the number of queries that can be queried at a time for service protection, for example, only one hundred user information can be queried at a time. Therefore, we need to split multiple requests for multiple queries, so the time-consuming becomes n*duration(A) + other operations . Similarly, using the optimization method of concurrent requests, ideally the time-consuming can be reduced to max(duration(A)) + other operations

The code implementation of these two scenarios is basically similar. This article will provide the idea and complete implementation of the second scenario.

Small scale chopper

The overall implementation class diagram of concurrent RPC calls is as follows:
进阶类图

First we need to create a thread pool for concurrent execution. Because there are usually other scenarios where the thread pool is used in the program, and we hope that RPC calls can use a separate thread pool, this is encapsulated with a factory method.

@Configuration
public class ThreadPoolExecutorFactory {

    @Resource
    private Map<String, AsyncTaskExecutor> executorMap;

    /**
    * 默认的线程池
    */
    @Bean(name = ThreadPoolName.DEFAULT_EXECUTOR)
    public AsyncTaskExecutor baseExecutorService() {
        //后续支持各个服务定制化这部分参数
        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
        //设置线程池参数信息
        taskExecutor.setCorePoolSize(10);
        taskExecutor.setMaxPoolSize(50);
        taskExecutor.setQueueCapacity(200);
        taskExecutor.setKeepAliveSeconds(60);
        taskExecutor.setThreadNamePrefix(ThreadPoolName.DEFAULT_EXECUTOR + "--");
        taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
        taskExecutor.setAwaitTerminationSeconds(60);
        taskExecutor.setDaemon(Boolean.TRUE);
        //修改拒绝策略为使用当前线程执行
        taskExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        //初始化线程池
        taskExecutor.initialize();

        return taskExecutor;
    }

    /**
    * 并发调用单独的线程池
    */
    @Bean(name = ThreadPoolName.RPC_EXECUTOR)
    public AsyncTaskExecutor rpcExecutorService() {
        //后续支持各个服务定制化这部分参数
        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
        //设置线程池参数信息
        taskExecutor.setCorePoolSize(20);
        taskExecutor.setMaxPoolSize(100);
        taskExecutor.setQueueCapacity(200);
        taskExecutor.setKeepAliveSeconds(60);
        taskExecutor.setThreadNamePrefix(ThreadPoolName.RPC_EXECUTOR + "--");
        taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
        taskExecutor.setAwaitTerminationSeconds(60);
        taskExecutor.setDaemon(Boolean.TRUE);
        //修改拒绝策略为使用当前线程执行
        taskExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        //初始化线程池
        taskExecutor.initialize();

        return taskExecutor;
    }
    /**
     * 根据线程池名称获取线程池
     * 若找不到对应线程池,则抛出异常
     * @param name 线程池名称
     * @return 线程池
     * @throws RuntimeException 若找不到该名称的线程池
     */
    public AsyncTaskExecutor fetchAsyncTaskExecutor(String name) {
        AsyncTaskExecutor executor = executorMap.get(name);
        if (executor == null) {
            throw new RuntimeException("no executor name " + name);
        }
        return executor;
    }
}

public class ThreadPoolName {

    /**
     * 默认线程池
     */
    public static final String DEFAULT_EXECUTOR = "defaultExecutor";

    /**
     * 并发调用使用的线程池
     */
    public static final String RPC_EXECUTOR = "rpcExecutor";
}

As shown in the code, we declare two Spring thread pools, AsyncTaskExecutor, which are the default thread pool and the thread pool for RPC calls, and load them into the map. The caller can use the fetchAsyncTaskExecutor method and pass in the name of the thread pool to specify the thread pool execution. There is another detail here. The number of threads in the Rpc thread pool is significantly greater than that of the other thread pool. This is because Rpc calls are not CPU-intensive logic and are often accompanied by a lot of waiting. Therefore, increasing the number of threads can effectively improve concurrency efficiency.

@Component
public class TracedExecutorService {

    @Resource
    private ThreadPoolExecutorFactory threadPoolExecutorFactory;


    /**
     * 指定线程池提交异步任务,并获得任务上下文
     * @param executorName 线程池名称
     * @param tracedCallable 异步任务
     * @param <T> 返回类型
     * @return 线程上下文
     */
    public <T> Future<T> submit(String executorName, Callable<T> tracedCallable) {
        return threadPoolExecutorFactory.fetchAsyncTaskExecutor(executorName).submit(tracedCallable);
    }
}

The submit method encapsulates the logic of obtaining the thread pool and submitting asynchronous tasks. Here, the combination of Callable+Future is used to obtain the execution result of the asynchronous thread.

The thread pool is ready, then we need to declare an interface for submitting concurrent calls to the service:

public interface BatchOperateService {

    /**
     * 并发批量操作
     * @param function 执行的逻辑
     * @param requests 请求
     * @param config 配置
     * @return 全部响应
     */
    <T, R> List<R> batchOperate(Function<T, R> function, List<T> requests, BatchOperateConfig config);
}

@Data
public class BatchOperateConfig {

    /**
     * 超时时间
     */
    private Long timeout;

    /**
     * 超时时间单位
     */
    private TimeUnit timeoutUnit;

    /**
     * 是否需要全部执行成功
     */
    private Boolean needAllSuccess;

}

The function object is passed in the batchOperate method, which is the code logic that needs to be executed concurrently. Requests are all requests, and concurrent calls will recurse these requests and submit them to the asynchronous thread. The config object can do some configuration for this concurrent call, such as the timeout period of the concurrent query, and whether the entire batch query continues to execute if part of the call is abnormal.

Next, take a look at the implementation class:

@Service
@Slf4j
public class BatchOperateServiceImpl implements BatchOperateService{

    @Resource
    private TracedExecutorService tracedExecutorService;

    @Override
    public <T, R> List<R> batchOperate(Function<T, R> function, List<T> requests, BatchOperateConfig config) {
        log.info("batchOperate start function:{} request:{} config:{}", function, JSON.toJSONString(requests), JSON.toJSONString(config));

        // 当前时间
        long startTime = System.currentTimeMillis();

        // 初始化
        int numberOfRequests = CollectionUtils.size(requests);

        // 所有异步线程执行结果
        List<Future<R>> futures = Lists.newArrayListWithExpectedSize(numberOfRequests);
        // 使用countDownLatch进行并发调用管理
        CountDownLatch countDownLatch = new CountDownLatch(numberOfRequests);
        List<BatchOperateCallable<T, R>> callables = Lists.newArrayListWithExpectedSize(numberOfRequests);

        // 分别提交异步线程执行
        for (T request : requests) {
            BatchOperateCallable<T, R> batchOperateCallable = new BatchOperateCallable<>(countDownLatch, function, request);
            callables.add(batchOperateCallable);

            // 提交异步线程执行
            Future<R> future = tracedExecutorService.submit(ThreadPoolName.RPC_EXECUTOR, batchOperateCallable);
            futures.add(future);
        }

        try {
            // 等待全部执行完成,如果超时且要求全部调用成功,则抛出异常
            boolean allFinish = countDownLatch.await(config.getTimeout(), config.getTimeoutUnit());
            if (!allFinish && config.getNeedAllSuccess()) {
                throw new RuntimeException("batchOperate timeout and need all success");
            }
            // 遍历执行结果,如果有的执行失败且要求全部调用成功,则抛出异常
            boolean allSuccess = callables.stream().map(BatchOperateCallable::isSuccess).allMatch(BooleanUtils::isTrue);
            if (!allSuccess && config.getNeedAllSuccess()) {
                throw new RuntimeException("some batchOperate have failed and need all success");
            }

            // 获取所有异步调用结果并返回
            List<R> result = Lists.newArrayList();
            for (Future<R> future : futures) {
                R r = future.get();
                if (Objects.nonNull(r)) {
                    result.add(r);
                }
            }
            return result;
        } catch (Exception e) {
            throw new RuntimeException(e.getMessage());
        } finally {
            double duration = (System.currentTimeMillis() - startTime) / 1000.0;
            log.info("batchOperate finish duration:{}s function:{} request:{} config:{}", duration, function, JSON.toJSONString(requests), JSON.toJSONString(config));

        }
    }
}

Usually we just traverse the Future directly after submitting it to the thread pool and wait for the result to be obtained. But here we use CountDownLatch to do a more unified timeout management. You can look at the implementation of BatchOperateCallable:

public class BatchOperateCallable<T, R> implements Callable<R> {

    private final CountDownLatch countDownLatch;

    private final Function<T, R> function;

    private final T request;

    /**
     * 该线程处理是否成功
     */
    private boolean success;

    public BatchOperateCallable(CountDownLatch countDownLatch, Function<T, R> function, T request) {
        this.countDownLatch = countDownLatch;
        this.function = function;
        this.request = request;
    }

    @Override
    public R call() {
        try {
            success = false;
            R result = function.apply(request);
            success = true;
            return result;
        } finally {
            countDownLatch.countDown();
        }
    }

    public boolean isSuccess() {
        return success;
    }
}

Regardless of whether the call is successful or abnormal, we will decrement the counter by one after the end. When the counter is reduced to 0, it means that all concurrent calls are completed. Otherwise, if the counter does not return to zero within the specified time, it means that the concurrent call has timed out, and an exception will be thrown at this time.

potential problems

One problem with concurrent calls is that we amplify the traffic accessing downstream interfaces, even hundreds of times in extreme cases. If the downstream service does not take defensive measures such as current limit, we are very likely to shut down the downstream service (the failure caused by this reason is not uncommon). Therefore, it is necessary to control the flow of the entire concurrent call. There are two methods of flow control. One is that if the microservice adopts the mesh mode, you can configure the QPS called by RPC in the sidecar, so as to achieve global control of the access to the downstream services (here, select the single-machine flow limit or the cluster limit The flow depends on the mode supported by the sidecar and the size of the service traffic. Generally speaking, if the average traffic is small, it is recommended to choose the single-machine current limit, because the volatility of the cluster current limit is often higher than that of the single-machine current limit. If the traffic is too small, it will cause misjudgment. ). If the mesh is not turned on, you need to implement the current limiter yourself in the code. Guava's RateLimiter class is recommended here, but it only supports single-machine current limiting. If you want to implement cluster current limiting, the complexity of the solution will be further increased.

summary

Abstracting the scenes encountered in project development and giving a general solution as much as possible is an important way for each of us to develop ourselves, and it is also a weapon to improve code reusability and stability. Concurrent Rpc calls are a common solution. I hope the implementation of this article can be helpful to you.


raledong
2.7k 声望2k 粉丝

心怀远方,负重前行