Java multi-threaded study notes (6) Changle Weiyang

Suddenly I found that my multi-threaded series was running out of topics: first encounter, acquaintance, very happy, long-term never tire, Changle Wuji, Changle Weiyang. Calculate your own multithreading related articles are:
"When we talk about multithreading and high concurrency"
"Java Multithreading Study Notes (1) First Encounter"
"Java Multithreaded Study Notes (2) Acquaintance"
"Java Multithreading Study Notes (3) Very Happy"
"Java multi-threaded study notes (4) never get tired of it"
"Java multi-threaded study notes (5) Changle Promise"
"ThreadLocal Study Notes"
Today should be the final chapter of multi-threading study notes. In this article, the basic concepts of multi-threading in JDK should be roughly used. In fact, there are also concurrent collections and parallel streams that have not been introduced.
I will focus on the implementation of concurrent collections, that is, the principles of some basic classes introduced in the above article, because there is not much difference between the use of concurrent collections and ordinary collections, this is also the next series of articles, which is the source code series of articles , started last October "When we talk about looking at the source code, what are we looking at", it's time to fill in this hole, and the parallel stream is still in the Stream series of articles. These articles are probably all in Nuggets and Sifu, and they are not unified. When there is time, the articles on the three platforms will be unified below. If you have found the above articles on the official account, the migration has been roughly completed.

Introduction to Fork Join Mode

The protagonist of this article is ForkJoin, and the author is the work of Doug Lea. When watching ForkJoinPool, I wanted to see other concurrent authors on a whim, and found that the following does not include undiscovered

CountDownLatch
CyclicBarrier
Semaphore
ThreadPoolExecutor
Future
Callable
ConcurrentHashMap
CopyOnWriteArrayList

It seems that the concurrent class library in the JDK is built by this old man. When I read the works of this old man, I also found that I missed the class: Phaser, so there will be a supplementary article, and the next one is multi-threaded. The series will bring Phaser's complement back. Having said that, let's introduce ForkJoin. Using ForkJoin to search globally in IDEA, the results are as follows:

ForkJoin

Let's first look at the inheritance class diagram of ForkJoinPool:

ForkJoinPool继承类图

We can see here that ForkJoinPool and ThreadPoolExecutor are at the same level. ThreadPoolExecutor is a thread pool, which we are familiar with, so we can speculate that ForkJoinPool is another type of thread pool. So what is the difference between this ForkJoinPool and ThreadPoolExecutor? With this question in mind, let's look at the comments of ForkJoinPool. Note that since ForkJoinPool and ThreadPoolExecutor are both subclasses of ExecutorService, the comments of ForkJoinPool will not say that ThreadPoolExecutor is another type of thread pool (Thread Pool, but another form of ExecutorService).

An ExecutorService for running ForkJoinTasks. A ForkJoinPool provides the entry point for submissions from non-ForkJoinTask clients, as well as management and monitoring operations.
A ForkJoinPool differs from other kinds of ExecutorService mainly by virtue of employing work-stealing: all threads in the pool attempt to find and execute tasks submitted to the pool and/or created by other active tasks (eventually blocking waiting for work if none exist). This enables efficient processing when most tasks spawn other subtasks (as do most ForkJoinTasks), as well as when many small tasks are submitted to the pool from external clients. Especially when setting asyncMode to true in constructors, ForkJoinPools may also be appropriate for use with event-style tasks that are never joined. All worker threads are initialized with Thread.isDaemon set true.
ForkJoinPool, an asynchronous execution service (or translated as this ExecutorService to execute some ForkJoinTasks), executes ForkJoinTask (Fork: fork, fork two branches, Join is merge), so ForkJoinTask can be understood as splitting and merging tasks, and can also perform some different tasks. This type of task, manages monitoring operations.
ForkJoinPool is mainly different from other types of ExecutorService in that it adopts a work-stealing algorithm: all threads in the thread pool will try to find and execute the tasks submitted to the thread and other unfinished tasks (if there are no tasks, all The thread will be in blocking waiting state). This processing method is very efficient for some task types that can divide large tasks into subtask processing and submit small tasks to the thread pool with other clients (that is, ForkJoin tasks). ForkJoinPool may also be more suitable for event-driven. type tasks that never need to merge results. All worker threads are set as daemon threads when they are initialized.

Introduction to Work Stealing Algorithms

Before introducing the work stealing algorithm, let's recall the working mode of ThreadPoolExecutor. When the client submits a task to the thread pool, the thread pool will first determine whether the current worker thread is less than the number of core threads. If it is less than the number of core threads, it will continue Add worker threads to the thread. If it is not less than the number of core threads, the task will be placed in the task queue. If the queue is full, determine whether the current worker thread is greater than the maximum number of threads. If it is greater than or equal to the maximum number of threads, then A rejection policy will be triggered.

// ThreadPoolExecutor的execute方法 
public void execute(Runnable command) {
        if (command == null)
            throw new NullPointerException();
        int c = ctl.get(); //ctl可以简单的理解为线程的状态量
        // workerCountOf 用于计算工作线程的数量
        if (workerCountOf(c) < corePoolSize) {
            // addWorker 用于的第二个参数用于指定添加的是否是核心线程
            if (addWorker(command, true))
                return;
            c = ctl.get();
        }
        // 如果线程处于运行状态且向任务队列添加任务失败,则尝试添加非核心线程
        if (isRunning(c) && workQueue.offer(command)) {
            int recheck = ctl.get();
            if (! isRunning(recheck) && remove(command))
                reject(command);
            else if (workerCountOf(recheck) == 0)
                addWorker(null, false);
        }
        // 添加失败则触发拒绝策略
        else if (!addWorker(command, false))
            reject(command);
    }

In ThreadPoolExecutor, the tasks in the task queue are obtained by directly calling the poll and take methods of the blocking queue, but in order to avoid the problem of multi-threaded consumption, the blocking queue is processed by locking when it is acquired:

 public E poll(long timeout, TimeUnit unit) throws InterruptedException {
        long nanos = unit.toNanos(timeout);
         // 我们的老朋友ReentrantLock
        final ReentrantLock lock = this.lock;
        lock.lockInterruptibly();
        try {
            while (count == 0) {
                if (nanos <= 0L)
                    return null;
                nanos = notEmpty.awaitNanos(nanos);
            }
            return dequeue();
        } finally {
            lock.unlock();
        }
    }

If possible, we want the thread to wait as short as possible when acquiring tasks from the task queue, then we need to prepare multiple queues, one for each thread, so that a consumer finishes processing itself When the tasks of the queue are processed, they can "steal" tasks from the queues corresponding to other worker threads for processing, so that the worker threads will not be idle and the burden of other consumer threads can be reduced. This is the idea behind the work-stealing algorithm. So there's nothing wrong with the job-stealing algorithm? If so, ThreadPoolExecutor seems to be @Deprecated early after ForkJoinPool comes out. It hasn't been there yet, so ForkJoinPool must have its adaptation scene. I shouldn't be the only one with this question, I found the answer I wanted on StackOverFlow:

ThreadPoolExecutor vs ForkJoinPool: stealing subtasks

Assume that I have created ThreadPoolExecutor with 10 threads and 3000 Callable tasks have been submitted. How these threads share the load of execution of sub tasks?
And How ForkJoin pool behaves differently for same use case?
Assuming that ThreadPoolExecutor has 10 threads and submits 3000 tasks to ThreadPoolExecutor, how will these threads cooperate to execute these tasks?
What is the difference between ForkJoinPool in the same situation?

Answer 1: If you submit 3000 tasks to the thread pool, and these tasks can no longer be split into subtasks, the behavior of ForkJoinPool and ThreadPoolExecutor is not significantly different: 10 threads execute 10 tasks at a time, until these tasks are executed. The applicable scenario for ForkJoinPool is that you have some tasks, but these tasks can be broken down into small tasks. Another answer is also from the phenomenon of thread starvation when consuming tasks.

My experience here is to cut the granularity of tasks. When we need to parallelize some computing tasks, we will turn to the thread pool and submit tasks to the thread pool instead:

线程池的工作图

Usually, we want the granularity of the task to be as small as possible, so that we can increase the concurrency granularity when submitting it to the thread pool. Suppose a task can obviously not be divided into too large a granularity, and the core of our thread pool There are 10 threads, and there is only one task when it is submitted to the thread pool, so there is only one worker thread in the thread pool, and it is assumed that this task can be divided into five independent subtasks, we will name them A, B, C, D, E, F. The execution time of F is the longest, then the final time of the task may be the execution time of A+B+C+D+E+F, then if we divide it into five independent subtasks, then the final execution time of the task Time is the execution time of the longest task. In this case, the developer can clearly know the minimum granularity of the task and execute it to the ThreadPoolExecutor. But sometimes we may not be able to manually cut the granularity for some tasks. We want to give a granularity to let the program divide according to this granularity, then execute it, and finally combine the results. This is the advantage of ForkJoinPool.

Divide into tasks as Fork:

ForkJoin的Fork

The result of the merge task is Join:

ForkJoin-Join

This is the so-called ForkJoin is also, in fact, we can also just use Fork. Similar to this is merge sort:

归并排序的过程

Is the merge sort process a bit similar to the ForkJoin we talked about above? Let's take a look at the ForkJoin mode through an example.

then use

ForkJoinPool has four constructors:

1. ForkJoinPool() 
2. ForkJoinPool(int parallelism) 
3.ForkJoinPool(int parallelism,ForkJoinWorkerThreadFactory factory,UncaughtExceptionHandler handler, boolean asyncMode)
4.ForkJoinPool(int parallelism, ForkJoinWorkerThreadFactory factory, UncaughtExceptionHandler handler, int mode,  String workerNamePrefix)
5. ForkJoinPool(int parallelism, ForkJoinWorkerThreadFactory factory, UncaughtExceptionHandler handler,  boolean asyncMode,    int corePoolSize,
 int maximumPoolSize,
 int minimumRunnable,
 Predicate<? super ForkJoinPool> saturate,
 long keepAliveTime,
  TimeUnit unit) // 自JDK9 引入

1.2.3 is essentially calling 4. Let's focus on the next 4.

parallelism parallel granularity, that is, setting the number of threads in the thread pool.

Use no-argument construction, the return value of Runtime.getRuntime().availableProcessors(), this method returns logical cores instead of physical cores. My computer is octa-core 16-core, so the number I get is 16

factory thread project, call this method to add worker threads when adding worker threads
handler exception handler, how to deal with problems encountered by the worker thread.
asyncMode is used to control consumption mode

if true, establishes local first-in-first-out scheduling mode for forked tasks that are never joined. This mode may be more appropriate than default locally stack-based mode in applications in which worker threads only process event-style asynchronous tasks. For default value, use false.
If true, fork tasks are scheduled in a first-in, first-out manner, these will never be merged. This scheduling mode is more suitable for some local stack-based applications, where these worker threads handle event-driven asynchronous tasks. Defaults to false.

workerNamePrefix: The name used to control the worker thread

Next we look at how to submit tasks to ForkJoinPool, we see a new parameter type: ForkJoinTask. Let's first introduce ForkJoin with an example.

public class FibonacciForkJoinTask extends RecursiveTask<Integer> {

    private final int n;

    public FibonacciForkJoinTask(int n) {
        this.n = n;
    }
    @Override
    protected Integer compute() {
        if (n <= 1){
            return n;
        }
        FibonacciForkJoinTask f1 = new FibonacciForkJoinTask(n - 1);
        f1.fork(); // 分解f1
        FibonacciForkJoinTask f2 = new FibonacciForkJoinTask(n - 2);
        f2.fork(); // 分解f2
        return  f1.join() + f2.join(); // 合并f1和f2的计算结果
    }
}
public class ForkJoinDemo01 {
    public static void main(String[] args) throws Exception {
        ForkJoinPool forkJoinPool = new ForkJoinPool();
        ForkJoinTask<Integer> forkJoinTaskResult = forkJoinPool.submit(new FibonacciForkJoinTask(4));
        System.out.println(forkJoinTaskResult.get());
    }
}

Overview of ForkJoinTask and its subclasses

ForkJoinTask的子类

ForkJoinTask is the base class for tasks submitted to ForkJoinPool, which is an abstract class. RecursiveTask and RecursiveAction are known by their names and are used to recursively cut the task granularity. The difference is that RecursiveTask has a return value, and RecursiveAction has no return value. RecursiveTask and RecursiveAction 1.7 were launched. CountedCompleter has a hook function onCompletion(CountedCompleter<?> caller), which will trigger this method when all tasks are completed. The above calculation of the Fibonacci sequence with Fork/Join is just to demonstrate the usage, and the single thread runs very fast, and there are also faster algorithms. Parallel streams also use ForkJoinPool by default. It is recommended to use commonPool to use ForkJoinPool.

in conclusion

Our understanding of things is a gradual and clear process. This article will be regarded as an introductory article for ForkJoin. We also struggled hard to understand what the fork method does, but finally gave up. Not all scenarios ForkJoin can be competent. If we can better control the task granularity, then there is no difference between the execution speed of ForkJoinPool and ThreadPoolExecutor. You can also just fork without Join.

References

"Java Multithreaded Programming Practical Guide" 2nd Edition by Huang Wenhai
ThreadPoolExecutor vs ForkJoinPool: stealing subtasks https://stackoverflow.com/questions/33448465/threadpoolexecutor-vs-forkjoinpool-stealing-subtasks
ForkJoin is applied in practice https://juejin.cn/post/6983953239049764871
What are the commonly used optimization strategies for Stack-based virtual machines? -From Zhihu

Java multi-threaded study notes (6) Changle Weiyang

Introduction to Fork Join Mode

Introduction to Work Stealing Algorithms

then use

Overview of ForkJoinTask and its subclasses

in conclusion

References

北冥有只鱼

引用和评论

深入Java IO:文件读写原理（一）

💢线上高延迟请求排查

这些年

Spring-@Configuration注解简析

单元测试-PowerMock

还在用命令行监控服务器？试试这款监控工具吧，直观又易用！

Java灵魂拷问13个为什么，你都会哪些？