What is the Fork/Join framework
Fork/Join
framework is a JDK 7
. It is used for parallel execution of a large task into multiple small tasks for parallel execution, and finally summarizes the results of each small task to obtain the special task of the large task result. It is easy to see through its naming that the framework is mainly divided into
Fork
and Join
. The first stage Fork
is to split a large task into multiple subtasks for parallel execution, and the second stage Join
is to merge all executions of these subtasks. As a result, the result of the big task was finally obtained.
It is not difficult to find the main process of its execution here: first determine whether a task is small enough, if the task is small enough, then directly calculate, otherwise, it is divided into several smaller tasks for calculation separately, this process can be repeatedly divided into one Series of small tasks. Fork/Join
framework is an algorithm based on divide and conquer . It splits a large task into multiple independent small tasks, then executes these small tasks in parallel, and finally merges the results of the small tasks to obtain the final result of the large task. Improve efficiency.
Fork/Join framework usage example
Let's take a look at how the Fork/Join
calculating the sum of all elements in the list. The general idea is: divide this list into many sublists, and then sum the elements of each sublist, and then we Then calculate the sum of all these values to get the sum of the original list.
Fork/Join
framework defined ForkJoinTask
to represent a Fork/Join
task, which provides fork()
, join()
and other operations, under normal circumstances, we do not need the direct successor to the ForkJoinTask
class, but the use of two framework provided ForkJoinTask
subclass:
- RecursiveAction used to represent the
Fork/Join
taskdid not return a result.
- RecursiveTask used to represent the
Fork/Join
taskhas returned results.
Obviously, in this example, the result needs to be returned. You can define the SumAction
class to inherit from RecursiveTask
, the code is as follows:
/**
* @author mghio
* @since 2021-07-25
*/
public class SumTask extends RecursiveTask<Long> {
private static final int SEQUENTIAL_THRESHOLD = 50;
private final List<Long> data;
public SumTask(List<Long> data) {
this.data = data;
}
@Override
protected Long compute() {
if (data.size() <= SEQUENTIAL_THRESHOLD) {
long sum = computeSumDirectly();
System.out.format("Sum of %s: %d\n", data.toString(), sum);
return sum;
} else {
int mid = data.size() / 2;
SumTask firstSubtask = new SumTask(data.subList(0, mid));
SumTask secondSubtask = new SumTask(data.subList(mid, data.size()));
// 执行子任务
firstSubtask.fork();
secondSubtask.fork();
// 等待子任务执行完成,并获取结果
long firstSubTaskResult = firstSubtask.join();
long secondSubTaskResult = secondSubtask.join();
return firstSubTaskResult + secondSubTaskResult;
}
}
private long computeSumDirectly() {
long sum = 0;
for (Long l : data) {
sum += l;
}
return sum;
}
public static void main(String[] args) {
Random random = new Random();
List<Long> data = random
.longs(1_000, 1, 100)
.boxed()
.collect(Collectors.toList());
ForkJoinPool pool = new ForkJoinPool();
SumTask task = new SumTask(data);
pool.invoke(task);
System.out.println("Sum: " + pool.invoke(task));
}
}
Here, when the list size is less than SEQUENTIAL_THRESHOLD
variable, it is regarded as a small task, and the result of the summation list elements is directly calculated, otherwise it is split into small tasks again, and the running results are as follows:
Through this sample code, we can find that ForkJoinTask
Fork/Join
framework and the usual general task is: ForkJoinTask
needs to implement the abstract method compute()
to define the calculation logic. The general implementation template in this method is to first determine the current task Whether it is a small task, if it is, execute the execution task, if it is not a small task, then split into two subtasks again, and then when each subtask calls the fork()
method, it will enter the compute()
method again to check whether the current task needs it Then split into subtasks, if it is a small task, execute the current task and return the result, otherwise continue to split, and finally call the join()
method to wait for the completion of all subtasks and obtain the execution result. The pseudo code is as follows:
if (problem is small) {
directly solve problem.
} else {
Step 1. split problem into independent parts.
Step 2. fork new subtasks to solve each part.
Step 3. join all subtasks.
Step 4. compose result from subresults.
}
Fork/Join frame design
Fork/Join
core idea of the 060fd6e6ecd4b8 framework is to split a large task into several small tasks, and then summarize the results of each small task to finally get the result of the large task. If you were to design such a framework, how would you achieve it? (It is recommended to think about it), Fork/Join
framework, as the name suggests, is divided into two steps:
- large task segmentation requires such a class to split a large task into subtasks. Maybe the subtasks after a split are still relatively large and need to be split multiple times until the subtasks that are split meet ours The defined task ends.
- executes the task and merges the task result split in the first step are stored in double-ended queue (for PS here, please see the following for why the double-ended queue is used), and then each queue starts a thread from Get task execution in the queue. The execution results of these subtasks will be placed in a unified queue, and then a thread will be started to get data from this queue, and finally the data will be merged and returned.
Fork/Join
framework uses the following two classes to complete the above two steps:
- ForkJoinTask class is also mentioned in the above example, which means the
ForkJoin
task. When using the framework, you must first define the task. Usually you only need to inherit the subclassRecursiveAction
ForkJoinTask
class (no result returned byRecursiveTask
). - ForkJoinPool can also guess one or two from the name, which is the thread pool
ForkJoinTask
The subtasks split from the big task will be added to the headdeque of the current thread.
If you like to think, you must think of such a scenario in your mind. When we need to complete a large task, we will first split the large task into multiple independent subtasks, and these subtasks will be placed in independent queues. And create a separate thread for each queue to execute the tasks in the queue, that is, there is a one-to-one relationship between the thread and the queue, so when some threads may execute the tasks in their own queue first, and some The thread is not completed, which causes some threads that have completed the task to wait. This is a good problem.
Since it is concurrency, the performance of the computer must be squeezed to the greatest extent. For this scenario, the concurrency master Doug Lea uses the work-stealing algorithm process. After using the work-stealing algorithm, the thread that will complete the task in the queue first Go to the queue of other threads to "steal" a task to execute, haha, one party has difficulties, all parties support it. But this time hold the thread and this thread will queue to access the same queue, so in order to
reduce theft task threads and competition between the stolen task thread,
ForkJoin
chose deque this data structure, so Tasks can be executed according to this rule: the thread of the stolen task always gets the task from the head of the queue and executes it, and the thread that steals the task uses the task to execute from the tail of the queue. This algorithm can make full use of multithreading for parallel computing in most cases, but there will still be a certain degree of competition in extreme cases such as there is only one task in the deque.
Fork/Join framework implementation principle
Fork/Join
achieve the core framework is ForkJoinPool
class, the class is an important part of ForkJoinTask
arrays and ForkJoinWorkerThread
array, where ForkJoinTask
array used to store user framework to be submitted to the ForkJoinPool
task, ForkJoinWorkerThread
array is responsible for these tasks. The task has the following four states:
- NORMAL completed
- CANCELLED was cancelled
- SIGNAL signal
- EXCEPTIONAL An exception occurred
Let's take a look at the core method implementation principles of these two classes. First, ForkJoinTask
's fork()
method of 060fd6e6ecd799. The source code is as follows:
Method For ForkJoinWorkerThread
, it will first call ForkJoinWorkerThread
of workQueue
, push()
method to perform this task asynchronously, and then return the result immediately. Continue to follow up
ForkJoinPool
of push()
method, source code is as follows:
Method to add to the current task ForkJoinTask
task queue array, and then call ForkJoinPool
of signalWork
way to create or wake up a worker thread to perform this task. Then look at ForkJoinTask
of join()
method, the method of source code as follows:
The method first calls the doJoin()
method, which returns the status of the current task, and performs different processing according to the returned task status:
- The result is returned directly if the status is completed
- If the state is cancelled, an exception will be thrown directly (
CancellationException
) - If an abnormal state occurs, the corresponding exception will be thrown directly
Continue to follow up the doJoin()
method, the source code of the method is as follows:
The method first judges whether the current task status has been executed, and if the execution is completed, it directly returns to the task status. If the execution is not completed, workQueue
) and execute it. If the task execution is completed, set the task status to NORMAL
, if an exception occurs, record the exception and set the task status to EXCEPTIONAL
(in the doExec()
method).
Summarize
This article mainly introduces the basic principles of the Fork/Join
Java
work stealing algorithm (
work-stealing
), the design method and part of the source code. Fork/Join
framework is also used in the official standard library of JDK
For example JDK 1.8+
standard library Arrays.parallelSort(array)
may be parallel sorting, it works through the inside Fork/Join
parallel sort large arrays split frame, you can increase the speed of sorting, as well as collection Collection.parallelStream()
method is also based on underlying Fork/Join
framework implemented, Finally, the threshold for defining small tasks is often given through testing and verification to ensure that the program can achieve the best performance.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。