3
头图

What is the Fork/Join framework

Fork/Join framework is a JDK 7 . It is used for parallel execution of a large task into multiple small tasks for parallel execution, and finally summarizes the results of each small task to obtain the special task of the large task result. It is easy to see through its naming that the framework is mainly divided into Fork and Join . The first stage Fork is to split a large task into multiple subtasks for parallel execution, and the second stage Join is to merge all executions of these subtasks. As a result, the result of the big task was finally obtained.

It is not difficult to find the main process of its execution here: first determine whether a task is small enough, if the task is small enough, then directly calculate, otherwise, it is divided into several smaller tasks for calculation separately, this process can be repeatedly divided into one Series of small tasks. Fork/Join framework is an algorithm based on divide and conquer . It splits a large task into multiple independent small tasks, then executes these small tasks in parallel, and finally merges the results of the small tasks to obtain the final result of the large task. Improve efficiency.

Fork/Join framework usage example

Let's take a look at how the Fork/Join calculating the sum of all elements in the list. The general idea is: divide this list into many sublists, and then sum the elements of each sublist, and then we Then calculate the sum of all these values to get the sum of the original list. Fork/Join framework defined ForkJoinTask to represent a Fork/Join task, which provides fork() , join() and other operations, under normal circumstances, we do not need the direct successor to the ForkJoinTask class, but the use of two framework provided ForkJoinTask subclass:

  • RecursiveAction used to represent the Fork/Join task did not return a result.
  • RecursiveTask used to represent the Fork/Join task has returned results.

Obviously, in this example, the result needs to be returned. You can define the SumAction class to inherit from RecursiveTask , the code is as follows:

/**
 * @author mghio
 * @since 2021-07-25
 */
public class SumTask extends RecursiveTask<Long> {

  private static final int SEQUENTIAL_THRESHOLD = 50;

  private final List<Long> data;

  public SumTask(List<Long> data) {
    this.data = data;
  }

  @Override
  protected Long compute() {
    if (data.size() <= SEQUENTIAL_THRESHOLD) {
      long sum = computeSumDirectly();
      System.out.format("Sum of %s: %d\n", data.toString(), sum);
      return sum;
    } else {
      int mid = data.size() / 2;
      SumTask firstSubtask = new SumTask(data.subList(0, mid));
      SumTask secondSubtask = new SumTask(data.subList(mid, data.size()));
      // 执行子任务
      firstSubtask.fork();
      secondSubtask.fork();
      // 等待子任务执行完成,并获取结果
      long firstSubTaskResult = firstSubtask.join();
      long secondSubTaskResult = secondSubtask.join();
      return firstSubTaskResult + secondSubTaskResult;
    }
  }

  private long computeSumDirectly() {
    long sum = 0;
    for (Long l : data) {
      sum += l;
    }
    return sum;
  }

  public static void main(String[] args) {
    Random random = new Random();

    List<Long> data = random
        .longs(1_000, 1, 100)
        .boxed()
        .collect(Collectors.toList());

    ForkJoinPool pool = new ForkJoinPool();
    SumTask task = new SumTask(data);
    pool.invoke(task);

    System.out.println("Sum: " + pool.invoke(task));
  }
}

Here, when the list size is less than SEQUENTIAL_THRESHOLD variable, it is regarded as a small task, and the result of the summation list elements is directly calculated, otherwise it is split into small tasks again, and the running results are as follows:

1.png

Through this sample code, we can find that ForkJoinTask Fork/Join framework and the usual general task is: ForkJoinTask needs to implement the abstract method compute() to define the calculation logic. The general implementation template in this method is to first determine the current task Whether it is a small task, if it is, execute the execution task, if it is not a small task, then split into two subtasks again, and then when each subtask calls the fork() method, it will enter the compute() method again to check whether the current task needs it Then split into subtasks, if it is a small task, execute the current task and return the result, otherwise continue to split, and finally call the join() method to wait for the completion of all subtasks and obtain the execution result. The pseudo code is as follows:

if (problem is small) {
  directly solve problem.
} else {
  Step 1. split problem into independent parts.
  Step 2. fork new subtasks to solve each part.
  Step 3. join all subtasks.
  Step 4. compose result from subresults.
}

Fork/Join frame design

Fork/Join core idea of the 060fd6e6ecd4b8 framework is to split a large task into several small tasks, and then summarize the results of each small task to finally get the result of the large task. If you were to design such a framework, how would you achieve it? (It is recommended to think about it), Fork/Join framework, as the name suggests, is divided into two steps:

  1. large task segmentation requires such a class to split a large task into subtasks. Maybe the subtasks after a split are still relatively large and need to be split multiple times until the subtasks that are split meet ours The defined task ends.
  2. executes the task and merges the task result split in the first step are stored in double-ended queue (for PS here, please see the following for why the double-ended queue is used), and then each queue starts a thread from Get task execution in the queue. The execution results of these subtasks will be placed in a unified queue, and then a thread will be started to get data from this queue, and finally the data will be merged and returned.

Fork/Join framework uses the following two classes to complete the above two steps:

  • ForkJoinTask class is also mentioned in the above example, which means the ForkJoin task. When using the framework, you must first define the task. Usually you only need to inherit the subclass RecursiveAction ForkJoinTask class (no result returned by RecursiveTask ).
  • ForkJoinPool can also guess one or two from the name, which is the thread pool ForkJoinTask The subtasks split from the big task will be added to the head deque of the current thread.

If you like to think, you must think of such a scenario in your mind. When we need to complete a large task, we will first split the large task into multiple independent subtasks, and these subtasks will be placed in independent queues. And create a separate thread for each queue to execute the tasks in the queue, that is, there is a one-to-one relationship between the thread and the queue, so when some threads may execute the tasks in their own queue first, and some The thread is not completed, which causes some threads that have completed the task to wait. This is a good problem.

Since it is concurrency, the performance of the computer must be squeezed to the greatest extent. For this scenario, the concurrency master Doug Lea uses the work-stealing algorithm process. After using the work-stealing algorithm, the thread that will complete the task in the queue first Go to the queue of other threads to "steal" a task to execute, haha, one party has difficulties, all parties support it. But this time hold the thread and this thread will queue to access the same queue, so in order to reduce theft task threads and competition between the stolen task thread, ForkJoin chose deque this data structure, so Tasks can be executed according to this rule: the thread of the stolen task always gets the task from the head of the queue and executes it, and the thread that steals the task uses the task to execute from the tail of the queue. This algorithm can make full use of multithreading for parallel computing in most cases, but there will still be a certain degree of competition in extreme cases such as there is only one task in the deque.

2.png

Fork/Join framework implementation principle

Fork/Join achieve the core framework is ForkJoinPool class, the class is an important part of ForkJoinTask arrays and ForkJoinWorkerThread array, where ForkJoinTask array used to store user framework to be submitted to the ForkJoinPool task, ForkJoinWorkerThread array is responsible for these tasks. The task has the following four states:

  • NORMAL completed
  • CANCELLED was cancelled
  • SIGNAL signal
  • EXCEPTIONAL An exception occurred

Let's take a look at the core method implementation principles of these two classes. First, ForkJoinTask 's fork() method of 060fd6e6ecd799. The source code is as follows:

6.png

Method For ForkJoinWorkerThread , it will first call ForkJoinWorkerThread of workQueue , push() method to perform this task asynchronously, and then return the result immediately. Continue to follow up ForkJoinPool of push() method, source code is as follows:

8.png

Method to add to the current task ForkJoinTask task queue array, and then call ForkJoinPool of signalWork way to create or wake up a worker thread to perform this task. Then look at ForkJoinTask of join() method, the method of source code as follows:

3.png

4.png

The method first calls the doJoin() method, which returns the status of the current task, and performs different processing according to the returned task status:

  1. The result is returned directly if the status is completed
  2. If the state is cancelled, an exception will be thrown directly ( CancellationException )
  3. If an abnormal state occurs, the corresponding exception will be thrown directly

Continue to follow up the doJoin() method, the source code of the method is as follows:

5.png

The method first judges whether the current task status has been executed, and if the execution is completed, it directly returns to the task status. If the execution is not completed, workQueue ) and execute it. If the task execution is completed, set the task status to NORMAL , if an exception occurs, record the exception and set the task status to EXCEPTIONAL (in the doExec() method).

Summarize

This article mainly introduces the basic principles of the Fork/Join Java work stealing algorithm ( work-stealing ), the design method and part of the source code. Fork/Join framework is also used in the official standard library of JDK For example JDK 1.8+ standard library Arrays.parallelSort(array) may be parallel sorting, it works through the inside Fork/Join parallel sort large arrays split frame, you can increase the speed of sorting, as well as collection Collection.parallelStream() method is also based on underlying Fork/Join framework implemented, Finally, the threshold for defining small tasks is often given through testing and verification to ensure that the program can achieve the best performance.


mghio
446 声望870 粉丝