1
头图

Hello, I am crooked.

A few days ago, I saw a bug in the JDK thread pool. I went to understand it and figured out the crux of it. I think this bug belongs to an unreasonable design of the thread pool method, and the official is aware of this bug. Afterwards, he said: It is indeed a bug, but I will not fix it. You can treat it as a feature.

Before taking you through this bug, let me ask a question:

What are the thread pool rejection policies that come with JDK?

This thing, the old eight-legged essay, has existed for longer than I have been in the industry, and it comes when I open my mouth:

  • AbortPolicy: Abort the task and throw a RejectedExecutionException, which is the default policy.
  • DiscardOldestPolicy: Discard the task at the front of the queue and execute the following tasks
  • CallerRunsPolicy: The task is handled by the calling thread
  • DiscardPolicy: It also discards tasks, but does not throw exceptions, which is equivalent to silent processing.

One of the trigger conditions of this bug this time is hidden in this DiscardPolicy.

But when you look at the source code, this thing is just an empty method, what kind of bug can it have?

It's wrong because it's an empty method that silently handles exceptions.

Don't worry, wait for me to put it to you slowly.

What BUG?

The link corresponding to the bug is this:

https://bugs.openjdk.org/browse/JDK-8286463

The title probably says: Oh, my old folks, listen to me, I found that when the thread pool's rejection policy DiscardPolicy encounters the invokerAll method, it may cause the thread to block all the time.

Then in the description part of the BUG mainly pay attention to these two paragraphs:

These two paragraphs reveal two messages:

  • 1. This bug has been raised before.
  • 2. Doug and Martin are also aware of this bug, but they feel that users can avoid the problem of forever blocking by coding.

So we have to go to the place where this bug first appeared first. That is this link:

https://bugs.openjdk.org/browse/JDK-8160037

From the title, these two problems are very similar, both have invokerAll and block, but the trigger conditions are different.

One is the DiscardPolicy rejection policy and the other is the shutdownNow method.

So my strategy is to take you to understand the shutdownNow method first, so that you can better understand the problems caused by DiscardPolicy.

Essentially, they say the same thing.

Phenomenon

In the description of this bug related to shutdownNow, the questioner gave his test case, I changed it a little, and just used it.

https://bugs.openjdk.org/browse/JDK-8160037

The code is posted here, you can also run it locally:

 public class MainTest {

    public static void main(String[] args) throws InterruptedException {
        
        List<Callable<Void>> tasks = new ArrayList<>();
        for (int i = 0; i < 10; i++) {
            int finalI = i;
            tasks.add(() -> {
                System.out.println("callable "+ finalI);
                Thread.sleep(500);
                return null;
            });
        }

        ExecutorService executor = Executors.newFixedThreadPool(2);
        Thread executorInvokerThread = new Thread(() -> {
            try {
                executor.invokeAll(tasks);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            System.out.println("invokeAll returned");
        });
        executorInvokerThread.start();
    }
}

Then explain to you what the test code is doing.

First of all, the place marked ① is stuffed with 10 callable type tasks into the list.

What are you doing with so many tasks?

It must be thrown into the thread pool, right?

Therefore, in the place labeled ②, a thread pool with 2 threads and core threads is created. The invokerAll method of the thread pool is called in the thread:

What does this method do?

Executes the given tasks, returning a list of Futures holding their status and results when all complete.

Executes the given set of tasks, returning a list of Futures containing their status and results when all tasks are completed.

That is to say, when the thread is started, the thread pool will execute the tasks in the list one by one, and return a Futures list after the execution is completed.

When we write code, we can know whether this batch of tasks has been completed by holding this list.

But, friends, but ah, notice, you see in my case I don't care about the return value of the invokerAll method at all.

What I care about is the output of this sentence after the invokerAll method is executed:

invokeAll returned

OK, now you say what's wrong with this program running?

You can't see it, can you?

I can't see it either, because it doesn't have anything wrong at all, and the program ends normally:

Next, I modified the program to look like this, adding these lines of code labeled ③:

What is called here is the shutdown method of the thread pool, the purpose is to wait for the thread pool to process the task and let the program exit.

Come on, what's wrong with this program running?

You sure didn't look too bad, did you?

Neither did I, because there was nothing wrong with it at all, and the program could end normally:

Well, next, I'm going to start deforming again.

The program becomes like this:

Note that I am using the shutdownNow method here, which means that I want to close the previous thread pool immediately, and then let the whole program exit.

So what's wrong with this program?

It's really a problem, it's really hard to see with the naked eye, but we can look at the running results first:

The results are still very good to observe.

There is no output of "invokeAll returned" and the program does not exit.

Then the question arises: Are you saying this is a bug?

No matter what the reason is, from the phenomenon, this is properly a bug, right?

I have called shutdownNow, what I thought is to immediately close the thread pool, and then let the whole program exit. As a result, the task is indeed not executed, but the program does not exit, which is inconsistent with our expectations.

So, be bold, this is a bug!

Here's another output comparison chart about the shutdownNow and shutdown methods, which is more intuitive:

As for the difference between these two methods, I won't talk about it. If you don't know, go to the Internet and read it by heart.

Anyway, the bug has been stably reproduced now.

The next step is to find the root cause.

Root cause

How to find the root cause?

You think about this first: the program should exit but does not exit, does that mean there are still threads running, more precisely non-daemon threads running?

By the way, it's easy to think about it here.

Look at the thread stack.

What do you think?

Cameras, friends. Our old buddy, who has often appeared in previous articles, just this:

With just one tap, you can see that there's a thread that's not right:

It is in the WAITING state, and the code that caused it to enter this state can be located at a glance through the stack information, which is line 244 of the invokeAll method, which is this line of code:

at java.util.concurrent.AbstractExecutorService.invokeAll(AbstractExecutorService.java:244)

Since the problem lies in the invokeAll method, you have to understand what this method is doing.

The source code is not complicated, I mainly focus on this part I framed:

The place labeled ① is to encapsulate the incoming task as a Future object, put it in a List first, and then call the execute method, that is, throw it into the thread pool for execution.

This operation is especially like calling the submit() method of the thread pool directly. Let me compare it to you:

The place marked with ② is the List of Futures in front of the loop. If the Futures are not executed, the get method of Futures will be called to block and wait for the results.

From the stack information, the thread is blocked in the get method of the Future, indicating that the Future has not been executed.

Why wasn't it executed?

OK, let's go back to this place for the test code:

10 tasks are thrown into the thread pool with 2 core threads.

Are there two that can be executed by the threads in the thread pool, and the remaining 8 enter the queue?

Okay, let me ask you: after calling shutdownNow, will the worker thread just die? Do the remaining 8 have no resources to execute?

Having said that, even if only 1 task is not executed? Does future.get() in the invokeAll method also block?

But, friends, but ah, in the case of such a clear bug, the above case was actually overturned by the authorities.

What's going on?

Let's take a look at the official reply.

Oh, sorry, it's not a big guy, it's the reply from the official giants Martin and Doug:

Martin said: Old Tie, I read your code, and it seems okay? Listen to me, the shutdownNow method returns a List of tasks that have not yet been executed. So you have to do something with the return of shutdownNow.

Doug says: Martin is right. An extra word:

that's why they are returned.

They refer to this list. That is to say, the old man took this situation into consideration when he wrote the code, so he returned all the tasks that were not executed to the caller.

Well, the shutdownNow method has a return value, I didn't notice this detail before:

But if you look carefully at this return value, it is a Runnable in a list, it is not a Future, so I cannot call the future.cancel() method.

So after getting this return value, how should I cancel the task?

Good question. Because the questioner also has the following question:

After seeing that the giants said they wanted to operate on the return value, he replied with a confused expression: "Brother, the shutdownNow method returns a List<Runnable>. At least for me, I don't know how to cancel these tasks. Shouldn't it be described in the documentation?

Brother Martin felt that this return was indeed a bit confusing, and he made the following reply:

There are two ways to submit tasks to the thread pool.

If you submit a Runnable task with the execute() method, shutdownNow returns a list of Runnables that have not been executed.

If you submit a Runnable task with the submit() method, it will be encapsulated as a FutureTask object, so calling the shutdownNow method returns a list of FutureTasks that have not been executed:

That is to say, the List collection returned by the shutdownNow method may contain either Runnable or FutureTask, depending on what method you call when you throw tasks into the thread pool.

FutureTask is a subclass of Runnable:

So, based on what Martin said and the code he provided, we can modify the test case to look like this:

Traverse the List collection returned by the shutdownNow method, and then judge whether it is a Future, if so, force it to a Future, and then call its cancel method.

In this way, the program can end normally.

In this way, it seems that it is indeed not a bug, and it can be avoided by coding.

reverse

But, my friends, but ah, the front is all my foreshadowing, and then the plot starts to reverse.

We come back to this link:

https://bugs.openjdk.org/browse/JDK-8286463

This link mentions the DiscardPolicy thread pool rejection policy.

As long as I change our Demo program a little bit to trigger the thread's DiscardPolicy rejection policy, the previous bug is really a bug that cannot be bypassed.

How should it be changed?

Very simple, just change the thread pool:

Replace our previous thread pool with 2 core threads and an infinite queue length with a custom thread pool.

The number of core threads, the maximum number of threads, and the queue length of this custom thread pool are all 1, and the thread rejection policy adopted is DiscardPolicy.

The code in other places does not move, and the whole code becomes like this. I will post the code for you to see, so that you can run it directly:

 public class MainTest {

    public static void main(String[] args) throws InterruptedException {

        List<Callable<Void>> tasks = new ArrayList<>();
        for (int i = 0; i < 10; i++) {
            int finalI = i;
            tasks.add(() -> {
                System.out.println("callable " + finalI);
                Thread.sleep(500);
                return null;
            });
        }
        ExecutorService executor = new ThreadPoolExecutor(
                1,
                1,
                1,
                TimeUnit.SECONDS,
                new ArrayBlockingQueue<>(1),
                new ThreadPoolExecutor.DiscardPolicy()
        );
//        ExecutorService executor = Executors.newFixedThreadPool(2);
        Thread executorInvokerThread = new Thread(() -> {
            try {
                executor.invokeAll(tasks);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            System.out.println("invokeAll returned");
        });
        executorInvokerThread.start();

        Thread.sleep(800);
        System.out.println("shutdown");
        List<Runnable> runnables = executor.shutdownNow();
        for (Runnable r : runnables) {
            if (r instanceof Future) ((Future<?>)r).cancel(false);
        }
        System.out.println("Shutdown complete");
    }
}

Then we first run the program to see the result:

Eh, what's going on?

I clearly dealt with the return value of shutdownNow, why the program does not output "invokeAll returned", and is blocked on the invokeAll method again?

Even if we don't know why the program didn't stop, from the performance point of view, this thing must be a bug, right?

Next, I will take you to analyze why this phenomenon occurs.

First of all, let me ask you, in our case, how many tasks can this thread pool hold at most?

Is it not possible to receive at most 2 tasks?

I can only receive 2 tasks at most. Does it mean that I have 8 tasks that cannot be processed and need to execute the thread pool rejection policy?

But what is our rejection policy?

It is DiscardPolicy, and its implementation is like this, that is, silent processing, discarding tasks, and not throwing exceptions:

OK, here you go again, what is the thing returned by shutdownNow, is it a task that has not yet been executed in the thread pool, that is, the task in the queue?

But there is only one task in the queue at most, and it is useless to return to you to cancel it.

So, this case has nothing to do with not handling the return value of shutdownNow.

The key is that these 8 tasks are rejected, or the key is that the DiscardPolicy rejection policy is triggered.

The effect of triggering once and triggering multiple times is the same. In our custom thread pool with the invokeAll method, as long as any task is silently processed, it will be considered an egg.

Why do you say this way?

Let's first look at the implementation of the default thread pool rejection policy AbortPolicy:

After being refused to execute, it will throw an exception, then execute the finally method, call cancel, and then it will be caught in the invokeAll method, so it will not block:

If it is silently processed, you have nowhere for the silently processed Future to throw an exception, and nowhere to call its cancel method, so it will always block here.

So, this is the bug.

So what is the official response to this bug?

The giant Martin replied: I think it should be explained in the document that the rejection policy of DiscardPolicy is rarely used in real scenarios, and it is not recommended for everyone to use it. Or do you treat it as a feature?

I think the implication is: I know this is a bug, but you have to use DiscardPolicy, a rejection policy that will not be used in actual coding.

I am not satisfied with this reply.

Brother Martin didn't know anything. During our interview, there was an eight-legged essay section, and one of the old eight-legged questions was as follows:

Have you customized the thread pool rejection policy?

If there is some cleverness, when customizing the thread pool rejection policy, write a fancy but equivalent rejection policy of DiscardPolicy.

That is to say, it is not put into the queue, and no exception is thrown. No matter how fancy your code is, there is the same problem.

Therefore, I think it is still the design of the invokeAll method. A Future that cannot be accessed by other threads outside the calling thread should not be designed.

This goes against the design theory of the Future object.

That's why I say it's a bug and a design issue.

What, you ask how should I design?

Sorry, no comment.


why技术
2.2k 声望6.8k 粉丝