Fill in a hole! Let&#39;s talk about the dynamic adjustment of the thread pool.

Hello, I'm crooked.

When chatting with a big guy a few days ago, he said that he was monitoring the thread pool recently and had just completed the development of the dynamic adjustment function.

I remembered that I had written this article before, so I found it out: "How to set the thread pool parameters? Meituan gave an answer that shocked the interviewer. 》

Then pointed out a problem to me, I thought about it carefully, and it seemed that there was indeed a hole left.

In order to better describe this pit, let me review several key points of thread pool dynamic adjustment.

First of all, why do you need to dynamically adjust the parameters of the thread pool?

Because with the development of the business, there may be a situation where a thread pool starts to be enough, but gradually becomes full.

This will result in the rejection of subsequent tasks submitted.

There is no once-and-for-all configuration scheme, and the relevant parameters should float with the float of the system.

Therefore, we can monitor the thread pool in multiple dimensions. For example, one of the dimensions is the monitoring of queue usage.

When the queue usage exceeds 80%, an early warning message will be sent to remind the corresponding person in charge to be vigilant. You can go to the corresponding management background page to adjust the thread pool parameters to prevent the task from being rejected.

When someone asks you how to configure the various parameters of the thread pool, you first recite the answer to this eight-legged essay, which is divided into IO-intensive and CPU-intensive.

Add one: However, in addition to these solutions, I used another set of solutions when actually solving the problem."

Then repeat the above words.

So what are the parameters that can be modified by the thread pool?

Normally, the number of core threads and the maximum number of threads can be adjusted.

The thread pool also directly provides its corresponding set method:

But in fact, there is another key parameter that needs to be adjusted, and that is the length of the queue.

Oh, by the way, to explain, the default queue used in this article is LinkedBlockingQueue .

Its capacity is final modified, which means that it cannot be modified after it is specified:

Therefore, the length of the queue needs to be adjusted a little bit more brainly.

As for how to circumvent the limitation of final, I'll talk about it later, let me give you a code first.

I usually don't post large pieces of code, but why did I post it this time?

Because I found that my previous article was not posted, and the code I wrote before I didn't know where to go.

So, I knocked it again...

import cn.hutool.core.thread.NamedThreadFactory;

import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

public class ThreadChangeDemo {

    public static void main(String[] args) {
        dynamicModifyExecutor();
    }

    private static ThreadPoolExecutor buildThreadPoolExecutor() {
        return new ThreadPoolExecutor(2,
                5,
                60,
                TimeUnit.SECONDS,
                new ResizeableCapacityLinkedBlockingQueue<>(10),
                new NamedThreadFactory("why技术", false));
    }

    private static void dynamicModifyExecutor() {
        ThreadPoolExecutor executor = buildThreadPoolExecutor();
        for (int i = 0; i < 15; i++) {
            executor.execute(() -> {
                threadPoolStatus(executor,"创建任务");
                try {
                    TimeUnit.SECONDS.sleep(5);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            });
        }
        threadPoolStatus(executor,"改变之前");
        executor.setCorePoolSize(10);
        executor.setMaximumPoolSize(10);
        ResizeableCapacityLinkedBlockingQueue<Runnable> queue = (ResizeableCapacityLinkedBlockingQueue)executor.getQueue();
        queue.setCapacity(100);
        threadPoolStatus(executor,"改变之后");
    }

    /**
     * 打印线程池状态
     *
     * @param executor
     * @param name
     */
    private static void threadPoolStatus(ThreadPoolExecutor executor, String name) {
        BlockingQueue<Runnable> queue = executor.getQueue();
        System.out.println(Thread.currentThread().getName() + "-" + name + "-:" +
                "核心线程数：" + executor.getCorePoolSize() +
                " 活动线程数：" + executor.getActiveCount() +
                " 最大线程数：" + executor.getMaximumPoolSize() +
                " 线程池活跃度：" +
                divide(executor.getActiveCount(), executor.getMaximumPoolSize()) +
                " 任务完成数：" + executor.getCompletedTaskCount() +
                " 队列大小：" + (queue.size() + queue.remainingCapacity()) +
                " 当前排队线程数：" + queue.size() +
                " 队列剩余大小：" + queue.remainingCapacity() +
                " 队列使用度：" + divide(queue.size(), queue.size() + queue.remainingCapacity()));
    }

    private static String divide(int num1, int num2) {
        return String.format("%1.2f%%", Double.parseDouble(num1 + "") / Double.parseDouble(num2 + "") * 100);
    }
}

When you paste this code, you will find that you do not have the NamedThreadFactory class.

It doesn't matter, I used the hutool toolkit. If you don't have one, you can customize one or not pass it in the constructor. This is not the point and the problem is not big.

The big problem is ResizeableCapacityLinkedBlockingQueue .

How did it come from?

Mentioned in the previous article:

Just paste a copy of LinkedBlockingQueue, modify the name, remove the final modifier of the Capacity parameter, and provide its corresponding get/set method.

It feels very simple, and dynamic changes of capacity parameters can be achieved.

However, when I wrote it, I felt that it was flawed.

After all, if it is so simple, why did the official design it as final?

Where is the pit?

I LinkedBlockingQueue here. They are all the content of the eight-legged essay that must be memorized.

Mainly talk about the aforementioned scenario, if I directly remove the final modifier and provide its corresponding get/set method, where is the pitfall of this approach.

First of all, if there is no special instructions, the source code in this article is the JDK 8 version.

Let's take a look at this put method:

Mainly look at this framed part.

We know that the capacity in the while condition represents the current capacity.

So what is count.get?

It is how many elements are in the current queue.

count.get == capacity means that the queue is full, and then execute notFull.await() to hang up the current put operation.

Here is a simple example to verify:

Apply for a queue of length 5, and then call the put method in the loop. When the queue is full, the program is blocked.

By dumping the current thread, we can know that the main thread is indeed blocked in the place we analyzed earlier:

So, think about it. If I change the capacity of the queue to another value, will this place be aware of it?

It can't perceive it, it's waiting for others to wake up.

Now let's replace the queue with my modified queue to verify it.

The idea of the following verification program is to execute the put operation of the queue in a child thread until the capacity is full and it is blocked.

Then the main thread changes the capacity to 100.

In the above program, the effect I want to achieve is that when the capacity is expanded, the child thread should not continue to block.

But after the previous analysis, we know that the child thread will not be awakened here.

So, the output is like this:

The child thread is still blocked, so it did not meet expectations.

So what should we do at this time?

Of course it is to actively wake up.

That is to modify the logic of setCapacity:

public void setCapacity(int capacity) {
    final int oldCapacity = this.capacity;
    this.capacity = capacity;
    final int size = count.get();
    if (capacity > size && size >= oldCapacity) {
        signalNotFull();
    }
}

The core logic is to find that if the capacity is expanded, then call the signalNotFull method:

Wake up the thread that was parked.

If you see this, you think you are a bit embarrassed, and you don’t know what these LinkedBlockingQueue things do:

Hurry up and spend an hour to add knowledge points related to LinkedBlockingQueue. Such things are often tested in interviews.

Okay, let's come back.

After modifying our custom setCapacity method, execute the program again, and the expected output appears:

In addition to changing the setCapacity method, I inadvertently triggered another answer when writing the article:

After calling the setCapacity method, call the put method again to get the expected output:

We can observe the put method to find that the truth is the same:

After calling the setCapacity method, call the put method again. Since the condition of the code labeled ① is not satisfied, it will not be blocked.

So you can go to the place labeled ② to wake up the blocked thread.

Therefore, the purpose of changing the queue length and waking up blocked tasks is achieved in a disguised form.

In the final analysis, it is necessary to perform a wake-up operation.

So which one is more elegant?

That must be the first way to encapsulate the logic in the setCapacity method to make it more elegant to operate.

The second method is mostly suitable for situations where "you don't know why, it's normal to write programs like this anyway".

Now we know what is the pit of dynamically adjusting the queue length in the thread pool.

That is, after the queue is full, the thread that calls the put method will be blocked, even if another thread calls the setCapacity method at this time, changing the queue length, if no thread triggers the put operation again, the blocked thread will not be blocked. wake.

is not it?

Don’t understand?

right?

This is not right, friends.

Friends who nod frequently after seeing the previous content should pay attention.

This place is about to turn.

Start to turn

When adding objects to the queue in the thread pool, the offer command is used instead of the put command:

Let's see what the offer command is doing:

After the queue is full, return false directly without blocking.

In other words, there is no need to wake up in the thread pool at all, because there is no blocking thread at all.

In the process of communicating with the VariableLinkedBlockingQueue , he mentioned a 061762bd36d373 thing.

This class is located in the MQ package. The modification method of the setCapacity method I mentioned earlier is learned from it:

At the same time, its put method is also used in the project:

Therefore, it is possible that the situation we analyzed earlier may occur, and there are threads that need to be awakened.

However, if you think about it, the put method is not used in the thread pool, is it just to avoid this situation?

Yes, it is.

However, it is not rigorous enough. If you know that there is a problem, why should you leave a hole here?

You learn MQ's VariableLinkedBlockingQueue with a thorough consideration. It can be used even when the put method is blocked. Isn't it fragrant?

In fact, it seems that besides letting you get familiar with LinkedBlockingQueue, it seems to be a useless knowledge point.

However, I can make this useless knowledge point play a big role.

Because this is actually a small detail.

Suppose I go out for an interview, and when I mention the dynamic adjustment method during the interview, I inadvertently grasp this little detail. Even if I haven't really landed the dynamic adjustment before, but I mentioned such a little detail, it seems very real. .

The interviewer heard: It's very good. There are overall and partial. It should be impossible to fake.

There are several details in VariableLinkedBlockingQueue. Take the put method as an example:

The judgment condition has count.get() >= capacity changed count.get() = capacity , in order to support the scenario where the capacity changes from large to small.

There are several such places, so I won’t list them all.

The devil is in the details.

The students have to take a good look.

JDK bug

In fact, the original plan was written to the front, so I planned to end it, because I just wanted to add details that I hadn't noticed before.

However, I was cheap, and went to the JDK bug list to search LinkedBlockingQueue, and wanted to see if there were any other gains.

I never expected that there was indeed a little windfall.

The first is this bug, which was raised on 2019-12-29:

https://bugs.openjdk.java.net/browse/JDK-8236580

The meaning of the title is to empower LinkedBlockingQueue so that its capacity can be modified.

In addition to his description of the scene below, he should also want to cooperate with the thread pool, find the grasper of the queue, drill down to the underlying logic, link the monitoring system, pull through the configuration page, and play a set of dynamically adapted combination punches.

But the official did not adopt this suggestion.

In the reply, the buddies who wrote the concurrent package are very cautious about adding things to the concurrent class. They feel that providing ThreadPoolExecutor with dynamically modifiable features will bring or has brought many bugs.

I understand it in a simple sentence: The suggestion is still good, but I dare not move. Concurrency, pull the whole body, don't know what moths will come out.

So to realize this function, you still have to find a way by yourself.

This also explains why the capacity of the queue is modified with final. After all, if the function is reduced, the chance of bugs is much less.

The second bug is interesting, and it matches our need to dynamically adjust the thread pool:

https://bugs.openjdk.java.net/browse/JDK-8241094

This is a bug raised in March 2020. It describes that when updating the number of core threads in the thread pool, a rejection exception will be thrown.

He posted a lot of code in the bug description part, but the code he wrote was very complicated and not easy to understand.

Fortunately, Mr. Martin wrote a simplified version, which is easy to understand at a glance:

What does this code do? Let me briefly report to you.

First, there is a loop in the main method. The test method is called in the loop. The loop ends when the test method throws an exception.

Then in the test method, a new thread pool is created every time, and then the queue length plus the maximum number of threads tasks are submitted to the thread pool, and finally the thread pool is closed.

At the same time, there is another thread that changes the number of core threads in the thread pool from 1 to 5.

You can open the bug link mentioned earlier, post this code and run it, which is very incredible.

Mr. Martin, he also thinks this is a bug.

To be honest, I ran a case, I think this should be regarded as a bug, but after the personal certification by Doug Lea, he does not think it is a bug.

The main reason is that this bug is indeed a bit beyond my cognition, and the specific reason is not clearly stated in the link, which caused me to locate it for a very long time, and even wanted to give up at one time.

But after finally positioning the problem, I sighed: Harm, that's it? It's meaningless.

Let's take a look at the performance of the problem:

After the above program runs, it will throw a RejectedExecutionException, that is, the thread pool refuses to execute the task.

But as we analyzed earlier, the number of for loops is just the number of tasks that the thread pool can accommodate:

There shouldn't be a problem in theory?

This is where the questioning buddy wondered:

He said: I am very puzzled. The number of tasks I submitted will never exceed queueCapacity+maxThreads. Why does the thread pool throw a RejectedExecutionException? And this problem is very difficult to debug, because adding any form of delay to the task, the problem will not recur.

His implication is: This problem is very inexplicable, but I can reproduce it stably, but the timing of each recurrence is very random, I can't figure it out, I think it is a bug, you can help.

Let me not talk about the main cause of the bug I located.

Let's take a look at what the old man said:

The old man’s point of view is simply four words:

The old man said that he didn't convince himself that the above program should be run successfully.

It means that he thinks it is normal to throw an exception. But he did not say why.

A day later, he added another sentence:

Let me first translate it for everyone:

He said that when the submit method of the thread pool and setCorePoolSize or prestartAllCoreThreads exist at the same time and run in different threads, there will be competition between them.

There will be a short window when the new thread is pre-started but not fully ready to accept tasks in the queue. The queue is still full in this window.

The solution is actually very simple. For example, the logic of pre-starting threads can be removed in the setCorePoolSize method, but if the prestartAllCoreThreads method is used, the previous problem will still occur.

However, no matter what the situation is, I am still not sure that this is a problem that needs to be fixed.

How is it, do you look dazed by the words of the old man?

Yes, I read this passage 10 times at the beginning. It was all ignorant, but when I understood the reason for this problem, I still had to sigh:

The old man summed it up in place, without a word of nonsense.

What's the reason?

First, let's take a look at these two places in the sample code for operating the thread pool:

It is a thread that modifies the number of core threads, that is, a thread in the default thread pool ForkJoinPool of CompletableFuture.

Submitting tasks to the thread pool is another thread, the main thread.

The first sentence of the old man said this:

Racing means driving, driving fast, and racing against...

This is a multi-threaded scenario. The main thread and the threads in the ForkJoinPool are raced, that is, the question of who comes first may appear.

Then we look at what the setCorePoolSize method does:

The place labeled ① is to calculate the difference between the newly set number of core threads and the original number of core threads.

The difference obtained is used in the place marked ②.

That is, take the smaller one of the difference and the number of tasks currently queued in the queue.

For example, the current configuration of the number of core threads is 2. At this time, I will change it to 5. There are 10 tasks in the queue.

Then the difference is 5-2=3, that is, delta=3 at label ①.

workQueue.size is the 10 tasks that are being queued.

That is Math.min(3,10), so k=3 at the label ②.

The meaning is that you need to add 3 core threads to help process the queued tasks.

However, you want to add 3 more, must it be correct?

Will the tasks in the queue have been processed in the process of adding new ones, maybe there is no need for as many as three?

So, what else is the condition for loop termination besides the honest loop k times?

When the queue is empty:

At the same time, if you look at the large comment on the code, you will know that it actually describes the same thing as me.

Okay, let’s look at addWorker. I want you to see the place:

After a series of judgments in this method, it will enter the logic of new Worker(), that is, the worker thread.

Then add this thread to the workers.

Workers is a HashSet collection of worker threads:

Look at the codes of the two games I framed, from workers.add(w) to t.start() .

From joining to the collection to the actual start, there is some logic in the middle.

This short period of time to execute the logic in the middle is what the old man said "window".

there's a window while new threads are in the process of being prestarted but not yet taking tasks。

That is, when the new thread is pre-launched, but has not yet accepted the task, there will be a window.

What will happen to this window?

It is the following sentence:

the queue may remain (transiently) full。

The queue may still be full, but only temporarily.

Next, let's connect together and see:

So how do you understand the sentence underlined above?

Bring in an actual scene, that is, the previous sample code, just adjust the parameters:

This thread pool has a core thread number of 1, the maximum number of threads is 2, the queue length is 5, and the maximum number of tasks that can be accommodated is 7.

Another thread is performing the operation of changing the core thread pool from 1 to 2.

Suppose we remember that the thread pool submit has submitted 6 tasks, and the time point when the 7th task is being submitted is T1.

Why should we emphasize this point in time?

Because when submitting the seventh task, you need to enable the number of non-core threads.

The specific source code is here:

java.util.concurrent.ThreadPoolExecutor#execute

That is to say, the queue is full at this time, and workQueue.offer(command) returns fasle. So we have to go to the addWorker(command, false) method.

The time when the code reaches line 1378 is T1.

If the addWorker method on line 1378 returns false, it means that adding a worker thread failed and a rejection exception is thrown.

The previous example program throws a rejection exception because the fasle is returned here.

Then the question becomes: Why does addWorker in line 1378 return false after execution?

Because this condition is currently not met wc >= (core ? corePoolSize : maximumPoolSize) :

wc is the current thread pool and the number of working threads.

Bringing in our previous conditions, this is wc >=(false?2:2) .

That is, wc=2.

Why is it equal to 2, shouldn't it be 1?

Where did the more come from?

There is only one truth: at this time the addWorker in the setCorePoolSize method is also executed to workers.add(w) , causing wc to change from 1 to 2.

The car crashed, so a rejection exception was thrown.

So why don't exceptions be thrown in most cases?

Because workers.add(w) to t.start() is very short.

In most cases, after the addWorker in the setCorePoolSize method is executed, it will be understood to take a task out of the queue for execution.

In this case, after another task is submitted through the thread pool, it is found that there are seats in the queue, and it is placed in the queue, and the addWorker method will not be executed at all.

Truth is such a truth.

This multithreading problem is indeed difficult to reproduce, how did I locate it?

Add log.

How to add logs in the source code?

Not only did I build a custom queue, but I also posted a copy of the source code of the thread pool so that I could add logs:

In addition, in fact, my positioning plan is not rigorous.

When debugging multithreading, it is best not to use System.out.println, there is a pit!

Scenes

Let's look back at the plan given by the father:

Actually it gave two.

The first is to remove the logic of addworker in the setCorePoolSize method.

The second is that in the original program, that is, the program given by the questioner, the prestartAllCoreThreads method is used, and the addWorker method must be called in this, so there is still a certain chance that the previous problem will occur.

However, the old man did not understand why he wrote this way?

I think maybe he didn't think of any suitable scene?

In fact, the aforementioned bug may actually appear in this dynamic adjustment scenario.

Although the probability of occurrence is very low, the conditions are also very harsh.

However, there is still a chance.

In case it appears, when colleagues are picking their heads, you just say: Well, I've seen it, it's a bug. Not necessarily every time.

This is another small detail that you can handle.

However, if you encounter this problem during the interview, it is a stupid question.

Meaningless.

Belongs to, the interviewer doesn't know where he saw a point that feels great, so he must show his great look.

But what he didn't know was that this question:

One last word

Okay, I saw it here, let's arrange a like. Writing articles is tiring and requires a little positive feedback.

Knock out for readers and friends:

This article has been included from my personal blog, everyone is welcome to play:

https://www.whywhy.vip/

Fill in a hole! Let's talk about the dynamic adjustment of the thread pool.

Where is the pit?

Start to turn

JDK bug

What's the reason?

Scenes

One last word

why技术

引用和评论

面试场景题：一次关于线程池使用场景的讨论。

💢线上高延迟请求排查

每一个前端，都要拥有属于自己的埋点库~

这些年

Spring-@Configuration注解简析

单元测试-PowerMock

还在用命令行监控服务器？试试这款监控工具吧，直观又易用！

Fill in a hole! Let&#39;s talk about the dynamic adjustment of the thread pool.

Where is the pit?

Start to turn

JDK bug

What's the reason?

Scenes

One last word

why技术

引用和评论

面试场景题：一次关于线程池使用场景的讨论。

💢线上高延迟请求排查

每一个前端，都要拥有属于自己的埋点库~

这些年

Spring-@Configuration注解简析

单元测试-PowerMock

还在用命令行监控服务器？试试这款监控工具吧，直观又易用！

Fill in a hole! Let's talk about the dynamic adjustment of the thread pool.