1
头图

Hello, I'm crooked.

Today, I will take you all to scroll through the time wheel. This thing is actually quite practical.

It is common in various frameworks, occasionally in the interview process. It is a bit difficult to understand, but after knowing the principle, I feel:

When most people talk about the time wheel, they start with netty.

I'm different. I want to start with Dubbo. After all, the first time I came into contact with the time wheel was in Dubbo, and it was amazing at that time.

Moreover, Dubbo's time wheel is also taken from Netty's source code, basically the same.

The time wheel has been used several times in Dubbo, such as the sending of heartbeat packets, the detection of request call timeout time, and the cluster fault tolerance strategy.

Let me start with this class in Dubbo:

org.apache.dubbo.rpc.cluster.support.FailbackClusterInvoker

Failback is a kind of cluster fault tolerance strategy:

It doesn't matter if you don't know Dubbo, you just need to know that it is introduced on the official website like this:

I would like to highlight the point that " timing of retransmission " the words.

Let's not look at the source code first, what do you think of when it comes to regular resending?

Did you think of timed tasks?

So how to implement timing tasks?

Everyone can generally think of the two classes ScheduledExecutorService and Timer provided in the JDK.

I won't say much about Timer, the performance is not high enough, now it is not recommended to use this thing.

ScheduledExecutorService is relatively used, and it mainly has three types of methods:

Briefly talk about the two methods scheduleAtFixedRate and scheduleWithFixedDelay.

ScheduleAtFixedRate, each execution time is pushed back by a time interval from the start of the previous task.

ScheduleWithFixedDelay, each execution time is a time interval from the end of the previous task.

emphasizes the start time of the previous task, and the latter emphasizes the end time of the previous task.

You can also understand that ScheduleAtFixedRate is based on fixed time intervals for task scheduling, and ScheduleWithFixedDelay depends on the length of time each task is executed, and is based on irregular time intervals for task scheduling.

Therefore, if we want to implement the scheduled retransmission function mentioned above based on ScheduledExecutorService, I think it is better to use ScheduleWithFixedDelay, which means that the next retry should be performed after a certain period of time after the previous retry is completed.

Serialize the entire retry function.

So how does Dubbo achieve this timing retry requirement?

The source code, there are no secrets under the source code.

Ready to start.

Source code

Some students may be anxious seeing this: Didn’t you talk about the time wheel? Why are you starting to poke the source code again?

Don't be anxious, I can't do it step by step.

Let me take you to tear up a wave of Dubbo source code first, let you know what the problem is with the source code written in this way, and then I will talk about the solution.

Besides, I'll just snap, and throw the solution on your face, you can't accept it either.

I like a gentler teaching method.

Well, first look at the source code below.

It doesn't matter if you don't understand these lines of code, you are mainly concerned with the logic in catch.

I matched the code with the introduction on the official website for you.

It means that the call failed, and there is an addFailed to find out.

What did addFailed do?

What it does is "retransmit regularly":

org.apache.dubbo.rpc.cluster.support.FailbackClusterInvoker#addFailed

This method can answer the question we posed earlier: in Dubbo cluster fault tolerance, how does the timing retry requirement be realized?

From the place labeled ①, we can know that the ScheduledExecutorService is used, and the specific point is the scheduleWithFixedDelay method.

To be more specific is that if the cluster fault tolerance adopts the failback strategy, RETRY_FAILED_PERIOD seconds when the request fails 061933adf831d, a RETRY_FAILED_PERIOD seconds until the retry is successful.

RETRY_FAILED_PERIOD is 061933adf831f9?

Look at line 52, it is 5 seconds.

In addition, you can see that the place labeled ③ in the addFailed method above is putting things in failed.

What is failed again?

Look at the previous 61 lines, which is a ConcurrentHashMap.

In the place labeled ③, the key to the failed put is the request that needs to be retried this time, and the value is the server corresponding to this request.

When is the failed map used?

Please see the retryFailed method labeled ②:

In this method, it will traverse the failed map and call it all out again.

If it succeeds, it will call the remove method to remove the request. If it fails, an exception will be thrown, the log will be printed, and then the request will be retryed next time.

Up to this point, we have uncovered the mystery of Dubbo's FailbackClusterInvoker class.

Under the veil, what is hidden is a map plus ScheduledExecutorService.

It doesn't seem to be too difficult. A conventional solution, I can think of it too.

So you slowly type on the screen:

However, my friends, take a good hold and sit firmly, to "but" and to turn.

There is actually a problem here. The most intuitive thing is this map. There is no limit on the size. Since there is no limit on the size, memory overflow may occur in some high-concurrency scenarios.

Okay, so the question is, how to prevent memory overflow?

Very simple, first we can limit the size of the map, right.

For example, limit its capacity to 1000.

What should I do when it is full?

Can we develop an elimination strategy, first in first out (FIFO), or last in first out (LIFO).

Then it can't keep retrying. If the retry exceeds a certain number of times, it should be killed.

The memory overflow and solutions mentioned above are not my nonsense.

I have evidence, because I saw its evolution from the submission record of the FailbackClusterInvoker class. The code in the previous screenshot is also optimized for the previous version, not the latest code:

This time I submitted an issue named 2425.

https://github.com/apache/dubbo/issues/2425

The problems and solutions mentioned here are the things I mentioned earlier.

Finally, the preparation is complete, and the story about the time wheel will officially begin.

Time wheel principle

Some friends started to be anxious again.

Ask me to quickly get the source code of the time wheel.

Don't worry, I will tell you the source code directly, and you will definitely be confused.

So I decided to draw a picture for you first and understand the principle.

Draw the basic look of the time wheel for everyone, and understand the working principle of the time wheel, the following source code analysis will be relatively easy to understand.

First of all, the most basic structure of the time wheel is actually an array, such as the following array with a length of 8:

How to become a wheel?

Just go end to end:

If each element represents one second, then array can express in a circle of is 8 seconds, which is like this:

Note that what I emphasized earlier is one lap, which is 8 seconds.

Then 2 laps are 16 seconds, 3 laps are 24 seconds, and 100 laps are 800 seconds.

Can you understand this?

I will give you another picture:

Although the length of the array is only 8, it can be superimposed one circle after another, so there is more data that can be represented.

For example, I changed the first three circles of the above picture to be drawn like this:

I hope you can understand it. It doesn't matter if you don't understand it. I mainly want you to know that there is a concept of "number of laps".

Okay, now I will beautify the previous array and turn it into a wheel visually.

What do the wheels say?

The English word for wheel is wheel, so we now have an array called wheel:

Then, fill in the previous data, which looks like this.

For the convenience of illustration, I only filled in the positions where the subscripts are 0 and 3. The other places also have the same meaning:

Then the problem is coming. Suppose I have a task that needs to be executed after 800 seconds at this time. What should it look like?

800 mod 8 =0, indicating that it should be hung where the subscript is 0:

Suppose there is another task that needs to be executed after 400 seconds?

For the same reason, just continue to add later:

Don't mistakenly think that the number of turns in the linked list corresponding to the subscript must be in order from small to large, this is not necessary.

Okay, now there is another task that needs to be executed in 403 seconds. Where should I hang it?

403 mod 8 = 3, then it is like this:

Why should I take the trouble to tell you how to calculate and how to put it in the corresponding subscript?

Because I still need to introduce one thing: the queue of tasks to be assigned.

When drawing the tasks of 800 seconds, 400 seconds, and 403 seconds above, I also omitted one step.

In fact, it should be like this:

Tasks are not hung on the time wheel in real time, but are placed in a queue to be allocated first, and then the tasks in the queue to be allocated are hung on the time wheel after a specific time.

When is it exactly?

Let's talk about the source code below.

In fact, in addition to the queue to be allocated, there is also a queue for task cancellation.

Because the tasks placed in the time wheel can be cancelled.

For example, in Dubbo, the time wheel mechanism is also used to check whether the call has timed out.

Assuming that the timeout period of a call is 5s, the task needs to be triggered after 5s, and a timeout exception is thrown.

But if the request receives a response within 2s without timeout, then the task needs to be cancelled.

The corresponding source code is this one, it doesn't matter if you don't understand it, just take a look, I just want to prove that I didn't lie to you:

org.apache.dubbo.remoting.exchange.support.DefaultFuture#received

The principle drawing is probably like this, and then I am still missing a picture.

Give you the names of the fields in the source code to correspond to the above figure.

Mainly correspond to these several objects for you, and it will not be too difficult to look at the source code later:

The corresponding is like this:

Note that the "worker scope of work" in the upper left corner wraps up the entire time wheel. When you look at the source code later, you will find that there is actually no thread safety issue in the core logic of the entire time wheel, because the single thread of the worker does all the work. It's done.

Finally, one more mention: For example, in the previous FailbackClusterInvoker scenario, the time wheel triggered a retry task, but it still failed. What should I do?

It's very simple, just put the task in again, so if you look at the source code, there is a method called rePut that does this:

org.apache.dubbo.rpc.cluster.support.FailbackClusterInvoker.RetryTimerTask#run

The meaning here is that if an exception occurs in the retry and the specified number of retries is not exceeded, then the task can be returned to the time wheel again.

Wait, what else can I do after I know the "number of retries" here?

For example, if you have received WeChat Pay, its callback notification has such a time interval:

I know the current number of retries, so I can set the time to 10 minutes on the 5th retry and throw it into the time wheel.

The above requirements can be achieved by the time wheel.

Of course, MQ's delay queue is also available, but it is not the scope of this article.

But there is another problem with using the time wheel to do the above requirement: that is, the task is in the memory, and if the service hangs, it will be gone. This is a place to pay attention to.

In addition to FailbackClusterInvoker, I actually think that the more appropriate place for the time wheel is to do the heartbeat.

This is too appropriate, Dubbo's heartbeat is done with the time wheel.

org.apache.dubbo.remoting.exchange.support.header.HeartbeatTimerTask#doTask

As you can see from the above figure, the doTask method is to send a heartbeat packet. After each sending is completed, the reput method is called, and then the task of sending the heartbeat packet is returned to the time wheel again.

Well, no longer expand the application scenarios.

Next, enter the source code analysis, keep up with the rhythm, don't be chaotic, everyone can learn.

Open the book!

Time wheel source code

After understanding the principle in place, we can take a look at our source code.

First explain, in order to facilitate my screenshots, I moved the location of the source code for some of the following screenshots, so it may be a bit different from when you look at the source code.

Let's review the usage of the time wheel in Dubbo's FailbackClusterInvoker class again.

First of all, the failTimer object is a familiar double-checked singleton mode:

The failTimer initialized here is the key logic of the HashedWheelTimer object is to call its construction method.

So, let's start with its construction method and start tearing it up.

Let me talk about what its several input parameters are:

  • threadFactory: Thread factory, you can specify the name of the thread and whether it is a daemon.
  • tickDuration: The time interval between two ticks.
  • unit: The time unit of tickDuration.
  • ticksPerWheel: The number of ticks in the time wheel.
  • maxPendingTimeouts: The maximum number of waiting tasks in the time wheel.

Therefore, the meaning of Dubbo's time wheel is like this:

Create a daemon thread named failback-cluster-timer to execute tasks every second. The size of this time wheel is 32, and the maximum number of tasks waiting to be processed is failbackTasks. This value is configurable, and the default value is 100.

But in many other usage scenarios, such as Dubbo checking whether the call has timed out, the value of maxPendingTimeouts is not sent:

org.apache.dubbo.remoting.exchange.support.DefaultFuture#TIME_OUT_TIMER

It didn't even send ticksPerWheel.

In fact, these two parameters have default values. ticksPerWheel defaults to 512. maxPendingTimeouts defaults to -1, meaning that there is no limit to the number of tasks waiting to be processed:

Okay, now let’s take a look at the construction method of this time wheel as a whole. I have written a comment on the function of each line:

There are a few places, I will also show you separately.

For example, the createWheel method, if you are familiar with the memorization of the eight-legged essay, you know that the core code for confirming the capacity here is the same as that in the HashMap.

This is what I mentioned in the source code comments, time wheel must be 2 to the power of n.

Why, you ask me why?

Don't ask, the question is to do bitwise operations later, the operation is fast, and the force is high.

I believe that the following code snippet does not need me to explain. If you don’t understand, you can double the eight-legged essay of HashMap:

But this line of code I can still say mask = wheel.length - 1 .

Because we already know that wheel.length is 2 to the power of n.

So suppose the delayed execution time of our timing task is x, then which grid of the time wheel should it be in?

Should we use x to take the remainder of the length? This is how it is calculated: x% wheel.length.

However, the efficiency of the remainder operation is actually not high.

So how can we make this operation faster?

That is wheel.length-1.

wheel.length is 2 to the nth power. After subtracting one, the low-order bits of the secondary system are all 1. For example, this is the formula:

So x% wheel.length = x & (wheel.length-1).

Mask =wheel.length-1 in the source code.

So where is the mask used?

One of them is in the run method of the Worker class:

org.apache.dubbo.common.timer.HashedWheelTimer.Worker

The idx calculated here is the subscript of the array that needs to be processed currently.

I'm just telling you that the mask is indeed involved in the & bit operation, so it doesn't matter if you don't understand this piece of code, because I haven't talked about it yet.

So don’t panic if you haven’t followed, let’s look down.

We already have a time wheel before, so how do we call this time?

In fact, it calls its newTimeout method:

This method has three inputs:

The meaning is very clear, that is, the specified task (task) starts to trigger after the specified time (delay, unit).

Next, interpret the newTimeout method:

The most critical code in it is the start method. Let me show you what you are doing:

It is divided into two parts.

The above is actually to maintain or judge the current status of HashedWheelTimer. From the source code, we know that the status has three values:

  • 0: Initialization
  • 1: Started
  • 2: Closed

If it is initialization, update the status to started through a cas operation, and execute the workerThread.start() operation to start the worker thread.

The next part is a little bit puzzling.

If startTime is equal to 0, that is, it has not been initialized, call await of CountDownLatch and wait for a while.

And this await is still an await on the main thread, where the main thread waits for startTime to be initialized. What kind of logic is this?

First, let's find out where startTime is initialized.

It is in the run method of Worker, and this method is triggered when the previous workerThread.start():

org.apache.dubbo.common.timer.HashedWheelTimer.Worker

It can be seen that after the initialization of startTime is completed, it is also judged whether it is equal to 0. That is to say, the System.nanoTime() method may return 0, a small detail, if you want to go deeper, it is also very interesting, I will not expand it here.

After the startTime initialization is complete, the startTimeInitialized.countDown() operation is executed immediately.

Doesn't this echo with here?

Can the main thread not be able to run right away?

So the problem is here, here is a lot of effort to do a startTime initialization, can not get the main thread and can not continue to execute what is it?

Of course it is useful. Go back to the newTimeout method and look down:

Let’s analyze the above equation.

First, System.nanoTime() is the real-time time when the code is executed to this place.

Because delay is a fixed value, unit.toNanos(delay) is also a fixed value.

Then System.nanoTime()+unit.toNanos(delay) is the number of nanoseconds that this task needs to be triggered.

for example.

Suppose System.nanoTime() = 1000 and unit.toNanos(delay)=100.

Then the time point when this task is triggered is 1000+100=1100.

Can this keep up?

So why subtract startTime?

startTime We analyzed earlier, in fact, it is System.nanoTime() when it is initialized, and it is a fixed value after the initialization is completed.

Isn't it that System.nanoTime()-startTime is almost close to 0?

What is the meaning of this equation System.nanoTime()+unit.toNanos(delay)-startTime?

Yes, this was a question I had when I looked at the source code.

But after I analyzed it, in fact, only System.nanoTime() is a variable in the entire equation.

System.nanoTime()-startTime does approach 0 when it is first calculated, but when it is triggered for the second time, that is, when the second task comes, when calculating its deadline, System.nanoTime() is Much larger than the fixed value of startTime.

Therefore, the execution time of the second task should be the current time plus the specified delay time minus the start time of the worker thread, and so on.

The previous newTimeout method is analyzed, that is, the main thread executes the logic related to the time wheel in this place.

What should we analyze next?

It must be the worker thread that was the time round to play.

The logic of the worker thread is in the run method.

And the core logic is in a do-while inside:

The condition for the end of the loop is that the current state of the time wheel is not the active state.

In other words, as long as the time wheel is not called stop logic, this thread will continue to run.

Next, let's look at the logic in the loop line by line. This part of the logic is the core logic of the time wheel.

The first is final long deadline = waitForNextTick() , which contains a lot of stories:

First you look at the method name and you know what it does.

It is to wait here until the next moment.

So the first line of the method is to calculate the nanosecond value at the next moment.

Next, look at the for loop. The previous part is more awkward. Only the place labeled ③ is easier to understand, which is to let the current thread sleep for a specified time.

So the previous part is to calculate what this designated time is.

How to calculate it?

The place marked ①, the front part can still be understood,

deadline-currentTime calculates how long it will take to reach the next time scale.

I can't understand it directly behind.

The 1000000 in it is easy to understand, the unit is nanosecond, and the conversion is 1 millisecond.

What is this 999999?

In fact, 999999 here is to add 1 millisecond to the calculated value.

For example, if deadline-currentTime is calculated as 1000123 nanoseconds, then 1000123/1000000=1ms.

But (1000123+999999)/1000000=2ms.

In other words, let the place marked ③ below, sleep 1ms more.

Why is this?

I don't know, so I don't care about it for now, leave a hole, the problem is not big, and then write down.

Here comes the place labeled ②, which seems to be a special treatment for the windows operating system, and sleepTimeMs should be converted to a multiple of 10.

Why?

Here I have to criticize Dubbo. I took the Netty implementation and concealed the key information. This is not appropriate.

This place is like this in Netty's source code:

Here is a clear guide:

https://github.com/netty/netty/issues/356

And along this road, follow all the way down, you will find such a place:

https://www.javamex.com/tutorials/threads/sleep_issues.shtml

Unexpectedly, there will be unexpected gains.

The first underline probably means that when the thread calls the Thread.sleep method, the JVM will make a special call to set the interrupt period to 1ms.

Because the implementation of the Thread.sleep method relies on the interrupt check provided by the operating system, that is, the operating system will check whether there is a thread that needs to be awakened and provide CPU resources at each interrupt. So I think the reason for sleeping 1ms more can be explained by this reason.

The pit left in the front was filled so quickly, and it was comfortable.

The second underlined point is that if it is windows, the interrupt cycle may be 10ms or 15ms, which is related to the hardware.

Therefore, if it is windows, the sleep time needs to be adjusted to a multiple of 10.

A worthless knowledge, for you.

The first few questions are clearly understood, and the waitForNextTick method is well understood. What it does is to wait, wait for a time scale, and wait for a tick length of time.

After waiting?

Came to this line of code int idx = (int) (tick & mask)

We have analyzed before, calculating the subscript corresponding to the current time, bit operation, operation Sao, fast speed, high force grid, not much to say.

Then the code executes to this method processCancelledTasks()

You can see from the method name, which is the queue for processing cancelled tasks:

The logic is very simple, at a glance, it is to empty the cancelledTimeouts queue.

Here is removing, cleaning up.

So where is the add?

It is in the following method:

org.apache.dubbo.common.timer.HashedWheelTimer.HashedWheelTimeout#cancel

If the cancel method of HashedWheelTimeout is called, then the task is even cancelled.

This method was mentioned in the previous drawing, and the logic is very clear, so I won't explain it much.

But pay attention to where I drew an underline: MpscLinkedQueue.

What is this?

This is a very awesome lock-free queue.

But the data structure of the cancelledTimeouts queue here in Dubbo is clearly LinkedBlockingQueue?

What's going on?

Because the comment here is in Netty, Netty uses MpscLinkedQueue.

You see, let me compare the differences between Netty and Dubbo:

So the comment here is misleading. If you have time, you can give Dubbo to pr to modify it.

There is another small detail.

Okay, let’s scroll down and come to this line of code HashedWheelBucket bucket=wheel[idx]

It's clear at a glance, nothing to say.

Get the bucket with the specified subscript from the time wheel.

Mainly look at the line of code below it transferTimeoutsToBuckets()

I still add comments to each line:

So the core logic of this method is to distribute all the tasks waiting to be allocated to the specified bucket.

This also answers a question I left when drawing the picture: when will the tasks in the waiting allocation queue be suspended on the time wheel?

This is the time.

Next, analyze the line of code bucket.expireTimeouts(deadline)

You see, the caller of this method is the bucket, which means that it is ready to process the tasks in the linked list in the bucket:

Finally, there is a line of code tick++

Indicates that the current tick has been processed, and preparations for the next time scale are started.

The key code is analyzed.

read it again if you don't understand it again, but I suggest that you also check the source code by yourself, and you will be able to understand it soon.

I believe that when the interviewer asks about the time round, you can fight him for the previous round.

Why is it a round?

Because after you have answered this time round, generally speaking, the interviewer will ask one:

Well, that’s pretty good, so can you introduce the hierarchical time wheel?

At that time, you were stunned: What, what the hell is the hierarchical time wheel, you didn't write it?

Yes, blame me, I didn't write, next time, next time.

But I can show you the way to see how Kafka optimizes the time wheel. You will applaud as you watch.

Several related issues

Finally, regarding the Dubbo time wheel, there is a discussion in issues:

https://github.com/apache/dubbo/issues/3324

If you are interested, you can check it out.

An interesting question is mentioned:

Netty uses HashedWheelTimer extensively in 3.x, but in 4.1, we can find that Netty retains HashedWheelTimer, but does not use it in its source code. Instead, it chooses ScheduledThreadPoolExecutor. I don't know what its purpose is.

This question was answered personally by the maintainers of Netty:

https://github.com/netty/netty/issues/8774

What he meant was that there is nothing wrong with the time wheel. I didn't use it just because we wanted to be on the same thread as the channel's EventLoop.

In Netty, an old man found that the time wheel was useless, and even wanted to get rid of it:

I think this belongs to the category of tools, if you keep it, it will always be useful.

In addition, the previous issue also mentioned another problem:

https://github.com/apache/dubbo/issues/1371

This is also an optimization after Dubbo introduced the time wheel.

Take a look, the above is after optimization, and the following is the previous wording:

In the previous way of writing, a thread is started in the background, and then an infinite loop is created to scan the entire collection over and over again:

This kind of scheme can also realize the demand, but compared with the writing of the time wheel, it is better to judge.

Operate the show, the speed is fast, force the grid to be high.

One last word

Okay, I saw it here. Reposting, watching, and liking can be arranged at will. I don't mind if you arrange them all. Writing articles is tiring and requires a little positive feedback.

Knock out for readers and friends:

This article has been included from my personal blog, everyone is welcome to play.

https://www.whywhy.vip/

why技术
2.2k 声望6.8k 粉丝