One: Background
1. Storytelling
The day before yesterday, a friend added a WeChat to ask for his program and the CPU burst high. The opening was a red envelope, which scared me! 🤣🤣🤣
Since it was a young year in the south, it was inconvenient for me to deal with Zhang Luo in my hometown. I didn’t help him deal with it while(true)
. My friend found out the problem the next morning, and the feedback said that it was caused by 061f3b09f6f3af, which is a bit interesting. Among the many CPU explosion cases I analyzed, I have never encountered while(true)
. I have always regretted it. I am really lucky and caught up a few years ago, haha 😄😄😄.
Next, let's use windbg to analyze it together.
Two: Windbg analysis
1. Check the CPU usage
Friends who have been following me all know that the !tp
command is enough.
0:022> !tp
CPU utilization: 95 Unknown format characterUnknown format control characterWorker Thread: Total: 11 Running: 11 Idle: 0 MaxLimit: 32767 MinLimit: 4
Work Request in Queue: 0
--------------------------------------
Number of Timers: 3
--------------------------------------
Completion Port Thread:Total: 4 Free: 4 MaxFree: 8 CurrentLimit: 4 MaxLimit: 1000 MinLimit: 4
There is a sentence Unknown format characterUnknown format control characterWorker
above, which is not very harmonious. In fact, it means %
. I don't know why this encoding problem occurs in !eeversion
.
0:022> !eeversion
4.700.21.56803 (3.x runtime) free
4,700,21,56803 @Commit: 28bb6f994c28bc91f09bc0ddb5dcb51d0f066806
Workstation mode
In plan phase of garbage collection
SOS Version: 5.0.4.36902 retail build
From the basic information, the current .netcore 3.x
, and it is obvious that the current GC is in the planning stage. So what is the planning stage?
2. What is the planning stage
In short, the GC in the planning phase needs to decide whether the current managed heap should perform a simple mark free operation or a heavy-weight compression operation. If compression is to be performed, it also needs to involve the relocation of managed heap objects. It often consumes a lot of CPU time slices. What is the next thing to explore that causes the GC to trigger?
3. GC trigger reasons
Since GC triggers are often caused by user threads allocating data, in the entire execution flow triggered by GC, one of the
is to freeze the 161f3b09f6f4b0 CLR execution engine, which is
SuspendEE
, which can be found in gc.app
.
Why do you have to mention SuspendEE
? It is because I can find out the SuspendEE
!t -special
, which is more accurate.
0:072> !t -special
ThreadCount: 54
UnstartedThread: 0
BackgroundThread: 40
PendingThread: 0
DeadThread: 1
Hosted Runtime: no
OSID Special thread type
1 6328 DbgHelper
2 35c0 Finalizer
4 5aac Timer
5 38b0 ThreadpoolWorker
17 3530 ThreadpoolWorker
18 4484 ThreadpoolWorker
19 1e4c ThreadpoolWorker
21 6380 ThreadpoolWorker
44 5bc4 SuspendEE
52 8ac ThreadpoolWorker
54 4164 ThreadpoolWorker
56 61c8 ThreadpoolWorker
58 1fa4 ThreadpoolWorker
60 2788 ThreadpoolWorker
69 48f4 IOCompletion
70 5708 IOCompletion
71 3b58 ThreadpoolWorker
72 17a0 GC
73 2f00 Gate
74 35e8 IOCompletion
75 5730 IOCompletion
It can be seen that the current 44
is the thread that triggered the GC. Then it becomes clear. Let's see what thread 44 is doing? Cut to thread 44, then !clrstack
.
0:044> !clrstack
OS Thread Id: 0x5bc4 (44)
Child SP IP Call Site
000000A2B0C3E4C8 00007ffd471ead3a [HelperMethodFrame: 000000a2b0c3e4c8]
000000A2B0C3E5E0 00007ffce8b4e506 System.Collections.Generic.List`1[[System.__Canon, System.Private.CoreLib]].System.Collections.Generic.IEnumerable.GetEnumerator() [/_/src/System.Private.CoreLib/shared/System/Collections/Generic/List.cs @ 585]
000000A2B0C3E630 00007ffce85e7a10 xxx.Program.DeletexxxExipredDate()
000000A2B0C3E780 00007ffd46bc1f0b System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44]
000000A2B0C3E7B0 00007ffd46bb90b6 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 172]
000000A2B0C3E830 00007ffd46ba535b System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93]
000000A2B0C3EA48 00007ffd47236c93 [GCFrame: 000000a2b0c3ea48]
000000A2B0C3ECB0 00007ffd47236c93 [DebuggerU2MCatchHandlerFrame: 000000a2b0c3ecb0]
From the output information, the problem DeletexxxExipredDate()
in the 061f3b09f6f552 method, and then explore the source code of this method.
private static void DeletexxxExipredDate()
{
while (true)
{
foreach (string key in xxx.xxxSpeedLimit.Keys)
{
try
{
string[] array = xxx.xxxSpeedLimit[key].Split('$');
if (array.Length > 1)
{
DateTime dateTime = Convert.ToDateTime(array[1]);
if ((DateTime.Now - dateTime).TotalSeconds > 21600.0 && xxx.xxxSpeedLimit.ContainsKey(key))
{
xxx.xxxSpeedLimit.TryRemove(key, out var _);
}
}
}
catch (Exception ex)
{
LogHelper.WriteAppExceptionLog("删除数据出现异常:" + ex.Message, ex);
}
Thread.Sleep(20000);
}
}
}
If you have rich experience in stepping on the pit, I believe that you can see the problems in this code at a glance. Yes, when the xxxSpeedLimit
dictionary is empty, it is equivalent to an infinite loop of while(true)
In order to verify my statement, you can use !dso
find the memory address of dict, and then use !wconcurrentdict
.
0:044> !dso
OS Thread Id: 0x5bc4 (44)
RSP/REG Object Name
...
000000A2B0C3E708 000001ba8007f618 System.Collections.Concurrent.ConcurrentDictionary`2[[System.String, System.Private.CoreLib],[System.String, System.Private.CoreLib]]
000000A2B0C3E760 000001ba88501cd0 System.Collections.Generic.List`1+Enumerator[[System.String, System.Private.CoreLib]]
000000A2B0C3E768 000001ba80050ec0 System.Threading.ContextCallback
000000A2B0C3E7F8 000001ba80a1a818 System.Threading.Thread
000000A2B0C3EA28 000001ba80a1a898 System.Threading.ThreadStart
0:044> !wconcurrentdict 000001ba8007f618
Empty ConcurrentDictionary
As you can see, it is currently a empty dictionary😂😂😂
Three: Summary
The main reason for this accident: the coders lacked a certain amount of programming experience, and when writing business logic, they lacked the process of processing the empty dictionary
while(true)
, and it was also possible that the Thread.Sleep(20000)
misplaced 🤣🤣 🤣
In general, I am very grateful for the dump provided by this friend, which made me really see it!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。