More high-quality dry goods: see my GitHub: dotnetfly
One: background
1. Tell a story
This is the story of a friend who added wx to dump for help in mid-June. His program cpu burst high➕UI stuck and asked how to solve it. The screenshot is as follows:
After getting this dump, I found out that this is an MRS program for robotic arms. Haha, there is actually a use of .NET in the field of intelligent robots such as robotic arms, which is a bit beyond my cognition, I don’t know. There are several robotic arms controlled by .NET in the big strong son warehouse that treats employees as brothers. 😄😄😄.
I will not discuss the problem of interface stuck here. I will only discuss how to solve the problem of high cpu. After all, friends who chase this series are all memory leaks and memory explosions. Change your taste. Good.
Two: Windbg analysis
1. Phenomenon verification
Others say that the cpu is high, so I have to use data to prove whether this is really the case. The method is very simple, just use the !tp
command.
0:000> !tp
CPU utilization: 100%
Worker Thread: Total: 151 Running: 151 Idle: 0 MaxLimit: 32767 MinLimit: 4
Work Request in Queue: 1
AsyncTimerCallbackCompletion TimerInfo@000000001dc25bb0
--------------------------------------
Number of Timers: 0
--------------------------------------
Completion Port Thread:Total: 2 Free: 1 MaxFree: 8 CurrentLimit: 2 MaxLimit: 1000 MinLimit: 4
From the perspective of hexagrams, there are two indicators worthy of attention:
- CPU utilization: 100%
This shows that at the moment when the dump was captured, the CPU of the machine was indeed 100%, which was really bad. To be a digression, several friends said that he wanted to capture this 100% dump and found that the remote desktop of Alibaba Cloud could not be connected. It's so embarrassing. . .
- Total: 151 Running: 151
There are currently 151 work threads in the thread pool, and 151 are running. What does this mean? It shows that every thread is busy, and it also indicates that the current thread is not enough and urgently needs to recruit people. The current system definitely has a force pushing it.
2. View the thread list
Next, take a look at the current thread list and use the !t
command.
0:000> !t
ThreadCount: 171
UnstartedThread: 1
BackgroundThread: 167
PendingThread: 1
DeadThread: 2
Hosted Runtime: no
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 7e0 00000000028901c0 26020 Preemptive 00000000049C9360:00000000049C98A8 0000000000602420 1 STA (GC)
9 2 df8 00000000028bc850 2b220 Preemptive 0000000000000000:0000000000000000 0000000000602420 0 MTA (Finalizer)
11 3 144 000000001fdef570 102a220 Preemptive 0000000000000000:0000000000000000 0000000000602420 0 MTA (Threadpool Worker)
14 9 dbc 0000000020703650 202b220 Preemptive 0000000000000000:0000000000000000 0000000000602420 0 MTA
15 10 5a4 00000000206d5860 2b220 Preemptive 0000000000000000:0000000000000000 0000000000602420 0 MTA
16 11 17c 00000000206df220 2b220 Preemptive 0000000000000000:0000000000000000 0000000000602420 0 MTA
17 12 dd4 00000000205e49a0 2b220 Preemptive 00000000049A7A20:00000000049A98A8 0000000000602420 0 MTA
18 14 8fc 0000000020495000 2b220 Preemptive 00000000049A5A40:00000000049A78A8 0000000000602420 0 MTA
19 17 a84 0000000020817490 202b220 Preemptive 00000000049ADBB0:00000000049AF8A8 0000000000602420 0 MTA
...
180 167 12b8 0000000026436d70 1021220 Preemptive 0000000000000000:0000000000000000 0000000000602420 0 MTA (Threadpool Worker)
181 168 11a4 0000000026437540 1021220 Preemptive 0000000000000000:0000000000000000 0000000000602420 0 MTA (Threadpool Worker)
182 169 880 0000000026437d10 1021220 Preemptive 0000000000000000:0000000000000000 0000000000602420 0 MTA (Threadpool Worker)
183 170 1334 00000000264384e0 1021220 Preemptive 0000000000000000:0000000000000000 0000000000602420 0 MTA (Threadpool Worker)
184 171 278 0000000026438cb0 1400 Preemptive 0000000000000000:0000000000000000 0000000000602420 0 Ukn
From the images look: ID=1
thread has a GC
mark, I went, seems to be triggered GC, here to remind you, the worker thread in GC=Workstation
mode, it can act as a recovery GC thread, and this GC=Server
mode is different of.
3. View the thread stack
Since the GC is triggered, then dead horse will be a living horse doctor. According to the routines triggered by GC, the basic process is as follows:
- To call up all thread stacks, use the
!EEStack
command.
0:000> !EEStack
---------------------------------------------
Thread 0
Current frame: ntdll!NtGetContextThread+0xa
Child-SP RetAddr Caller, Callee
...
- Look for the
WaitUntilGCComplete
keyword to see how many threads are waiting for GC collection
Use Ctrl+F
retrieve, the screenshot is as follows:
It can be seen from the picture: 96 threads are waiting for the GC to complete. By this point, the corners of my mouth have risen, 😁😁😁. . .
- Find the
try_allocate_more_space
keyword to determine what business logic triggered the GC
I'm going, what's the matter? There is no end to this. Since there is no try_allocate_more_space
keyword, it means that the current GC is probably not triggered automatically. Then who triggered it? It's a bit weird, huh?
4. How is GC triggered?
To find out, the easiest and rude way is to look at what GC
does? Use !clrstack
here.
0:000> !clrstack
OS Thread Id: 0x7e0 (0)
Child SP IP Call Site
000000000043e470 00000000778c1fea [InlinedCallFrame: 000000000043e470] System.GC._Collect(Int32, Int32)
000000000043e470 000007feea38ce2a [InlinedCallFrame: 000000000043e470] System.GC._Collect(Int32, Int32)
000000000043e440 000007feea38ce2a System.GC.Collect()
000000000043e4f0 000007fe8bcd29ca xxx.xxx.T_Tick(System.Object, System.EventArgs)
000000000043e520 000007fee3d0ef6f System.Windows.Forms.Timer.OnTick(System.EventArgs)
000000000043e550 000007fee3d076fe System.Windows.Forms.Timer+TimerNativeWindow.WndProc(System.Windows.Forms.Message ByRef)
000000000043e580 000007fee3cea3c3 System.Windows.Forms.NativeWindow.Callback(IntPtr, Int32, IntPtr, IntPtr)
000000000043e620 000007fee43611f1 DomainBoundILStubClass.IL_STUB_ReversePInvoke(Int64, Int32, Int64, Int64)
000000000043e890 000007feeac1221e [InlinedCallFrame: 000000000043e890] System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG ByRef)
000000000043e890 000007fee3d6a378 [InlinedCallFrame: 000000000043e890] System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG ByRef)
000000000043e860 000007fee3d6a378 DomainBoundILStubClass.IL_STUB_PInvoke(MSG ByRef)
000000000043e920 000007fee3cff23e System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr, Int32, Int32)
000000000043ea10 000007fee3cfebd2 System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
000000000043eab0 000007fee3cfe9df System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
000000000043eb10 000007fe8b6208a6 xxx.Program.Main()
000000000043ede0 000007feeac16bb3 [GCFrame: 000000000043ede0]
Let me go, from the thread stack GC.Collect()
manually. What kind of business is this? Next, use ip2md + savemodule
to xxx.xxx.T_Tick
source code of 061381930824a3 to take a look.
0:000> !ip2md 000007fe8bcd29ca
MethodDesc: 000007fe8b50ae90
Method Name: xxx.xxx.T_Tick(System.Object, System.EventArgs)
Class: 000007fe8b6ac628
MethodTable: 000007fe8b50b080
mdToken: 00000000060002b5
Module: 000007fe8b504118
IsJitted: yes
CodeAddr: 000007fe8bcd29a0
Transparency: Critical
0:000> !savemodule 000007fe8b504118 D:\dumps\MRS-CPU\T_Tick.dll
3 sections in file
section 0 - VA=2000, VASize=1a85fc, FileAddr=200, FileSize=1a8600
section 1 - VA=1ac000, VASize=5088, FileAddr=1a8800, FileSize=5200
section 2 - VA=1b2000, VASize=c, FileAddr=1ada00, FileSize=200
Open T_Tick.dll with ILSpy, the screenshot is as follows:
From the code logic point of view, my friend did the business logic of triggering a GC for 3 minutes. I don't know what I want to do, so I talked with my friend on wx.
.png)
5. Is it really all about GC?
In fact, many of the dumps with high CPU bursts I have shared are due to frequent triggering of GC, such as large string splicing, misuse of regular expressions, etc., but a gc once every 3 minutes can kill the cpu , If Xiaobai can still fool around, in the eyes of friends who know a little bit, he can't stand scrutiny. The implication is that the real culprit has not been found. . . To find the suspicious culprit, the best way is to perform a carpet search on all thread stacks. The screenshot is as follows:
As you can see from the above figure, there are currently 112 threads stuck at System.Collections.Generic.Dictionary2[[System.Int32, mscorlib],[System.__Canon, mscorlib]].FindEntry(Int32)
. You will definitely say that the stuck is caused by the GC triggering and freezing all threads. Of course, this theory does not need to be refuted. It is indeed the case.
I believe that experienced friends will definitely find a problem, that is, a thread-unsafe Dictionary appears in a multi-threaded environment. I also mentioned this problem in a previous car networking CPU explosion analysis. It will lead to FindEntry
The weird phenomenon of infinite loop occurs during operation.
So far, the problem of high CPU burst has basically been found.
Three: Summary
This cpu explosion accident is mainly due to:
- The use of a non-thread-safe Dictionary in a multithreaded environment leads to an infinite loop.
- Calling GC.Collection() periodically makes it worse.
After finding the problem, the solution is simple. It is recommended to change Dictioanry to ConcurrentDictionary and remove the manual call GC.Collection()
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。