One: background
1. Tell a story
A few days ago, a friend added wx and complained that his program is always inexplicably high cpu during the peak period. How to analyze it for help?
After communicating with this friend, it is said that this problem has plagued them for several years, and I also invited a Microsoft engineer to come and solve it. . .
Regarding the problem of high CPU burst, old readers should know that I have written several articles, which are basically divided into two categories:
- GC trigger
- Large number of locks
A small part of it is a variety of inexplicable problems, I can't list them all 🤣🤣🤣, since my friend has found me, I have to find a way to solve it for him, not to talk more, go to windbg.
Two: windbg analysis
1. View the synchronization block table
Encountered this kind of problem, the first check synchronization block table is my inertial thinking, the command is very simple
!syncblk
.
0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
-----------------------------
Total 20779
CCW 16
RCW 21
ComClassFactory 0
Free 16490
I went and threw an empty one, and there was nothing in the sync block table. . . Since it has nothing to do with locks, let's take a look at threads. After all, threads are supported by the CPU.
2. View threads
To view the managed threads in the system, you can use the
!t
command. There are more threads, which is a bit simplified.
0:000> !t
ThreadCount: 135
UnstartedThread: 0
BackgroundThread: 132
PendingThread: 0
DeadThread: 1
Hosted Runtime: no
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
34 1 25d4 000001ea28702130 28220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 Ukn
74 2 3ed0 000001ea286fa940 2b220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 MTA (Finalizer)
76 3 4a70 000001f4447d7810 102a220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 MTA (Threadpool Worker)
77 4 326c 000001f4447dbe60 21220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 Ukn
78 6 2dac 000001f4447d9750 1020220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 Ukn (Threadpool Worker)
79 7 1468 000001f444a2d6f0 21220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 1 Ukn (GC)
80 8 f0 000001f444a2cf20 21220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 Ukn
81 9 3118 000001f444a2f630 21220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 Ukn
Let's sell it first, can any friends see what's so strange about these threads? ? ? Yes, the last column of 79
Ukn (GC)
tag. I think you must be curious. What does this mean? Due to some changes in the underlying GC mode, it tells you to a certain extent that your program triggered the GC. For further verification, you can use !t -special
see what type of thread 79 belongs to and more detailed information.
0:000> !t -special
ThreadCount: 135
UnstartedThread: 0
BackgroundThread: 132
PendingThread: 0
DeadThread: 1
Hosted Runtime: no
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
34 1 25d4 000001ea28702130 28220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 0 Ukn
79 7 1468 000001f444a2d6f0 21220 Preemptive 0000000000000000:0000000000000000 000001ea286ee080 1 Ukn (GC)
OSID Special thread type
41 38d8 DbgHelper
42 1c88 GC
74 3ed0 Finalizer
75 402c ProfilingAPIAttach
76 4a70 Timer
79 1468 GC SuspendEE
From the last line of output, you can see that 79 is the GC thread, and there is a strange SuspendEE
tag behind it. You are curious again. What does this mean?
SuspendEE
=Suspend CLR Execution Engine (EE)
In other words, thread No. 79 CLR execution engine. The purpose is very simple. It is to facilitate the other 31 GC threads manage the current
managed heap, but this brother machine is really 🐂👃, 32core, also I don't know which hospital is so powerful, but to add, use
!cpuid
verify.
0:000> !cpuid
CP F/M/S Manufacturer MHz
0 6,62,4 <unavailable> 2600
1 6,62,4 <unavailable> 2600
2 6,62,4 <unavailable> 2600
3 6,62,4 <unavailable> 2600
xxx
31 6,62,4 <unavailable> 2600
Now that the GC trigger is predicted, the managed and unmanaged stacks of all threads can be called out in the next step.
3. View each thread stack
To view the managed and unmanaged stacks of each thread is very simple, just use !eestack
, and then you can search for the keyword WaitUntilGCComplete
to determine how many threads are waiting for the GC to complete.
It can be seen from the figure that 40 threads are currently blocked, which is great. The problem is getting clearer and clearer. Next, we will analyze which thread did what it shouldn’t do, which caused the GC to be triggered. The same can be done. Search try_allocate_more_space
to determine which threads are allocating space.
When I go, it is obvious that the current 170
and 187
threads are allocating large objects. gc_heap::allocate_large_object
triggers the GC. The large object heap itself is a daunting thing, and its recycling and cleaning are very CPU-intensive. This It also coincides with that my friend said that the CPU dropped in about 1 minute.
4. Find the culprit
Now the focus is on these two threads. I saw that these two thread stacks are the same method, so here is a 187
thread for analysis. You can use !clrstack
see its managed stack.
0:187> !clrstack
OS Thread Id: 0x1ef0 (187)
Child SP IP Call Site
00000054ce631e30 00007ffc4021bde2 System.String.FillStringChecked(System.String, Int32, System.String)
00000054ce631e70 00007ffc402222a8 System.String.ConcatArray(System.String[], Int32)
00000054ce631ed0 00007ffc40223528 System.String.Concat(System.String[])
00000054ce631f40 00007ffbe6dbdafb BLL.xxx.xxx.GetRowString(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1, System.Nullable`1, System.String, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.String, System.String, System.String, System.String, System.String ByRef)
00000054ce65cf40 00007ffbe6ab3187 BLL.xxx.xxx.ExpZB(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1, System.Nullable`1, System.String, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.String, System.String, System.String, System.String)
From the stack, it seems to be System.String.Concat
splice the string. Good guy, I don’t know how many times have been criticized for string splicing for so many years. There are still many people stepping on the pit. For further verification, you can use !clrstack -p + !da -details xxx
Look at what this System.String[] is, simplified as follows:
0:187> !clrstack -p
OS Thread Id: 0x1ef0 (187)
00000054ce631e70 00007ffc402222a8 System.String.ConcatArray(System.String[], Int32)
PARAMETERS:
values (<CLR reg>) = 0x000001ea69e8d2f8
totalLength = <no data>
0:187> !da -details 0x000001ea69e8d2f8
Name: System.String[]
Size: 128(0x80) bytes
Array: Rank 1, Number of elements 13, Type CLASS
Element Methodtable: 00007ffc403d6948
[0] 000001f2391a83f0
Name: System.String
MethodTable: 00007ffc403d6948
EEClass: 00007ffc3fcd50e0
Size: 445950(0x6cdfe) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String: xxxxx
From the output information, we can see that String[]
, of which the largest string is 445950 bytes = 445k
, which is larger than the 85k definition of the large object, so Concat here is the crux, and the same is true for 170 threads. Next, I have to One problem to be solved is: Why is there such a large string produced, and what exactly is done in the code? ? ? To find the answer, you have to export the source code from the dump to find out.
5. View the problem code
To analyze the problem code, you can export the BLL.xxx.xxx.GetRowString method !ip2md + !savemodule
0:187> !ip2md 00007ffbe6dbdafb
MethodDesc: 00007ffbe5342118
Method Name: BLL.xxx.xxx.GetRowString(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1<Int32>, System.Nullable`1<Int32>, System.String, System.Nullable`1<Single>, System.Nullable`1<Single>, System.Nullable`1<Single>, System.Nullable`1<Single>, System.String, System.String, System.String, System.String, System.String ByRef)
Class: 00007ffbe52fe328
MethodTable: 00007ffbe53421d8
mdToken: 0000000006000096
Module: 00007ffbe471a890
0:187> !savemodule 00007ffbe471a890 E:\dumps\RowString.dll
3 sections in file
section 0 - VA=2000, VASize=f7fcc, FileAddr=200, FileSize=f8000
section 1 - VA=fa000, VASize=3bc, FileAddr=f8200, FileSize=400
section 2 - VA=fc000, VASize=c, FileAddr=f8600, FileSize=200
Then ILSpy
decompile this code.
Good guy, this way of writing is really 🐂👃, countless string splicing, I feel that gen
and LOH
are too late to allocate the memory segment 😥😥😥, it really kills the GC. . .
Three: Summary
In fact, this is a textbook-style problem, and there are textbook-style solutions, and I saw that this method has more than 600 lines of code, basically it is to do string splicing, and finally I will talk about the solution.
- Refactor the method, try to use StringBuilder instead of String to minimize the number of triggers due to this GC.
The last little easter egg seems to coincide with the deep suspicion of my friends. . .
More high-quality dry goods: see my GitHub: dotnetfly
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。