One: background

1. Tell a story

A few days ago, a friend added wx and complained that his program is always inexplicably high cpu during the peak period. How to analyze it for help?



After communicating with this friend, it is said that this problem has plagued them for several years, and I also invited a Microsoft engineer to come and solve it. . .

Regarding the problem of high CPU burst, old readers should know that I have written several articles, which are basically divided into two categories:

  • GC trigger
  • Large number of locks

A small part of it is a variety of inexplicable problems, I can't list them all 🤣🤣🤣, since my friend has found me, I have to find a way to solve it for him, not to talk more, go to windbg.

Two: windbg analysis

1. View the synchronization block table

Encountered this kind of problem, the first check synchronization block table is my inertial thinking, the command is very simple !syncblk .


0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
-----------------------------
Total           20779
CCW             16
RCW             21
ComClassFactory 0
Free            16490

I went and threw an empty one, and there was nothing in the sync block table. . . Since it has nothing to do with locks, let's take a look at threads. After all, threads are supported by the CPU.

2. View threads

To view the managed threads in the system, you can use the !t command. There are more threads, which is a bit simplified.


0:000> !t
ThreadCount:      135
UnstartedThread:  0
BackgroundThread: 132
PendingThread:    0
DeadThread:       1
Hosted Runtime:   no
                                                                                                        Lock  
       ID OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
  34    1 25d4 000001ea28702130    28220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn 
  74    2 3ed0 000001ea286fa940    2b220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     MTA (Finalizer) 
  76    3 4a70 000001f4447d7810  102a220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     MTA (Threadpool Worker) 
  77    4 326c 000001f4447dbe60    21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn 
  78    6 2dac 000001f4447d9750  1020220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn (Threadpool Worker) 
  79    7 1468 000001f444a2d6f0    21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 1     Ukn (GC) 
  80    8   f0 000001f444a2cf20    21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn 
  81    9 3118 000001f444a2f630    21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn 

Let's sell it first, can any friends see what's so strange about these threads? ? ? Yes, the last column of 79 Ukn (GC) tag. I think you must be curious. What does this mean? Due to some changes in the underlying GC mode, it tells you to a certain extent that your program triggered the GC. For further verification, you can use !t -special see what type of thread 79 belongs to and more detailed information.


0:000> !t -special
ThreadCount:      135
UnstartedThread:  0
BackgroundThread: 132
PendingThread:    0
DeadThread:       1
Hosted Runtime:   no
                                                                                                        Lock  
       ID OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
  34    1 25d4 000001ea28702130    28220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 0     Ukn 
  79    7 1468 000001f444a2d6f0    21220 Preemptive  0000000000000000:0000000000000000 000001ea286ee080 1     Ukn (GC) 

OSID Special thread type
41 38d8 DbgHelper 
42 1c88 GC 
74 3ed0 Finalizer 
75 402c ProfilingAPIAttach 
76 4a70 Timer
79 1468 GC SuspendEE 

From the last line of output, you can see that 79 is the GC thread, and there is a strange SuspendEE tag behind it. You are curious again. What does this mean?

SuspendEE = Suspend CLR Execution Engine (EE)

In other words, thread No. 79 CLR execution engine. The purpose is very simple. It is to facilitate the other 31 GC threads manage the current managed heap, but this brother machine is really 🐂👃, 32core, also I don't know which hospital is so powerful, but to add, use !cpuid verify.


0:000> !cpuid
CP  F/M/S  Manufacturer     MHz
 0  6,62,4  <unavailable>   2600
 1  6,62,4  <unavailable>   2600
 2  6,62,4  <unavailable>   2600
 3  6,62,4  <unavailable>   2600
 xxx
31  6,62,4  <unavailable>   2600

Now that the GC trigger is predicted, the managed and unmanaged stacks of all threads can be called out in the next step.

3. View each thread stack

To view the managed and unmanaged stacks of each thread is very simple, just use !eestack , and then you can search for the keyword WaitUntilGCComplete to determine how many threads are waiting for the GC to complete.

It can be seen from the figure that 40 threads are currently blocked, which is great. The problem is getting clearer and clearer. Next, we will analyze which thread did what it shouldn’t do, which caused the GC to be triggered. The same can be done. Search try_allocate_more_space to determine which threads are allocating space.


When I go, it is obvious that the current 170 and 187 threads are allocating large objects. gc_heap::allocate_large_object triggers the GC. The large object heap itself is a daunting thing, and its recycling and cleaning are very CPU-intensive. This It also coincides with that my friend said that the CPU dropped in about 1 minute.

4. Find the culprit

Now the focus is on these two threads. I saw that these two thread stacks are the same method, so here is a 187 thread for analysis. You can use !clrstack see its managed stack.


0:187> !clrstack 
OS Thread Id: 0x1ef0 (187)
        Child SP               IP Call Site
00000054ce631e30 00007ffc4021bde2 System.String.FillStringChecked(System.String, Int32, System.String)
00000054ce631e70 00007ffc402222a8 System.String.ConcatArray(System.String[], Int32)
00000054ce631ed0 00007ffc40223528 System.String.Concat(System.String[])
00000054ce631f40 00007ffbe6dbdafb BLL.xxx.xxx.GetRowString(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1, System.Nullable`1, System.String, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.String, System.String, System.String, System.String, System.String ByRef)
00000054ce65cf40 00007ffbe6ab3187 BLL.xxx.xxx.ExpZB(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1, System.Nullable`1, System.String, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.Nullable`1, System.String, System.String, System.String, System.String)

From the stack, it seems to be System.String.Concat splice the string. Good guy, I don’t know how many times have been criticized for string splicing for so many years. There are still many people stepping on the pit. For further verification, you can use !clrstack -p + !da -details xxx Look at what this System.String[] is, simplified as follows:


0:187> !clrstack -p
OS Thread Id: 0x1ef0 (187)
00000054ce631e70 00007ffc402222a8 System.String.ConcatArray(System.String[], Int32)
    PARAMETERS:
        values (<CLR reg>) = 0x000001ea69e8d2f8
        totalLength = <no data>
0:187> !da -details 0x000001ea69e8d2f8
Name:        System.String[]
Size:        128(0x80) bytes
Array:       Rank 1, Number of elements 13, Type CLASS
Element Methodtable: 00007ffc403d6948
[0] 000001f2391a83f0
    Name:        System.String
    MethodTable: 00007ffc403d6948
    EEClass:     00007ffc3fcd50e0
    Size:        445950(0x6cdfe) bytes
    File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
    String:      xxxxx

From the output information, we can see that String[] , of which the largest string is 445950 bytes = 445k , which is larger than the 85k definition of the large object, so Concat here is the crux, and the same is true for 170 threads. Next, I have to One problem to be solved is: Why is there such a large string produced, and what exactly is done in the code? ? ? To find the answer, you have to export the source code from the dump to find out.

5. View the problem code

To analyze the problem code, you can export the BLL.xxx.xxx.GetRowString method !ip2md + !savemodule


0:187> !ip2md 00007ffbe6dbdafb
MethodDesc:   00007ffbe5342118
Method Name:  BLL.xxx.xxx.GetRowString(System.String, Boolean, Boolean, System.String, System.String, System.String, System.String, System.String, System.String, Int32, System.String, System.String, System.String, System.String, System.String, System.String, Int32, Int32, System.String, System.Nullable`1<Int32>, System.Nullable`1<Int32>, System.String, System.Nullable`1<Single>, System.Nullable`1<Single>, System.Nullable`1<Single>, System.Nullable`1<Single>, System.String, System.String, System.String, System.String, System.String ByRef)
Class:        00007ffbe52fe328
MethodTable:  00007ffbe53421d8
mdToken:      0000000006000096
Module:       00007ffbe471a890
0:187> !savemodule  00007ffbe471a890 E:\dumps\RowString.dll
3 sections in file
section 0 - VA=2000, VASize=f7fcc, FileAddr=200, FileSize=f8000
section 1 - VA=fa000, VASize=3bc, FileAddr=f8200, FileSize=400
section 2 - VA=fc000, VASize=c, FileAddr=f8600, FileSize=200

Then ILSpy decompile this code.

Good guy, this way of writing is really 🐂👃, countless string splicing, I feel that gen and LOH are too late to allocate the memory segment 😥😥😥, it really kills the GC. . .

Three: Summary

In fact, this is a textbook-style problem, and there are textbook-style solutions, and I saw that this method has more than 600 lines of code, basically it is to do string splicing, and finally I will talk about the solution.

  • Refactor the method, try to use StringBuilder instead of String to minimize the number of triggers due to this GC.

The last little easter egg seems to coincide with the deep suspicion of my friends. . .

More high-quality dry goods: see my GitHub: dotnetfly


一线码农
369 声望1.6k 粉丝