1

One: Background

1. Storytelling

The day before yesterday, a friend added a WeChat to ask for his program and the CPU burst high. The opening was a red envelope, which scared me! 🤣🤣🤣

Since it was a young year in the south, it was inconvenient for me to deal with Zhang Luo in my hometown. I didn’t help him deal with it while(true) . My friend found out the problem the next morning, and the feedback said that it was caused by 061f3b09f6f3af, which is a bit interesting. Among the many CPU explosion cases I analyzed, I have never encountered while(true) . I have always regretted it. I am really lucky and caught up a few years ago, haha 😄😄😄.

Next, let's use windbg to analyze it together.

Two: Windbg analysis

1. Check the CPU usage

Friends who have been following me all know that the !tp command is enough.


0:022> !tp
CPU utilization: 95 Unknown format characterUnknown format control characterWorker Thread: Total: 11 Running: 11 Idle: 0 MaxLimit: 32767 MinLimit: 4
Work Request in Queue: 0
--------------------------------------
Number of Timers: 3
--------------------------------------
Completion Port Thread:Total: 4 Free: 4 MaxFree: 8 CurrentLimit: 4 MaxLimit: 1000 MinLimit: 4

There is a sentence Unknown format characterUnknown format control characterWorker above, which is not very harmonious. In fact, it means % . I don't know why this encoding problem occurs in !eeversion .


0:022> !eeversion
4.700.21.56803 (3.x runtime) free
4,700,21,56803 @Commit: 28bb6f994c28bc91f09bc0ddb5dcb51d0f066806
Workstation mode
In plan phase of garbage collection
SOS Version: 5.0.4.36902 retail build

From the basic information, the current .netcore 3.x , and it is obvious that the current GC is in the planning stage. So what is the planning stage?

2. What is the planning stage

In short, the GC in the planning phase needs to decide whether the current managed heap should perform a simple mark free operation or a heavy-weight compression operation. If compression is to be performed, it also needs to involve the relocation of managed heap objects. It often consumes a lot of CPU time slices. What is the next thing to explore that causes the GC to trigger?

3. GC trigger reasons

Since GC triggers are often caused by user threads allocating data, in the entire execution flow triggered by GC, one of the is to freeze the 161f3b09f6f4b0 CLR execution engine, which is SuspendEE , which can be found in gc.app .

Why do you have to mention SuspendEE ? It is because I can find out the SuspendEE !t -special , which is more accurate.


0:072> !t -special
ThreadCount:      54
UnstartedThread:  0
BackgroundThread: 40
PendingThread:    0
DeadThread:       1
Hosted Runtime:   no

          OSID Special thread type
        1 6328 DbgHelper 
        2 35c0 Finalizer 
        4 5aac Timer 
        5 38b0 ThreadpoolWorker 
       17 3530 ThreadpoolWorker 
       18 4484 ThreadpoolWorker 
       19 1e4c ThreadpoolWorker 
       21 6380 ThreadpoolWorker 
       44 5bc4 SuspendEE 
       52  8ac ThreadpoolWorker 
       54 4164 ThreadpoolWorker 
       56 61c8 ThreadpoolWorker 
       58 1fa4 ThreadpoolWorker 
       60 2788 ThreadpoolWorker 
       69 48f4 IOCompletion 
       70 5708 IOCompletion 
       71 3b58 ThreadpoolWorker 
       72 17a0 GC 
       73 2f00 Gate 
       74 35e8 IOCompletion 
       75 5730 IOCompletion 

It can be seen that the current 44 is the thread that triggered the GC. Then it becomes clear. Let's see what thread 44 is doing? Cut to thread 44, then !clrstack .


0:044> !clrstack 
OS Thread Id: 0x5bc4 (44)
        Child SP               IP Call Site
000000A2B0C3E4C8 00007ffd471ead3a [HelperMethodFrame: 000000a2b0c3e4c8] 
000000A2B0C3E5E0 00007ffce8b4e506 System.Collections.Generic.List`1[[System.__Canon, System.Private.CoreLib]].System.Collections.Generic.IEnumerable.GetEnumerator() [/_/src/System.Private.CoreLib/shared/System/Collections/Generic/List.cs @ 585]
000000A2B0C3E630 00007ffce85e7a10 xxx.Program.DeletexxxExipredDate()
000000A2B0C3E780 00007ffd46bc1f0b System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44]
000000A2B0C3E7B0 00007ffd46bb90b6 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 172]
000000A2B0C3E830 00007ffd46ba535b System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93]
000000A2B0C3EA48 00007ffd47236c93 [GCFrame: 000000a2b0c3ea48] 
000000A2B0C3ECB0 00007ffd47236c93 [DebuggerU2MCatchHandlerFrame: 000000a2b0c3ecb0] 

From the output information, the problem DeletexxxExipredDate() in the 061f3b09f6f552 method, and then explore the source code of this method.


        private static void DeletexxxExipredDate()
        {
            while (true)
            {
                foreach (string key in xxx.xxxSpeedLimit.Keys)
                {
                    try
                    {
                        string[] array = xxx.xxxSpeedLimit[key].Split('$');
                        if (array.Length > 1)
                        {
                            DateTime dateTime = Convert.ToDateTime(array[1]);
                            if ((DateTime.Now - dateTime).TotalSeconds > 21600.0 && xxx.xxxSpeedLimit.ContainsKey(key))
                            {
                                xxx.xxxSpeedLimit.TryRemove(key, out var _);
                            }
                        }
                    }
                    catch (Exception ex)
                    {
                        LogHelper.WriteAppExceptionLog("删除数据出现异常:" + ex.Message, ex);
                    }
                    Thread.Sleep(20000);
                }
            }
        }

If you have rich experience in stepping on the pit, I believe that you can see the problems in this code at a glance. Yes, when the xxxSpeedLimit dictionary is empty, it is equivalent to an infinite loop of while(true)

In order to verify my statement, you can use !dso find the memory address of dict, and then use !wconcurrentdict .


0:044> !dso 
OS Thread Id: 0x5bc4 (44)
RSP/REG          Object           Name
...
000000A2B0C3E708 000001ba8007f618 System.Collections.Concurrent.ConcurrentDictionary`2[[System.String, System.Private.CoreLib],[System.String, System.Private.CoreLib]]
000000A2B0C3E760 000001ba88501cd0 System.Collections.Generic.List`1+Enumerator[[System.String, System.Private.CoreLib]]
000000A2B0C3E768 000001ba80050ec0 System.Threading.ContextCallback
000000A2B0C3E7F8 000001ba80a1a818 System.Threading.Thread
000000A2B0C3EA28 000001ba80a1a898 System.Threading.ThreadStart
0:044> !wconcurrentdict 000001ba8007f618
Empty ConcurrentDictionary

As you can see, it is currently a empty dictionary😂😂😂

Three: Summary

The main reason for this accident: the coders lacked a certain amount of programming experience, and when writing business logic, they lacked the process of processing the empty dictionary while(true) , and it was also possible that the Thread.Sleep(20000) misplaced 🤣🤣 🤣

In general, I am very grateful for the dump provided by this friend, which made me really see it!


一线码农
366 声望1.6k 粉丝