One: background

1. Tell a story

A few days ago, a friend of wx asked for help. Its program CPU was often full, and I didn't find the reason. I hope to help.


I have received several dumps with high CPU bursts in a row these days. I am tired of seeing it. I hope that there will be a few other dumps later. From the communication point of view, the performance of this friend is very miserable, which may actually be worse. Now that I have found me, I will try my best to help him find the real murderer behind the scenes, without much to say, go to windbg.

Two: windbg analysis

1. View managed threads

Because threads are supported by cpu, it is also a good idea to start with threads. To view all managed threads of the program, you can use the !t command.


0:000> !t
ThreadCount:      38
UnstartedThread:  0
BackgroundThread: 34
PendingThread:    0
DeadThread:       3
Hosted Runtime:   no
                                                                                                        Lock  
       ID OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
   0    1 105c 000000000067f600    2a020 Preemptive  0000000000000000:0000000000000000 0000000000671ec0 0     MTA 
   2    2 13d0 00000000192c4f40    2b220 Preemptive  0000000000000000:0000000000000000 0000000000671ec0 0     MTA (Finalizer) 
...
XXXX   15    0 000000001bc64970  8039820 Preemptive  0000000000000000:0000000000000000 0000000000671ec0 0     MTA (Threadpool Completion Port) 
  24   23 1380 000000001bc660e0  8029220 Preemptive  0000000000000000:0000000000000000 0000000000671ec0 0     MTA (Threadpool Completion Port) 
XXXX   53    0 000000001bc63200  8039820 Preemptive  0000000000000000:0000000000000000 0000000000671ec0 0     MTA (Threadpool Completion Port) 
XXXX   27 10dc 000000001bd0dbf0  1029220 Preemptive  0000000002CB40F8:0000000002CB4108 0000000000671ec0 1     MTA (GC) (Threadpool Worker) 

From the perspective of the hexagram: the program has 38 threads, 3 dead, I went, there was a bright spot, the last line appeared a familiar MTA (GC) , what does this mean? This means that the current thread triggered the GC, but the strange thing is that the thread that triggered the GC is dead. You must ask how to see it. Look at the XXXX . Forget it, the dead horse is a living doctor, and the thread is called out. For all managed and unmanaged stacks, look for the words WaitUntilGCComplete and try_allocate_more_space

2. View the thread stack

To view the thread stacks of all threads, you can use the ~*e !dumpstack command.

  • Search for the WaitUntilGCComplete keyword.

From the picture, hey, there are really 18 threads waiting, and I also saw System.String.Concat . Is it the same as the his cpu burst height that I posted in the last article? 😘😘😘

  • Search for the try_allocate_more_space keyword.

I went, there is no try_allocate_more_space keyword, which is not a routine with his. It is possible that this dump is not a particularly good time, and it is possible that the program is in some weird behavior.

It seems that this path has come to an end, but again, the thread is raised by the cpu, then bite the bullet and see what each thread is doing. In order to make the result clearer, change the command ~*e !clrstack .

It can be seen from the figure that there are currently 25 threads stuck at FindEntry(System.__Canon) , and from the call stack, it looks like a dll encapsulated by aliyun. Why are so many threads stuck here? Does this give people a big question mark? Next, I will export this code of Alibaba Cloud to see what happened.

3. View the problem code

To export the problem code, use the classic !ip2md + !savemodule combination command.


0:000> !ip2md 000007fe9a1a0641
MethodDesc:   000007fe9a5678e0
Method Name:  Aliyun.Acs.Core.Utils.CacheTimeHelper.AddLastClearTimePerProduct(System.String, System.String, System.DateTime)
Class:        000007fe9a595a08
MethodTable:  000007fe9a567900
mdToken:      00000000060000a6
Module:       000007fe9a561f58
IsJitted:     yes
CodeAddr:     000007fe9a1a0610
Transparency: Critical
0:000> !savemodule 000007fe9a561f58 E:\dumps\AddLastClearTimePerProduct.dll
3 sections in file
section 0 - VA=2000, VASize=14148, FileAddr=200, FileSize=14200
section 1 - VA=18000, VASize=3fc, FileAddr=14400, FileSize=400
section 2 - VA=1a000, VASize=c, FileAddr=14800, FileSize=200

Then use ILSpy to decompile this dll, because it is Alibaba Cloud's code, I can release it with confidence.


// Aliyun.Acs.Core.Utils.CacheTimeHelper
using System;
using System.Collections.Generic;

public class CacheTimeHelper
{
    private static Dictionary<string, DateTime> lastClearTimePerProduct = new Dictionary<string, DateTime>();

    private const int ENDPOINT_CACHE_TIME = 3600;

    public static bool CheckCacheIsExpire(string product, string regionId)
    {
        string key = product + "_" + regionId;
        DateTime dateTime;
        if (lastClearTimePerProduct.ContainsKey(key))
        {
            dateTime = lastClearTimePerProduct[key];
        }
        else
        {
            dateTime = DateTime.Now;
            lastClearTimePerProduct.Add(key, dateTime);
        }
        if (3600.0 < (DateTime.Now - dateTime).TotalSeconds)
        {
            return true;
        }
        return false;
    }

    public static void AddLastClearTimePerProduct(string product, string regionId, DateTime lastClearTime)
    {
        string key = product + "_" + regionId;
        if (lastClearTimePerProduct.ContainsKey(key))
        {
            lastClearTimePerProduct.Remove(key);
        }
        lastClearTimePerProduct.Add(key, lastClearTime);
    }
}

It can be seen that the above code can not if (lastClearTimePerProduct.ContainsKey(key)) . If you chase down, you can refer to the source code of Dictionary.


public class Dictionary<TKey, TValue> 
{
    // System.Collections.Generic.Dictionary<TKey,TValue>
    public bool ContainsKey(TKey key)
    {
        return FindEntry(key) >= 0;
    }
}

CacheTimeHelper to this point, do you see what is wrong with this 060ab01a2d7d25? Yes, the non-thread-safe Dictionary<string, DateTime> is used in a multi-threaded environment, which is very problematic.

4. What are the problems with Dictionary?

Dictionary in a multithreaded environment will definitely cause data confusion. This is undoubtedly, and there will be some exceptions iterations, but if this misuse will cause the CPU to explode, I haven’t seen it in my field of vision. been. . . To be sure, search for on bing.

Hey, there really is such a case: High CPU in .NET app using a static , and then cut a picture.

From the description of the article, it is exactly the same 😁😁😁, which also concludes that operating the Dictionary in a multi-threaded environment may cause FindEntry(key) , and then 25 infinite loops will lift the cpu together to supplement the current Full CPU utilization. . .


0:000> !tp
CPU utilization: 100%
Worker Thread: Total: 27 Running: 27 Idle: 0 MaxLimit: 32767 MinLimit: 4
Work Request in Queue: 0
--------------------------------------
Number of Timers: 1
--------------------------------------
Completion Port Thread:Total: 4 Free: 3 MaxFree: 8 CurrentLimit: 3 MaxLimit: 1000 MinLimit: 4

Three: Summary

Since it is a bug in the sdk of Cloud 160ab01a2d7eae, this problem is troublesome. . . You can't change it, and then tell your friend to submit a work order to solve it.

I thought the matter was over like this, but I think about it, other Alibaba Cloud SDKs used a few years ago also encountered a problem similar to the high CPU. Later, I got it through upgrading the SDK. This time I also gambled and watched. Here is the assembly information.


[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]
[assembly: TargetFramework(".NETStandard,Version=v2.0", FrameworkDisplayName = "")]
[assembly: AssemblyCompany("Alibaba Cloud")]
[assembly: AssemblyConfiguration("Release")]
[assembly: AssemblyCopyright("©2009-2018 Alibaba Cloud")]
[assembly: AssemblyDescription("Alibaba Cloud SDK for C#")]
[assembly: AssemblyFileVersion("1.1.12.0")]
[assembly: AssemblyInformationalVersion("1.1.12")]
[assembly: AssemblyProduct("aliyun-net-sdk-core")]
[assembly: AssemblyTitle("aliyun-net-sdk-core")]
[assembly: AssemblyVersion("1.1.12.0")]

You can see that your friend is currently using the 1.1.12.0 version, then upgrade aliyun-net-sdk-core to the latest version and see if this CacheTimeHelper has been repaired?


As expected, it has been fixed in the new version, so experience tells me that with Alibaba Cloud's SDK, remember to upgrade frequently, otherwise all kinds of big pits are waiting for you. . . 😔😔😔

More high-quality dry goods: see my GitHub: dotnetfly


一线码农
369 声望1.6k 粉丝