One: background

1. Tell a story

Last month, a friend added wx to inform him that his program was dead, and asked how to analyze it further. The screenshot is as follows:

Seeing that this friend still has a certain analysis basis, he may play less and lack a certain analysis experience. After I analyze it briefly, I found this dump is quite interesting, CPU, memory, and death are all accounted for, and the program is miserable. There is nothing more than this. . .

Now that I found me, I have to find a way to resolve his pain. From easy to difficult, let's analyze the three reasons one by one.

2: Three high analysis

1. Cause of death

According to 40+ dump analysis experience, most of the hangs are caused by a certain situation that causes the thread to !tp , which causes subsequent requests to accumulate in the threadpool. If you want to verify, you can use the 0614d3839a12d3 command to view the thread pool queue.


0:000> !tp
CPU utilization: 81%
Worker Thread: Total: 65 Running: 65 Idle: 0 MaxLimit: 32767 MinLimit: 64
Work Request in Queue: 2831
    Unknown Function: 00007ffffcba1750  Context: 0000022ab04d4a58
    Unknown Function: 00007ffffcba1750  Context: 0000022ab03e4ce8
    ...
    Unknown Function: 00007ffffcba1750  Context: 0000022ab825ec88
    Unknown Function: 00007ffffcba1750  Context: 0000022ab825a458
    Unknown Function: 00007ffffcba1750  Context: 0000022ab8266500
    Unknown Function: 00007ffffcba1750  Context: 0000022ab8268198
    Unknown Function: 00007ffffcba1750  Context: 0000022ab826cb00
    Unknown Function: 00007ffffcba1750  Context: 0000022ab8281578
--------------------------------------
Number of Timers: 0
--------------------------------------
Completion Port Thread:Total: 2 Free: 2 MaxFree: 128 CurrentLimit: 2 MaxLimit: 32767 MinLimit: 64

It can be clearly seen that there are 2831 tasks in the thread pool queue, which causes the new incoming requests to be unable to be processed, so there is a hang up phenomenon. Next, let’s take a look at what these threads are doing and why The efficiency is so low, you can use ~*e !clrstack call up all thread stacks, the screenshot is as follows:

After scanning it, I found that there are many System.Net.HttpWebRequest.GetResponse() methods. Experienced friends should know that this is another classic synchronous http request that is too slow and the program cannot handle it. Some friends may be curious, can you give me the URL !dso it out, yes, just use the 0614d3839a134a command.


000000D2FBD3B840 0000023269e85698 System.Text.UTF8Encoding
000000D2FBD3B850 00000236e9dd2cb8 System.String    application/x-www-form-urlencoded
000000D2FBD3B870 0000023269e85698 System.Text.UTF8Encoding
000000D2FBD3B9A8 00000231aa221a38 System.String    uSyncAppxxx
000000D2FBD3B9B8 00000231aa201a70 System.String    VToken={0}&Vorigin={1}&QueryJson={2}
000000D2FBD3B9C0 00000231aa202200 System.String    http://xxx.xxx.com/API/xxx/BusinessCardFolder/Connector/xxx/GetPageList

I went, this url is still an external network address, 🐂👃, the synchronization method itself is slow, this address is even worse. . . No wonder it's not stuck 😄

2. Analysis of cpu burst height

It can also be seen from the output !tp cpu = 81% , then why is it so high? According to experience, it is probably lock lock, GC trigger, infinite loop, etc., you can use the elimination method.

  1. Is it a lock?

You can use the command !syncblk look at the synchronization block table.


0:000> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
  212 0000023ef3cdd028            3         1 0000023ef40efa40 8d70 209   000002396ad93788 System.Object
-----------------------------
Total           297
CCW             3
RCW             4
ComClassFactory 0
Free            139

From the output, there is nothing wrong with the lock. Next, use the !mlocks command to check other types of locks to see if there are any new discoveries.


0:000> !mlocks
Examining SyncBlocks...
Scanning for ReaderWriterLock(Slim) instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...

ClrThread  DbgThread  OsThread    LockType    Lock              LockLevel
------------------------------------------------------------------------------
...
0x49       209        0x8d70      thinlock    000002396ad9ba90  (recursion:0)
0x49       209        0x8d70      thinlock    000002396ad9baa8  (recursion:0)
0x49       209        0x8d70      thinlock    000002396ad9bac0  (recursion:0)
0x49       209        0x8d70      thinlock    000002396ad9bad8  (recursion:0)
0x49       209        0x8d70      thinlock    000002396ad9baf0  (recursion:0)
0x49       209        0x8d70      thinlock    000002396ad9bb08  (recursion:0)
0x49       209        0x8d70      thinlock    000002396ad9bb20  (recursion:0)
0x49       209        0x8d70      thinlock    000002396ad9bb38  (recursion:0)
0x49       209        0x8d70      thinlock    000002396ad9bb50  (recursion:0)
0x49       209        0x8d70      thinlock    000002396ad9bb68  (recursion:0)
0x49       209        0x8d70      thinlock    000002396ad9bb80  (recursion:0)
0xe        152        0x8e68      thinlock    0000023669f7e428  (recursion:0)
0x41       208        0x8fb4      thinlock    00000235e9f6e8d0  (recursion:0)
0x17       161        0x9044      thinlock    00000238ea94db68  (recursion:0)
0x16       159        0x911c      thinlock    000002392a03ed40  (recursion:0)
0x47       206        0x9264      thinlock    000002322af08e28  (recursion:0)

I went and found that there are a large number of thinlocks, and DbgThread=209 threads actually have 1000+. The screenshot is as follows:

Some friends may not know what thinlock is. Simply put, it is a cpu-consuming internal spin lock, similar to SpinLock. Next, pick an object and check its !gcroot .


0:000> !gcroot 000002396ad9ba48
Thread 2580:
    000000d2fb0bef10 00007ff806945ab3 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
        rbp-80: 000000d2fb0bef50
            ->  0000023769dd4008 System.Threading.Thread
            ->  0000023269e776b8 System.Runtime.Remoting.Contexts.Context
            ->  0000023269e773b8 System.AppDomain
            ...
            ->  0000023269ee1e00 System.Threading.TimerCallback
            ->  0000023269ed2d30 System.Web.Caching.CacheExpires
            ->  0000023269ed2c78 System.Web.Caching.CacheSingle
            ->  0000023269ed2ce0 System.Collections.Hashtable
            ->  000002372ab91d90 System.Collections.Hashtable+bucket[]
            ->  00000239ab32fd10 System.Web.Caching.CacheEntry
            ->  000002396ad93748 System.Collections.Concurrent.ConcurrentDictionary`2[[System.String, mscorlib],[xxx].Application.Entity.BaseManage.UserRelationEntity, xxx.Application.Entity]]
            ->  00000239ab2a8248 System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[System.String, mscorlib],[xxx.Application.Entity.BaseManage.UserRelationEntity, xxx.Application.Entity]]
            ->  000002396ad96b80 System.Object[]
            ->  000002396ad9ba48 System.Object

From the output information, this thinlock comes from the ConcurrentDictionary dictionary. Next, we dump this dictionary and use the !mdt command.


0:148> !mdt 000002396ad93748
000002396ad93748 (System.Collections.Concurrent.ConcurrentDictionary`2[[System.String, mscorlib],[xxx.Application.Entity.BaseManage.UserRelationEntity, xxx.Application.Entity]])
    m_tables:00000239ab2a8248 (System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[System.String, mscorlib],[xxx.Application.Entity.BaseManage.UserRelationEntity, xxx.Application.Entity]])
    m_comparer:NULL (System.Collections.Generic.IEqualityComparer`1[[System.__Canon, mscorlib]])
    m_growLockArray:true (System.Boolean)
    m_keyRehashCount:0x0 (System.Int32)
    m_budget:0x213 (System.Int32)
    m_serializationArray:NULL (System.Collections.Generic.KeyValuePair`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]][])
    m_serializationConcurrencyLevel:0x0 (System.Int32)
    m_serializationCapacity:0x0 (System.Int32)
0:148> !mdt 00000239ab2a8248
00000239ab2a8248 (System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[System.String, mscorlib],[xxx.Application.Entity.BaseManage.UserRelationEntity, xxx.Application.Entity]])
    m_buckets:0000023e9a2477e8 (System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.String, mscorlib],[xxx.Application.Entity.BaseManage.UserRelationEntity, xxx.Application.Entity]][], Elements: 543997)
    m_locks:000002396ad96b80 (System.Object[], Elements: 1024)
    m_countPerLock:00000239aa8472c8 (System.Int32[], Elements: 1024)
    m_comparer:0000023269e782b8 (System.Collections.Generic.GenericEqualityComparer`1[[System.String, mscorlib]])

From the above information, this dictionary has 54.3 w records, why is it so big, and there are 1024 locks, which is a bit interesting, let's take a look at the source code.

From the source code, there is indeed a lock[] array inside. What kind of operation triggered the traversal of locks[] . If you want to find the answer, you can look for the ConcurrentDictionary keyword on all thread stacks.


OS Thread Id: 0x2844 (163)
        Child SP               IP Call Site
000000d2fb83abb8 00007ff80a229df8 [GCFrame: 000000d2fb83abb8] 
000000d2fb83aca0 00007ff80a229df8 [GCFrame: 000000d2fb83aca0] 
000000d2fb83acd8 00007ff80a229df8 [HelperMethodFrame: 000000d2fb83acd8] System.Threading.Monitor.Enter(System.Object)
000000d2fb83add0 00007ff80693ea56 System.Collections.Concurrent.ConcurrentDictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].AcquireLocks(Int32, Int32, Int32 ByRef)
000000d2fb83ae20 00007ff806918ef2 System.Collections.Concurrent.ConcurrentDictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].AcquireAllLocks(Int32 ByRef)
000000d2fb83ae60 00007ff8069153f9 System.Collections.Concurrent.ConcurrentDictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].GetValues()
000000d2fb83aee0 00007ff7ae17d8ec xxx.Util.DataHelper.ToEnumerable[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.Collections.Concurrent.ConcurrentDictionary`2<System.__Canon,System.__Canon>)
000000d2fb83af20 00007ff7ad125241 xxx.Application.Code.CacheHelper.GetCaches[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.String)
000000d2fb83afa0 00007ff7ad12513b xxx.Application.Code.CacheHelper.GetCaches[[System.__Canon, mscorlib]](System.String)
000000d2fb83b000 00007ff7b10199e5 xxx.Application.Cache.CacheHelper.GetUserRelations()

From the thread stack, we found that there are nearly 20 codes as above. It can be seen that the program triggered the lock of ConcurrentDictionary when calling the GetCaches method and was stuck. Next, let's take a look at what the source code of xxx.Application.Cache.CacheHelper.GetUserRelations()


public static IEnumerable<UserRelationEntity> GetUserRelations()
{
    return xxx.Application.Code.CacheHelper.GetCaches<UserRelationEntity>("xxx.BaseManage-UserRelation");
}

protected static IEnumerable<T> GetCaches<T>(string cacheKeyName)
{
    return GetCaches<T, string>(cacheKeyName);
}

private static IEnumerable<T> GetCaches<T, TKey>(string cacheKeyName)
{
    return GetConcurrentDictionaryCache<T, TKey>(cacheKeyName)?.ToEnumerable();
}

public static IEnumerable<T> ToEnumerable<TKey, T>(this ConcurrentDictionary<TKey, T> dics)
{
    return dics.Values;
}

From the source logic point of view, the program will eventually call dics.Values every time the program calls the cache. I am curious what its framework logic is like? Screenshot below:

Have you discovered that dict.Values to be executed 1024 times Monitor.Enter(locks[i], ref lockTaken); , which is 1024 internal spin locks, which is a key factor for high CPU.

3. Reasons for high memory burst

The last question is why does memory explode? Careful friends should find that GetValues just now. I will post the code again:


private ReadOnlyCollection<TValue> GetValues()
{
    int locksAcquired = 0;
    try
    {
        AcquireAllLocks(ref locksAcquired);
        int countInternal = GetCountInternal();
        if (countInternal < 0)
        {
            throw new OutOfMemoryException();
        }
        List<TValue> list = new List<TValue>(countInternal);
        for (int i = 0; i < m_tables.m_buckets.Length; i++)
        {
            for (Node node = m_tables.m_buckets[i]; node != null; node = node.m_next)
            {
                list.Add(node.m_value);
            }
        }
        return new ReadOnlyCollection<TValue>(list);
    }
    finally
    {
        ReleaseLocks(0, locksAcquired);
    }
}

54.3w will be generated every time GetValues, please note that this list is newly generated, not a reference on ConcurrentDictionary, this is a pitfall, every time it is called, there will be a large list on LOH List, you said that the memory surge does not surge? ? ?

Three: Summary

In general, these three highs are caused by the following two factors.

  1. Using the synchronous HttpRequest method and using the external network url caused the program to hang.

Optimization measures: use asynchronous mode

  1. The ConcurrentDictionary.Values of the giant pit caused the memory and cpu to explode.

I think many of my friends never thought: ConcurrentDictionary.Values has such a big pit, which reminds me of how the thread-unsafe Dictionary.Values is done?

public ValueCollection Values
{
    [__DynamicallyInvokable]
    get
    {
        if (values == null)
        {
            values = new ValueCollection(this);
        }
        return values;
    }
}

public sealed class ValueCollection
{
    public ValueCollection(Dictionary<TKey, TValue> dictionary)
    {
        if (dictionary == null)
        {
            ThrowHelper.ThrowArgumentNullException(ExceptionArgument.dictionary);
        }
        this.dictionary = dictionary;
    }
}

It can be clearly seen that it did not generate a new list, so the optimization measures are as follows:

  1. Refuse to use ConcurrentDictionary.Values, adopt lock + Dictionary .
  2. If you insist on using ConcurrentDictionary, please send the Query condition, instead of using Values to do the full pull and then query, reducing the fearless memory usage.

Finally, I have a little easter egg. After giving the analysis results to this friend, my friend wanted me to come to analyze it, and it was the first time I met it. . . Too caught off guard 🤣🤣🤣


一线码农
369 声望1.6k 粉丝