One: background

1. Tell a story

At the end of last month, a friend of WeChat approached me and said that his program was too slow to process RabbitMQ in multi-threading. Please help analyze the reasons. The screenshot is as follows:

This problem was thrown out, a bit daunting, no CPU bursting high, no memory leaks, no program stuck. . . The ghost knows why Rabbitmq is too slow 🤣🤣🤣 🤣🤣🤣

Since there is no reason, then look for it in the direction of freeze and talk to windbg.

Two: Windbg analysis

1. Look for the cause of freeze

There are many reasons for the slow processing. It may be that there are fewer people working, or these people have gone fishing. In short, the work is not saturated. Next, let's see if this group of people have fished, starting from the thread pool.


0:000> !tp
CPU utilization: 81%
Worker Thread: Total: 187 Running: 172 Idle: 0 MaxLimit: 32767 MinLimit: 8
Work Request in Queue: 0
--------------------------------------
Number of Timers: 0
--------------------------------------
Completion Port Thread:Total: 1 Free: 1 MaxFree: 16 CurrentLimit: 1 MaxLimit: 1000 MinLimit: 8

From the output, the current CPU=81% is probably caused by this program. Since the CPU is too high, it may be some computationally intensive operations, it may be a context switch caused by a large number of locks, or it may be caused by frequent GC triggers.

2. Is it a GC trigger?

At this point in the series, I believe that many friends should know how to see whether it is caused by GC from the thread stack. ~*e !dumpstack , it was found that thread 214 triggered the GC, as shown in the following figure:

!dumpstack is too much information output with 06191c63e5e5b0, in order to simplify the information, use !clrstack instead.


0:214> !clrstack 
OS Thread Id: 0x290c (214)
        Child SP               IP Call Site
000000bb2983c7d0 00007ffed8c3e335 System.Text.UTF8Encoding.GetString(Byte[], Int32, Int32)
000000bb2983c810 00007ffe7aaecee9 StackExchange.Redis.RedisValue.op_Implicit(StackExchange.Redis.RedisValue)
000000bb2983c860 00007ffe7aaecd92 xxxx.Util.Helper.RedisHelper.ConvertObj[[System.__Canon, mscorlib]](StackExchange.Redis.RedisValue)
000000bb2983c8f0 00007ffe7abc3997 xxxx.Util.Helper.RedisHelper+c__DisplayClass29_0`1[[System.__Canon, mscorlib]].b__0(StackExchange.Redis.IDatabase)
000000bb2983ca20 00007ffe7aaeaca0 xxxx.Util.Helper.RedisHelper.Do[[System.__Canon, mscorlib]](System.Func`2<StackExchange.Redis.IDatabase,System.__Canon>)
000000bb2983caa0 00007ffe7abc3762 xxxx.Util.Helper.RedisHelper.HashGet[[System.__Canon, mscorlib]](System.String, System.String)
000000bb2983cb40 00007ffe7abc2f3d xxxx.Implementation.xxx.GetCompany(System.String)
...
000000bb2983e170 00007ffe7ad2fadf xxx.xxx.MQ.xxx+c__DisplayClass21_0`1[[System.__Canon, mscorlib]].b__1()
000000bb2983e1c0 00007ffed8c11862 System.Threading.Tasks.Task`1[[System.__Canon, mscorlib]].InnerInvoke()
000000bb2983e220 00007ffed8c10937 System.Threading.Tasks.Task.Execute()
000000bb2983e260 00007ffed8bc674e System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000bb2983e330 00007ffed8bc65e7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000bb2983e360 00007ffed8c10bdd System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
000000bb2983e410 00007ffed8c10303 System.Threading.Tasks.Task.ExecuteEntry(Boolean)
000000bb2983e450 00007ffed8bcfa10 System.Threading.ThreadPoolWorkQueue.Dispatch()
000000bb2983e8e8 00007ffed9d465d3 [DebuggerU2MCatchHandlerFrame: 000000bb2983e8e8] 

From the information in the hexagram, the program RedisHelper.HashGet . Let me go. Is it because I read some big data from redis? Next, use !clrstack -a see what you have read from redis?


0:214> !clrstack -a
OS Thread Id: 0x290c (214)
        Child SP               IP Call Site
000000bb2983c860 00007ffe7aaecd92 xxx.Util.Helper.RedisHelper.ConvertObj[[System.__Canon, mscorlib]](StackExchange.Redis.RedisValue)
    PARAMETERS:
        this (0x000000bb2983c8f0) = 0x000000bb182da3d0
        value (0x000000bb2983c960) = 0x000000bb18440778
    LOCALS:
        0x000000bb2983c8c4 = 0x0000000000000000
        0x000000bb2983c8a0 = 0x0000000000000000
        0x000000bb2983c898 = 0x0000000000000000

0:214> !do 0x000000bb18440778
Name:        System.Byte[]
MethodTable: 00007ffed8db93d0
EEClass:     00007ffed87f4dc8
Size:        6679(0x1a17) bytes
Array:       Rank 1, Number of elements 6655, Type Byte (Print Array)
Content:     {"ID":104x,"Code":"130x","xxx":"1304","xxx":"8888","Name":"...............","xxx":"...............","MqConnStr"

As you can see, redis has read about 6.6 k data, only such a little data, I thought dozens of M 🤣🤣🤣, if this thing can trigger GC, I don't believe it.

3. What caused the GC to trigger?

So who is the culprit of In fact, if you think about it carefully, GC triggering is nothing more than pushing the gen0 gen1 , and then from gen1 to gen2 . Then the RabbitMQ of my friend is processing slowly, and it must have triggered multiple GCs, which is the result of this slow processing that is visible to the naked eye. .

There is a breakthrough here. Since multiple GCs are triggered, there must be many rooted objects on gen2, so it is better to start here, and use the !dumpgen command to view.


0:214> !dumpgen 2 -stat
       Count      Total Size      Type
-------------------------------------------------
      87,738     24,925,316   System.Int32[]
     509,643     56,399,716   System.String
      83,077     65,760,888   System.Reflection.Emit.__FixupData[]
   2,158,346     69,067,072   System.Reflection.Emit.GenericMethodInfo
      92,388    172,765,392   System.Object[]
     796,870    179,587,796   **** FREE ****
  14,030,441    336,730,584   System.RuntimeMethodHandle
     428,376    348,743,274   System.Byte[]

I went, there is actually 1.4 kw . This is so curious. Then I will pick a few addresses and see what its reference root is.


0:214> !dumpgen 2 -type System.RuntimeMethodHandle
Object             MT                    Size   Name
-------------------------------------------------------------------
000000bac9f8fe68   00007FFED8DB9978        24   System.RuntimeMethodHandle
000000bac9f8fe80   00007FFED8DB9978        24   System.RuntimeMethodHandle
000000bac9f8fe98   00007FFED8DB9978        24   System.RuntimeMethodHandle
000000bac9f8feb0   00007FFED8DB9978        24   System.RuntimeMethodHandle
000000bac9f8fec8   00007FFED8DB9978        24   System.RuntimeMethodHandle
...
000000baca1af510   00007FFED8DB9978        24   System.RuntimeMethodHandle
000000baca1af548   00007FFED8DB9978        24   System.RuntimeMethodHandle
000000baca1af560   00007FFED8DB9978        24   System.RuntimeMethodHandle
000000baca1afd90   00007FFED8DB9978        24   System.RuntimeMethodHandle
000000baca1afde0   00007FFED8DB9978        24   System.RuntimeMethodHandle

0:214> !gcroot 000000baca1afde0
Thread 187c0:
    000000baeeb5cb30 00007ffed8ba9c60 System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].FindEntry(System.__Canon)
        rsi: 
            ->  000000baca4c6ef0 System.Collections.Generic.Dictionary`2[[System.Tuple`4[[System.String, mscorlib],[System.String, mscorlib],[System.Int32, mscorlib],[System.Int32, mscorlib]], mscorlib],[System.Delegate, mscorlib]]
            ->  000000badab3b8a0 System.Collections.Generic.Dictionary`2+Entry[[System.Tuple`4[[System.String, mscorlib],[System.String, mscorlib],[System.Int32, mscorlib],[System.Int32, mscorlib]], mscorlib],[System.Delegate, mscorlib]][]
            ->  000000baca1b5210 System.Func`2[[System.Data.IDataReader, System.Data],[xxx.BAR_AccountDC, xxxx]]
            ->  000000baca1ae8b0 System.Reflection.Emit.DynamicMethod+RTDynamicMethod
            ->  000000baca1ae7f8 System.Reflection.Emit.DynamicMethod
            ->  000000baca1b51b0 System.RuntimeMethodInfoStub
            ->  000000baca1b4a18 System.Reflection.Emit.DynamicResolver
            ->  000000baca1aea60 System.Reflection.Emit.DynamicScope
            ->  000000baca1aea78 System.Collections.Generic.List`1[[System.Object, mscorlib]]
            ->  000000baca1b3b20 System.Object[]
            ->  000000baca1afde0 System.RuntimeMethodHandle

From the reference chain, it is Dictionary , then we will dump this dictionary to see what it is.


0:214> !do 000000baca4c6ef0
Name:        System.Collections.Generic.Dictionary`2[[System.Tuple`4[[System.String, mscorlib],[System.String, mscorlib],[System.Int32, mscorlib],[System.Int32, mscorlib]], mscorlib],[System.Delegate, mscorlib]]
MethodTable: 00007ffe7a937f18
EEClass:     00007ffed87f7cb8
Size:        80(0x50) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffed8dc3e30  400182a        8       System.Int32[]  0 instance 000000badaaa2c10 buckets
00007ffed9874d38  400182b       10 ...non, mscorlib]][]  0 instance 000000badab3b8a0 entries
00007ffed8dc3e98  400182c       38         System.Int32  1 instance            83000 count
00007ffed8dc3e98  400182d       3c         System.Int32  1 instance            83000 version
00007ffed8dc3e98  400182e       40         System.Int32  1 instance               -1 freeList
00007ffed8dc3e98  400182f       44         System.Int32  1 instance                0 freeCount
00007ffed8d9a430  4001830       18 ...Canon, mscorlib]]  0 instance 000000baca2ec958 comparer
00007ffed8d9c550  4001831       20 ...Canon, mscorlib]]  0 instance 0000000000000000 keys
00007ffed8dcef28  4001832       28 ...Canon, mscorlib]]  0 instance 0000000000000000 values
00007ffed8dc1c98  4001833       30        System.Object  0 instance 0000000000000000 _syncRoot

0:214> !objsize 000000baca4c6ef0
sizeof(000000baca4c6ef0) = 1116325152 (0x4289c520) bytes (System.Collections.Generic.Dictionary`2[[System.Tuple`4[[System.String, mscorlib],[System.String, mscorlib],[System.Int32, mscorlib],[System.Int32, mscorlib]], mscorlib],[System.Delegate, mscorlib]])

As you can see, the current dict's size= 8.3w total size of 1.1G , and then use !mdt check the contents of the dictionary.

It can be seen that the items in the dictionary are probably key= select * from xxx where AccountNo= xxxx , value = Func ,

4. View source code

!gcroot know from 06191c63e5e8f9 that the current root 187c0 , and then we use the thread stack to find the source code.


0:089> ~~[187c0]s
ntdll!NtWaitForSingleObject+0xa:
00007ffe`e9ea06fa c3              ret
0:089> !clrstack 
OS Thread Id: 0x187c0 (89)
        Child SP               IP Call Site
000000baeeb5caa0 00007ffed9718dfe System.Tuple`4[[System.__Canon, mscorlib],[System.__Canon, mscorlib],[System.Int32, mscorlib],[System.Int32, mscorlib]].System.Collections.IStructuralEquatable.GetHashCode(System.Collections.IEqualityComparer)
000000baeeb5cb00 00007ffed8ba90a7 System.Collections.Generic.ObjectEqualityComparer`1[[System.__Canon, mscorlib]].GetHashCode(System.__Canon)
000000baeeb5cb30 00007ffed8ba9c60 System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].FindEntry(System.__Canon)
000000baeeb5cba0 00007ffed8bc53a4 System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].TryGetValue(System.__Canon, System.__Canon ByRef)
000000baeeb5cbe0 00007ffe7abd8069 xxx.Internal.Cache`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].Get(System.__Canon, System.Func`1<System.__Canon>)
000000baeeb5cc50 00007ffe7abd6d52 xxx.Database+d__49`1[[System.__Canon, mscorlib]].MoveNext()
000000baeeb5ccd0 00007ffed733768e System.Linq.Enumerable.FirstOrDefault[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1<System.__Canon>)
000000baeeb5cd40 00007ffe7af5489e xxx.xxx.ParsePapPayDebitRecord(xxx, xxx.BAR_AccountDC ByRef, System.String ByRef, xxx ByRef, Boolean, xxx, Boolean)
...

From the thread stack, the program does a db query operation. The source code is probably as follows:


public static InvalidTxnCode ParsePapPayDebitRecord(xxx)
{
    sql = "select * FROM  [dbo].[xxx]  where AccountNO = '" + transRecord.AccountNO + "'";
    IEnumerable<BAR_AccountDC> enumerable3 = new YiBll<BAR_AccountDC>(transRecord.AccountNO).QuerySql(sql);
}

The problem arises in this SQL splicing. In order to improve the mapping speed, the underlying framework sql and the corresponding Func<DataReader,Data> . Because each SQL is different, the underlying dict becomes larger and larger until The current 1.1 G.

Three: Summary

In general, this accident was caused by the growing dict constantly squeezing the small object heap, and the multithreading of the program to allocate objects at full power, resulting in frequent recycling of gc on the small object heap, resulting in too slow rabbitmq processing.

The modification method is to parameterize sql.


 sql = "select * FROM  [dbo].[xxx]  where AccountNO = '" + transRecord.AccountNO + "'";

Change to


 sql = "select * FROM  [dbo].[xxx]  where AccountNO = @AccountNO";

Finally, this dump is somewhat confusing, because the thread that triggered the GC is not the culprit, it is just the last straw that crushes the camel.


一线码农
369 声望1.6k 粉丝