One: background

1. Tell a story

A friend approached me last month and said that his program had a memory leak. I don’t know how to analyze it further. The screenshot is as follows:

My friend has already said this very concisely and concisely, so let's talk on windbg.

Two: Windbg analysis

1. What kind of leak is it?

According to my friend's description, the memory exploded after the program was running for a period of time, and there should be no casualties. Otherwise, I won't chat with my wx. Here you can use .time see how long the current process has been running.


0:000> .time
Debug session time: Thu Oct 21 14:54:39.000 2021 (UTC + 8:00)
System Uptime: 6 days 4:37:27.851
Process Uptime: 0 days 0:40:14.000
  Kernel time: 0 days 0:01:55.000
  User time: 0 days 0:07:33.000

It can be seen that this dump was captured after the program ran for 40 minutes. Next, let's compare the memory of the process and the proportion of the gc heap to see which one is leaking.


0:000> !address -summary

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE                                327     7dfc`c665a000 ( 125.987 TB)           98.43%
MEM_RESERVE                             481      201`e91a2000 (   2.007 TB)  99.74%    1.57%
MEM_COMMIT                             2307        1`507f4000 (   5.258 GB)   0.26%    0.00%

0:000> !eeheap -gc
Number of GC Heaps: 2
------------------------------

GC Allocated Heap Size:    Size: 0x139923528 (5260850472) bytes.
GC Committed Heap Size:    Size: 0x13bf23000 (5300695040) bytes.

It is easy to know from the hexagram that this is a complete managed heap memory leak.

2. What is taking up such a large amount of memory?

Knowing that it is hosting layer, I feel happy all at once, and then use !dumpheap -stat see if there are any big objects to dig.


0:000> !dumpheap -stat
Statistics:
              MT    Count    TotalSize Class Name
00007ffdeb1fc400  5362921    128710104 xxxBLLs.xxx.BundleBiz+<>c__DisplayClass20_0
00007ffdeaeff140  5362929    171613728 System.Collections.Generic.List`1[[xxx.xxx, xxx]]
00007ffdeaeff640  5362957    171615272 xxx.BLLs.Plan.Dto.xxx[]
00007ffde8171e18 16146362    841456072 System.String
00007ffdeb210098  5362921   1415811144 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
00007ffdea9ca260  5362921   2359685240 xxx.Bundle             

From the output point of view, the memory is mainly xxx.Bundle and AsyncTaskMethodBuilder , and the number is as high as 536w . Here is a very interesting place. If you understand asynchrony, I believe you can see AsyncTaskMethodBuilder + VoidTaskResult is doing. , according to experience, this friend should have mistakenly entered 161a6de3ad1278 asynchronous infinite recursion, then how to dig it? Then look down.

3. Find the problem code

See xxx.BundleBiz+<DistributionBundle>d__20 above? This asynchronous operation is involved in classes and methods, followed by ILSpy reflected BundleBiz anonymous class under <DistributionBundle>d__20 , as shown below:

Although the source code is found, the code is an asynchronous state machine decompiled by ILSpy. The next question is how to find the await and async codes in reverse according to the state machine code? There is a used by , which can be used here.

Double-click used by to see the real calling code, which is simplified as follows:


public async Task DistributionBundle(List<Bundle> list, List<xxx> bwdList, xxx item, List<xxx> sumDetails, List<xxx> details, BundleParameter bundleParameter, IEnumerable<dynamic> labels)
{
    int num = 0;
    foreach (xxx detail in sumDetails)
    {
        IEnumerable<xxx> woDetails = details.Where((xxx w) => w.Size == detail.Size && w.Color == detail.Color);
        foreach (xxx item2 in woDetails)
        {
            xxx
        }
        woDetails = woDetails.OrderBy((xxx s) => s.Seq).ToList();
        num++;
        xxx
        Bundle bundle = new Bundle();
        Bundle bundle2 = bundle;
        bundle2.BundleId = await _repo.CreateBundleId();
        
        foreach (xxx item3 in woDetails)
        {
            item3.TaskQty = item3.WoQty + Math.Ceiling(item3.WoQty * item3.OverCutRate);
            decimal value = default(decimal);
        }

        await DistributionBundle(list, bwdList, item, sumDetails, details, bundleParameter, labels);
    }
}

Looking at the above code carefully, I went, await DistributionBundle(list, bwdList, item, sumDetails, details, bundleParameter, labels); called itself again, and it seemed that under certain conditions, it fell into a dead recursion. . .

Some friends may ask, apart from experience, can it be analyzed from the dump? Of course, you can 500w+ to see its !gcroot .


0:000> !DumpHeap /d -mt 00007ffdeb210098
         Address               MT     Size
000001a297913a68 00007ffdeb210098      264     
000001a297913b70 00007ffdeb210098      264  

0:000> !gcroot 000001a297913a68
Thread 5ac:
    000000470B1EE4E0 00007FFE45103552 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2922]
        rbp+10: 000000470b1ee550
            ->  000001A297A25D88 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
            ->  000001A29796D8C0 Microsoft.Extensions.Hosting.Internal.Host
            ...
            ->  000001A298213248 System.Data.SqlClient.TdsParserStateObjectNative
            ->  000001A32E6AB700 System.Threading.Tasks.TaskFactory`1+<>c__DisplayClass38_0`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient],[System.Data.CommandBehavior, System.Data.Common]]
            ->  000001A32E6AB728 System.Threading.Tasks.Task`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient]]
            ->  000001A32E6ABB18 System.Threading.Tasks.StandardTaskContinuation
            ->  000001A32E6ABA80 System.Threading.Tasks.ContinuationTaskFromResultTask`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient]]
            ->  000001A32E6AB6C0 System.Action`1[[System.Threading.Tasks.Task`1[[System.Data.SqlClient.SqlDataReader, System.Data.SqlClient]], System.Private.CoreLib]]
            ->  000001A32E6AB428 System.Data.SqlClient.SqlCommand+<>c__DisplayClass130_0
            ...
            ->  000001A32E6ABC08 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.String, System.Private.CoreLib],[Dapper.SqlMapper+<QueryRowAsync>d__34`1[[System.String, System.Private.CoreLib]], Dapper]]
            ->  000001A32E6ABD20 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.String, System.Private.CoreLib],[xxx.DALs.xxx.BundleRepo+<CreateBundleId>d__12, xxx]]
            ->  000001A32E6ABD98 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A32E6A6BD8 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A433250520 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A32E69E0F8 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A433247D28 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A433246330 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A32E69A568 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ->  000001A433245408 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[xxx.BundleBiz+<DistributionBundle>d__20, xxx]]
            ...

Judging from the call stack, the code seems to fall into an endless loop in the process of reading records from the database.

4. Why there is no stack overflow

When I see the infinite loop, I believe many friends will definitely ask why there is no stack overflow. After all, the default thread stack space is only 1M . From !gcroot , these references are all hung on the 5ac thread, which is the output below. The main thread, and the main thread stack is also very clean.


0:000> !t
ThreadCount:      30
UnstartedThread:  0
BackgroundThread: 24
PendingThread:    0
DeadThread:       5
Hosted Runtime:   no
                                                                                                            Lock  
 DBG   ID     OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
   0    1      5ac 000001A29752CDF0  202a020 Preemptive  0000000000000000:0000000000000000 000001a29754c570 0     MTA 
   4    2     1e64 000001A29752A490    2b220 Preemptive  0000000000000000:0000000000000000 000001a29754c570 0     MTA (Finalizer) 
...


0:000> !clrstack 
OS Thread Id: 0x5ac (0)
        Child SP               IP Call Site
000000470B1EE1D0 00007ffe5eb30544 [GCFrame: 000000470b1ee1d0] 
000000470B1EE318 00007ffe5eb30544 [HelperMethodFrame_1OBJ: 000000470b1ee318] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
000000470B1EE440 00007ffe45103c25 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
000000470B1EE4E0 00007ffe45103552 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2922]
000000470B1EE550 00007ffe451032cf System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2861]
000000470B1EE5D0 00007ffe45121b04 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) [/_/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/TaskAwaiter.cs @ 143]
000000470B1EE600 00007ffe4510482d System.Runtime.CompilerServices.TaskAwaiter.GetResult() [/_/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/TaskAwaiter.cs @ 106]
000000470B1EE630 00007ffe4de36595 Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.Run(Microsoft.Extensions.Hosting.IHost) [/_/src/Hosting/Abstractions/src/HostingAbstractionsHostExtensions.cs @ 49]
000000470B1EE660 00007ffde80f3b4b xxx.Program.Main(System.String[])
000000470B1EE8B8 00007ffe47c06c93 [GCFrame: 000000470b1ee8b8] 
000000470B1EEE50 00007ffe47c06c93 [GCFrame: 000000470b1eee50] 

If you know a little bit asynchronous gameplay of 161a6de3ad15e1, you should know that there is a IO completion port, which can realize handle and ThreadPool . Infinite recursion is just entering the waiting queue of IO completion port. In theory, it has nothing to do with stack space, and there will be no stack overflow.

Three: Summary

This memory leak accident is mainly due to the programmer's carelessness. Perhaps the long-term 996 has been in a trance. With this information, I believe it will be very simple to correct.


一线码农
366 声望1.6k 粉丝