One: background
1. Tell a story
A friend of wx found me last month and said that his program had a memory leak. How to solve it? As shown below:
As can be seen from the screenshots, this friend is still somewhat familiar with the operation of windbg, and may lack some practical experience, so after using a few commands, I don’t know how to troubleshoot.
Now that I have found me, I will continue to analyze and find the culprit on his dump based on my personal experience. Not much gossip, talk to windbg.
Two: Windbg analysis
1. Is there really a memory leak?
Friends who follow this series should know that I have used the !address -summary
and !eeheap -gc
countless times to determine whether the current memory leak belongs to the managed layer or the unmanaged layer.
0:000> !address -summary
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free 358 7dfc`67f60000 ( 125.986 TB) 98.43%
<unknown> 1087 203`88b6e000 ( 2.014 TB) 99.99% 1.57%
Image 1532 0`09f11000 ( 159.066 MB) 0.01% 0.00%
Heap 249 0`03453000 ( 52.324 MB) 0.00% 0.00%
Stack 66 0`01fc0000 ( 31.750 MB) 0.00% 0.00%
Other 10 0`001d1000 ( 1.816 MB) 0.00% 0.00%
TEB 22 0`0002c000 ( 176.000 kB) 0.00% 0.00%
PEB 1 0`00001000 ( 4.000 kB) 0.00% 0.00%
--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_MAPPED 183 200`00d06000 ( 2.000 TB) 99.30% 1.56%
MEM_PRIVATE 1252 3`8d479000 ( 14.207 GB) 0.69% 0.01%
MEM_IMAGE 1532 0`09f11000 ( 159.066 MB) 0.01% 0.00%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 358 7dfc`67f60000 ( 125.986 TB) 98.43%
MEM_RESERVE 749 200`28a9b000 ( 2.001 TB) 99.33% 1.56%
MEM_COMMIT 2218 3`6f5f5000 ( 13.740 GB) 0.67% 0.01%
0:000> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x00000026DA8DA928
generation 1 starts at 0x00000026DA7FC348
generation 2 starts at 0x00000024C4691000
ephemeral segment allocation context: none
segment begin allocated size
00000024C4690000 00000024C4691000 00000024D468FF28 0xfffef28(268431144)
00000024EECF0000 00000024EECF1000 00000024FECF0000 0xffff000(268431360)
000000248D6F0000 000000248D6F1000 000000249D6EFEF8 0xfffeef8(268431096)
...
00000026D66D0000 00000026D66D1000 00000026DBA3CA30 0x536ba30(87472688)
Large object heap starts at 0x00000024D4691000
segment begin allocated size
00000024D4690000 00000024D4691000 00000024DC67C318 0x7feb318(134132504)
00000024E60F0000 00000024E60F1000 00000024EE0637C8 0x7f727c8(133638088)
0000002482140000 0000002482141000 000000248A08F338 0x7f4e338(133489464)
00000024A6770000 00000024A6771000 00000024AE76F6C0 0x7ffe6c0(134211264)
...
000000278E6D0000 000000278E6D1000 000000279635F2D0 0x7c8e2d0(130605776)
00000029233E0000 00000029233E1000 000000292AF672F8 0x7b862f8(129524472)
000000292B3E0000 000000292B3E1000 0000002931A5ED60 0x667dd60(107470176)
000000299B3E0000 000000299B3E1000 00000029A20095B0 0x6c285b0(113411504)
000000281E6D0000 000000281E6D1000 0000002825CD3F58 0x7602f58(123744088)
00000028266D0000 00000028266D1000 000000282D5CAD50 0x6ef9d50(116366672)
000000282E6D0000 000000282E6D1000 0000002833CA0880 0x55cf880(89979008)
00000029A33E0000 00000029A33E1000 00000029A684D300 0x346c300(54969088)
Total Size: Size: 0x353f96d88 (14293757320) bytes.
------------------------------
GC Heap Size: Size: 0x353f96d88 (14293757320) bytes.
From the output, the current process occupies MEM_COMMIT=13.7G
and the managed heap memory occupies 14293757320 = 13.3G
. It is obvious that this is a simple mode of managed memory leak. According to experience, there may be any large objects on the managed heap. Here, use the
!dumpheap -stat
command.
0:000> !dumpheap -stat
Statistics:
MT Count TotalSize Class Name
00007ff9ed6ea268 3956842 94964208 System.Collections.Generic.Dictionary`2+KeyCollection[[System.String, System.Private.CoreLib],[Serilog.Events.LogEventPropertyValue, Serilog]]
00007ff9ed5e6d28 3842435 166405016 Serilog.Parsing.MessageTemplateToken[]
00007ff9ed5e5e28 3842434 184436832 Serilog.Events.MessageTemplate
00007ff9ecccf090 4011012 203304420 System.Int32[]
00007ff9ed647078 3956849 253238336 Serilog.Events.LogEvent
00007ff9ed6e7b48 3956849 284893128 System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib],[Serilog.Events.LogEventPropertyValue, Serilog]]
00007ff9ed5e74e8 9259598 296307136 Serilog.Parsing.TextToken
00007ff9ed6471b0 12551808 301243392 Serilog.Events.ScalarValue
00007ff9ed6e8308 3956849 729078048 System.Collections.Generic.Dictionary`2+Entry[[System.String, System.Private.CoreLib],[Serilog.Events.LogEventPropertyValue, Serilog]][]
00007ff9eccb1e18 16546412 3987811940 System.String
00000024c3b8faf0 82904 7382993568 Free
Let me go, the largest object in the managed heap is actually Free
, which probably occupies 7.3G
. This is 🐂👃, don’t play cards according to the routine. For a better understanding, let’s briefly explain the free
object.
2. Simple explanation free
In fact, in short, a free object is an object that is marked as a free block that has been reclaimed by gc but has not been submitted to the operating system for release. How to prove it? You can find a free block on the managed heap first.
0:000> !dumpheap -type Free
Address MT Size
00000024c4691000 00000024c3b8faf0 24 Free
00000024c46a2448 00000024c3b8faf0 24 Free
00000024c46b26d8 00000024c3b8faf0 40 Free
00000024c47e4418 00000024c3b8faf0 40 Free
00000024c4925680 00000024c3b8faf0 40 Free
00000024c49284a8 00000024c3b8faf0 64 Free
00000024c4947a90 00000024c3b8faf0 192 Free
00000024c4951f70 00000024c3b8faf0 48 Free
000000249d6ea3a8 00000024c3b8faf0 640 Free
- Use the !do command to see if it is marked as a free block
0:000> !do 000000249d6ea3a8
Free Object
Size: 640(0x280) bytes
- Use dc to look at the content of this free block and see if there are traces of gc recycling.
0:000> dc 000000249d6ea3a8 000000249d6ea3a8+0x280
00000024`9d6ea3a8 c3b8faf0 00000024 00000268 00000000 ....$...h.......
00000024`9d6ea3b8 9d6ea6d0 00000024 00000000 00000000 ..n.$...........
00000024`9d6ea3c8 ed3ae2b8 00007ff9 9d6ea3a8 00000024 ..:.......n.$...
00000024`9d6ea3d8 00000000 00000000 edcc9118 00007ff9 ................
00000024`9d6ea3e8 00000000 00000000 00000000 00000000 ................
00000024`9d6ea3f8 00000000 00000000 00000000 00000000 ................
00000024`9d6ea408 eeb07a50 00007ff9 9d6ea3c8 00000024 Pz........n.$...
00000024`9d6ea418 00000000 00000000 ef292ee8 00007ff9 ..........).....
00000024`9d6ea428 9d6ea408 00000024 00000000 00000000 ..n.$...........
...
00000024`9d6ea4a8 eeb0a158 00007ff9 9d6ea420 00000024 X....... .n.$...
00000024`9d6ea4b8 00000000 00000000 00000000 00000000 ................
00000024`9d6ea4c8 ef293818 00007ff9 9d6ea4a8 00000024 .8).......n.$...
00000024`9d6ea4d8 00000000 00000000 ee8357e0 00007ff9 .........W......
...
00000024`9d6ea508 eed37b40 00007ff9 00000000 00000000 @{..............
00000024`9d6ea518 00000000 00000000 00000000 00000000 ................
00000024`9d6ea528 c4699b48 00000024 00000000 00000000 H.i.$...........
00000024`9d6ea538 00000000 07000440 00000001 00000000 ....@...........
00000024`9d6ea548 00000000 00000000 00000000 00000000 ................
00000024`9d6ea558 00000000 00000000 ef2af6e0 00007ff9 ..........*.....
00000024`9d6ea568 00000000 00000000 00000000 00000000 ................
00000024`9d6ea578 00000000 00000000 c4699b48 00000024 ........H.i.$...
00000024`9d6ea588 00000000 00000000 00000000 07000400 ................
...
00000024`9d6ea628 ef2afd08 ..*.
As you can see, this free block had some residual content characters before it was alive. Okay, after we have a basic understanding of the free block, let's continue to explore.
3. What is preventing the merging of free blocks?
According to normal logic, most of the free blocks will merge memory after gc is reclaimed. Those emptied segments will be released by the operating system, but this dump does not, which means there is something preventing the merging of free blocks. What the hell is it? Experienced friends will say that you can observe the handle table of gc, the command is !gchandles -stat
.
0:000> !gchandles -stat
Statistics:
MT Count TotalSize Class Name
...
00007ff9ed15c0f0 1008 72576 System.Reflection.Emit.DynamicResolver
00007ff9ecbf6618 38 409344 System.Object[]
Total 1784 objects
Handles:
Strong Handles: 233
Pinned Handles: 16
Async Pinned Handles: 18
Ref Count Handles: 1
Weak Long Handles: 1327
Weak Short Handles: 144
Dependent Handles: 45
From the output, there is nothing suspicious here, so what should I do? If you have a lot of practical experience, there are still some experiences worth sharing. For example, observing the layout characteristics of free on the heap, there are often major discoveries.
4. View the layout characteristics of the free block
In order to simplify the output result, I limited the scope to a certain segment on the heap, such as here: 00000029233E0000 000000292AF672F8
, so the command is !dumpheap 00000029233E0000 000000292AF672F8
0:000> !dumpheap 00000029233E0000 000000292AF672F8
Address MT Size
00000029233e1000 00000024c3b8faf0 8291896 Free
0000002923bc9638 00007ff9eccb1e18 108448
0000002923be3dd8 00000024c3b8faf0 29931248 Free
000000292586f4c8 00007ff9eccb1e18 301328
00000029258b8dd8 00000024c3b8faf0 41384784 Free
0000002928030928 00007ff9eccb1e18 301328
000000292807a238 00000024c3b8faf0 2542664 Free
00000029282e6e80 00007ff9eccb1e18 108448
0000002928301620 00000024c3b8faf0 29915032 Free
0000002929f88db8 00007ff9eccb1e18 301328
0000002929fd26c8 00000024c3b8faf0 2746688 Free
000000292a271008 00007ff9eccb1e18 291304
000000292a2b81f0 00000024c3b8faf0 1019600 Free
000000292a3b10c0 00007ff9eccb1e18 108448
000000292a3cb860 00000024c3b8faf0 10601048 Free
000000292ade7ab8 00007ff9eccb1e18 301328
000000292ae313c8 00000024c3b8faf0 280808 Free
000000292ae75cb0 00007ff9eccb1e18 280854
000000292aeba5c8 00000024c3b8faf0 416584 Free
000000292af20110 00007ff9eccb1e18 291304
Statistics:
MT Count TotalSize Class Name
00007ff9eccb1e18 10 2394118 System.String
00000024c3b8faf0 10 127130352 Free
Total 20 objects
It was really shocked at first sight, free and object alternated. This is the real reason why free blocks cannot be merged. To be honest, this kind of textbook-style memory fragmentation dump is really unattainable, so let’s draw a few. An object object between frees, and see what is being referenced that causes the gc to not be recycled.
5. Find the reference chain of object
If you want to see who is referring to an object, you can use the !gcroot
command. Here I will take a look at two.
0:000> !gcroot 0000002923bc9638
Thread 1878:
00000024C39BE4B0 00007FFA4C0B3522 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2922]
rbp+10: 00000024c39be520
-> 00000024C48AD6E0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
-> 00000024C48AD200 Microsoft.Extensions.Hosting.Internal.Host
-> 00000024C48AC538 Microsoft.Extensions.Logging.Logger`1[[Microsoft.Extensions.Hosting.Internal.Host, Microsoft.Extensions.Hosting]]
-> 00000024C48AC658 Microsoft.Extensions.Logging.Logger
-> 00000024C48AC680 Microsoft.Extensions.Logging.LoggerInformation[]
-> 00000024C48AC6E0 Serilog.Extensions.Logging.SerilogLogger
-> 00000024C48AC808 Serilog.Core.Logger
-> 00000024C48AC760 Serilog.Core.Logger
-> 00000024C47AD468 Serilog.Core.Logger
-> 00000024C47ABF08 Serilog.Core.Sinks.SafeAggregateSink
-> 00000024C47ABF20 Serilog.Core.ILogEventSink[]
-> 00000024C479C548 Serilog.Sinks.Grafana.Loki.LokiSink
-> 00000024C479C778 Serilog.Sinks.Grafana.Loki.Infrastructure.BoundedQueue`1[[Serilog.Events.LogEvent, Serilog]]
-> 00000024C479C7B8 System.Collections.Generic.Queue`1[[Serilog.Events.LogEvent, Serilog]]
-> 00000026E8C1A800 Serilog.Events.LogEvent[]
-> 00000026148D3308 Serilog.Events.LogEvent
-> 00000026148D4EF0 Serilog.Events.MessageTemplate
-> 0000002923BC9638 System.String
Found 1 unique roots (run '!gcroot -all' to see all roots).
0:000> !gcroot 000000292586f4c8
Thread 1878:
00000024C39BE4B0 00007FFA4C0B3522 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2922]
rbp+10: 00000024c39be520
-> 00000024C48AD6E0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
-> 00000024C48AD200 Microsoft.Extensions.Hosting.Internal.Host
-> 00000024C48AC538 Microsoft.Extensions.Logging.Logger`1[[Microsoft.Extensions.Hosting.Internal.Host, Microsoft.Extensions.Hosting]]
-> 00000024C48AC658 Microsoft.Extensions.Logging.Logger
-> 00000024C48AC680 Microsoft.Extensions.Logging.LoggerInformation[]
-> 00000024C48AC6E0 Serilog.Extensions.Logging.SerilogLogger
-> 00000024C48AC808 Serilog.Core.Logger
-> 00000024C48AC760 Serilog.Core.Logger
-> 00000024C47AD468 Serilog.Core.Logger
-> 00000024C47ABF08 Serilog.Core.Sinks.SafeAggregateSink
-> 00000024C47ABF20 Serilog.Core.ILogEventSink[]
-> 00000024C479C548 Serilog.Sinks.Grafana.Loki.LokiSink
-> 00000024C479C778 Serilog.Sinks.Grafana.Loki.Infrastructure.BoundedQueue`1[[Serilog.Events.LogEvent, Serilog]]
-> 00000024C479C7B8 System.Collections.Generic.Queue`1[[Serilog.Events.LogEvent, Serilog]]
-> 00000026E8C1A800 Serilog.Events.LogEvent[]
-> 0000002614BB7AC8 Serilog.Events.LogEvent
-> 0000002616D3CC40 Serilog.Events.MessageTemplate
-> 000000292586F4C8 System.String
Found 1 unique roots (run '!gcroot -all' to see all roots).
From the reference chain, these objects are all sent to Grafana.Loki
Serilog
log component. Through tracking the reference chain objects, I found that System.Collections.Generic.Queue
has major problems. The screenshot is as follows:
Yes, this queue actually has a backlog of 395w. What is the backlog? You can use !wdo
see the string content.
It seems that this is a failed retry. After analyzing this, the idea is probably clear, that is, what is Grafana.Loki
or Serilog
component, which causes the log to not be sent to Loki or the sending speed is too slow, and then the continuous backlog is caused. The analyzed information was communicated with a friend. The screenshot is as follows:
Three: Summary
The main reason for this memory fragmentation is the Serilog
docking with Loki
, but I can only analyze it to this point. As for why there is a backlog, this has to be further debugged and analyzed by friends. I believe this The problem will be solved soon 💪💪💪
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。