One: background

1. Tell a story

A friend approached me last week and said that his program CPU and handles are constantly growing, and there is no trend of looking back. After checking for a few days, there is no progress. Teja wx asked for help. The screenshot is as follows:


I am going to use a separate article to investigate the problem of CPU explosion. This article will first talk about the problem of 1612c49bf862e6 handle leakage. After all, I have written more than 20 articles, and this is the first time I talked about handle leaks, which is a bit interesting.

2. What is a handle

I personally understand the handle: it is holding a reference to the unmanaged resource in the managed layer. With this reference, we can force the recovery of the unmanaged resource. So what is an unmanaged resource? My personal understanding is that unmanaged resources are not managed by gc.

The classes that usually contain this kind of handle are: FileStream, Socket, etc. If you have this pre-foundation, then you can use windbg to analyze it!

Two: windbg analysis

1. Look at the surface of the problem

My friend sees handle =8770 task manager, it means that there are 8770 handles to unmanaged resources in the program, how do you look at it? Before saying this, have everyone encountered this phenomenon, that is, regardless of the program How to leak, as long as we exit the exe, then all resources will be released , whether it is a managed resource or an unmanaged resource, I believe there are many friends who are curious about how this is achieved? ? ? You can think about 10s first.

The answer is revealed! Simply put, the CLR maintains a handle table internally. When the program is closed, the CLR will forcibly release all the handles in the handle table. The problem is simple. Since the CLR can reach it, I believe it can be done through windbg. , Yes, it is through the !gchandles command.

2. View the handle table

Here is a reminder that !gchandles is AppDomain, not Process. Next, look at the command output:


0:000> !gchandles -stat
Statistics:
              MT    Count    TotalSize Class Name
...
00007ffccc1d2360        3       262280 System.Byte[]
00007ffccc116610       72       313224 System.Object[]
00007ffccc3814a0     8246       593712 System.Threading.OverlappedData
Total 10738 objects

Handles:
    Strong Handles:       312
    Pinned Handles:       18
    Async Pinned Handles: 8246
    Ref Count Handles:    1
    Weak Long Handles:    2080
    Weak Short Handles:   59
    Dependent Handles:    22

From the output point of view, there is a set of data that is particularly dazzling, that is: Async Pinned Handles = 8246 [System.Threading.OverlappedData] , what does this mean? From the English name, it can be seen that this is a asynchronous IO. Some friends should know that in the process of asynchronous IO, there will be a byte[] pinned to live, and there is also an asynchronous IO context object OverlappedData .

The next question is: Since it is asynchronous IO, what type of handle is it, as mentioned earlier, is it FileStream or Socket? To find out the answer, you need to dig deep into the OverlappedData object, the related command is: !dumpheap -mt xxx & !do ... , refer to the following:


0:000> !DumpHeap /d -mt 00007ffccc3814a0
         Address               MT     Size
000001aa2acb39c8 00007ffccc3814a0       72     
000001aa2acb3fd8 00007ffccc3814a0       72     
000001aa2ad323d0 00007ffccc3814a0       72     
...
0:000> !do 000001aa2acb39c8
Name:        System.Threading.OverlappedData
MethodTable: 00007ffccc3814a0
EEClass:     00007ffccc37ca18
Size:        72(0x48) bytes
File:        C:\xxx\xxx\vms_210819\System.Private.CoreLib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffccc21f508  40006b2        8  System.IAsyncResult  0 instance 0000000000000000 _asyncResult
00007ffccc110ae8  40006b3       10        System.Object  0 instance 000001aa2acb4020 _callback
00007ffccc381150  40006b4       18 ...eading.Overlapped  0 instance 000001aa2acb3980 _overlapped
00007ffccc110ae8  40006b5       20        System.Object  0 instance 000001aa2acb9fe8 _userObject
00007ffccc11f130  40006b6       28                  PTR  0 instance 000001aa2a9bd830 _pNativeOverlapped
00007ffccc11ecc0  40006b7       30        System.IntPtr  1 instance 0000000000000000 _eventHandle
0:000> !DumpObj /d 000001aa2acb3980
Name:        System.Threading.ThreadPoolBoundHandleOverlapped
MethodTable: 00007ffccc3812a0
EEClass:     00007ffccc37c9a0
Size:        72(0x48) bytes
File:        C:\xxx\xxx\vms_210819\System.Private.CoreLib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffccc3814a0  40006ba        8 ...ng.OverlappedData  0 instance 000001aa2acb39c8 _overlappedData
00007ffccc34fcd0  40006a4       10 ...ompletionCallback  0 instance 000001aa2acb3920 _userCallback
00007ffccc110ae8  40006a5       18        System.Object  0 instance 000001aa2acb38c8 _userState
00007ffccc380120  40006a6       20 ...locatedOverlapped  0 instance 000001aa2acb3960 _preAllocated
00007ffccc11f130  40006a7       30                  PTR  0 instance 000001aa2a9bd830 _nativeOverlapped
00007ffccc380eb8  40006a8       28 ...adPoolBoundHandle  0 instance 000001aa2acb3900 _boundHandle
00007ffccc1171c8  40006a9       38       System.Boolean  1 instance                0 _completed
00007ffccc34fcd0  40006a3      458 ...ompletionCallback  0   static 000001aa2acb4020 s_completionCallback
0:000> !DumpObj /d 000001aa2acb3900
Name:        System.Threading.ThreadPoolBoundHandle
MethodTable: 00007ffccc380eb8
EEClass:     00007ffccc37c870
Size:        32(0x20) bytes
File:        C:\xxx\xxx\vms_210819\System.Private.CoreLib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffccc1d76b0  40006a1        8 ...rvices.SafeHandle  0 instance 000001aa2acb1d30 _handle
00007ffccc1171c8  40006a2       10       System.Boolean  1 instance                0 _isDisposed

0:000> !DumpObj /d 000001aa2acb1d30
Name:        Microsoft.Win32.SafeHandles.SafeFileHandle
MethodTable: 00007ffccc3807c8
EEClass:     00007ffccc37c548
Size:        48(0x30) bytes
File:        C:\xxx\xxx\xxx\System.Private.CoreLib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffccc11ecc0  4000bb4        8        System.IntPtr  1 instance 0000000000000428 handle
00007ffccc11b1e8  4000bb5       10         System.Int32  1 instance                4 _state
00007ffccc1171c8  4000bb6       14       System.Boolean  1 instance                1 _ownsHandle
00007ffccc1171c8  4000bb7       15       System.Boolean  1 instance                1 _fullyInitialized
00007ffccc2f1ae0  4001c3d       20 ...Private.CoreLib]]  1 instance 000001aa2acb1d50 _isAsync
00007ffccc380eb8  4001c3e       18 ...adPoolBoundHandle  0 instance 0000000000000000 <ThreadPoolBinding>k__BackingField

0000000000000428 fifth line from the bottom is the specific handle value, and then you can use the !handle command to view the specific information of its value.


0:000> !handle 0000000000000428 7
Handle 428
  Type             File
  Attributes       0
  GrantedAccess    0x100081:
         Synch
         Read/List,ReadAttr
  HandleCount      2
  PointerCount     65489

It Type:File that it turns out that more than 8,000 are file handles. . .

Writing here seems to be a dead end 😪😪😪, although some information has been dug up, but this information is not enough for me to find the source of the problem. From the reference chain, these objects in gchandles are at the top of the reference chain. In other words In other words, I need to find some data objects downstream of this reference chain. A good entry point is to dig into the heap.

3. Find the grandchildren of OverlappedData from the managed heap

First, we use !dumpheap -stat view the managed heap.


0:000> !dumpheap -stat
Statistics:
              MT    Count    TotalSize Class Name
...

00007ffccc3c5e18   939360     52604160 System.Collections.Generic.SortedSet`1+Node[[System.Collections.Generic.KeyValuePair`2[[System.String, System.Private.CoreLib],[System.String, System.Private.CoreLib]], System.Private.CoreLib]]
00007ffccc1d2360    16492     69081162 System.Byte[]
000001aa2a99af00    10365     76689384      Free
00007ffccc1d1e18  1904987    116290870 System.String

Since it is looking for the downstream of the reference chain, start with the basic type System.String or System.Byte[] . Here I will choose the former and write a script for grouping all addresses under mt. After all, human flesh is impossible. From the output of the script, I will choose the former. I took a few addresses to check! gcroot, they are probably similar to this.


0:000> !gcroot 000001aa47a0c030
HandleTable:
    000001AA4469C090 (async pinned handle)
    -> 000001AA491EB908 System.Threading.OverlappedData
    -> 000001AA491EB8C0 System.Threading.ThreadPoolBoundHandleOverlapped
    -> 000001AA491EB860 System.Threading.IOCompletionCallback
    -> 000001AA491EAF30 System.IO.FileSystemWatcher
    -> 000001AA491EB458 System.IO.FileSystemEventHandler
    ...
    -> 000001AA47A0C030 System.String

0:000> !gcroot 000001aa2d3ea480
HandleTable:
    000001AA28FE9930 (async pinned handle)
    -> 000001AA2DD68220 System.Threading.OverlappedData
    -> 000001AA2DD681D8 System.Threading.ThreadPoolBoundHandleOverlapped
    -> 000001AA2DD68178 System.Threading.IOCompletionCallback
    -> 000001AA2DD67848 System.IO.FileSystemWatcher
    ...
    -> 000001AA2D3EA480 System.String    

From the perspective of the entire reference chain, there is a System.IO.FileSystemWatcher , which is consistent with the previous analysis of handle= File , and then these strings are exported, and most of them are found to be appSettings , as shown below:


string: appSettings:RabbitMQLogQueue
string: appSettings:MedicalMediaServerIP
string: appSettings:UseHttps
...

Then we used the !strings command to perform fuzzy matching, and found that such a string is as high as 61w . . .

At this point, it can be basically concluded: appsettings is watched, but there is a problem with the way of watch. . .

4. Find the final answer

After giving the survey results to a friend, ask the friend to focus on whether there is any problem with the way to watch the appsetting? After a few hours, my friend was finally found.


It probably means: I have already reloadOnChange=true by setting 0612c49bf8662f, but the person who wrote the code is not familiar with this area, and the data perceives appsettings through polling every 10s, and the problem appears here. . .

Three: Summary

In fact, the main reason for this accident is still unfamiliar with how to perceive the latest data in appsettings in real time. While using .netcore's built-in reloadOnChange monitoring, it also uses polling for data perception, so the basics are still very important. Yes, don't take it for granted! 😁😁😁

More high-quality dry goods: see my GitHub: dotnetfly


一线码农
369 声望1.6k 粉丝