One: background
1. Tell a story
In the middle of this month, a friend added me to wx for help. His program thread share is very high, asking how to solve it. The screenshot is as follows:
To be honest, chatting with programmers in different industries is quite interesting. Making friends can also expand your circle. Friends said that because of this bug, he also caused the project to become pornographic... 😂😂😂
Haha, it seems that if the customer doesn't buy it, the acceptance won't work, and it hurts. . . Find me early, this customer won't get back, maybe this is the value of technology! 😁😁😁
Now that you have found me, let this hang up problem completely disappear, and talk to windbg.
Two: Windbg analysis
1. View thread situation
Since my friend said that the thread is high, start with the thread and use the !t
command.
0:000> !t
ThreadCount: 1006
UnstartedThread: 0
BackgroundThread: 1005
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
Lock
DBG ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 10c8 00000000004D89A0 2a020 Preemptive 0000000000000000:0000000000000000 00000000002f1070 0 MTA
2 2 13c0 000000000031FF70 2b220 Preemptive 0000000000000000:0000000000000000 00000000002f1070 0 MTA (Finalizer)
3 3 12cc 000000000032B780 102a220 Preemptive 0000000000000000:0000000000000000 00000000002f1070 0 MTA (Threadpool Worker)
4 5 138c 000000000039E3C0 8029220 Preemptive 00000000B6D3CCA0:00000000B6D3D260 00000000002f1070 0 MTA (Threadpool Completion Port)
6 6 106c 0000000019E562A0 3029220 Preemptive 0000000000000000:0000000000000000 00000000002f1070 0 MTA (Threadpool Worker)
8 11 7f0 0000000019F8F9E0 20220 Preemptive 0000000000000000:0000000000000000 00000000002f1070 0 Ukn
9 1949 323c 000000009AA69E40 8029220 Preemptive 00000000B6BB8AD0:00000000B6BB94E0 00000000002f1070 0 MTA (Threadpool Completion Port)
10 1637 b3c 000000009AA1C260 8029220 Preemptive 00000000B6CD4220:00000000B6CD47E0 00000000002f1070 0 MTA (Threadpool Completion Port)
11 1947 223c 000000009ADB72E0 8029220 Preemptive 00000000B6D88D68:00000000B6D89550 00000000002f1070 0 MTA (Threadpool Completion Port)
12 1968 2e74 000000009AA1E330 8029220 Preemptive 00000000B6A8CD40:00000000B6A8D300 00000000002f1070 0 MTA (Threadpool Completion Port)
...
994 313 1fa4 000000009A81FFC0 8029220 Preemptive 00000000B6BFC1B8:00000000B6BFC410 00000000002f1070 0 MTA (Threadpool Completion Port)
995 1564 18ec 000000009A835510 8029220 Preemptive 00000000B6AC1ED0:00000000B6AC2490 00000000002f1070 0 MTA (Threadpool Completion Port)
996 1581 4ac 000000001C2E36E0 8029220 Preemptive 00000000B6C51500:00000000B6C51AC0 00000000002f1070 0 MTA (Threadpool Completion Port)
997 814 2acc 000000009A73B5E0 8029220 Preemptive 00000000B6D67BF8:00000000B6D683E0 00000000002f1070 0 MTA (Threadpool Completion Port)
998 517 25dc 000000009A838990 8029220 Preemptive 00000000B6D2CA10:00000000B6D2CFD0 00000000002f1070 0 MTA (Threadpool Completion Port)
999 670 2a10 000000001C2E4400 8029220 Preemptive 00000000B6CD0490:00000000B6CD0A50 00000000002f1070 0 MTA (Threadpool Completion Port)
1000 183 1704 000000009A81F930 8029220 Preemptive 00000000B6AE8670:00000000B6AE8C30 00000000002f1070 0 MTA (Threadpool Completion Port)
1001 117 1bcc 000000009A73BC70 8029220 Preemptive 00000000B6B92780:00000000B6B92D40 00000000002f1070 0 MTA (Threadpool Completion Port)
1002 1855 1d68 000000009A81E580 8029220 Preemptive 00000000B6B28460:00000000B6B28A20 00000000002f1070 0 MTA (Threadpool Completion Port)
1003 1070 2ef0 000000009A73C300 8029220 Preemptive 00000000B6B8F640:00000000B6B8FC00 00000000002f1070 0 MTA (Threadpool Completion Port)
1004 1429 210c 000000001C2E4A90 8029220 Preemptive 00000000B6D5F488:00000000B6D5FC70 00000000002f1070 0 MTA (Threadpool Completion Port)
1005 1252 2f38 000000009A838300 8029220 Preemptive 00000000B6A99240:00000000B6A99800 00000000002f1070 0 MTA (Threadpool Completion Port)
1006 1317 3118 000000001C2E5120 8029220 Preemptive 00000000B6DA3A30:00000000B6DA4440 00000000002f1070 0 MTA (Threadpool Completion Port)
1007 1837 3120 000000009A8375E0 8029220 Preemptive 00000000B6D38F10:00000000B6D394D0 00000000002f1070 0 MTA (Threadpool Completion Port)
1009 1964 2f64 000000009A81DEF0 1029220 Preemptive 0000000000000000:0000000000000000 00000000002f1070 0 MTA (Threadpool Worker)
You can see that there are currently 1006 threads, of which 1000 are Threadpool Completion Port
. This is the first time that so many IO threads are stuck, 🐂👃.
To be honest, when Threadpool Completion Port
I thought that this is an asynchronous operation callback, so why are so many IO threads stuck? To find the answer, take a thread and take a look.
0:1000> ~1000s
ntdll!NtNotifyChangeDirectoryFile+0xa:
00000000`77c7a75a c3 ret
0:1000> !clrstack
OS Thread Id: 0x1704 (1000)
Child SP IP Call Site
00000000A99FF4C0 0000000077c7a75a [InlinedCallFrame: 00000000a99ff4c0] Interop+Kernel32.ReadDirectoryChangesW(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Boolean, Int32, Int32 ByRef, System.Threading.NativeOverlapped*, IntPtr)
00000000A99FF4C0 000007fe8e87bd20 [InlinedCallFrame: 00000000a99ff4c0] Interop+Kernel32.ReadDirectoryChangesW(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Boolean, Int32, Int32 ByRef, System.Threading.NativeOverlapped*, IntPtr)
00000000A99FF470 000007fe8e87bd20 DomainBoundILStubClass.IL_STUB_PInvoke(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Boolean, Int32, Int32 ByRef, System.Threading.NativeOverlapped*, IntPtr)
00000000A99FF560 000007fef19dab6e System.IO.FileSystemWatcher.Monitor(AsyncReadState) [E:\A\_work\322\s\corefx\src\System.IO.FileSystem.Watcher\src\System\IO\FileSystemWatcher.Win32.cs @ 141]
00000000A99FF5E0 000007fef19dae6c System.IO.FileSystemWatcher.ReadDirectoryChangesCallback(UInt32, UInt32, System.Threading.NativeOverlapped*) [E:\A\_work\322\s\corefx\src\System.IO.FileSystem.Watcher\src\System\IO\FileSystemWatcher.Win32.cs @ 227]
00000000A99FF630 000007feedbe0af9 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [E:\A\_work\191\s\src\mscorlib\shared\System\Threading\ExecutionContext.cs @ 167]
00000000A99FF6B0 000007feede094dc System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*) [E:\A\_work\191\s\src\mscorlib\src\System\Threading\Overlapped.cs @ 108]
00000000A99FF7F0 000007feee359ed3 [GCFrame: 00000000a99ff7f0]
00000000A99FF9D0 000007feee359ed3 [DebuggerU2MCatchHandlerFrame: 00000000a99ff9d0]
Let me go and see FileSystemWatcher
. Friends who follow this series should know that I analyzed an article last month. NET a streaming unicorn API handle leak analysis , which is the file handle explosion caused by FileSystemWatcher Gao, the reason for his formation is
regularly refreshing appsetttings + reloadOnChange=true. This world is so small that there will be no crashes. . . Next we will focus on it.
2. Explore FileSystemWatcher
For further analysis, first use the !dso
command to look at the current thread stack object.
0:1000> !dso
OS Thread Id: 0x1704 (1000)
RSP/REG Object Name
00000000A99FF508 00000000263285d8 System.Byte[]
00000000A99FF518 00000000242aeb10 System.Threading._IOCompletionCallback
00000000A99FF560 00000000242ae1b0 Microsoft.Win32.SafeHandles.SafeFileHandle
00000000A99FF568 00000000242aeaa8 System.Threading.PreAllocatedOverlapped
00000000A99FF578 00000000242aeb10 System.Threading._IOCompletionCallback
00000000A99FF5E0 00000000242a8538 System.IO.FileSystemWatcher
00000000A99FF5E8 00000000242aea10 System.IO.FileSystemWatcher+AsyncReadState
00000000A99FF608 00000000242aea10 System.IO.FileSystemWatcher+AsyncReadState
00000000A99FF610 0000000023206e30 System.Threading.ExecutionContext
00000000A99FF618 0000000001032928 System.Threading.ContextCallback
00000000A99FF630 00000000242a8538 System.IO.FileSystemWatcher
00000000A99FF678 00000000b6a69a40 System.Threading.Thread
00000000A99FF688 00000000242aeb10 System.Threading._IOCompletionCallback
00000000A99FF690 0000000023206e30 System.Threading.ExecutionContext
00000000A99FF6C0 0000000021fa55d8 System.Threading._IOCompletionCallback
00000000A99FF6C8 000000002052e6e0 System.Threading.ExecutionContext
00000000A99FF7E0 000000000560d2b0 System.Threading.OverlappedData
Since the objects on the thread stack are expanded to small, you can see the System.Byte[]
on the smallest address to know what the current callback is. The screenshot is as follows:
After the analysis experience of the previous campaign, I have basically figured it out here. This is another reloadOnChange: true
when continuously building ConfigurationRoot. Its consequences will cause a large number of FileSystemWatcher
and ConfigurationRoot
be added to the memory and cannot be released, and it is induced The point is the stuck case triggered by the massive callback function triggered by the continuous change of the log file in the above figure. The specific details... Please listen to me slowly and verify the number of these two classes on the managed heap.
0:1000> !dumpheap -stat -type FileSystemWatcher
Statistics:
MT Count TotalSize Class Name
000007fe8ed5bc90 2 160 System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib],[System.IO.FileSystemWatcher, System.IO.FileSystem.Watcher]]
000007fe8e9f11a0 34480 1930880 System.IO.FileSystemWatcher+AsyncReadState
000007fe8e9d69c8 34480 4137600 System.IO.FileSystemWatcher
Total 68962 objects
0:1000> !dumpheap -stat -type ConfigurationRoot
Statistics:
MT Count TotalSize Class Name
000007fe8e9f1e70 34480 827520 Microsoft.Extensions.Configuration.ConfigurationRoot+<>c__DisplayClass2_0
000007fe8e999560 34480 1103360 Microsoft.Extensions.Configuration.ConfigurationRoot
Total 68960 objects
Sure enough, the managed heap has 3.4w
, and then I have to communicate with my friends.
3. What exactly caused the code?
A friend asked why there 3.4w
of ConfigurationRoot
objects, there is only a theory, a program, based on this information, friends soon found the code in question, shots are as follows:
It is because the reloadOnChange: true
above figure allows the FileSystemWatcher to appsettings.json
a real-time monitoring 3.4w
of objects of 0615672b84851d in the memory that cannot be released.
Three: Why log changes will cause the program to freeze
1. The original confusion
In fact, I was very confused when I came to the analysis. Even if there is 3.4w
in the memory, it is only appsettings.json
. As long as the file does not change, it will not trigger the callback of 3.4w
But after I analyzed the ConfigurationRoot
, I found myself really naive.
2. Find the answer from the source code
First, let's take a look at what FileSystemWatcher monitors? You can set a breakpoint in its constructor, as shown in the following figure:
Obviously, it is watching the root directory of the program, which can explain why the log file changes will trigger the callback function of the file change. For verification, I can ReadDirectoryChangesCallback
method and drop another log. Go to the root directory and see if it can be triggered. . . Screenshot below:
3.4w
back to this case, that is to say, once there is a log change, it will trigger 0615672b84865f callback functions, if it changes 100 times, it will trigger 340w
callbacks, and the log changes will not stop, naturally, the program will be killed by the massive callbacks. . . . Right. . .
Four: Summary
This accident may be due to a friend's laziness and did not write Configuration
or IOptions
. Instead, the ConnectionString was obtained by rebuilding the ConfigurationRoot and incorrectly configured reloadOnChange: true
, which caused the IO thread to be unable to process the massive callback functions due to the change of the log file in time. , Which in turn causes the program to hang.
After knowing the whole ins and outs, the optimization measures are very simple, and two methods are provided.
reloadOnChange: true
toreloadOnChange: false
.- Find a way to inject Configuration into the DataBaseConfig class and make it a static variable. 😁😁😁
The last easter egg, my friend is too polite.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。