I haven’t posted for about 11 days. It’s really not because of laziness. I wanted to take the time to write a few days ago. I don’t know why there are so many friends who have asked for help recently. I can get 2-3 dumps for help a day. Analysis, it’s a bit interesting that most of my friends have analyzed the moss problem of the company several times or for many years. It is really too tired, but fortunately, tired is going uphill 😂😂😂.
Back to the topic again, a month ago, a friend wx found me. He is also learning how to analyze dump recently. Maybe he is not very experienced and cannot continue the analysis. The screenshot is as follows:
Although the problems in the dump are strange and strange, there are still some rules to follow if you want to group them into big categories, such as: frequent triggering of gc, a large number of locks, etc., detailed summary can observe my planet, well, since the analysis can't go on, Then come to windbg.
Two: Windbg analysis
1. View CPU utilization
Since it was reported that the cpu is too high, I have to use data to verify that it is not, the old command !tp
.
0:057> !tp
CPU utilization: 100%
Worker Thread: Total: 51 Running: 30 Idle: 0 MaxLimit: 400 MinLimit: 4
Work Request in Queue: 11
Unknown Function: 6a0bbb30 Context: 1b4ca258
Unknown Function: 6a0bbb30 Context: 1b4ca618
Unknown Function: 6a0bbb30 Context: 1b4ca758
Unknown Function: 6a0bbb30 Context: 1cb88d60
Unknown Function: 6a0bbb30 Context: 1b4ca798
Unknown Function: 6a0bbb30 Context: 1b5a54d0
AsyncTimerCallbackCompletion TimerInfo@01f6e530
Unknown Function: 6a0bbb30 Context: 1b5a5a50
Unknown Function: 6a0bbb30 Context: 1cb892a0
Unknown Function: 6a0bbb30 Context: 1b4ca8d8
Unknown Function: 6a0bbb30 Context: 1cb88da0
--------------------------------------
Number of Timers: 1
--------------------------------------
Completion Port Thread:Total: 1 Free: 1 MaxFree: 8 CurrentLimit: 1 MaxLimit: 400 MinLimit: 4
Let me go, the cpu is full. By the way, here is a reminder that CPU utilization: 100%
refers to the current machine and not the program. The implication is that when the machine's CPU is 100%, it is not necessarily caused by the program you dumped.
2. Is it a GC trigger?
In the face of this unfamiliar dump, first perform some empirical investigations, such as whether it is caused by GC trigger? So how to verify this hypothesis? To make the result more accurate, use !t -special
export the thread list and see if there is GC SuspendEE
.
0:057> !t -special
ThreadCount: 109
UnstartedThread: 0
BackgroundThread: 74
PendingThread: 0
DeadThread: 35
Hosted Runtime: no
OSID Special thread type
14 2594 DbgHelper
15 2be4 GC SuspendEE
16 dc4 GC
17 2404 GC
18 bb4 GC
19 2498 Finalizer
20 312c ProfilingAPIAttach
21 858 Timer
22 3a78 ADUnloadHelper
27 290c GC
28 2e24 GC
29 28b0 GC
30 1e64 GC
38 3b24 ThreadpoolWorker
...
90 2948 Gate
From the output point of view, Nima is indeed there, which indicates that it is indeed caused by GC trigger. If you still don't believe it, you can refer to the coreclr source code.
size_t
GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason)
{
dprintf (2, ("triggered a GC!"));
gc_heap::gc_started = TRUE;
{
init_sync_log_stats();
#ifndef MULTIPLE_HEAPS
cooperative_mode = gc_heap::enable_preemptive ();
dprintf (2, ("Suspending EE"));
BEGIN_TIMING(suspend_ee_during_log);
GCToEEInterface::SuspendEE(SUSPEND_FOR_GC);
END_TIMING(suspend_ee_during_log);
gc_heap::proceed_with_gc_p = gc_heap::should_proceed_with_gc();
gc_heap::disable_preemptive (cooperative_mode);
if (gc_heap::proceed_with_gc_p)
pGenGCHeap->settings.init_mechanisms();
else
gc_heap::update_collection_counts_for_no_gc();
#endif //!MULTIPLE_HEAPS
}
}
See the above SuspendEE
, its full name is Suspend CLR Execute Engine
, then we use ~*e !dumpstack
see which thread triggered the GarbageCollectGeneration
method in the CLR.
It can be seen from the figure that thread 53 is triggered. !clrstack
From the thread stack point of view, the program did a XXX.GetAll()
operation. The name is pretty scary at first glance. Next, let's take a look at this piece of source code. What exactly was done? The simplified source code is as follows:
public static List<xxxx> GetAll()
{
string text = "xxxProperty_GetAll";
SqlDatabase val = new SqlDatabase(m_strConnectionString);
xxxPropertyTreeInfo xxxPropertyTreeInfo = null;
List<xxxPropertieInfo> list = new List<xxxPropertieInfo>();
DbCommand storedProcCommand = ((Database)val).GetStoredProcCommand(text);
using (IDataReader reader = ((Database)val).ExecuteReader(storedProcCommand))
{
while (DataBase.DataReaderMoveNext(reader))
{
xxxPropertyTreeInfo = new xxxPropertyTreeInfo();
xxxPropertyTreeInfo.LoadDataReader(reader);
list.Add(xxxPropertyTreeInfo);
}
}
return list;
}
public virtual void LoadDataReader(MethodBase method, object obj, IDataReader reader)
{
Hashtable hashtable = new Hashtable();
for (int i = 0; i < reader.FieldCount; i++)
{
hashtable.Add(reader.GetName(i).ToLower(), reader.GetValue(i));
}
Hashtable fieldProperties = GetFieldProperties(method, FieldType.DBField);
foreach (object key in fieldProperties.Keys)
{
PropertyInfo p = (PropertyInfo)fieldProperties[key];
object v = null;
if (hashtable.Contains(key))
{
v = hashtable[key];
}
if (v != null)
{
SetPropertieValue(ref obj, ref p, ref v);
}
}
}
From the source logic point of view: it executed a stored procedure xxxProperty_GetAll
, and then made a mapping mapping between the reader and xxxPropertyTreeInfo that obtained the data, and triggered the GC during the mapping process.
3. Is it caused by excessive data?
According to past experience, it should be caused by too much data obtained from the database. Isn't this dump? To find the answer, first use the !dso
command to export all the variables of the thread stack, and then use !do xxx
view List<xxxPropertieInfo> list
, as shown in the following figure:
From the figure, this size is not large, so why does it cause frequent triggering of gc? Even if the reflection produces a lot of small objects, it should not have much effect. . . This made me fall into contemplation again. . .
4. Find the source of the problem
After a search, I found a few suspicious points.
- There are 24 threads executing the
XXX.GetALL()
method.
- 123 lists were found in the managed heap, and the largest size is 1298, so the total is not small. . .
0:053> !dumpheap -mt 1b9eadd0
Address MT Size
02572a9c 1b9eadd0 24
026eca58 1b9eadd0 24
0273d2a0 1b9eadd0 24
...
Statistics:
MT Count TotalSize Class Name
1b9eadd0 123 2952 System.Collections.Generic.List`1[[xxxPropertieInfo, xxx.Model]]
0:053> !DumpObj /d 28261894
Name: System.Collections.Generic.List`1[[xxxPropertieInfo, xxx.Model]]
MethodTable: 1b9eadd0
EEClass: 6e2c6f8c
Size: 24(0x18) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
MT Field Offset Type VT Attr Value Name
6e6ff32c 4001891 4 System.__Canon[] 0 instance 23710638 _items
6e6f1bc0 4001892 c System.Int32 1 instance 1298 _size
6e6f1bc0 4001893 10 System.Int32 1 instance 1298 _version
6e6f0100 4001894 8 System.Object 0 instance 00000000 _syncRoot
6e6ff32c 4001895 4 System.__Canon[] 0 static <no information>
- The program is 32bit
It can be judged from the memory address that the current program is 32bit, which means that its segment segment will be small, which means more GC collection.
Three: Summary
The accident was due to:
- Multiple threads frequently repeatedly call the
GetALL()
method with size=1298. - Using the inefficient
reflection method for model mapping, a lot of small objects were generated during the mapping process.
- Too small segment (32M)
The combination of the three causes frequent GC triggers.
The improvement method is also very simple.
- The simplest and rude way: Cache a copy of the query results of the database.
- A little more formal method: Replace the inefficient
manual reflection
Dapper
, and change the program to 64bit.
I communicated with my friends, adopted the first method, and finally pressed the CPU down, and everything was calm again!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。