An analysis of a .NET HIS system memory surge in a tertiary hospital

One: background

1. Tell a story

A few days ago, a friend added wx and said that his program suffered a surge in memory. How to analyze it for help?

Talk to this friend, this dump is also taken from a HIS system, as my friend said, I was really on the bar with the hospital 🤣🤣🤣, that’s okay, save yourself some resources😁😁😁, okay, Stop talking, talk to windbg.

Two: windbg analysis

1. Managed or unmanaged?

Since the memory is skyrocketing, let's see how big the commit memory of the current process is?


0:000> !address -summary

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE                                174     7ffe`baac0000 ( 127.995 TB)          100.00%
MEM_COMMIT                             1153        1`33bd3000 (   4.808 GB)  94.59%    0.00%
MEM_RESERVE                             221        0`1195d000 ( 281.363 MB)   5.41%    0.00%

It can be seen that it probably accounts for 4.8G , and then let's look at the managed heap memory.


0:000> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x00000207a4fc48c8
generation 1 starts at 0x00000207a3dc3138
generation 2 starts at 0x0000020697fc1000
ephemeral segment allocation context: none
------------------------------
GC Heap Size:            Size: 0x1241b3858 (4900730968) bytes.

It can be seen from the last line that the managed heap occupies 4900730968/1024/1024/1024=4.5G , and the two indicators are compared. It turns out that there is a problem with the managed memory, which is easier to handle. . .

2. View the managed heap

Since the memory is eaten up by the managed heap, let's see what is on the managed heap? ? ?


0:000> !dumpheap -stat
Statistics:
              MT    Count    TotalSize Class Name
...
00007ffd00397b98  1065873    102323808 System.Data.DataRow
00000206978b8250  1507805    223310768      Free
00007ffd20d216b8  4668930    364025578 System.String
00007ffd20d22aa8      797    403971664 System.String[]
00007ffd20d193d0   406282   3399800382 System.Byte[]
Total 9442152 objects

I don’t know. I was shocked. System.Byte[] almost occupies 3.3 G of memory, which means that the gc heap is almost eaten by it. According to experience, there must be something big. Then how to analyze it? In addition to using scripts to byte[] , are there other techniques for pure human flesh? Of course, you can use !heapstat observe the generation information of these objects on the managed heap.


0:000> !heapstat
Heap             Gen0         Gen1         Gen2          LOH
Heap0         2252000     18880400   3968704192    910894376

Free space:                                                 Percentage
Heap0           43128       770160    185203264     39849984SOH:  4% LOH:  4%

It can be seen from the figure that the current head is on Gen2, and then you can use eeheap -gc to find the segment address range of Gen2, so as to minimize the display of the content on the heap.


0:000> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x00000207a4fc48c8
generation 1 starts at 0x00000207a3dc3138
generation 2 starts at 0x0000020697fc1000
ephemeral segment allocation context: none
         segment             begin         allocated              size
0000020697fc0000  0000020697fc1000  00000206a7fbec48  0xfffdc48(268426312)
00000206bbeb0000  00000206bbeb1000  00000206cbeaef50  0xfffdf50(268427088)
00000206ccc40000  00000206ccc41000  00000206dcc3f668  0xfffe668(268428904)
00000206dcc40000  00000206dcc41000  00000206ecc3f098  0xfffe098(268427416)
0000020680000000  0000020680001000  000002068ffff8c0  0xfffe8c0(268429504)
00000206ff4d0000  00000206ff4d1000  000002070f4cf588  0xfffe588(268428680)
000002070f4d0000  000002070f4d1000  000002071f4cf9f0  0xfffe9f0(268429808)
000002071f4d0000  000002071f4d1000  000002072f4cfef0  0xfffeef0(268431088)
000002072f4d0000  000002072f4d1000  000002073f4cf748  0xfffe748(268429128)
000002073f4d0000  000002073f4d1000  000002074f4ce900  0xfffd900(268425472)
00000207574d0000  00000207574d1000  00000207674cfe70  0xfffee70(268430960)
00000207674d0000  00000207674d1000  00000207774ceaf8  0xfffdaf8(268425976)
00000207774d0000  00000207774d1000  00000207874cf270  0xfffe270(268427888)
00000207874d0000  00000207874d1000  00000207974cf7a8  0xfffe7a8(268429224)
00000207974d0000  00000207974d1000  00000207a51ea5a8  0xdd195a8(231839144)

Generally speaking, the first segment is for gen0 + gen1 , and the subsequent segment is gen2 . Next, I choose segment: 00000206dcc41000 - 00000206ecc3f098 , and then use !dumpheap export all objects in the interval.


0:000> !dumpheap -stat 00000206dcc41000 00000206ecc3f098
Statistics:
              MT    Count    TotalSize Class Name
00007ffd00397b98   191803     18413088 System.Data.DataRow
00007ffd20d216b8   662179     37834152 System.String
00007ffd20d193d0    23115    187896401 System.Byte[]

From this point of view memory segment, Byte[] have a 2.3w, not too much, come out to see what all the dump features.


0:000> !dumpheap -mt 00007ffd20d193d0 00000206dcc41000 00000206ecc3f098
         Address               MT     Size
00000206dcc410e8 00007ffd20d193d0     8232     
00000206dcc43588 00007ffd20d193d0     8232     
00000206dcc45a48 00007ffd20d193d0     8232     
00000206dcc47d78 00007ffd20d193d0     8232     
00000206dcc4a028 00007ffd20d193d0     8232     
00000206dcc4c4b0 00007ffd20d193d0     8232     
00000206dcc4eb08 00007ffd20d193d0     8232     
00000206dcc50e88 00007ffd20d193d0     8232     
00000206dcc535b0 00007ffd20d193d0     8232     
00000206dcc575d8 00007ffd20d193d0     8232     
00000206dcc5a5a8 00007ffd20d193d0     8232     
00000206dcc5cbf8 00007ffd20d193d0     8232     
00000206dcc5eef8 00007ffd20d193d0     8232     
00000206dcc611f8 00007ffd20d193d0     8232     
00000206dcc634e8 00007ffd20d193d0     8232     
00000206dcc657f0 00007ffd20d193d0     8232     
00000206dcc67af8 00007ffd20d193d0     8232     
00000206dcc69e00 00007ffd20d193d0     8232   
...

I went, 99% are 8232byte . It turned out to be some 8k . Then who is using it, use !gcroot check the reference root.


0:000> !gcroot 00000206dcc410e8
Thread 8c1c:
        rsi: 
            ->  00000206983d5730 System.ServiceProcess.ServiceBase[]
                ...
            ->  000002069dcb6d38 OracleInternal.ConnectionPool.OraclePool
                ...
            ->  000002069dc949c0 OracleInternal.TTC.OraBufReader
            ->  000002069dc94a70 System.Collections.Generic.List`1[[OracleInternal.Network.OraBuf, Oracle.ManagedDataAccess]]
            ->  00000206ab8c2200 OracleInternal.Network.OraBuf[]
            ->  00000206dcc41018 OracleInternal.Network.OraBuf
            ->  00000206dcc410e8 System.Byte[]

Judging from the reference chain, it seems to be held by 060b9892d9de98, which is very confusing. Could it be that a bug in the Oracle OracleInternal.Network.OraBuf[] The curiosity is coming, let's take a look at the number and size of the elements?


0:000> !do 00000206ab8c2200
Name:        OracleInternal.Network.OraBuf[]
MethodTable: 00007ffcc7833c68
EEClass:     00007ffd20757728
Size:        4194328(0x400018) bytes
Array:       Rank 1, Number of elements 524288, Type CLASS (Print Array)
Fields:
None

0:000> !objsize 00000206ab8c2200
sizeof(00000206ab8c2200) = -1086824024 (0xbf3861a8) bytes (OracleInternal.Network.OraBuf[])

The current array has 52w, and totalsize is directly negative 😓.

3. Find the problem code

After knowing the phenomenon, then use ILSpy to decompile the Oracle SDK, and finally compare it, as shown in the following figure:

It turns out that m_tempOBList is the culprit of the memory skyrocket, which is very embarrassing. Why does it skyrocket? Why not release? Since I am not familiar with Oracle, I can only turn to the magical StackOverflow. I went, and there is really no end to the world, Huge managed memory allocation when reading (iterating) data with DbDataReader

It probably means that this phenomenon is caused by a bug in the Oracle SDK reading Clob type fields. The solution is also very simple. It will be released after use. For details, see the following figure:

4. Find the truth

Since the post says that there is a problem with reading the Clob type, then call up all the thread stacks to see if there is a trace of the Clob in the thread stack at this time?

Judging from the thread stack, the code uses the ToDataTable method to convert the IDataReader to a DataTable. When the large fields are read during the conversion process, there will naturally be GetCompleteClobData , which means that the perfect hit the post said. In order to make the conclusion more accurate, I will Go and dig how many rows have been read by the current DataReader?


0:028> !clrstack -a
OS Thread Id: 0xbab0 (28)
000000e78ef7d520 00007ffd00724458 System.Data.DataTable.Load(System.Data.IDataReader, System.Data.LoadOption, System.Data.FillErrorEventHandler)
    PARAMETERS:
        this = <no data>
        reader (<CLR reg>) = 0x00000206a530ac20
        loadOption = <no data>
        errorHandler = <no data>
0:028> !do 0x00000206a530ac20
Name:        Oracle.ManagedDataAccess.Client.OracleDataReader
MethodTable: 00007ffcc7933b10
EEClass:     00007ffcc78efd30
Size:        256(0x100) bytes
File:        D:\xxx.dll
Fields:
00007ffd20d23e98  4000337       d0         System.Int32  1 instance          1061652 m_RowNumber

From m_RowNumber, 106w rows have been read. It is not common to read 100w+ records at a time. If there are large fields, it is also 🐂👃.

Three: Summary

Taken together, this accident was caused by reading millions of data containing large fields to the DataTable at one time. The solution is very simple. Read the DataReader through for and release it immediately after processing the OracleClob type. Refer to the post code:


var item = oracleDataReader.GetOracleValue(columnIndex);

if (item is OracleClob clob)
{
    if (clob != null)
    {
        // use clob.Value ...

        clob.Close();
    }
}

More high-quality dry goods: see my GitHub: dotnetfly

An analysis of a .NET HIS system memory surge in a tertiary hospital

One: background

1. Tell a story

Two: windbg analysis

1. Managed or unmanaged?

2. View the managed heap

3. Find the problem code

4. Find the truth

Three: Summary

一线码农

引用和评论

记一次 .NET 某供应链WEB网站 CPU 爆高事故分析

dotnet LINQ 使用简明教程

.NET用C#提取PDF文档表格到文本、CSV和Excel文件

.NET内存居高不下排查怎么解决

C# virtual 和 abstract 详解

用C#在Excel工作表中创建数据透视表和数据透视图

dotnet 编译模式使用教程