One: background

1. Tell a story

In mid-July, a friend added wx to ask for help. The memory of his program ran up during production. It seems that there is no trend of looking back. Asking how to solve it, the screenshot is as follows:

After chatting with this friend, I felt like I was a small boss in a small county. I have a regular life, local resources, various small relationships, and a taste of financial freedom. This is also the way of life I have always longed for 😄😄 😄.

Now that my friend has found me, I have to find a way to solve the problem for him. Since the memory is skyrocketing, I will gamble on the hosting level. Hey, talk to windbg.

Two: windbg analysis

1. Managed or unmanaged

Friends who have been following this series should know that I have used the !address -summary and !eeheap -gc countless times to determine whether the current memory belongs to the managed layer or the unmanaged layer.


0:000> !address -summary

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE                                393     7dfe`f2105000 ( 125.996 TB)           98.43%
MEM_RESERVE                            1691      200`0f1e4000 (   2.000 TB)  99.81%    1.56%
MEM_COMMIT                             6191        0`fed07000 (   3.981 GB)   0.19%    0.00%


0:000> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x000001D2E572BBC8
generation 1 starts at 0x000001D2E54F70E0
generation 2 starts at 0x000001D252051000
ephemeral segment allocation context: none
         segment             begin         allocated              size
000001D252050000  000001D252051000  000001D26204FFE0  0xfffefe0(268431328)
Large object heap starts at 0x000001D262051000
         segment             begin         allocated              size
000001D262050000  000001D262051000  000001D2655F3F80  0x35a2f80(56242048)
Total Size:              Size: 0xbf4dbf80 (3209543552) bytes.
------------------------------
GC Heap Size:    Size: 0xbf4dbf80 (3209543552) bytes.

The process index on the 3.98G is 06119c6ad09c13, and the GC heap index is 3209543552 = 3G . Obviously, this accident belongs to the hosting level.

2. Look for large objects on the hosting layer

We all know that C# is a managed language, so even useful and useless objects can't escape the GC heap. The implication is to look at the GC heap and pick a few large objects.


0:000> !dumpheap -stat
Statistics:
              MT    Count    TotalSize Class Name
00007ff98a68f090   391475     43869284 System.Int32[]
00007ff98b6adfa0  1902760     45666240 System.Collections.ObjectModel.ReadOnlyCollection`1[[System.Linq.Expressions.Expression, System.Linq.Expressions]]
00007ff98b6ac3c0  1951470     46835280 System.Linq.Expressions.ConstantExpression
00007ff98bc452e0  1681178     53797696 System.Linq.Expressions.TypedConstantExpression
00007ff98eacb6b8  1902708     60886656 System.Dynamic.Utils.ListArgumentProvider
00007ff98f236518  1774982     70999280 Microsoft.EntityFrameworkCore.Query.Expressions.ColumnExpression
00007ff98c650c58  1681142     80694816 System.Linq.Expressions.MethodCallExpression3
00007ff98a82bc38  3414094     81938256 System.RuntimeMethodHandle
00007ff98fd96fc0    17750     83936016 System.Collections.Generic.Dictionary`2+Entry[[System.Reflection.MemberInfo, System.Private.CoreLib],[System.Linq.Expressions.Expression, System.Linq.Expressions]][]
00007ff98e5ed5d8    35493    101740504 System.Collections.Generic.Dictionary`2+Entry[[Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceCacheKey, Microsoft.Extensions.DependencyInjection],[System.Object, System.Private.CoreLib]][]
00007ff98bcff6a8  3639389    116460448 System.Linq.Expressions.PropertyExpression
00007ff98b85cf00  5028347    160907104 System.Reflection.Emit.GenericFieldInfo
00007ff98a671e18  2178117    168395994 System.String
00007ff98a5b6610   160565    171498416 System.Object[]
00007ff98eaa8ab0  4981589    199263560 System.Linq.Expressions.MemberAssignment
00007ff98a672360   398740    391928469 System.Byte[]
00007ff98a746d68   181886    486150592 System.Char[] 

From the managed heap, the System.Linq.Expressions.MemberAssignment object is as high as 498w. Obviously there is a problem. From the class name, it may be related to ExpressionTree. Then take a few objects to see if there are too large objects in its reference chain.


0:000> !gcroot 000001d25399f690
HandleTable:
    000001D251B715A8 (pinned handle)
    -> 000001D262068CF0 System.Object[]
    -> 000001D2531C3B78 Microsoft.EntityFrameworkCore.Internal.ServiceProviderCache
    -> 000001D25399E3D0 Remotion.Linq.QueryModel
    -> 000001D25399E3B8 Remotion.Linq.Clauses.SelectClause
    -> 000001D25442C068 System.Linq.Expressions.MemberInitExpression
    -> 000001D25442C050 System.Runtime.CompilerServices.TrueReadOnlyCollection`1[[System.Linq.Expressions.MemberBinding, System.Linq.Expressions]]
    -> 000001D2539A0290 System.Linq.Expressions.MemberBinding[]
    -> 000001D25399F690 System.Linq.Expressions.MemberAssignment

The reference chain is very long, I will intercept it here. After a bit of investigation, I found that the large object is actually Remotion.Linq.Clauses.SelectClause , and the objsize object is directly exploded. It is really weird, as shown in the following code:


0:000> !objsize 000001D25399E3B8
sizeof(000001D25399E3B8) = -1187378032 (0xb93a0c90) bytes (Remotion.Linq.Clauses.SelectClause)

A bit confused, this object is actually the culprit. From the reference chain, it is a widget under EF to build an expression tree. What is certain is that there is something wrong with my friend when using EF, but I have to bite the bullet char[] , I found that this class has a large number of such 06119c6ad09d3d arrays. After exporting, it will look like the following.


Logistics.Text30), 
|           Text31 = string TryReadValue(t1.Outer.Outer, 42, WmsOutboundConfirmLogistics.Text31), 
|           Text32 = string TryReadValue(t1.Outer.Outer, 43, WmsOutboundConfirmLogistics.Text32), 
|           Text33 = string TryReadValue(t1.Outer.Outer, 44, WmsOutboundConfirmLogistics.Text33), 
|           Text34 = string TryReadValue(t1.Outer.Outer, 45, WmsOutboundConfirmLogistics.Text34), 
|           Text35 = string TryReadValue(t1.Outer.Outer, 46, WmsOutboundConfirmLogistics.Text35), 
|           IsQueue = Nullable<bool> TryReadValue(t1.Outer.Outer, 47, WmsOutboundConfirmLogistics.IsQueue), 
|           IsStop = Nullable<bool> TryReadValue(t1.Outer.Outer, 48, WmsOutboundConfirmLogistics.IsStop), 
|           CheckCode = string TryReadValue(t1.Outer.Outer, 49, WmsOutboundConfirmLogistics.CheckCode), 
|           ClientCode = string TryReadValue(t1.Outer.Inner, 50, WmsOutboundOrder.ClientCode), 
|           WarehouseCode = string TryReadValue(t1.Outer.Inner, 51, WmsOutboundOrder.WarehouseCode), 
|           ErpNumber = string TryReadValue(t1.Outer.Inner, 52, WmsOutboundOrder.ErpNumber), 
|           OrderCategory = string TryReadValue(t1.Outer.Inner, 53, WmsOutboundOrder.OrderCategory), 
|           OrderStatus = string TryReadValue(t1.Outer.Inner, 54, WmsOutboundOrder.OrderStatus), 
|           OrderType = string TryReadValue(t1.Outer.Inner, 55, WmsOutboundOrder.OrderType), 
|           SendCompany = string TryReadValue(t1.Outer.Inner, 56, WmsOutboundOrder.SendCompany), 
|           SendName = string TryReadValue(t1.Outer.Inner, 57, WmsOutboundOrder.SendName), 
|           SendTel = string TryReadValue(t1.Outer.Inner, 58, WmsOutboundOrder.SendTel), 
|           SendMobile = string TryReadValue(t1.Outer.Inner, 59, WmsOutboundOrder.SendMobile), 
|           SendProvince = string TryReadValue(t1.Outer.Inner, 60, WmsOutboundOrder.SendProvince), 
|           SendCity = string TryReadValue(t1.Outer.Inner, 61, WmsOutboundOrder.SendCity), 
|           SendArea = string TryReadValue(t1.Outer.Inner, 62, WmsOutboundOrder.SendArea), 
|           ...
|           CategoryName = string TryReadValue(t1.Outer.Inner, 88, WmsOutboundOrder.CategoryName), 
|           SourcePlatformCode = string TryReadValue(t1.Outer.Inner, 89, WmsOutboundOrder.SourcePlatformCode), 
|           PayMode = (string)string TryReadValue(t1.Outer.Outer, 90, null), 
|           List = List<WmsOutboundConfirmLogisticsLinesDTO> WmsOutboundConfirmLogisticsBusiness.GetOrderLines(string TryReadValue(t1.Outer.Outer, 5, WmsOutboundConfirmLogistics.OrderNumber)), 
|           ConfirmTime = DateTime TryReadValue(t1.Inner, 91, WmsOutboundOrderConfirmation.CreateTime), 
|           ReturnUrl = (string)string TryReadValue(t1.Outer.Outer, 92, null) 
|       }
|__ ), 
|__ contextType: Core.DataRepository.BaseDbContext, 
|__ logger: DiagnosticsLogger<Query>, 
|__ queryContext: Unhandled parameter: queryContext)                                                      

From the content point of view, ExpressionTree, which should be the select statement, said that I asked a friend and said it was probably a report business, but this information did not seem to help him much. To be honest, I actually don’t know how to continue the investigation. Fell into despair.

3. Find hope from despair

I was thinking, since EF has built a large number of ExpressionTrees like this, there must be a problem, but I can't figure out what the problem is. After a long time, I suddenly had an inspiration. Now that EF has built a tree, it is possible that SQL will also come out, right. , Why don’t I search the select sql statement directly on the heap. . . .


0:000> !strings /m:*select*
Address            Gen    Length   Value
000001d2e4de64e0    2       1964   SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...
000001d2e4e11e78    2       1964   SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...
000001d2e4e3d1f0    2       1964   SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...
000001d2e4e673c8    2       1964   SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...
000001d2e4e91760    2       1964   SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...
000001d2e4ebb2e8    2       1964   SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...
000001d2e4ee54f8    2       1964   SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...
000001d2e4f10758    2       1964   SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...
000001d2e4f398d0    2       1964   SELECT a."Id", a."CreateTime" AS "CreateTime0", a."CreatorId", a."CreatorRealName", a."Deleted", a."OrderNumber", a."CarrierId",...

---------------------------------------
18128 matching strings

Sure enough, a large number of duplicate select statements were found, and they were very close from the leftmost memory address, which means that they were generated in a certain operation at the same time, and then we exported several sql statements.


SELECT a."Id", ....
FROM "WmsOutboundConfirmLogistics" AS a
INNER JOIN "WmsOutboundOrder" AS b ON a."OrderNumber" = b."OrderNumber"
INNER JOIN "WmsOutboundOrderConfirmation" AS c ON a."OrderNumber" = c."OrderNumber"
WHERE (a."OrderNumber" = @__pagination_OrderNumber_0) AND (b."FreezeStatus" = FALSE)
ORDER BY a."Id"";

SELECT a."Id", ....
FROM "WmsOutboundConfirmLogistics" AS a
INNER JOIN "WmsOutboundOrder" AS b ON a."OrderNumber" = b."OrderNumber"
INNER JOIN "WmsOutboundOrderConfirmation" AS c ON a."OrderNumber" = c."OrderNumber"
WHERE (a."OrderNumber" = @__pagination_OrderNumber_0) AND (b."FreezeStatus" = FALSE)
ORDER BY a."Id""

I got this 1.8w to my friend. My friend said this is the sql for query report.

4. Integration of all leads

That here there is a big problem, since it is a query report, why is there the same sql 1.8w, the only difference is a."OrderNumber" = @__pagination_OrderNumber_0 the order number, is it should not be a.OrderNumber in (xxxx) or table associated with the query do? ? ? Putting it all together is the following conjecture:


-- 理想
select * from a where a.id in (1,2,3)

-- 现实
select * from a where a.id=1;
select * from a where a.id=2;
select * from a where a.id=3;

Coupled with the similar memory address of each SQL, combined with the Remotion.Linq.Clauses.SelectClause object of the exploded table, the whole process is probably: the table is associated or in operation, and the result becomes countless single SQL statement queries, resulting in explosive growth of memory at the bottom of EF.

Three: Summary

After reading the wording of my friend's query ef, I guess that most of the human flesh builds ExpressionTree to query the database, capitalized 🐂👃, such as the picture below:

The solution is to ask friends to check the writing of the expression tree, or directly write the good SQL. To be honest, this dump still took a lot of effort. I thought it was very simple, but it still encountered a little bit in practice. Difficulties, just grow by experience!

More high-quality dry goods: see my GitHub: dotnetfly


一线码农
369 声望1.6k 粉丝