One: Background

1. Storytelling

A friend added WeChat to ask for help a few years ago, saying that his program had occasional CPU bursts, and he was looking for how to solve it. The screenshots are as follows:

I suggest friends to use procdump to capture two dumps in a row when the cpu is high, so the analysis is more stable, and the friends also successfully captured them as scheduled. Next, let's use windbg to analyze them together.

Two: Windbg analysis

1. Check the CPU usage

First use !tp to check the cpu utilization of the two dumps


0:112> !tp
CPU utilization: 100%
Worker Thread: Total: 138 Running: 128 Idle: 10 MaxLimit: 2000 MinLimit: 400
Work Request in Queue: 17
    Unknown Function: 00007ffe1a6617d0  Context: 000001fd9bcb20c8
    ...
--------------------------------------
Number of Timers: 2
--------------------------------------
Completion Port Thread:Total: 2 Free: 2 MaxFree: 16 CurrentLimit: 2 MaxLimit: 2000 MinLimit: 400


0:014> !tp
CPU utilization: 96%
Worker Thread: Total: 173 Running: 67 Idle: 106 MaxLimit: 2000 MinLimit: 400
Work Request in Queue: 1
    Unknown Function: 00007ffe1a6617d0  Context: 000001fda1a20be8
--------------------------------------
Number of Timers: 2
--------------------------------------
Completion Port Thread:Total: 7 Free: 7 MaxFree: 16 CurrentLimit: 7 MaxLimit: 2000 MinLimit: 400

Sure enough, as a friend said, then you can tentatively see if it is caused by GC triggering?

2. Check if GC is triggered

Simply use ~*e !dumpstack to export the managed and unmanaged stacks of all threads, then search for GarbageCollectGeneration .

Sure enough, the GC is triggered. From the call stack information, the current hosting layer may be operating at high frequency new , which results in only writing data to a certain heap and causing heap imbalance. server mode GC In order to balance multiple heaps, do After the heap balance operation, the next clue is why there is crazy writing? Also have to look at the hosting layer, use the !clrstack command.


0:112> !clrstack 
OS Thread Id: 0x3278 (112)
        Child SP               IP Call Site
000000b4ddc79098 00007ffe28b9fa74 [HelperMethodFrame: 000000b4ddc79098] 
000000b4ddc791a0 00007ffda6c229cb System.Data.Entity.ModelConfiguration.Utilities.EdmPropertyPath.System.Collections.Generic.IEnumerable<System.Data.Entity.Core.Metadata.Edm.EdmProperty>.GetEnumerator()
000000b4ddc79200 00007ffe01a179eb System.Linq.Enumerable.SequenceEqual[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1<System.__Canon>, System.Collections.Generic.IEnumerable`1<System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)
000000b4ddc79280 00007ffda6c2297e System.Data.Entity.ModelConfiguration.Configuration.Mapping.EntityMappingConfiguration+c__DisplayClass14.b__11(System.Data.Entity.Core.Mapping.ColumnMappingBuilder)
000000b4ddc792b0 00007ffe01a13f8f System.Linq.Enumerable.SingleOrDefault[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1<System.__Canon>, System.Func`2<System.__Canon,Boolean>)
000000b4ddc79330 00007ffda6c2087c System.Data.Entity.ModelConfiguration.Configuration.Mapping.EntityMappingConfiguration.Configure(System.Data.Entity.Core.Metadata.Edm.DbDatabaseMapping, System.Data.Entity.Core.Common.DbProviderManifest, System.Data.Entity.Core.Metadata.Edm.EntityType, System.Data.Entity.Core.Mapping.StorageEntityTypeMapping ByRef, Boolean, Int32, Int32)
000000b4ddc79520 00007ffda6c20128 System.Data.Entity.ModelConfiguration.Configuration.Types.EntityTypeConfiguration.ConfigureUnconfiguredType(System.Data.Entity.Core.Metadata.Edm.DbDatabaseMapping, System.Data.Entity.Core.Common.DbProviderManifest, System.Data.Entity.Core.Metadata.Edm.EntityType)
000000b4ddc795a0 00007ffda6c1ffaf System.Data.Entity.ModelConfiguration.Configuration.Types.EntityTypeConfiguration.ConfigureTablesAndConditions(System.Data.Entity.Core.Mapping.StorageEntityTypeMapping, System.Data.Entity.Core.Metadata.Edm.DbDatabaseMapping, System.Data.Entity.Core.Common.DbProviderManifest)
000000b4ddc79620 00007ffda6c055c0 System.Data.Entity.ModelConfiguration.Configuration.ModelConfiguration.ConfigureEntityTypes(System.Data.Entity.Core.Metadata.Edm.DbDatabaseMapping, System.Data.Entity.Core.Common.DbProviderManifest)
000000b4ddc79680 00007ffda6c05474 System.Data.Entity.ModelConfiguration.Configuration.ModelConfiguration.Configure(System.Data.Entity.Core.Metadata.Edm.DbDatabaseMapping, System.Data.Entity.Core.Common.DbProviderManifest)
000000b4ddc796d0 00007ffda69ae5c2 System.Data.Entity.DbModelBuilder.Build(System.Data.Entity.Core.Common.DbProviderManifest, System.Data.Entity.Infrastructure.DbProviderInfo)
000000b4ddc79740 00007ffda6649ccf System.Data.Entity.DbModelBuilder.Build(System.Data.Common.DbConnection)
000000b4ddc79780 00007ffda7b4b2d3 System.Data.Entity.Infrastructure.EdmxWriter.WriteEdmx(System.Data.Entity.DbContext, System.Xml.XmlWriter)
000000b4ddc797c0 00007ffda7b4acbe Class125.smethod_0(System.Data.Entity.DbContext)
000000b4ddc79820 00007ffda7b4aba4 Class617.smethod_22(System.Data.Entity.DbContext)
000000b4ddc79860 00007ffda7b4aa90 Class617.smethod_27(System.Data.Entity.DbContext)
000000b4ddc798c0 00007ffda7b3e9ec DbContextExtensions.GetModel(System.Data.Entity.DbContext)
000000b4ddc79910 00007ffda7b3e49b Class124.smethod_0(System.Data.Entity.DbContext, System.String)
000000b4ddc79950 00007ffda7b3d6c3 Class486.smethod_3[[System.__Canon, mscorlib]](System.Data.Entity.DbContext, Z.BulkOperations.BulkOperation`1<System.__Canon>, System.Collections.Generic.IEnumerable`1<System.__Canon>, System.Collections.Generic.List`1<System.Object>)
000000b4ddc79a00 00007ffda7b36871 DbContextExtensions.BulkInsert[[System.__Canon, mscorlib]](System.Data.Entity.DbContext, System.Collections.Generic.IEnumerable`1<System.__Canon>, System.Action`1<Z.EntityFramework.Extensions.EntityBulkOperation`1<System.__Canon>>)
000000b4ddc79ab0 00007ffda7b32c65 xxx.EFRepository`1[[System.__Canon, mscorlib]].BulkInsert(System.__Canon[])
...

From the stack information, there are probably the following three points of information:

  1. Using EF to do bulk insert operation BulkInsert .
  2. Used the Z.EntityFramework plugin.
  3. A lot of Build, Configure words, seems to be what configuration, build or something.

3. Is it caused by inserting too much data?

The first thing that comes to mind is that list is too large. For verification, you can use !clrstack -a to export the list parameter of the BulkInsert method.


0:112> !clrstack -a
OS Thread Id: 0x3278 (112)
        Child SP               IP Call Site
000000b4ddc79b90 00007ffda7b31ee8 xxx.BLL.BaseService`1[[System.__Canon, mscorlib]].BulkInsert(System.__Canon[])
    PARAMETERS:
        this (0x000000b4ddc79d10) = 0x000001fa14bbb630
        _tArr (0x000000b4ddc79d18) = 0x000001fa14c1a2f8

0:112> !do 0x000001fa14c1a2f8
Name:        xxx.EntityModel.xxx[]
MethodTable: 00007ffda9437968
EEClass:     00007ffe02f556b0
Size:        56(0x38) bytes
Array:       Rank 1, Number of elements 4, Type CLASS (Print Array)
Fields:
None

Judging from the output, the current list.length=4 is very confusing. Since the heap is all at balance , are there several threads attacking? To verify, use DbContextExtensions.BulkInsert to search for keywords on all managed thread stacks.

It can be seen that there are currently 10 onslaughts. Looking at their lists in turn, they are not big. I am confused.

4. Pre-judgment of the problem

With these ideas, I always feel that the reason for triggering the GC is too strange, but it is certain that the problem lies in the Z.EntityFramework plug-in. According to the traditional experience of the 4S store , it is no problem to just replace it without repairing, because I have no problem with Z.EntityFramework If you are not familiar with it, you can only give it to a friend.

Strange to say, my friend found a strange phenomenon the next day, saying that every time the CPU burst high, there was an abnormal restart of w3wp , and after the restart, because Z.EntityFramework needs to be warmed up, the subsequent request blocked the CPU stage. High sex.

From a friend's message and the dump analysis just now, the problem can basically be located. The operations of Build, Configure and banlance can be explained, and it is also found that this so-called preheating does not achieve serialization , but 10 The threads come together until the warmup ends and the CPU drops.

Three: Summary

In general, this accident with a high CPU burst is due to the unexpected restart of the w3wp process, which led to multi-threaded concurrent preheating of Z.EntityFramework , which resulted in multiple GC triggers during the preheating process. As for why w3wp was unexpectedly terminated , this is another topic, but the good news is that a friend found the problem code from the crash dump that was captured in the following days.


一线码农
366 声望1.6k 粉丝