One: background
1. Tell a story
I have written a few real cases of memory skyrocketing, which is a bit numb. change my taste and share a case of 160b6e62bac260 CPU burst. Some time ago, a friend found me on wx and said that one of his old projects often accepts The warning message to
CPU > 90%
is quite embarrassing.
Now that you find me, use windbg to analyze and what else can I do.
Two: windbg analysis
1. Exploration site
Since CPU > 90%
said, then let me verify if it is really the case?
0:359> !tp
CPU utilization: 100%
Worker Thread: Total: 514 Running: 514 Idle: 0 MaxLimit: 2400 MinLimit: 32
Work Request in Queue: 1
Unknown Function: 00007ff874d623fc Context: 0000003261e06e40
--------------------------------------
Number of Timers: 2
--------------------------------------
Completion Port Thread:Total: 2 Free: 2 MaxFree: 48 CurrentLimit: 2 MaxLimit: 2400 MinLimit: 32
From the perspective of the hexagram, it is spectacular. The CPU is directly full, and the 514 threads in the thread pool are also running at full capacity. So what are they all running? First of all, I have to wonder if these threads are locked by any lock.
2. View the synchronization block table
Observe the lock situation and check the synchronization block table first. After all, everyone likes to use lock to play multi-thread synchronization. You can use the !syncblk
command to view.
0:359> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
53 000000324cafdf68 498 0 0000000000000000 none 0000002e1a2949b0 System.Object
-----------------------------
Total 1025
CCW 3
RCW 4
ComClassFactory 0
Free 620
Let me go, this hexagram looks strange, what the hell is MonitorHeld=498
? ? The textbooks say: owner + 1 , waiter + 2
, so what you see with the naked eye will always be an odd number. What does the even number mean? After checking the magical StackOverflow, it can be summarized into the following two situations:
- Memory corruption
This situation is more difficult than winning the lottery, and I firmly believe that this kind of luck will not go away. . .
- lock convoy (lock escort)
Some time ago I shared a real case: recorded a .NET CPU burst analysis of a travel agency website , it is because of the CPU burst caused by lock convoy, it is really small, and I encountered it again. . . To make it easier for everyone to understand, I'll post the picture.
After reading this picture, you should understand that a thread frequently competes for locks in the time slice, so it is easy to appear that a thread holding a lock has just exited, and there is no real thread waiting for the lock at this time. Holding the lock, the dump just caught is such a time difference. In other words, the current 498 is all the count of waiter threads, that is, 249 waiter threads. Then you can verify it and adjust the thread stacks of all threads. Come out, and then search for the Monitor.Enter
keyword.
It can be seen from the figure that there are currently 220 threads stuck at Monitor.Enter
, and 29 seem to be missing. No matter what, a large number of threads are stuck anyway. From the stack, it looks like they are stuck after setting the context xxx.Global.PreProcess
Yes, in order to satisfy curiosity, I will export the problem code.
3. View the problem code
Still use the old command !ip2md + !savemodule
.
0:359> !ip2md 00007ff81ae98854
MethodDesc: 00007ff819649fa0
Method Name: xxx.Global.PreProcess(xxx.JsonRequest, System.Object)
Class: 00007ff81966bdf8
MethodTable: 00007ff81964a078
mdToken: 0000000006000051
Module: 00007ff819649768
IsJitted: yes
CodeAddr: 00007ff81ae98430
Transparency: Critical
0:359> !savemodule 00007ff819649768 E:\dumps\PreProcess.dll
3 sections in file
section 0 - VA=2000, VASize=b6dc, FileAddr=200, FileSize=b800
section 1 - VA=e000, VASize=3d0, FileAddr=ba00, FileSize=400
section 2 - VA=10000, VASize=c, FileAddr=be00, FileSize=200
Then use ILSpy to open the problem code, the screenshot is as follows:
Nima, sure enough, every DataContext.SetContextItem()
method has a lock, which perfectly hits lock convoy
.
4. Is it really over like this?
Originally I was going to report, but I thought that more than 500 thread stacks were called up, and I was idle, so I just scanned it. As a result, I went and unexpectedly found that 134 threads were stuck at ReaderWriterLockSlim.TryEnterReadLockCore
, as shown in the following figure:
As you can see from the name, this is an optimized version of the read-write lock: ReaderWriterLockSlim
. Why are 138 threads stuck here? Really curious, export the problem again.
internal class LocalMemoryCache : ICache
{
private string CACHE_LOCKER_PREFIX = "xx_xx_";
private static readonly NamedReaderWriterLocker _namedRwlocker = new NamedReaderWriterLocker();
public T GetWithCache<T>(string cacheKey, Func<T> getter, int cacheTimeSecond, bool absoluteExpiration = true) where T : class
{
T val = null;
ReaderWriterLockSlim @lock = _namedRwlocker.GetLock(cacheKey);
try
{
@lock.EnterReadLock();
val = (MemoryCache.Default.Get(cacheKey) as T);
if (val != null)
{
return val;
}
}
finally
{
@lock.ExitReadLock();
}
try
{
@lock.EnterWriteLock();
val = (MemoryCache.Default.Get(cacheKey) as T);
if (val != null)
{
return val;
}
val = getter();
CacheItemPolicy cacheItemPolicy = new CacheItemPolicy();
if (absoluteExpiration)
{
cacheItemPolicy.AbsoluteExpiration = new DateTimeOffset(DateTime.Now.AddSeconds(cacheTimeSecond));
}
else
{
cacheItemPolicy.SlidingExpiration = TimeSpan.FromSeconds(cacheTimeSecond);
}
if (val != null)
{
MemoryCache.Default.Set(cacheKey, val, cacheItemPolicy);
}
return val;
}
finally
{
@lock.ExitWriteLock();
}
}
After reading the above code, I probably want to implement a GetOrAdd operation for MemoryCache, and it seems that for the sake of safety, each cachekey is equipped with a ReaderWriterLockSlim. This logic is a bit weird. After all, MemoryCache itself brings thread safety to realize this logic. Methods, such as:
public class MemoryCache : ObjectCache, IEnumerable, IDisposable
{
public override object AddOrGetExisting(string key, object value, DateTimeOffset absoluteExpiration, string regionName = null)
{
if (regionName != null)
{
throw new NotSupportedException(R.RegionName_not_supported);
}
CacheItemPolicy cacheItemPolicy = new CacheItemPolicy();
cacheItemPolicy.AbsoluteExpiration = absoluteExpiration;
return AddOrGetExistingInternal(key, value, cacheItemPolicy);
}
}
5. Is there any problem with ReaderWriterLockSlim?
Haha, surely many friends ask this? 😅😅😅, indeed, what's the problem with this? First, take a look at how many ReaderWriterLockSlims are currently in the _namedRwlocker collection? It's easy to verify, just search on the managed heap.
0:359> !dumpheap -type System.Threading.ReaderWriterLockSlim -stat
Statistics:
MT Count TotalSize Class Name
00007ff8741631e8 70234 6742464 System.Threading.ReaderWriterLockSlim
You can see that the current managed heap has ReaderWriterLockSlim of 7w+, what can we do? ? ? Don’t forget, ReaderWriterLockSlim brings a Slim
because it can spin , so the 160b6e62bac982 spin
bit of CPU. If you zoom in a few hundred times? Can the CPU not be lifted?
Three: Summary
In general, there are two reasons why CPU reflected by this dump is full.
- The frequent contention and context switching caused by lock convoy gave the CPU a crit.
- ReaderWriterLockSlim's one hundred times
user mode spin gave the CPU another crit.
After knowing the reason, the response plan is simple.
- Batch operation, reduce the number of serialized locks, and don't play with the lock volume.
- Remove ReaderWriterLockSlim and use the thread-safe method that comes with MemoryCache.
More high-quality dry goods: see my GitHub: dotnetfly
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。