One: Background
1. Storytelling
At the beginning of this month, a friend found me on wx and said that after a period of time, his api would only have requests but no responses. The screenshots are as follows:
From the description of my friend, it seems that the program is stuck by something. This kind of stuck problem is relatively simple to solve. Next, I will use windbg to analyze it for you.
Two: Windbg analysis
1. Request What is the request doing?
Since my friend said that the api has a request but no response, how can I verify my friend's words, right? We all know that .NET uses HttpContext
to represent a request. The implication is that we can grab the duration attribute under HttpContext. There is a !whttp
command in Netext that can help us.
0:000> !whttp
HttpContext Thread Time Out Running Status Verb Url
000000563bf803b0 42 00:01:50 00:01:24 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x-HN
000000563bf84660 -- 00:01:50 Finished 200 GET http://xxx.com:30003/
000000563c4a0470 51 00:01:50 00:00:12 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2C
00000056bbf63590 30 00:01:50 00:02:41 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx-B2C
00000056bc82a038 -- 00:01:50 Finished 200 GET http://localhost:30003/
00000056bc84a3e8 44 00:01:50 00:00:51 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x
00000056bc8671c8 46 00:01:50 00:00:45 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx-B2C
000000573bf44698 35 00:01:50 00:02:39 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x
000000573bf483c0 33 00:01:50 00:02:41 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x-HN
000000573bf97e80 40 00:01:50 00:02:32 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=ZJB2C
000000573c583b08 -- 00:01:50 Finished 200 GET http://localhost:30003/
000000573c589ec8 -- 00:01:50 Finished 200 GET http://xxx.com:30003/Wms/xxx/xxx/xxx
000000573c760e28 -- 00:01:50 Finished 200 POST http://xxx.com:30003/Wms/xxx/xxx/xxx
000000573c95f990 48 00:01:50 00:00:31 200 POST http://xxx.com:30003/Wms/Common/xxx?xxx=xxx&xxx=x-HN
00000057bbf4f8e8 31 00:01:50 00:02:12 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x
00000057bc080340 50 00:01:50 00:00:19 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x
000000583c4aee80 43 00:01:50 00:01:11 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2B
000000583c4d0c50 53 00:01:50 00:00:01 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2B
00000058bbf8f1a0 34 00:01:50 00:02:22 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2B
000000593bfe1758 41 00:01:50 00:01:22 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx2C
000000593c892160 -- 00:01:50 Finished 200 GET http://xxx.com:30003/Wms/xxx/xxx/xxxJob
000000593ca813b0 45 00:01:50 00:00:30 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx-HN
000000593caa45d8 -- 00:01:50 Finished 200 GET http://xxx.com:30003/
00000059bc1ad808 32 00:01:50 00:01:45 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=xxx-B2C
00000059bc1c3d70 36 00:01:50 00:01:29 200 POST http://xxx.com:30003/Wms/xxx/xxx?xxx=xxx&xxx=x
25 HttpContext object(s) found matching criteria
From the Running
column, it can be seen that most of the requests have reached more than 1 minute, which also verifies the stuck problem mentioned by friends. According to experience, you can take Running
column is located, that is, the above 30
and 33
. Threads, what are they doing?
2. Explore the longest running thread
Next, cut to 30
and 33
and look at their thread stacks.
0:000> ~30s
ntdll!NtWaitForSingleObject+0xa:
00007ffd`b81f024a c3 ret
0:030> !clrstack
OS Thread Id: 0x29d0 (30)
Child SP IP Call Site
0000005acc3ac590 00007ffdb81f024a [PrestubMethodFrame: 0000005acc3ac590] xxx.xxx.RedisConnectionHelp.get_Instance()
0000005acc3ac850 00007ffd4dd78911 xxx.xxx.RedisCache..ctor(Int32, System.String)
0000005acc3ac8c0 00007ffd4dd78038 xxx.xxx.CacheByRedis.HashGet[[System.__Canon, mscorlib]](System.String, System.String, Int32)
0000005acc3ac968 00007ffdabef1f7c [StubHelperFrame: 0000005acc3ac968]
0000005acc3ac9c0 00007ffd4dd77f18 xxx.xxx.Cache.xxx.GetCacheNotAreaDataEntity[[System.__Canon, mscorlib]](System.String, System.String, System.String)
...
0:030> ~33s
ntdll!NtWaitForMultipleObjects+0xa:
00007ffd`b81f07ba c3 ret
0:033> !clrstack
OS Thread Id: 0x3ad4 (33)
Child SP IP Call Site
0000005accabae90 00007ffdb81f07ba [GCFrame: 0000005accabae90]
0000005accabafb8 00007ffdb81f07ba [HelperMethodFrame_1OBJ: 0000005accabafb8] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
0000005accabb0d0 00007ffdaac60d64 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
0000005accabb160 00007ffdaac5b4bb System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken)
0000005accabb1d0 00007ffdab5a01d1 System.Threading.Tasks.Task.InternalWait(Int32, System.Threading.CancellationToken)
0000005accabb2a0 00007ffdab59cfa7 System.Threading.Tasks.Task`1[[System.__Canon, mscorlib]].GetResultxxx(Boolean)
0000005accabb2e0 00007ffd4d8d338f xxx.Config.xxx.Config`1[[System.__Canon, mscorlib]].GetConfig(xxx.Config.Model.ConfigListener, System.Func`2<xxx.Config.Request.GetConfigRequest,System.Threading.Tasks.Task`1<System.String>>)
0000005accabb340 00007ffd4d8d2f40 xxx.Config.xxx.Config`1[[System.__Canon, mscorlib]].get_Item(System.String, System.String)
0000005accabb3c0 00007ffd4dd78f7f xxx.Util.BaseConfig.get_GetRedisConn()
0000005accabb440 00007ffd4dd78e9c xxx.xxx.RedisConnectionHelp.GetConnectionString()
0000005accabb4a0 00007ffd4dd789cb xxx.xxx.RedisConnectionHelp..cctor()
0000005accabb940 00007ffdabef6953 [GCFrame: 0000005accabb940]
0000005accabc5b0 00007ffdabef6953 [PrestubMethodFrame: 0000005accabc5b0] xxx.xxx.RedisConnectionHelp.get_Instance()
0000005accabc870 00007ffd4dd78911 xxx.xxx.RedisCache..ctor(Int32, System.String)
0000005accabc8e0 00007ffd4dd78038 xxx.xxx.CacheByRedis.HashGet[[System.__Canon, mscorlib]](System.String, System.String, Int32)
0000005accabc988 00007ffdabef1f7c [StubHelperFrame: 0000005accabc988]
0000005accabc9e0 00007ffd4dd77f18 xxx.Core.Cache.xxx.GetCacheNotAreaDataEntity[[System.__Canon, mscorlib]](System.String, System.String, System.String)
...
The above information is not difficult to find 30
number of threads are stuck in RedisConnectionHelp.get_Instance()
at 33
Line has entered RedisConnectionHelp.get_Instance()
method, and finally GetConfig()
waiting for the results Result at the press of thumb is, 30
number of threads are evidently waiting for the lock, 33
No. Waiting for the asynchronous result, the next breakthrough point is to explore the code at RedisConnectionHelp.Instance
3. Find the problem code
Next, use the decompilation tool ILSpy to find the problem code.
public static class RedisConnectionHelp
{
public static ConnectionMultiplexer Instance
{
get
{
if (_instance == null)
{
lock (Locker)
{
if (_instance == null || !_instance.IsConnected)
{
_instance = GetManager();
}
}
}
return _instance;
}
}
}
Thread No. 30 is really stuck at Locker
. Next, dig into the GetManager()
method executed by No. 33 thread. The simplified code is as follows:
public T this[string dataId, string key = ""]
{
get
{
try
{
string config = GetConfig(configListener, new NacosConfigClient(Base.NacosConfiguration).DoGetConfigAsync);
return JsonConvert.DeserializeObject<T>(config);
}
catch (Exception ex)
{
return default(T);
}
}
}
private string GetConfig(ConfigListener listener, Func<GetConfigRequest, Task<string>> action)
{
var text2 = action(new GetConfigRequest
{
DataId = listener.DataId,
Group = listener.Group,
Tenant = text
}).Result;
return text2;
}
internal async Task<string> DoGetConfigAsync(GetConfigRequest request)
{
IRestResponse restResponse = await HttpUtil.Request(currentServerAddr, Method.GET, request.ParamValues(), xxx);
return restResponse.Content;
}
It can be seen that the code is stuck on Result
and waits indefinitely. Here I think of the synchronization context. I think his program is
.NET 4.8
under ASP.NET MVC
. I remember that the good context should be AspNetSynchronizationContext
. Article: A Task.Result deadlocks, how can I write this code? , there are about four solutions.
- Use .ConfigureAwait(false)
- change to fully asynchronous
- Wrap another layer with Task.
- change to full sync
Three: Summary
In fact, this accident is mainly due asynchronous code.Result
synchronous code. There are many articles criticizing this phenomenon, and this
pit of synchronization context, the suggestion to friends is to change to 161e777b911241 full synchronization , and the deadlock problem also disappears.
Haha, so happy for my friends!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。