内存泄漏Bug现场
一个做BI
数据展示的服务在一个晚上重启了5次,由于是通过k8s容器编排,服务挂了以后会自动重启,所以服务还能继续提供服务。
第一时间先上日志系统查看错误日志,发现如下报错:
java.lang.OutOfMemoryError ERROR java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.start(CloseableHttpAsyncClientBase.java:83)
at org.elasticsearch.client.RestClientBuilder.build(RestClientBuilder.java:190)
at com.xiaohongshu.fls.jbds.handler.JbdsImpl.get_realtime_trend_data_from_es(JbdsImpl.java:637)
at com.xiaohongshu.fls.jbds.handler.JbdsImpl.get_seller_trend(JbdsImpl.java:316)
at com.xiaohongshu.fls.jbds.handler.JbdsImpl$$FastClassBySpringCGLIB$$bd2466f7.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:738)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:85)
at com.xiaohongshu.fls.jbds.aspect.RpcMethodExceptionAspect.requestControllerLog(RpcMethodExceptionAspect.java:33)
at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:629)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:618)
at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:673)
at com.xiaohongshu.fls.jbds.handler.JbdsImpl$$EnhancerBySpringCGLIB$$d374b954.get_seller_trend(<generated>)
at sun.reflect.GeneratedMethodAccessor81.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.xiaohongshu.infra.rpc.core.ThriftServerBaseProxy.call(ThriftServerBaseProxy.java:119)
at com.xiaohongshu.infra.rpc.core.ThriftServerCGlibProxy.intercept(ThriftServerCGlibProxy.java:27)
at com.xiaohongshu.fls.jbds.handler.JbdsImpl$$EnhancerByCGLIB$$19473b8f.get_seller_trend(<generated>)
at com.xiaohongshu.fls.rpc.jbds.JBusinessDataService$Processor$get_seller_trend.getResult(JBusinessDataService.java:1450)
at com.xiaohongshu.fls.rpc.jbds.JBusinessDataService$Processor$get_seller_trend.getResult(JBusinessDataService.java:1435)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518)
at org.apache.thrift.server.Invocation.run(Invocation.java:18)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
发现是OOM
的错误,并且有unable to create new native thread
的错误信息,笔者的第一直觉是创建了大量线程从而导致堆外内存空间不足。
随即去CAT
监控系统上去查看线程的活跃情况,如下图:
发现服务的活跃线程数达到了1W左右,这显然有问题。服务本身是一个Thrift Server
,开起的worker
线程数为200,再加上一些其他IO
线程,总的线程数不会超过300(服务内自己没有显示创建线程或者使用线程池)。
查找线索
笔者登录到线上的Docker实例上,通过jmap -histo:live pid
命令,查看JVM中存活的对象,输出如下内容:
num #instances #bytes class name
----------------------------------------------
1: 19303 656995672 [B
2: 116670 13942144 [C
3: 15645 5298648 [I
4: 10098 3796848 java.lang.Thread
5: 75817 3639216 java.util.HashMap
6: 113889 2733336 java.lang.String
7: 27521 2421848 java.lang.reflect.Method
8: 68662 2197184 java.util.concurrent.ConcurrentHashMap$Node
9: 10041 1767032 [J
10: 372 1525568 [Ljava.nio.ByteBuffer;
11: 36807 1472280 java.util.LinkedHashMap$Entry
12: 16504 1352488 [Ljava.util.HashMap$Node;
13: 40687 1301984 java.lang.ThreadLocal$ThreadLocalMap$Entry
14: 11495 1277344 java.lang.Class
15: 29903 1196120 java.util.WeakHashMap$Entry
16: 70474 1127584 java.lang.Object
17: 15346 918592 [Ljava.lang.Object;
18: 51556 824896 java.util.HashSet
19: 25528 816896 java.util.HashMap$Node
20: 9989 812176 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
21: 9661 792032 [Ljava.util.WeakHashMap$Entry;
22: 9504 760320 org.apache.http.impl.nio.reactor.BaseIOReactor
23: 30551 733224 java.util.concurrent.ConcurrentLinkedQueue$Node
24: 9914 713808 sun.nio.ch.EPollArrayWrapper
25: 9914 713808 sun.nio.ch.EPollSelectorImpl
26: 29682 712368 java.util.concurrent.ConcurrentLinkedQueue
27: 3525 694672 [Ljava.util.concurrent.ConcurrentHashMap$Node;
28: 15664 501248 java.lang.ref.WeakReference
29: 8364 468384 java.util.LinkedHashMap
30: 9656 463488 java.util.WeakHashMap
31: 20064 436536 [Ljava.lang.Class;
32: 9267 370680 java.security.AccessControlContext
33: 4856 349632 java.lang.reflect.Field
34: 10278 328896 java.lang.ref.ReferenceQueue
35: 9914 317248 sun.nio.ch.AllocatedNativeObject
36: 4955 317120 java.util.concurrent.ConcurrentHashMap
37: 7597 303880 java.lang.ref.SoftReference
38: 9245 293568 [Ljava.security.ProtectionDomain;
39: 11484 275616 java.util.ArrayList
40: 3330 239760 org.springframework.core.annotation.AnnotationAttributes
41: 9989 239736 java.lang.ThreadLocal$ThreadLocalMap
42: 9951 238824 java.util.Collections$SynchronizedSet
43: 9922 238128 java.util.BitSet
44: 9504 228096 org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker
45: 5460 207944 [Ljava.lang.String;
46: 2459 196720 java.lang.reflect.Constructor
47: 12103 193648 java.util.HashMap$KeySet
48: 8003 192072 java.beans.MethodRef
49: 3657 175536 org.aspectj.weaver.reflect.ShadowMatchImpl
50: 10339 165424 java.util.concurrent.atomic.AtomicBoolean
51: 10281 164496 java.lang.ref.ReferenceQueue$Lock
52: 10203 163248 java.util.Collections$UnmodifiableSet
53: 2913 163128 java.beans.MethodDescriptor
54: 2547 161536 [Ljava.lang.reflect.Method;
55: 9914 158624 java.nio.channels.spi.AbstractSelector$1
56: 9913 158608 sun.nio.ch.Util$3
57: 5481 131544 org.springframework.core.MethodClassKey
58: 3707 118624 java.util.LinkedList
59: 3637 116384 org.aspectj.weaver.patterns.ExposedState
60: 2047 114632 java.lang.Class$ReflectionData
61: 1155 110880 org.springframework.beans.GenericTypeAwarePropertyDescriptor
62: 399 109984 [Ljava.lang.Thread;
63: 4387 105288 sun.reflect.generics.tree.SimpleClassTypeSignature
64: 1388 99936 java.beans.PropertyDescriptor
65: 4087 90560 [Ljava.lang.reflect.Type;
66: 5361 85776 org.springframework.core.annotation.AnnotationUtils$DefaultValueHolder
67: 1296 82944 io.netty.buffer.PoolSubpage
68: 4387 82520 [Lsun.reflect.generics.tree.TypeArgument;
69: 674 75488 org.springframework.boot.loader.jar.JarEntry
70: 1848 73920 java.util.TreeMap$Entry
71: 3061 73464 sun.reflect.annotation.AnnotationInvocationHandler
72: 2737 65688 java.util.LinkedList$Node
73: 3989 63824 sun.reflect.generics.tree.ClassTypeSignature
74: 858 61776 org.apache.ibatis.mapping.ResultMapping
75: 832 53248 org.springframework.core.MethodParameter
76: 78 51168 io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
77: 2120 50880 java.util.Collections$UnmodifiableRandomAccessList
78: 3165 50640 java.util.LinkedHashMap$LinkedKeySet
79: 2419 49728 [Lsun.reflect.generics.tree.FieldTypeSignature;
标红的这个类org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker
非常值得怀疑,有9504个实例,从类名看是一个实现了Reactor
模式的http
客户端,了解Reactor
模式的同学都知道,其包含了一个accept
线程处理连接事件,N
个IO bounding
的线程处理read
、write
事件,和M
个worker
线程处理业务逻辑。
进一步查找
接下来需要查找哪里使用了这个http客户端,在build.gradle的文件里看到依赖了ES
的http client
group: 'org.elasticsearch.client', name: 'elasticsearch-rest-client', version: '6.3.2'
通过全局搜索,发现了如下的代码:
private String get_realtime_data_from_es (String seller_id) throws IOException{
Date current_date = new Date( );
SimpleDateFormat sdf = new SimpleDateFormat ("yyyyMMdd");
String date=sdf.format(current_date);
RestClient restClient = RestClient.builder(
new HttpHost("xxxxxxxxx", 9200, "http"),
new HttpHost("xxxxxxxxx", 9201, "http")).build();
org.elasticsearch.client.Response response = restClient.performRequest("GET", "/seller/"+date+"/"+seller_id);
restClient.close();
return EntityUtils.toString(response.getEntity());
}
笔者发现这里调用ES
的RestClient
的非常可疑,居然是每次调用创建一个对象,而不是使用单例模式。
使用过RestTemplate
或者HttpClient
的同学都知道,Http
客户端一般都是通过单例的方式注入到Spring
容器中,客户端都是线程安全的,多线程情况下不会出现串包的情况。(具体线程安全的实现机制,可参考笔者之前的博客:《Http请求连接池-HttpClient的AbstractConnPool源码分析》,地址:https://segmentfault.com/a/11...)
源码探究
直接进到RestClientBuilder
类的build()
方法中一探究竟,代码如下:
public RestClient build() {
if (failureListener == null) {
failureListener = new RestClient.FailureListener();
}
CloseableHttpAsyncClient httpClient = AccessController.doPrivileged(new PrivilegedAction<CloseableHttpAsyncClient>() {
@Override
public CloseableHttpAsyncClient run() {
return createHttpClient();
}
});
RestClient restClient = new RestClient(httpClient, maxRetryTimeout, defaultHeaders, hosts, pathPrefix, failureListener);
httpClient.start();
return restClient;
}
build
方法是实际创建RestClient
的地方,设置了超时时间、host等参数。再点进start
方法,他是抽象类CloseableHttpAsyncClient
的一个抽象方法,具体实现在实现类CloseableHttpAsyncClientBase
中,如下:
@Override
public void start() {
if (this.status.compareAndSet(Status.INACTIVE, Status.ACTIVE)) {
if (this.reactorThread != null) {
this.reactorThread.start();
}
}
}
其调用了成员变量reactorThread
的start
方法,而成员变量reactorThread
是一个Thread类,它在构造方法中初始化,如下:
public CloseableHttpAsyncClientBase(
final NHttpClientConnectionManager connmgr,
final ThreadFactory threadFactory,
final NHttpClientEventHandler handler) {
super();
this.connmgr = connmgr;
if (threadFactory != null && handler != null) {
this.reactorThread = threadFactory.newThread(new Runnable() {
@Override
public void run() {
try {
final IOEventDispatch ioEventDispatch = new InternalIODispatch(handler);
connmgr.execute(ioEventDispatch);
} catch (final Exception ex) {
log.error("I/O reactor terminated abnormally", ex);
} finally {
status.set(Status.STOPPED);
}
}
});
} else {
this.reactorThread = null;
}
this.status = new AtomicReference<Status>(Status.INACTIVE);
}
可见,每创建一个CloseableHttpAsyncClient
对象,就会创建一个reactorThread
线程,而connmgr.execute(ioEventDispatch)
是一个永久for循环执行的方法,所以线程的run方法不会主动退出,即,reactorThread
线程不会销毁。
问题解决
找到bug所在后,结局方案很简单,只需将ES
的RestClient
通过单例的方式注入到服务中,即可解决堆外内存泄漏的问题。
总结
查找bug时,需要从多个方面去定位,通过尽可能多的现场信息去定量分析,监控、线上机器、源码的查看都需要,深入下去,会有不少收获。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。