奇怪现象
你是否遇到过服务内部即使设置了线程超时时间,上游依旧报超时异常?
你是否遇到过下游服务快速响应,当前服务后续方法无耗时逻辑。但偏偏整体执行耗时极长?
你是否遇到过gc耗时规律性增长?却与流量不成比例?
你是否遇到过Eden空间突然降低且维持很久?
背景
那是一个平静的午后,突然的gc耗时告警惊扰了打工人专心编码的心。打开gc监控一看,发现虽然只有单个容器报警,但是大部分gc已经飙升到接近1s了。
将报警机器快速拉出负载后,开始查看分析gc情况。
- 没有full gc
- 年轻代频率大幅提高,耗时大幅提高
- 流量并未有较大起伏
- 使用的G1GC,设置的最大停顿时间是200ms
头痛医头
既然只是年轻代频率和耗时大幅提高,那不是手到擒来?--增加年轻代空间。
一个更大的年轻代,自然能让gc的频率降低,总吞吐提高。
进入容器,查看堆内存大小。发现服务设置的堆内存为10G,未手动设置Eden大小。
默认情况下年轻代占1/3。理论上不应该在流量没有起伏的情况下gc突然如此频繁。
查看gc日志
发现在某个时间段内,gc频率极高。同时年轻代大幅缩小。
日志内容如下
[Parallel Time: 206.4 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 3867072696.3, Avg: 3867072696.3, Max: 3867072696.4, Diff: 0.1]
[Ext Root Scanning (ms): Min: 179.4, Avg: 183.6, Max: 185.9, Diff: 6.5, Sum: 1468.4]
[Update RS (ms): Min: 2.3, Avg: 2.5, Max: 3.0, Diff: 0.7, Sum: 20.3]
[Processed Buffers: Min: 56, Avg: 76.2, Max: 100, Diff: 44, Sum: 610]
[Scan RS (ms): Min: 0.2, Avg: 0.3, Max: 0.4, Diff: 0.2, Sum: 2.2]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 17.6, Avg: 19.7, Max: 24.0, Diff: 6.3, Sum: 157.8]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.5]
[Termination Attempts: Min: 1, Avg: 100.8, Max: 128, Diff: 127, Sum: 806]
[GC Worker Other (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.8]
[GC Worker Total (ms): Min: 206.2, Avg: 206.3, Max: 206.4, Diff: 0.2, Sum: 1650.1]
[GC Worker End (ms): Min: 3867072902.5, Avg: 3867072902.6, Max: 3867072902.6, Diff: 0.1]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.7 ms]
[Other: 7.7 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 5.1 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 1.0 ms]
[Eden: 6120.0M(6120.0M)->0.0B(1356.0M) Survivors: 24.0M->24.0M Heap: 8625.7M(10.0G)->2506.0M(10.0G)]
上图可以分析出,Eden大幅缩小。
由于流量不变,young gc会更加频繁,从而大量存活对象不停被复制,最后导致单位时间内有gc耗时大幅提高。
上述猜测合情合理。为了验证想法,通过jvm监控查看分区使用情况。
观察上图,我们发现
图一已申请空间:
年轻代已申请空间快速抖动-->年轻代总空间降至低位-->年轻代总空间保持低位-->年轻代总空间恢复高位。
图二已使用空间:
年轻代正常gc-->随着总空间降低,年轻代可使用空间持续保持低位-->gc次数突增。
既然知道是年轻代原因,那就先着手解决Eden缩小的问题。
当前是G1GC,通过设置参数强制增大年轻代最小浮动空间解决-XX:G1NewSizePercent=5%--->60%。
修改完重启之后,无论gc频率和gc耗时都大幅降低。
整体数据:
单容器数据:
修改完后,无论总gc单次耗时还是gc频率都大幅降低。粗算整体降低至25%。young gc耗时报警消失。
东风又起
在长时间持续观察每个容器后,发现每个容器都迎来gc突刺,观察发现因为都发生了full gc。
但是以经验来看,G1GC有Mix gc机制,即在young gc时,会尝试通过混合gc的模式清理老年代。理论上不太可能产生full gc。那究竟发生了什么?
查看gc日志,发现并未出现关于mix gc的部分。那我们先了解下Mix gc以及触发机制。
Mixed GC触发条件
- 当达到IHOP阈值,-XX:InitiatingHeapOccupancyPercent(默认45%)时,即,当使用内存占到堆总大小的45%的时候,G1将开始并发标记阶段 + Mixed GC
- 当全局并发标记结束后,统计出所有可能被回收的垃圾占Heap的比例值,如果超过5%(-XX:G1HeapWastePercent默认值5%),就会触发多轮Mixed GC
- 调用System.gc,且开启-XX:ExplicitGCInvokesConcurrent
答案呼之欲出,mixed gc的触发阈值默认是45%(8b12版本之前是堆整体使用率,之后是老年代使用空间占比),但是年轻代最小也堆是60%,导致老年代无法达到40%,因此无法触发mixed gc。
于是,修改G1NewSizePercent和InitiatingHeapOccupancyPercent后我们继续观察。
修改不再出现full gc,gc周期也被拉长整体稳定增长。但是峰值gc还是高达1s。
我们继续查看gc日志,尝试继续优化。
精益求精
在仔细查看gc日志后,我发现大量类似以下的gc日志。
144392.185: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 9537847296 bytes, allocation request: 2097168 bytes, threshold: 4831838190 bytes (45.00 %), source: concurrent humongous allocation]
在 G1 中,当对象超过一个 Region 的 50% ,就会判定为大对象,如果一个对象太大,会连续使用多个Humongous区域存放,且占用老年代空间。因此尝试放大G1HeapRegionSize。
正常容器(Regionsize=4M)
debug容器(Regionsize=8M)
可以发现。修改完regionSize后,由于部分的大对象不再是大对象,因此不再进入老年代。从而提高了延长了老年代达到阈值的时间。同时年轻代老年代自动调整导致的毛刺更加平缓。同时根据上图也能发现年轻代整体的可使用空间也远高于小region。
至此,gc优化结束。
但是真的结束了么?为什么Eden会持续会缩小呢?
带着这个问题,开始继续研究。
柳暗花明
这里感谢张承老师的提醒。他提到曾经出现过这样的情况,可以通过-XX:G1LogLevel=finest打印更详细的日志,有可能是StringTable过大引起的扫描耗时,之前出现过类似的情况,原因是序列化问题导致StringTable过大。
扫描耗时!一语惊醒梦中人。
我们知道gc环节中,最关键的无非是根结点扫描和对象复制的过程。而其中对象复制又是耗时较多的。但是,但凡其他某个环节耗时上来了,在遇到设置的200ms的gc耗时时间后,必然意味着留给young gc复制对象的时间的缩短,而复制对象的时间被压缩必然伴随着Eden空间的缩小。至此,对Eden的缩小有了个猜测。这里,我们后续再做验证。
通过参数-XX:G1LogLevel=finest获取日志。
{Heap before GC invocations=214 (full 0):
garbage-first heap total 10485760K, used 5631601K [0x0000000540000000, 0x0000000541001400, 0x00000007c0000000)
region size 16384K, 192 young (3145728K), 16 survivors (262144K)
Metaspace used 147267K, capacity 154132K, committed 154240K, reserved 1183744K
class space used 17488K, capacity 18983K, committed 19072K, reserved 1048576K
2024-03-06T19:41:01.013+0800: 83631.062: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 201326592 bytes, new threshold 15 (max 15)
- age 1: 15568080 bytes, 15568080 total
- age 2: 6251328 bytes, 21819408 total
- age 3: 9652016 bytes, 31471424 total
- age 4: 7445392 bytes, 38916816 total
- age 5: 7072600 bytes, 45989416 total
- age 6: 6826088 bytes, 52815504 total
- age 7: 5464128 bytes, 58279632 total
- age 8: 5080296 bytes, 63359928 total
- age 9: 5480008 bytes, 68839936 total
- age 10: 6519056 bytes, 75358992 total
- age 11: 7353016 bytes, 82712008 total
- age 12: 6458120 bytes, 89170128 total
- age 13: 4577144 bytes, 93747272 total
- age 14: 5488152 bytes, 99235424 total
- age 15: 5709096 bytes, 104944520 total
83631.063: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 10026, predicted base time: 181.10 ms, remaining time: 18.90 ms, target pause time: 200.00 ms]
83631.063: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 176 regions, survivors: 16 regions, predicted young region time: 16.59 ms]
83631.063: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 176 regions, survivors: 16 regions, old: 0 regions, predicted pause time: 197.68 ms, target pause time: 200.00 ms]
, 0.2069120 secs]
[Parallel Time: 202.5 ms, GC Workers: 8]
[GC Worker Start (ms): 83631062.9 83631062.9 83631062.9 83631062.9 83631062.9 83631062.9 83631062.9 83631062.9
Min: 83631062.9, Avg: 83631062.9, Max: 83631062.9, Diff: 0.1]
[Ext Root Scanning (ms): 173.3 173.1 173.0 172.3 172.6 173.2 173.1 172.8
Min: 172.3, Avg: 172.9, Max: 173.3, Diff: 1.0, Sum: 1383.4]
[Thread Roots (ms): 0.1 0.8 0.8 0.8 0.8 0.7 0.7 0.7
Min: 0.1, Avg: 0.7, Max: 0.8, Diff: 0.7, Sum: 5.4]
[StringTable Roots (ms): 187.6 189.9 189.8 188.5 190.1 189.7 189.8 189.8
Min: 187.6, Avg: 189.4, Max: 190.1, Diff: 2.5, Sum: 1515.0]
[Universe Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[JNI Handles Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[ObjectSynchronizer Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.0
Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.2]
[FlatProfiler Roots (ms): 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Management Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[SystemDictionary Roots (ms): 0.0 0.0 0.0 1.4 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.2, Max: 1.4, Diff: 1.4, Sum: 1.4]
[CLDG Roots (ms): 2.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.4, Max: 2.8, Diff: 2.8, Sum: 2.8]
[JVMTI Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[CodeCache Roots (ms): 2.1 1.8 1.9 1.8 1.6 2.0 1.8 2.0
Min: 1.6, Avg: 1.9, Max: 2.1, Diff: 0.5, Sum: 14.9]
[CM RefProcessor Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Wait For Strong CLD (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Weak CLD Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[SATB Filtering (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Update RS (ms): 2.0 1.8 1.9 1.8 1.5 2.0 1.7 1.9
Min: 1.5, Avg: 1.8, Max: 2.0, Diff: 0.5, Sum: 14.6]
[Processed Buffers: 36 55 78 88 78 88 96 44
Min: 36, Avg: 70.4, Max: 96, Diff: 60, Sum: 563]
[Scan RS (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
[Code Root Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1
Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Object Copy (ms): 26.9 27.3 27.2 28.0 27.9 26.8 27.2 27.3
Min: 26.8, Avg: 27.3, Max: 28.0, Diff: 1.1, Sum: 218.7]
[Termination (ms): 0.0 0.0 0.1 0.0 0.1 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[Termination Attempts: 1 98 101 88 89 67 87 82
Min: 1, Avg: 76.6, Max: 101, Diff: 100, Sum: 613]
[GC Worker Other (ms): 0.0 0.0 0.1 0.0 0.2 0.0 0.1 0.0
Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5]
[GC Worker Total (ms): 202.2 202.2 202.3 202.2 202.4 202.2 202.2 202.1
Min: 202.1, Avg: 202.2, Max: 202.4, Diff: 0.2, Sum: 1617.7]
[GC Worker End (ms): 83631265.1 83631265.1 83631265.1 83631265.1 83631265.3 83631265.1 83631265.1 83631265.1
Min: 83631265.1, Avg: 83631265.1, Max: 83631265.3, Diff: 0.2]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.3 ms]
[Other: 4.1 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 2.6 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.1 ms]
[Parallel Redirty: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Redirtied Cards: 2746 1281 1119 473 463 0 0 0
Min: 0, Avg: 760.2, Max: 2746, Diff: 2746, Sum: 6082]
[Humongous Register: 0.0 ms]
[Humongous Total: 0]
[Humongous Candidate: 0]
[Humongous Reclaim: 0.0 ms]
[Humongous Reclaimed: 0]
[Free CSet: 0.2 ms]
[Young Free CSet: 0.2 ms]
[Non-Young Free CSet: 0.0 ms]
[Eden: 2816.0M(2816.0M)->0.0B(3088.0M) Survivors: 256.0M->256.0M Heap: 5499.6M(10.0G)->2699.5M(10.0G)]
Heap after GC invocations=215 (full 0):
garbage-first heap total 10485760K, used 2764320K [0x0000000540000000, 0x0000000541001400, 0x00000007c0000000)
region size 16384K, 16 young (262144K), 16 survivors (262144K)
Metaspace used 147267K, capacity 154132K, committed 154240K, reserved 1183744K
class space used 17488K, capacity 18983K, committed 19072K, reserved 1048576K
}
[Times: user=1.62 sys=0.00, real=0.20 secs]
2024-03-06T19:41:01.221+0800: 83631.270: Total time for which application threads were stopped: 0.2095341 seconds, Stopping threads took: 0.0002805 seconds
观察日志,我们发现StringTable的耗时接近200ms。那什么是Stringtable?
其是JVM提供的针对字符串的缓存,有兴趣的同学可以自行查询资料。这里简单说明下,在很多场景中,会有大量重复的字符串,JVM为了节约空间,通过一个stringtale对字符串进行缓存。在使用字面量或者调用String的intern方法后,会询问底层的StringTable中是否存在一样的字符串。如果存在则返回,如果不存在,则放入stringtable。从而实现字符串缓存的功能。
在确认了是StringTable引发的问题后,我们首先来验证一下。
- 拉出某个Eden已大幅降低的容器,保证线上不受影响。
- 使用jmap -histoy pid命令强制gc,观察gc效果。
发现full gc后,Stringtable耗时大幅降低,但是eden依旧缩小。和预期不符,难道不是这个原因引起的?
仔细观察full gc后的日志,发现gc的真实消耗的确已经大幅降低。
[Times: user=0.09 sys=0.05, real=0.02 secs]
但是为什么eden还是 4588.0M(4588.0M)->0.0B(2468.0M) ?
2024-03-02T17:45:27.190+0800: 400.411: Application time: 3.1242468 seconds
{Heap before GC invocations=10 (full 0):
garbage-first heap total 10485760K, used 2790066K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000)
region size 4096K, 632 young (2588672K), 18 survivors (73728K)
Metaspace used 136316K, capacity 142388K, committed 142592K, reserved 1173504K
class space used 16630K, capacity 17875K, committed 17920K, reserved 1048576K
2024-03-02T17:45:27.192+0800: 400.413: [Full GC (Heap Inspection Initiated GC) 2024-03-02T17:45:27.192+0800: 400.413: [Class Histogram (before full gc):
num #instances #bytes class name
----------------------------------------------
1: 7591768 1216542784 [C
2: 148811 488969976 [B
3: 205621 209905680 [I
4: 4033851 129083232 java.util.HashMap$Node
.....
.....
2024-03-02T17:50:33.086+0800: 706.307: Application time: 1.2055260 seconds
{Heap before GC invocations=12 (full 2):
garbage-first heap total 10485760K, used 4982437K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000)
region size 4096K, 1147 young (4698112K), 0 survivors (0K)
Metaspace used 136951K, capacity 143136K, committed 143360K, reserved 1175552K
class space used 16658K, capacity 17919K, committed 17920K, reserved 1048576K
2024-03-02T17:50:33.088+0800: 706.309: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 301989888 bytes, new threshold 15 (max 15)
- age 1: 43347440 bytes, 43347440 total
706.310: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 11409, predicted base time: 14.25 ms, remaining time: 185.75 ms, target pause time: 200.00 ms]
706.310: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1147 regions, survivors: 0 regions, predicted young region time: 183.96 ms]
706.310: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1147 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 198.21 ms, target pause time: 200.00 ms]
, 0.0212262 secs]
[Parallel Time: 16.3 ms, GC Workers: 8]
[GC Worker Start (ms): 706310.8 706310.8 706310.8 706310.8 706310.8 706310.9 706310.9 706310.9
Min: 706310.8, Avg: 706310.8, Max: 706310.9, Diff: 0.1]
[Ext Root Scanning (ms): 1.3 3.1 2.0 1.2 1.2 1.2 1.2 1.2
Min: 1.2, Avg: 1.5, Max: 3.1, Diff: 1.9, Sum: 12.3]
[Thread Roots (ms): 0.6 0.1 0.6 0.6 0.6 0.6 0.6 0.6
Min: 0.1, Avg: 0.5, Max: 0.6, Diff: 0.5, Sum: 4.3]
[StringTable Roots (ms): 1.8 0.0 0.7 1.9 1.9 1.9 1.8 1.9
Min: 0.0, Avg: 1.5, Max: 1.9, Diff: 1.9, Sum: 11.8]
[Universe Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[JNI Handles Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[ObjectSynchronizer Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[FlatProfiler Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0
Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Management Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[SystemDictionary Roots (ms): 0.0 0.0 1.2 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.1, Max: 1.2, Diff: 1.2, Sum: 1.2]
[CLDG Roots (ms): 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.4, Max: 3.0, Diff: 3.0, Sum: 3.0]
[JVMTI Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[CodeCache Roots (ms): 1.8 1.2 2.2 1.8 1.8 1.8 1.8 1.8
Min: 1.2, Avg: 1.8, Max: 2.2, Diff: 1.0, Sum: 14.2]
[CM RefProcessor Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Wait For Strong CLD (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Weak CLD Roots (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[SATB Filtering (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Update RS (ms): 1.5 1.1 1.5 1.6 1.6 1.5 1.6 1.6
Min: 1.1, Avg: 1.5, Max: 1.6, Diff: 0.6, Sum: 12.0]
[Processed Buffers: 171 58 79 117 132 101 91 133
Min: 58, Avg: 110.2, Max: 171, Diff: 113, Sum: 882]
[Scan RS (ms): 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.1
Min: 0.1, Avg: 0.2, Max: 0.3, Diff: 0.1, Sum: 1.7]
[Code Root Scanning (ms): 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5, Sum: 0.5]
[Object Copy (ms): 12.6 11.3 11.4 12.6 12.6 12.7 12.5 12.6
Min: 11.3, Avg: 12.3, Max: 12.7, Diff: 1.3, Sum: 98.4]
[Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Termination Attempts: 9 11 9 11 12 11 1 13
Min: 1, Avg: 9.6, Max: 13, Diff: 12, Sum: 77]
[GC Worker Other (ms): 0.3 0.0 0.1 0.1 0.1 0.1 0.0 0.0
Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 0.8]
[GC Worker Total (ms): 16.0 15.7 15.7 15.7 15.7 15.7 15.6 15.6
Min: 15.6, Avg: 15.7, Max: 16.0, Diff: 0.4, Sum: 125.7]
[GC Worker End (ms): 706326.8 706326.5 706326.5 706326.5 706326.5 706326.5 706326.5 706326.5
Min: 706326.5, Avg: 706326.5, Max: 706326.8, Diff: 0.3]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.6 ms]
[Other: 4.3 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.4 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.1 ms]
[Parallel Redirty: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Redirtied Cards: 1776 898 719 458 0 0 0 0
Min: 0, Avg: 481.4, Max: 1776, Diff: 1776, Sum: 3851]
[Humongous Register: 0.1 ms]
[Humongous Total: 15]
[Humongous Candidate: 15]
[Humongous Reclaim: 0.1 ms]
[Humongous Reclaimed: 15]
[Free CSet: 0.8 ms]
[Young Free CSet: 0.8 ms]
[Non-Young Free CSet: 0.0 ms]
[Eden: 4588.0M(4588.0M)->0.0B(2468.0M) Survivors: 0.0B->132.0M Heap: 4865.7M(10.0G)->372.6M(10.0G)]
Heap after GC invocations=13 (full 2):
garbage-first heap total 10485760K, used 381544K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000)
region size 4096K, 33 young (135168K), 33 survivors (135168K)
Metaspace used 136951K, capacity 143136K, committed 143360K, reserved 1175552K
class space used 16658K, capacity 17919K, committed 17920K, reserved 1048576K
}
[Times: user=0.09 sys=0.05, real=0.02 secs]
2024-03-02T17:50:33.110+0800: 706.331: Total time for which application threads were stopped: 0.0240503 seconds, Stopping threads took: 0.0002914 seconds
继续查看日志,发现真实耗时与预估耗时相差极大。
706.310: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1147 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 198.21 ms, target pause time: 200.00 ms]
, 0.0212262 secs]
[Parallel Time: 16.3 ms, GC Workers: 8]
[GC Worker Start (ms): 706310.8 706310.8 706310.8 706310.8 706310.8 706310.9 706310.9 706310.9
Min: 706310.8, Avg: 706310.8, Max: 706310.9, Diff: 0.1]
[Ext Root Scanning (ms): 1.3 3.1 2.0 1.2 1.2 1.2 1.2 1.2
Min: 1.2, Avg: 1.5, Max: 3.1, Diff: 1.9, Sum: 12.3]
[Thread Roots (ms): 0.6 0.1 0.6 0.6 0.6 0.6 0.6 0.6
Min: 0.1, Avg: 0.5, Max: 0.6, Diff: 0.5, Sum: 4.3]
[StringTable Roots (ms): 1.8 0.0 0.7 1.9 1.9 1.9 1.8 1.9
[Times: user=0.09 sys=0.05, real=0.02 secs]
2024-03-02T17:50:33.110+0800: 706.331: Total time for which application threads were stopped: 0.0240503 seconds, Stopping threads took: 0.0002914 seconds
因此猜测因为是预估,而预估是对存量历史数据的计算,因此gc的自动调整机制还未跟上。
等待观察结果,大约2分钟后预估时间持续逐渐减小,并接近真实耗时,Eden空间持续增大。
效果图:
至此我们再次确认是stringtable引起的问题。
Stringtable为什么这么大?
要研究Stringtable为什么这么大,我们首先要知道Stringtable是如何产生的。
Stringtable里的对象是可以根据字面量和intern产生。字面量没问题,不可能产生大量的stringtable引用,剩下的就是intern。那怎么判断哪里大幅调用了?
- arthas:但是arthas的实现原理是修改字节码插桩。string是常量类,无法实现监控调用。
- 全局搜索intern方法调用:但是底层存在不少调用代码,难以排查。
- 查看内存dump文件:从上面内容可知,young gc无法阻挡stringtable的增长,且full gc可以。因此在young gc结束之后进行dump,确保留下大量所需的数据。
在分析中间部分的string对象后,发现大量长度相等字母+数字的的字符串。猜测是罪魁祸首,复制后在日志中搜索,发现是下游响应的key。
结合网上文章与上述现象,我们初步判断可能就是【序列化key+随机key】导致的Stringtable逐渐增大。
在通过本地debug+底层代码分析后,终于发现jackson中调用intern的地方。
package com.fasterxml.jackson.core.util;
public final class InternCache extends ConcurrentHashMap<String,String>{
public String intern(String input) {
String result = get(input);
if (result != null) { return result; }
if (size() >= MAX_ENTRIES) {
synchronized (lock) {
if (size() >= MAX_ENTRIES) {
clear();
}
}
}
result = input.intern();
put(result, result);
return result;
}
{
再往上翻源码,可以发现是哈啰的soa中间件在解析响应值时,是通过jackson的序列化完成的。
package com.hellobike.soa.rpc.utils;
public class RpcParseUtil {
public static <T> T parseResult(Serializer serializer,
String resultJson, Type returnType) {
if (resultJson == null) {
return null;
}
if (returnType == void.class || returnType == Void.class || returnType == null) {
return null;
}
return (T) serializer.decode(new StringWrapperByteBuf(resultJson), returnType, null);
}
}
为什么jackson要这么做?
因为高并发接口场景下,会产生大量的序列化对象。而一般情况下,大量序列化对象的key都是固定不变的。如果不对key做缓存,那么绝大部分场景下,这些key都会成为重复的String对象存储在堆内存中。造成堆内存浪费。
而jackson作为序列化工具,一定是要考虑性能问题的。因此对这种重复的string对象做intern,实现字符串缓存。从而降低内存浪费。
至此,原因分析完毕。目前后续有两种解决途径
- 通知下游修改响应值,通过固定key的值避免序列化
- 通过修改此接口的序列化方式,关闭key缓存功能实现对随机key的缓存。
最后由于场景和配合问题, 在不同的服务分别采用了两种方式。
溯本求源
上述排查过程大量环节是由于巧合发现的排查线索,当遇到其他场景我们应该如何排查呢?
- 通过-XX:G1LogLevel=finest打印更详细的日志,确认gc耗时原因
- 如果是stringtable增大引起,可以通过以下几种方式获取
- 通过jdk的sa-jdi工具中的sun.jvm.hotspot.memory.StringTablel类,尝试获取内存中stringtable引用的对象
- 通过内存分析
- visualvm的Oql语句【select u from java.lang.String u where count(referrers(u)) == 0】分析young gc后依旧死亡的内存对象
- 通过mat或visualvm分析类似对象的引用关系查看周边string对象分析所属环节
确认是被ItemBean引用。
解决途径
既然知道原因了,那解决起来就比较方便了。
由于已经确认我们的场景是soa响应的值的key是一个接近随机数的值(可能并非随机数,只是几乎不重样),那接下来就有两条路径能解决这个问题。
问题数据格式
{
"msg": "ok",
"code": 0,
"data": {
"outputs": {
"resultMap": {
"d32d56e8a1b34457af26d94a02c16cba": 0.32004153617699793,
"5bc9df88acea4da8bc829de13aa3afb2": 0.434173387080695,
"4900533c94bc443ea086de0936bfb59e": 0.5555319629335832,
"575360f1d0544f95b4c430dd8759839a": 0.908782703771497,
"00fba5cbd24749999dd719f9ec338f4b": 0.5919287549198423,
"1b7d5916f0454847a8ab90ce2a49e3a1": 0.5869043763908277,
"22d5a24229374a4b9d04c9bf371e7eb7": 0.09601455380787982,
"5a35beafa5d8492f85e5aa6ca3223129": 0.5386071460222013,
"61488dab533b4ec4ba6ae1ea61802c9d": 0.22098097729212451,
"0aaebddbe3e44677a47f5f4becb6ea51": 0.11469034353064256,
"21d6c73689514d00ae72b5eb7d1a1785": 0.7861965800121863,
"3709b11d98194297b788e6068c14760f": 0.8005589118418589,
"cb1b093134d049f9bebc49e8cc3d791d": 0.22801438927340578,
"6718a455bcf1416fb1839c79254c4d7c": 0.936704202541788,
"1200f2d71e5944fcbd645d7084cfbfe8": 0.43563843328912855,
"6f9ebebbeec247119a3bcf2b7c52b76f": 0.9499259171150704,
"cea737f0c1cb4be0969d692196287b61": 0.6442754643503128,
"a1da06ba41d74faf9c88374bf302e578": 0.46847853036230913,
"b5f1177c0bfa4f67a69d67606665a87b": 0.19847468498415466,
"efe70c3f037e44cb995dd8a6a6aff8f1": 0.6239117074575771
}
},
"code": 0,
"cost": 43,
"seqNo": "1097369524402581504",
"extraOutputs": {
"traceId": "96939a6e0c65b3f4e6ebadcaa5e4c4e8",
"decision_version": 21
},
"projectId": 4,
"version": 21
}
}
解决途径:
- 修改soa对接参数,将这个接近随机数的值放到value中
- 对问题接口不进行key的序列化。
AlgorithmMarket服务:
通过修改响应数据,将响应数据的类似随机key放入到list中,解决问题。
AppProphetOnlineService服务:
通过修改soa底层序列化方式,不对key做intern对方式决解问题。
private SoaClient getNonInternSoaClient(String clientName, String group) {
ClientConf clientConf = new ClientConf();
clientConf.setClientName(clientName);
clientConf.setGroup(group);
SoaClient client = CustomizedClientCreator.createSoaClient2(clientConf);
client.start();
return client;
}
public class CustomizedBlockingGrpcClientInvoker extends GrpcClientInvoker {
protected final Serializer serializer2;
public CustomizedBlockingGrpcClientInvoker(ManagedChannel channel, InternalClientInvokerOption option, DynamicConfigHelper helper) {
super(channel, option, helper);
this.serializer2 = new CustomizedJacksonSerializer();
}
public SoaResponse doInvoke(SoaRequest request) throws RpcException {
try {
return this.invokeInternal(request);
} catch (RpcException var3) {
return SoaResponse.buildErrorResponse(var3);
} catch (Throwable var4) {
return SoaResponse.buildErrorResponse(new RpcException(RpcResultStatus.UNKNOWN_ERROR, var4));
}
}
private SoaResponse invokeInternal(SoaRequest soaRequest) throws RpcException {
String requestJson = this.buildRequestJson(soaRequest);
SoaInvokerRequest request = this.buildSoaRequest(soaRequest, requestJson);
SoaInvokerServiceGrpc.SoaInvokerServiceBlockingStub soaInvokerServiceStub = (SoaInvokerServiceGrpc.SoaInvokerServiceBlockingStub) (new BlockingStubBuilder(this.channel, soaRequest)).build();
SoaInvokerResponse soaInvokerResponse = soaInvokerServiceStub.call(request);
checkResponseCode(soaInvokerResponse);
this.helper.getSoaClientLogger().logForResponse(soaRequest, soaInvokerResponse.getResultJson());
Object response = RpcParseUtil.parseResult(this.serializer2, soaInvokerResponse.getResultJson(), soaRequest.getReturnType());
return SoaResponse.buildSuccessResponse(response);
}
}
private static ObjectMapper initObjectMapper() {
JsonFactory jsonFactory=new JsonFactory().disable(JsonFactory.Feature.INTERN_FIELD_NAMES);
SimpleModule module = new SimpleModule();
module.addSerializer(LocalDate.class, new LocalDateSerializer()).addDeserializer(LocalDate.class, new LocalDateDeserializer());
module.addSerializer(LocalDateTime.class, new LocalDateTimeSerializer()).addDeserializer(LocalDateTime.class, new LocalDateTimeDeserializer());
ObjectMapper mapper= (new ObjectMapper(jsonFactory))
.registerModule(module)
.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
.configure(SerializationFeature.WRITE_NULL_MAP_VALUES, false)
.setSerializationInclusion(JsonInclude.Include.NON_NULL);
return mapper;
}
其他坑
fastjson
fastjson 在1.1.24之前对所有的 json 的 key 使用了 intern 方法,缓存到了字符串常量池中,这样每次读取的时候就会非常快,大大减少时间和空间。而且 json 的 key 通常都是不变的。这个地方没有考虑到大量的 json key 如果是变化的,那就会给字符串常量池带来很大的负担。
这个问题 fastjson 在1.1.24版本中已经将这个漏洞修复了。程序加入了一个最大的缓存大小,超过这个大小后就不会再往字符串常量池中放了。
1.1.24版本的com.alibaba.fastjson.parser.SymbolTable#addSymbol() Line:113]代码
public static final int MAX_SIZE = 1024;
if (size >= MAX_SIZE) {return new String(buffer, offset, len);}
这个问题是70w 数据量时候的引发的,如果是几百万的数据量的话可能就不只是30ms 的问题了。因此在使用系统级提供的String#intern方式一定要慎重!
ClassLoader
既然Ext Root Scanning的耗时会导致上述问题,那我们常用的ClassLoader也是根节点,使用的也很频繁,是否会也会导致此类问题?
本地尝试通过代码构建一个不停创建ClassLoade并加载class文件的环境,同时打印gc日志。
测试demo:
MyClassLoader loader = new MyClassLoader(ClassLoadGCTest.class.getClassLoader());
Object targetBiz = loader.loadClass("com.jiaxz.hotload.DiffBizMethod").newInstance();
targetBiz = null;
结果如下:
[Parallel Time: 103.4 ms, GC Workers: 3]
[GC Worker Start (ms): 48087.4 48087.4 48087.4
Min: 48087.4, Avg: 48087.4, Max: 48087.4, Diff: 0.0]
[Ext Root Scanning (ms): 47.0 34.8 47.0
Min: 34.8, Avg: 43.0, Max: 47.0, Diff: 12.2, Sum: 128.9]
[Thread Roots (ms): 0.0 0.1 0.1
Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
[StringTable Roots (ms): 0.0 0.0 0.1
Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Universe Roots (ms): 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[JNI Handles Roots (ms): 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[ObjectSynchronizer Roots (ms): 0.0 0.0 0.2
Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.2]
[FlatProfiler Roots (ms): 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Management Roots (ms): 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[SystemDictionary Roots (ms): 0.0 30.3 0.0
Min: 0.0, Avg: 10.1, Max: 30.3, Diff: 30.3, Sum: 30.3]
[CLDG Roots (ms): 0.6 0.0 0.0
Min: 0.0, Avg: 0.2, Max: 0.6, Diff: 0.6, Sum: 0.6]
[JVMTI Roots (ms): 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[CodeCache Roots (ms): 0.1 0.0 1.2
Min: 0.0, Avg: 0.4, Max: 1.2, Diff: 1.2, Sum: 1.3]
[CM RefProcessor Roots (ms): 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Wait For Strong CLD (ms): 0.0 0.0 0.2
Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.2]
[Weak CLD Roots (ms): 46.5 16.6 46.5
Min: 16.6, Avg: 36.5, Max: 46.5, Diff: 29.9, Sum: 109.5]
[SATB Filtering (ms): 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
由于SystemDictionary是JVM内部的一个关键数据结构,用于存储已加载类的相关信息。所以当类加载器加载越来越多的类,且并未卸载时也会导致Ext Root Scanning增长。
哈啰场景下如何定位解决
在哈啰环境下,若使用了G1GC,同时在SOA出参入参的时候使用了大量不重复的值作为key,即能重现上述场景。可通过以下方式定位并临时解决。
临时定位解决
重写InternCache类
在工程目录下重写InternCache类。注意需要相同包路径。
package com.fasterxml.jackson.core.util;
import java.util.concurrent.ConcurrentHashMap;
public final class InternCache extends ConcurrentHashMap<String, String> {
//定义默认值false
private volatile boolean onOff = false;
static {
//初始化onOff变量
onOff = System.getProperty("onOff") != null;
log.info("onOff:{}", onOff);
}
public String intern(String input) {
//若开启开关,直接返回结果
if(onOff){
return input;
}
String result = (String)this.get(input);
...
}
}
通过重写jackson的InternCache,加入开关控制是否开启序列化。
灰度发布
这里可以将onOff默认值改为true,通过值发布一台观察gc日志。也可以将onOff默认值改成false,发布单台机器后通过debug容器修改环境变量,实现灰度测试。
debug机器环境修改:
- 打开启动脚本:vim init.script
- 增加环境变量:-DonOff=true
- 启动脚本:bash init.script docker_start
查看效果
灰度完之后,可通过gc日志查看ext root scan查看扫描耗时是否缩小。
验证上述方式
已在AppAlgorithmMarket服务验证通过。
debug节点GC:[Ext Root Scanning (ms): Min: 1.2, Avg: 1.5, Max: 2.6, Diff: 1.4, Sum: 11.7]
正常节点GC:[Ext Root Scanning (ms): Min: 51.6, Avg: 51.7, Max: 61.2, Diff: 0.6, Sum: 460.0]
全量修改建议
中间件实现apollo动态变更soa序列化方式,并使用JsonFactory jsonFactory=new JsonFactory() .disable(JsonFactory.Feature.INTERN_FIELD_NAMES)关闭key的缓存。
(本文作者:贾学智)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。