1

奇怪现象

你是否遇到过服务内部即使设置了线程超时时间,上游依旧报超时异常?

图片

你是否遇到过下游服务快速响应,当前服务后续方法无耗时逻辑。但偏偏整体执行耗时极长?

图片

你是否遇到过gc耗时规律性增长?却与流量不成比例?

图片

图片

你是否遇到过Eden空间突然降低且维持很久?

图片

图片

背景

那是一个平静的午后,突然的gc耗时告警惊扰了打工人专心编码的心。打开gc监控一看,发现虽然只有单个容器报警,但是大部分gc已经飙升到接近1s了。

图片

将报警机器快速拉出负载后,开始查看分析gc情况。

  • 没有full gc
  • 年轻代频率大幅提高,耗时大幅提高
  • 流量并未有较大起伏
  • 使用的G1GC,设置的最大停顿时间是200ms

头痛医头

既然只是年轻代频率和耗时大幅提高,那不是手到擒来?--增加年轻代空间。

一个更大的年轻代,自然能让gc的频率降低,总吞吐提高。

进入容器,查看堆内存大小。发现服务设置的堆内存为10G,未手动设置Eden大小。

默认情况下年轻代占1/3。理论上不应该在流量没有起伏的情况下gc突然如此频繁。

查看gc日志
发现在某个时间段内,gc频率极高。同时年轻代大幅缩小。

日志内容如下

[Parallel Time: 206.4 ms, GC Workers: 8]
      [GC Worker Start (ms): Min: 3867072696.3, Avg: 3867072696.3, Max: 3867072696.4, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 179.4, Avg: 183.6, Max: 185.9, Diff: 6.5, Sum: 1468.4]
      [Update RS (ms): Min: 2.3, Avg: 2.5, Max: 3.0, Diff: 0.7, Sum: 20.3]
         [Processed Buffers: Min: 56, Avg: 76.2, Max: 100, Diff: 44, Sum: 610]
      [Scan RS (ms): Min: 0.2, Avg: 0.3, Max: 0.4, Diff: 0.2, Sum: 2.2]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 17.6, Avg: 19.7, Max: 24.0, Diff: 6.3, Sum: 157.8]
      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.5]
         [Termination Attempts: Min: 1, Avg: 100.8, Max: 128, Diff: 127, Sum: 806]
      [GC Worker Other (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.8]
      [GC Worker Total (ms): Min: 206.2, Avg: 206.3, Max: 206.4, Diff: 0.2, Sum: 1650.1]
      [GC Worker End (ms): Min: 3867072902.5, Avg: 3867072902.6, Max: 3867072902.6, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.7 ms]
   [Other: 7.7 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 5.1 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 1.0 ms]
   [Eden: 6120.0M(6120.0M)->0.0B(1356.0M) Survivors: 24.0M->24.0M Heap: 8625.7M(10.0G)->2506.0M(10.0G)]

上图可以分析出,Eden大幅缩小。

由于流量不变,young gc会更加频繁,从而大量存活对象不停被复制,最后导致单位时间内有gc耗时大幅提高。

上述猜测合情合理。为了验证想法,通过jvm监控查看分区使用情况。

图片

图片

观察上图,我们发现
图一已申请空间:
年轻代已申请空间快速抖动-->年轻代总空间降至低位-->年轻代总空间保持低位-->年轻代总空间恢复高位。

图二已使用空间:
年轻代正常gc-->随着总空间降低,年轻代可使用空间持续保持低位-->gc次数突增。

既然知道是年轻代原因,那就先着手解决Eden缩小的问题。

当前是G1GC,通过设置参数强制增大年轻代最小浮动空间解决-XX:G1NewSizePercent=5%--->60%。

修改完重启之后,无论gc频率和gc耗时都大幅降低。

整体数据:
图片

单容器数据:
图片

修改完后,无论总gc单次耗时还是gc频率都大幅降低。粗算整体降低至25%。young gc耗时报警消失。

东风又起

在长时间持续观察每个容器后,发现每个容器都迎来gc突刺,观察发现因为都发生了full gc。

图片

但是以经验来看,G1GC有Mix gc机制,即在young gc时,会尝试通过混合gc的模式清理老年代。理论上不太可能产生full gc。那究竟发生了什么?

查看gc日志,发现并未出现关于mix gc的部分。那我们先了解下Mix gc以及触发机制。

Mixed GC触发条件

  • 当达到IHOP阈值,-XX:InitiatingHeapOccupancyPercent(默认45%)时,即,当使用内存占到堆总大小的45%的时候,G1将开始并发标记阶段 + Mixed GC
  • 当全局并发标记结束后,统计出所有可能被回收的垃圾占Heap的比例值,如果超过5%(-XX:G1HeapWastePercent默认值5%),就会触发多轮Mixed GC
  • 调用System.gc,且开启-XX:ExplicitGCInvokesConcurrent

答案呼之欲出,mixed gc的触发阈值默认是45%(8b12版本之前是堆整体使用率,之后是老年代使用空间占比),但是年轻代最小也堆是60%,导致老年代无法达到40%,因此无法触发mixed gc。

于是,修改G1NewSizePercent和InitiatingHeapOccupancyPercent后我们继续观察。

图片

修改不再出现full gc,gc周期也被拉长整体稳定增长。但是峰值gc还是高达1s。

我们继续查看gc日志,尝试继续优化。

精益求精

在仔细查看gc日志后,我发现大量类似以下的gc日志。

144392.185: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 9537847296 bytes, allocation request: 2097168 bytes, threshold: 4831838190 bytes (45.00 %), source: concurrent humongous allocation]

在 G1 中,当对象超过一个 Region 的 50% ,就会判定为大对象,如果一个对象太大,会连续使用多个Humongous区域存放,且占用老年代空间。因此尝试放大G1HeapRegionSize。

正常容器(Regionsize=4M)

图片

图片

debug容器(Regionsize=8M)

图片

图片

可以发现。修改完regionSize后,由于部分的大对象不再是大对象,因此不再进入老年代。从而提高了延长了老年代达到阈值的时间。同时年轻代老年代自动调整导致的毛刺更加平缓。同时根据上图也能发现年轻代整体的可使用空间也远高于小region。

至此,gc优化结束。

但是真的结束了么?为什么Eden会持续会缩小呢?

带着这个问题,开始继续研究。

柳暗花明

这里感谢张承老师的提醒。他提到曾经出现过这样的情况,可以通过-XX:G1LogLevel=finest打印更详细的日志,有可能是StringTable过大引起的扫描耗时,之前出现过类似的情况,原因是序列化问题导致StringTable过大。

扫描耗时!一语惊醒梦中人。

我们知道gc环节中,最关键的无非是根结点扫描和对象复制的过程。而其中对象复制又是耗时较多的。但是,但凡其他某个环节耗时上来了,在遇到设置的200ms的gc耗时时间后,必然意味着留给young gc复制对象的时间的缩短,而复制对象的时间被压缩必然伴随着Eden空间的缩小。至此,对Eden的缩小有了个猜测。这里,我们后续再做验证。

通过参数-XX:G1LogLevel=finest获取日志。

{Heap before GC invocations=214 (full 0):
 garbage-first heap   total 10485760K, used 5631601K [0x0000000540000000, 0x0000000541001400, 0x00000007c0000000)
  region size 16384K, 192 young (3145728K), 16 survivors (262144K)
 Metaspace       used 147267K, capacity 154132K, committed 154240K, reserved 1183744K
  class space    used 17488K, capacity 18983K, committed 19072K, reserved 1048576K
2024-03-06T19:41:01.013+0800: 83631.062: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 201326592 bytes, new threshold 15 (max 15)
- age   1:   15568080 bytes,   15568080 total
- age   2:    6251328 bytes,   21819408 total
- age   3:    9652016 bytes,   31471424 total
- age   4:    7445392 bytes,   38916816 total
- age   5:    7072600 bytes,   45989416 total
- age   6:    6826088 bytes,   52815504 total
- age   7:    5464128 bytes,   58279632 total
- age   8:    5080296 bytes,   63359928 total
- age   9:    5480008 bytes,   68839936 total
- age  10:    6519056 bytes,   75358992 total
- age  11:    7353016 bytes,   82712008 total
- age  12:    6458120 bytes,   89170128 total
- age  13:    4577144 bytes,   93747272 total
- age  14:    5488152 bytes,   99235424 total
- age  15:    5709096 bytes,  104944520 total
 83631.063: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 10026, predicted base time: 181.10 ms, remaining time: 18.90 ms, target pause time: 200.00 ms]
 83631.063: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 176 regions, survivors: 16 regions, predicted young region time: 16.59 ms]
 83631.063: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 176 regions, survivors: 16 regions, old: 0 regions, predicted pause time: 197.68 ms, target pause time: 200.00 ms]
, 0.2069120 secs]
   [Parallel Time: 202.5 ms, GC Workers: 8]
      [GC Worker Start (ms):  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9  83631062.9
       Min: 83631062.9, Avg: 83631062.9, Max: 83631062.9, Diff: 0.1]
      [Ext Root Scanning (ms):  173.3  173.1  173.0  172.3  172.6  173.2  173.1  172.8
       Min: 172.3, Avg: 172.9, Max: 173.3, Diff: 1.0, Sum: 1383.4]
         [Thread Roots (ms):  0.1  0.8  0.8  0.8  0.8  0.7  0.7  0.7
          Min: 0.1, Avg: 0.7, Max: 0.8, Diff: 0.7, Sum: 5.4]
         [StringTable Roots (ms):  187.6  189.9  189.8  188.5  190.1  189.7  189.8  189.8
          Min: 187.6, Avg: 189.4, Max: 190.1, Diff: 2.5, Sum: 1515.0]
         [Universe Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [JNI Handles Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [ObjectSynchronizer Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.2  0.0
          Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.2]
         [FlatProfiler Roots (ms):  0.0  0.1  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
         [Management Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [SystemDictionary Roots (ms):  0.0  0.0  0.0  1.4  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.2, Max: 1.4, Diff: 1.4, Sum: 1.4]
         [CLDG Roots (ms):  2.8  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.4, Max: 2.8, Diff: 2.8, Sum: 2.8]
         [JVMTI Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [CodeCache Roots (ms):  2.1  1.8  1.9  1.8  1.6  2.0  1.8  2.0
          Min: 1.6, Avg: 1.9, Max: 2.1, Diff: 0.5, Sum: 14.9]
         [CM RefProcessor Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Wait For Strong CLD (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Weak CLD Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [SATB Filtering (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Update RS (ms):  2.0  1.8  1.9  1.8  1.5  2.0  1.7  1.9
       Min: 1.5, Avg: 1.8, Max: 2.0, Diff: 0.5, Sum: 14.6]
         [Processed Buffers:  36  55  78  88  78  88  96  44
          Min: 36, Avg: 70.4, Max: 96, Diff: 60, Sum: 563]
      [Scan RS (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
       Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
      [Code Root Scanning (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.1
       Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
      [Object Copy (ms):  26.9  27.3  27.2  28.0  27.9  26.8  27.2  27.3
       Min: 26.8, Avg: 27.3, Max: 28.0, Diff: 1.1, Sum: 218.7]
      [Termination (ms):  0.0  0.0  0.1  0.0  0.1  0.0  0.0  0.0
       Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
         [Termination Attempts:  1  98  101  88  89  67  87  82
          Min: 1, Avg: 76.6, Max: 101, Diff: 100, Sum: 613]
      [GC Worker Other (ms):  0.0  0.0  0.1  0.0  0.2  0.0  0.1  0.0
       Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5]
      [GC Worker Total (ms):  202.2  202.2  202.3  202.2  202.4  202.2  202.2  202.1
       Min: 202.1, Avg: 202.2, Max: 202.4, Diff: 0.2, Sum: 1617.7]
      [GC Worker End (ms):  83631265.1  83631265.1  83631265.1  83631265.1  83631265.3  83631265.1  83631265.1  83631265.1
       Min: 83631265.1, Avg: 83631265.1, Max: 83631265.3, Diff: 0.2]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   [Other: 4.1 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 2.6 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.1 ms]
         [Parallel Redirty:  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Redirtied Cards:  2746  1281  1119  473  463  0  0  0
          Min: 0, Avg: 760.2, Max: 2746, Diff: 2746, Sum: 6082]
      [Humongous Register: 0.0 ms]
         [Humongous Total: 0]
         [Humongous Candidate: 0]
      [Humongous Reclaim: 0.0 ms]
         [Humongous Reclaimed: 0]
      [Free CSet: 0.2 ms]
         [Young Free CSet: 0.2 ms]
         [Non-Young Free CSet: 0.0 ms]
   [Eden: 2816.0M(2816.0M)->0.0B(3088.0M) Survivors: 256.0M->256.0M Heap: 5499.6M(10.0G)->2699.5M(10.0G)]
Heap after GC invocations=215 (full 0):
 garbage-first heap   total 10485760K, used 2764320K [0x0000000540000000, 0x0000000541001400, 0x00000007c0000000)
  region size 16384K, 16 young (262144K), 16 survivors (262144K)
 Metaspace       used 147267K, capacity 154132K, committed 154240K, reserved 1183744K
  class space    used 17488K, capacity 18983K, committed 19072K, reserved 1048576K
}
 [Times: user=1.62 sys=0.00, real=0.20 secs] 
2024-03-06T19:41:01.221+0800: 83631.270: Total time for which application threads were stopped: 0.2095341 seconds, Stopping threads took: 0.0002805 seconds

观察日志,我们发现StringTable的耗时接近200ms。那什么是Stringtable?

其是JVM提供的针对字符串的缓存,有兴趣的同学可以自行查询资料。这里简单说明下,在很多场景中,会有大量重复的字符串,JVM为了节约空间,通过一个stringtale对字符串进行缓存。在使用字面量或者调用String的intern方法后,会询问底层的StringTable中是否存在一样的字符串。如果存在则返回,如果不存在,则放入stringtable。从而实现字符串缓存的功能。

在确认了是StringTable引发的问题后,我们首先来验证一下。

  • 拉出某个Eden已大幅降低的容器,保证线上不受影响。
  • 使用jmap -histoy pid命令强制gc,观察gc效果。

发现full gc后,Stringtable耗时大幅降低,但是eden依旧缩小。和预期不符,难道不是这个原因引起的?

仔细观察full gc后的日志,发现gc的真实消耗的确已经大幅降低。
[Times: user=0.09 sys=0.05, real=0.02 secs]

但是为什么eden还是 4588.0M(4588.0M)->0.0B(2468.0M) ?

2024-03-02T17:45:27.190+0800: 400.411: Application time: 3.1242468 seconds
{Heap before GC invocations=10 (full 0):
 garbage-first heap   total 10485760K, used 2790066K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000)
  region size 4096K, 632 young (2588672K), 18 survivors (73728K)
 Metaspace       used 136316K, capacity 142388K, committed 142592K, reserved 1173504K
  class space    used 16630K, capacity 17875K, committed 17920K, reserved 1048576K
2024-03-02T17:45:27.192+0800: 400.413: [Full GC (Heap Inspection Initiated GC) 2024-03-02T17:45:27.192+0800: 400.413: [Class Histogram (before full gc): 
 num     #instances         #bytes  class name
----------------------------------------------
   1:       7591768     1216542784  [C
   2:        148811      488969976  [B
   3:        205621      209905680  [I
   4:       4033851      129083232  java.util.HashMap$Node
  .....
  .....

2024-03-02T17:50:33.086+0800: 706.307: Application time: 1.2055260 seconds
{Heap before GC invocations=12 (full 2):
 garbage-first heap   total 10485760K, used 4982437K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000)
  region size 4096K, 1147 young (4698112K), 0 survivors (0K)
 Metaspace       used 136951K, capacity 143136K, committed 143360K, reserved 1175552K
  class space    used 16658K, capacity 17919K, committed 17920K, reserved 1048576K
2024-03-02T17:50:33.088+0800: 706.309: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 301989888 bytes, new threshold 15 (max 15)
- age   1:   43347440 bytes,   43347440 total
 706.310: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 11409, predicted base time: 14.25 ms, remaining time: 185.75 ms, target pause time: 200.00 ms]
 706.310: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1147 regions, survivors: 0 regions, predicted young region time: 183.96 ms]
 
 706.310: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1147 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 198.21 ms, target pause time: 200.00 ms]
, 0.0212262 secs]
   [Parallel Time: 16.3 ms, GC Workers: 8]
      [GC Worker Start (ms):  706310.8  706310.8  706310.8  706310.8  706310.8  706310.9  706310.9  706310.9
       Min: 706310.8, Avg: 706310.8, Max: 706310.9, Diff: 0.1]
      [Ext Root Scanning (ms):  1.3  3.1  2.0  1.2  1.2  1.2  1.2  1.2
       Min: 1.2, Avg: 1.5, Max: 3.1, Diff: 1.9, Sum: 12.3]
         [Thread Roots (ms):  0.6  0.1  0.6  0.6  0.6  0.6  0.6  0.6
          Min: 0.1, Avg: 0.5, Max: 0.6, Diff: 0.5, Sum: 4.3]
         [StringTable Roots (ms):  1.8  0.0  0.7  1.9  1.9  1.9  1.8  1.9
          Min: 0.0, Avg: 1.5, Max: 1.9, Diff: 1.9, Sum: 11.8]
         [Universe Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [JNI Handles Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [ObjectSynchronizer Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [FlatProfiler Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.1  0.0
          Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
         [Management Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [SystemDictionary Roots (ms):  0.0  0.0  1.2  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.1, Max: 1.2, Diff: 1.2, Sum: 1.2]
         [CLDG Roots (ms):  0.0  3.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.4, Max: 3.0, Diff: 3.0, Sum: 3.0]
         [JVMTI Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [CodeCache Roots (ms):  1.8  1.2  2.2  1.8  1.8  1.8  1.8  1.8
          Min: 1.2, Avg: 1.8, Max: 2.2, Diff: 1.0, Sum: 14.2]
         [CM RefProcessor Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Wait For Strong CLD (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Weak CLD Roots (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [SATB Filtering (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Update RS (ms):  1.5  1.1  1.5  1.6  1.6  1.5  1.6  1.6
       Min: 1.1, Avg: 1.5, Max: 1.6, Diff: 0.6, Sum: 12.0]
         [Processed Buffers:  171  58  79  117  132  101  91  133
          Min: 58, Avg: 110.2, Max: 171, Diff: 113, Sum: 882]
      [Scan RS (ms):  0.2  0.2  0.2  0.2  0.2  0.2  0.3  0.1
       Min: 0.1, Avg: 0.2, Max: 0.3, Diff: 0.1, Sum: 1.7]
      [Code Root Scanning (ms):  0.0  0.0  0.5  0.0  0.0  0.0  0.0  0.0
       Min: 0.0, Avg: 0.1, Max: 0.5, Diff: 0.5, Sum: 0.5]
      [Object Copy (ms):  12.6  11.3  11.4  12.6  12.6  12.7  12.5  12.6
       Min: 11.3, Avg: 12.3, Max: 12.7, Diff: 1.3, Sum: 98.4]
      [Termination (ms):  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
       Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Termination Attempts:  9  11  9  11  12  11  1  13
          Min: 1, Avg: 9.6, Max: 13, Diff: 12, Sum: 77]
      [GC Worker Other (ms):  0.3  0.0  0.1  0.1  0.1  0.1  0.0  0.0
       Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 0.8]
      [GC Worker Total (ms):  16.0  15.7  15.7  15.7  15.7  15.7  15.6  15.6
       Min: 15.6, Avg: 15.7, Max: 16.0, Diff: 0.4, Sum: 125.7]
      [GC Worker End (ms):  706326.8  706326.5  706326.5  706326.5  706326.5  706326.5  706326.5  706326.5
       Min: 706326.5, Avg: 706326.5, Max: 706326.8, Diff: 0.3]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.6 ms]
   [Other: 4.3 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.4 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.1 ms]
         [Parallel Redirty:  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Redirtied Cards:  1776  898  719  458  0  0  0  0
          Min: 0, Avg: 481.4, Max: 1776, Diff: 1776, Sum: 3851]
      [Humongous Register: 0.1 ms]
         [Humongous Total: 15]
         [Humongous Candidate: 15]
      [Humongous Reclaim: 0.1 ms]
         [Humongous Reclaimed: 15]
      [Free CSet: 0.8 ms]
         [Young Free CSet: 0.8 ms]
         [Non-Young Free CSet: 0.0 ms]
   [Eden: 4588.0M(4588.0M)->0.0B(2468.0M) Survivors: 0.0B->132.0M Heap: 4865.7M(10.0G)->372.6M(10.0G)]
Heap after GC invocations=13 (full 2):
 garbage-first heap   total 10485760K, used 381544K [0x0000000540800000, 0x0000000540c05000, 0x00000007c0800000)
  region size 4096K, 33 young (135168K), 33 survivors (135168K)
 Metaspace       used 136951K, capacity 143136K, committed 143360K, reserved 1175552K
  class space    used 16658K, capacity 17919K, committed 17920K, reserved 1048576K
}
 [Times: user=0.09 sys=0.05, real=0.02 secs] 
 2024-03-02T17:50:33.110+0800: 706.331: Total time for which application threads were stopped: 0.0240503 seconds, Stopping threads took: 0.0002914 seconds

继续查看日志,发现真实耗时与预估耗时相差极大。

706.310: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1147 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 198.21 ms, target pause time: 200.00 ms]
, 0.0212262 secs]
   [Parallel Time: 16.3 ms, GC Workers: 8]
      [GC Worker Start (ms):  706310.8  706310.8  706310.8  706310.8  706310.8  706310.9  706310.9  706310.9
       Min: 706310.8, Avg: 706310.8, Max: 706310.9, Diff: 0.1]
      [Ext Root Scanning (ms):  1.3  3.1  2.0  1.2  1.2  1.2  1.2  1.2
       Min: 1.2, Avg: 1.5, Max: 3.1, Diff: 1.9, Sum: 12.3]
         [Thread Roots (ms):  0.6  0.1  0.6  0.6  0.6  0.6  0.6  0.6
          Min: 0.1, Avg: 0.5, Max: 0.6, Diff: 0.5, Sum: 4.3]
         [StringTable Roots (ms):  1.8  0.0  0.7  1.9  1.9  1.9  1.8  1.9
        
 [Times: user=0.09 sys=0.05, real=0.02 secs] 
 2024-03-02T17:50:33.110+0800: 706.331: Total time for which application threads were stopped: 0.0240503 seconds, Stopping threads took: 0.0002914 seconds

因此猜测因为是预估,而预估是对存量历史数据的计算,因此gc的自动调整机制还未跟上。

等待观察结果,大约2分钟后预估时间持续逐渐减小,并接近真实耗时,Eden空间持续增大。

效果图:

图片

图片

至此我们再次确认是stringtable引起的问题。

Stringtable为什么这么大?
要研究Stringtable为什么这么大,我们首先要知道Stringtable是如何产生的。

Stringtable里的对象是可以根据字面量和intern产生。字面量没问题,不可能产生大量的stringtable引用,剩下的就是intern。那怎么判断哪里大幅调用了?

  • arthas:但是arthas的实现原理是修改字节码插桩。string是常量类,无法实现监控调用。
  • 全局搜索intern方法调用:但是底层存在不少调用代码,难以排查。
  • 查看内存dump文件:从上面内容可知,young gc无法阻挡stringtable的增长,且full gc可以。因此在young gc结束之后进行dump,确保留下大量所需的数据。

在分析中间部分的string对象后,发现大量长度相等字母+数字的的字符串。猜测是罪魁祸首,复制后在日志中搜索,发现是下游响应的key。

结合网上文章与上述现象,我们初步判断可能就是【序列化key+随机key】导致的Stringtable逐渐增大。

在通过本地debug+底层代码分析后,终于发现jackson中调用intern的地方。

package com.fasterxml.jackson.core.util;

public final class InternCache extends ConcurrentHashMap<String,String>{
    public String intern(String input) {
        String result = get(input);
        if (result != null) { return result; }
    
        if (size() >= MAX_ENTRIES) {
            synchronized (lock) {
                if (size() >= MAX_ENTRIES) {
                    clear();
                }
            }
        }
        result = input.intern();
        put(result, result);
        return result;
    }
{

再往上翻源码,可以发现是哈啰的soa中间件在解析响应值时,是通过jackson的序列化完成的。

package com.hellobike.soa.rpc.utils;

public class RpcParseUtil {

    public static <T> T parseResult(Serializer serializer,
                                    String resultJson, Type returnType) {
        if (resultJson == null) {
            return null;
        }

        if (returnType == void.class || returnType == Void.class || returnType == null) {
            return null;
        }

        return (T) serializer.decode(new StringWrapperByteBuf(resultJson), returnType, null);
    }
}

为什么jackson要这么做?
因为高并发接口场景下,会产生大量的序列化对象。而一般情况下,大量序列化对象的key都是固定不变的。如果不对key做缓存,那么绝大部分场景下,这些key都会成为重复的String对象存储在堆内存中。造成堆内存浪费。

而jackson作为序列化工具,一定是要考虑性能问题的。因此对这种重复的string对象做intern,实现字符串缓存。从而降低内存浪费。

至此,原因分析完毕。目前后续有两种解决途径

  • 通知下游修改响应值,通过固定key的值避免序列化
  • 通过修改此接口的序列化方式,关闭key缓存功能实现对随机key的缓存。

最后由于场景和配合问题, 在不同的服务分别采用了两种方式。

溯本求源

上述排查过程大量环节是由于巧合发现的排查线索,当遇到其他场景我们应该如何排查呢?

  • 通过-XX:G1LogLevel=finest打印更详细的日志,确认gc耗时原因
  • 如果是stringtable增大引起,可以通过以下几种方式获取
  • 通过jdk的sa-jdi工具中的sun.jvm.hotspot.memory.StringTablel类,尝试获取内存中stringtable引用的对象
  • 通过内存分析
  • visualvm的Oql语句【select u from java.lang.String u where count(referrers(u)) == 0】分析young gc后依旧死亡的内存对象
  • 通过mat或visualvm分析类似对象的引用关系查看周边string对象分析所属环节

图片

图片

确认是被ItemBean引用。

解决途径

既然知道原因了,那解决起来就比较方便了。

由于已经确认我们的场景是soa响应的值的key是一个接近随机数的值(可能并非随机数,只是几乎不重样),那接下来就有两条路径能解决这个问题。

问题数据格式

{
  "msg": "ok",
  "code": 0,
  "data": {
    "outputs": {
      "resultMap": {
        "d32d56e8a1b34457af26d94a02c16cba": 0.32004153617699793,
        "5bc9df88acea4da8bc829de13aa3afb2": 0.434173387080695,
        "4900533c94bc443ea086de0936bfb59e": 0.5555319629335832,
        "575360f1d0544f95b4c430dd8759839a": 0.908782703771497,
        "00fba5cbd24749999dd719f9ec338f4b": 0.5919287549198423,
        "1b7d5916f0454847a8ab90ce2a49e3a1": 0.5869043763908277,
        "22d5a24229374a4b9d04c9bf371e7eb7": 0.09601455380787982,
        "5a35beafa5d8492f85e5aa6ca3223129": 0.5386071460222013,
        "61488dab533b4ec4ba6ae1ea61802c9d": 0.22098097729212451,
        "0aaebddbe3e44677a47f5f4becb6ea51": 0.11469034353064256,
        "21d6c73689514d00ae72b5eb7d1a1785": 0.7861965800121863,
        "3709b11d98194297b788e6068c14760f": 0.8005589118418589,
        "cb1b093134d049f9bebc49e8cc3d791d": 0.22801438927340578,
        "6718a455bcf1416fb1839c79254c4d7c": 0.936704202541788,
        "1200f2d71e5944fcbd645d7084cfbfe8": 0.43563843328912855,
        "6f9ebebbeec247119a3bcf2b7c52b76f": 0.9499259171150704,
        "cea737f0c1cb4be0969d692196287b61": 0.6442754643503128,
        "a1da06ba41d74faf9c88374bf302e578": 0.46847853036230913,
        "b5f1177c0bfa4f67a69d67606665a87b": 0.19847468498415466,
        "efe70c3f037e44cb995dd8a6a6aff8f1": 0.6239117074575771
      }
    },
    "code": 0,
    "cost": 43,
    "seqNo": "1097369524402581504",
    "extraOutputs": {
      "traceId": "96939a6e0c65b3f4e6ebadcaa5e4c4e8",
      "decision_version": 21
    },
    "projectId": 4,
    "version": 21
  }
}

解决途径:

  • 修改soa对接参数,将这个接近随机数的值放到value中
  • 对问题接口不进行key的序列化。

AlgorithmMarket服务:
通过修改响应数据,将响应数据的类似随机key放入到list中,解决问题。

AppProphetOnlineService服务:
通过修改soa底层序列化方式,不对key做intern对方式决解问题。

private SoaClient getNonInternSoaClient(String clientName, String group) {
    ClientConf clientConf = new ClientConf();
    clientConf.setClientName(clientName);
    clientConf.setGroup(group);
    SoaClient client = CustomizedClientCreator.createSoaClient2(clientConf);
    client.start();
    return client;
}
public class CustomizedBlockingGrpcClientInvoker extends GrpcClientInvoker {

    protected final Serializer serializer2;


    public CustomizedBlockingGrpcClientInvoker(ManagedChannel channel, InternalClientInvokerOption option, DynamicConfigHelper helper) {
        super(channel, option, helper);
        this.serializer2 = new CustomizedJacksonSerializer();
    }

    public SoaResponse doInvoke(SoaRequest request) throws RpcException {
        try {
            return this.invokeInternal(request);
        } catch (RpcException var3) {
            return SoaResponse.buildErrorResponse(var3);
        } catch (Throwable var4) {
            return SoaResponse.buildErrorResponse(new RpcException(RpcResultStatus.UNKNOWN_ERROR, var4));
        }
    }

    private SoaResponse invokeInternal(SoaRequest soaRequest) throws RpcException {
        String requestJson = this.buildRequestJson(soaRequest);
        SoaInvokerRequest request = this.buildSoaRequest(soaRequest, requestJson);
        SoaInvokerServiceGrpc.SoaInvokerServiceBlockingStub soaInvokerServiceStub = (SoaInvokerServiceGrpc.SoaInvokerServiceBlockingStub) (new BlockingStubBuilder(this.channel, soaRequest)).build();
        SoaInvokerResponse soaInvokerResponse = soaInvokerServiceStub.call(request);
        checkResponseCode(soaInvokerResponse);
        this.helper.getSoaClientLogger().logForResponse(soaRequest, soaInvokerResponse.getResultJson());

        Object response = RpcParseUtil.parseResult(this.serializer2, soaInvokerResponse.getResultJson(), soaRequest.getReturnType());
        return SoaResponse.buildSuccessResponse(response);
    }


}
 private static ObjectMapper initObjectMapper() {
        JsonFactory jsonFactory=new JsonFactory().disable(JsonFactory.Feature.INTERN_FIELD_NAMES);
        SimpleModule module = new SimpleModule();
        module.addSerializer(LocalDate.class, new LocalDateSerializer()).addDeserializer(LocalDate.class, new LocalDateDeserializer());
        module.addSerializer(LocalDateTime.class, new LocalDateTimeSerializer()).addDeserializer(LocalDateTime.class, new LocalDateTimeDeserializer());
        ObjectMapper mapper= (new ObjectMapper(jsonFactory))
                .registerModule(module)
                .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
                .configure(SerializationFeature.WRITE_NULL_MAP_VALUES, false)
                .setSerializationInclusion(JsonInclude.Include.NON_NULL);
        return mapper;
    }

其他坑

fastjson

fastjson 在1.1.24之前对所有的 json 的 key 使用了 intern 方法,缓存到了字符串常量池中,这样每次读取的时候就会非常快,大大减少时间和空间。而且 json 的 key 通常都是不变的。这个地方没有考虑到大量的 json key 如果是变化的,那就会给字符串常量池带来很大的负担。

这个问题 fastjson 在1.1.24版本中已经将这个漏洞修复了。程序加入了一个最大的缓存大小,超过这个大小后就不会再往字符串常量池中放了。

1.1.24版本的com.alibaba.fastjson.parser.SymbolTable#addSymbol() Line:113]代码

public static final int MAX_SIZE = 1024;
if (size >= MAX_SIZE) {return new String(buffer, offset, len);}

这个问题是70w 数据量时候的引发的,如果是几百万的数据量的话可能就不只是30ms 的问题了。因此在使用系统级提供的String#intern方式一定要慎重!

ClassLoader

既然Ext Root Scanning的耗时会导致上述问题,那我们常用的ClassLoader也是根节点,使用的也很频繁,是否会也会导致此类问题?

本地尝试通过代码构建一个不停创建ClassLoade并加载class文件的环境,同时打印gc日志。

测试demo:

MyClassLoader loader = new MyClassLoader(ClassLoadGCTest.class.getClassLoader());
Object targetBiz = loader.loadClass("com.jiaxz.hotload.DiffBizMethod").newInstance();
targetBiz = null;

结果如下:

[Parallel Time: 103.4 ms, GC Workers: 3]
      [GC Worker Start (ms):  48087.4  48087.4  48087.4
       Min: 48087.4, Avg: 48087.4, Max: 48087.4, Diff: 0.0]
      [Ext Root Scanning (ms):  47.0  34.8  47.0
       Min: 34.8, Avg: 43.0, Max: 47.0, Diff: 12.2, Sum: 128.9]
         [Thread Roots (ms):  0.0  0.1  0.1
          Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
         [StringTable Roots (ms):  0.0  0.0  0.1
          Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
         [Universe Roots (ms):  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [JNI Handles Roots (ms):  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [ObjectSynchronizer Roots (ms):  0.0  0.0  0.2
          Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.2]
         [FlatProfiler Roots (ms):  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Management Roots (ms):  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [SystemDictionary Roots (ms):  0.0  30.3  0.0
          Min: 0.0, Avg: 10.1, Max: 30.3, Diff: 30.3, Sum: 30.3]
         [CLDG Roots (ms):  0.6  0.0  0.0
          Min: 0.0, Avg: 0.2, Max: 0.6, Diff: 0.6, Sum: 0.6]
         [JVMTI Roots (ms):  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [CodeCache Roots (ms):  0.1  0.0  1.2
          Min: 0.0, Avg: 0.4, Max: 1.2, Diff: 1.2, Sum: 1.3]
         [CM RefProcessor Roots (ms):  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Wait For Strong CLD (ms):  0.0  0.0  0.2
          Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.2]
         [Weak CLD Roots (ms):  46.5  16.6  46.5
          Min: 16.6, Avg: 36.5, Max: 46.5, Diff: 29.9, Sum: 109.5]
         [SATB Filtering (ms):  0.0  0.0  0.0
          Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]

由于SystemDictionary是JVM内部的一个关键数据结构,用于存储已加载类的相关信息。所以当类加载器加载越来越多的类,且并未卸载时也会导致Ext Root Scanning增长。

哈啰场景下如何定位解决

在哈啰环境下,若使用了G1GC,同时在SOA出参入参的时候使用了大量不重复的值作为key,即能重现上述场景。可通过以下方式定位并临时解决。

临时定位解决

重写InternCache类

在工程目录下重写InternCache类。注意需要相同包路径。

package com.fasterxml.jackson.core.util;

import java.util.concurrent.ConcurrentHashMap;

public final class InternCache extends ConcurrentHashMap<String, String> {

    //定义默认值false
    private volatile boolean onOff = false;
    
    static {
        //初始化onOff变量
        onOff = System.getProperty("onOff") != null;
        log.info("onOff:{}", onOff);
    }

    public String intern(String input) {
        //若开启开关,直接返回结果
        if(onOff){
            return input;
        }
        String result = (String)this.get(input);
        ...
    }
}

通过重写jackson的InternCache,加入开关控制是否开启序列化。

灰度发布

这里可以将onOff默认值改为true,通过值发布一台观察gc日志。也可以将onOff默认值改成false,发布单台机器后通过debug容器修改环境变量,实现灰度测试。

debug机器环境修改:

  • 打开启动脚本:vim init.script
  • 增加环境变量:-DonOff=true
  • 启动脚本:bash init.script docker_start

查看效果

灰度完之后,可通过gc日志查看ext root scan查看扫描耗时是否缩小。

验证上述方式

已在AppAlgorithmMarket服务验证通过。
debug节点GC:[Ext Root Scanning (ms): Min: 1.2, Avg: 1.5, Max: 2.6, Diff: 1.4, Sum: 11.7]
正常节点GC:[Ext Root Scanning (ms): Min: 51.6, Avg: 51.7, Max: 61.2, Diff: 0.6, Sum: 460.0]

全量修改建议

中间件实现apollo动态变更soa序列化方式,并使用JsonFactory jsonFactory=new JsonFactory() .disable(JsonFactory.Feature.INTERN_FIELD_NAMES)关闭key的缓存。

(本文作者:贾学智)

图片


哈啰技术
89 声望53 粉丝

哈啰官方技术号,不定期分享哈啰的相关技术产出。