1. 问题

spring-cloud-gateway 作为统一的请求入口,负责转发请求到相应的微服务中去。

采用的 Spring Cloud 的版本为 Finchley SR2。

测试一个接口的性能,发现 tps 只有 1000 req/s 左右就上不去了。

[root@hystrix-dashboard wrk]# wrk -t 10 -c 200 -d 30s --latency -s post-test.lua 'http://10.201.0.28:8888/api/v1/json'
Running 30s test @ http://10.201.0.28:8888/api/v1/json
  10 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   188.34ms  110.13ms   2.00s    78.43%
    Req/Sec   106.95     37.19   333.00     77.38%
  Latency Distribution
     50%  165.43ms
     75%  243.48ms
     90%  319.47ms
     99%  472.64ms
  30717 requests in 30.04s, 7.00MB read
  Socket errors: connect 0, read 0, write 0, timeout 75
Requests/sec:   1022.62
Transfer/sec:    238.68KB

其中 post-test.lua 内容如下:

request = function()
    local headers = {}
    headers["Content-Type"] = "application/json"
    local body = [[{
        "biz_code": "1109000001",
        "channel": "7",
        "param": {
            "custom_id": "ABCD",
            "type": "test",
            "animals": ["cat", "dog", "lion"],
            "retcode": "0"
        }
    }]]
    return wrk.format('POST', nil, headers, body)
end

网关的逻辑是读取请求中 body 的值,根据 biz_code 字段去内存的路由表中匹配路由,然后转发请求到对应的微服务中去。

2. 排查

  1. 测试接口本身的性能:
[root@hystrix-dashboard wrk]# wrk -t 10 -c 200 -d 30s --latency -s post-test.lua 'http://10.201.0.32:8776/eeams-service/api/v1/json'
Running 30s test @ http://10.201.0.32:8776/eeams-service/api/v1/json
  10 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    26.72ms    8.59ms 260.23ms   89.66%
    Req/Sec   752.18    101.46     0.94k    78.67%
  Latency Distribution
     50%   23.52ms
     75%   28.02ms
     90%   35.58ms
     99%   58.25ms
  224693 requests in 30.02s, 50.83MB read
Requests/sec:   7483.88
Transfer/sec:      1.69MB

发现接口的 tps 可以达到 7000+。

  1. 通过 spring-boot-admin 查看网关的 cpu、内存等占用情况,发现都没有用满;查看线程状况,发现 reactor-http-nio 线程组存在阻塞情况。对于响应式编程来说,reactor-http-nio 线程出现阻塞结果是灾难性的。
  2. 通过 jstack 命令分析线程状态,定位阻塞的代码(第 19 行):
"reactor-http-nio-4" #19 daemon prio=5 os_prio=0 tid=0x00007fb784d7f240 nid=0x80b waiting for monitor entry [0x00007fb71befc000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:404)
    - waiting to lock <0x000000008b0cec30> (a java.lang.Object)
    at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:93)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.springframework.util.ClassUtils.forName(ClassUtils.java:282)
    at org.springframework.http.converter.json.Jackson2ObjectMapperBuilder.registerWellKnownModulesIfAvailable(Jackson2ObjectMapperBuilder.java:753)
    at org.springframework.http.converter.json.Jackson2ObjectMapperBuilder.configure(Jackson2ObjectMapperBuilder.java:624)
    at org.springframework.http.converter.json.Jackson2ObjectMapperBuilder.build(Jackson2ObjectMapperBuilder.java:608)
    at org.springframework.http.codec.json.Jackson2JsonEncoder.<init>(Jackson2JsonEncoder.java:54)
    at org.springframework.http.codec.support.AbstractCodecConfigurer$AbstractDefaultCodecs.getJackson2JsonEncoder(AbstractCodecConfigurer.java:177)
    at org.springframework.http.codec.support.DefaultServerCodecConfigurer$ServerDefaultCodecsImpl.getSseEncoder(DefaultServerCodecConfigurer.java:99)
    at org.springframework.http.codec.support.DefaultServerCodecConfigurer$ServerDefaultCodecsImpl.getObjectWriters(DefaultServerCodecConfigurer.java:90)
    at org.springframework.http.codec.support.AbstractCodecConfigurer.getWriters(AbstractCodecConfigurer.java:121)
    at org.springframework.http.codec.support.DefaultServerCodecConfigurer.getWriters(DefaultServerCodecConfigurer.java:39)
    at org.springframework.web.reactive.function.server.DefaultHandlerStrategiesBuilder.build(DefaultHandlerStrategiesBuilder.java:103)
    at org.springframework.web.reactive.function.server.HandlerStrategies.withDefaults(HandlerStrategies.java:90)
    at org.springframework.cloud.gateway.support.DefaultServerRequest.<init>(DefaultServerRequest.java:81)
    at com.glsc.imf.dbg.route.RouteForJsonFilter.filter(RouteForJsonFilter.java:34)
    at org.springframework.cloud.gateway.handler.FilteringWebHandler$DefaultGatewayFilterChain.lambda$filter$0(FilteringWebHandler.java:115)
    at org.springframework.cloud.gateway.handler.FilteringWebHandler$DefaultGatewayFilterChain$$Lambda$800/1871561393.get(Unknown Source)
    at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:44)

最终定位到问题代码为:

DefaultServerRequest req = new DefaultServerRequest(exchange);    // 这行代码存在性能问题
return req.bodyToMono(JSONObject.class).flatMap(body -> {
    ...
});

这里的逻辑是我需要读取请求中 body 的值,并转化为 json,之后根据其中的特定字段去匹配路由,然后进行转发。这里选择了先把 exchange 转化为 DefaultServerRequest,目的是为了使用该类的 bodyToMono 方法,可以方便的进行转换。

3. 解决

改写代码以实现同样的功能:

return exchange.getRequest().getBody().collectList()
        .map(dataBuffers -> {
            ByteBuf byteBuf = Unpooled.buffer();
            dataBuffers.forEach(buffer -> {
                try {
                    byteBuf.writeBytes(IOUtils.toByteArray(buffer.asInputStream()));
                } catch (IOException e) {
                    e.printStackTrace();
                }
            });
            return JSON.parseObject(new String(byteBuf.array()));
        })
    .flatMap(body -> {
        ...
    });

之后进行测试,

[root@hystrix-dashboard wrk]# wrk -t 10 -c 200 -d 30s --latency -s post-test.lua 'http://10.201.0.28:8888/api/v1/json'
Running 30s test @ http://10.201.0.28:8888/api/v1/json
  10 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    48.47ms   45.85ms 325.87ms   88.55%
    Req/Sec   548.13    202.55   760.00     80.01%
  Latency Distribution
     50%   31.18ms
     75%   39.44ms
     90%  112.18ms
     99%  227.19ms
  157593 requests in 30.02s, 35.94MB read
Requests/sec:   5249.27
Transfer/sec:      1.20MB

发现 tps 从 1000 提升到了 5000+,问题解决。


pggsnap
9 声望4 粉丝