Super awesome performance monitoring artifact! Quickly locate online problems

What I want to summarize today is the testing tool JMH and the Java runtime monitoring tool Arthas used in my recent work. They also helped me a lot in my actual work. So here are some tips on how to use these tools. At the same time, deepen your familiarity with these tools. For these two tools, I will first briefly introduce the general usage scenarios of these tools, and then use a performance troubleshooting that I really encounter in my work as an example to explain in detail the actual usage of these two tools. Not much to say, straight to the topic.

Problem Description

In order to allow my later examples to run through the use of these two tools, I first briefly describe the actual performance problems we encountered in the development. Then we will introduce the actual use of these two performance tools to see how we can use these two tools to successfully locate the performance bottleneck.

The problem is as follows: In order to be able to support the loss rate, we revised the original log4j2 Async+custom Appender method, and put the asynchronous logic in the revised Appender. However, we found that the modified log performance is much lower than the previous Async+custom Appender method. For privacy reasons, I didn't use the actual company example, here I used another way that can also reflect the problem. We will not give a specific configuration file for the time being, but first give the performance test code and results.

Code

package com.bryantchang.appendertest;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class AppenderTest {

    private static final String LOGGER_NAME_DEFAULT = "defaultLogger";
    private static final String LOGGER_NAME_INCLUDE = "includeLocationLogger";
    private static final Logger LOGGER = LoggerFactory.getLogger(LOGGER_NAME_INCLUDE);
    public static final long BATCH = 10000;

    public static void main(String[] args) throws InterruptedException {
        while(true) {
            long start, end;
            start = System.currentTimeMillis();
            for (int i = 0; i < BATCH; i++) {
                LOGGER.info("msg is {}", i);
            }
            end = System.currentTimeMillis();
            System.out.println("duration of " + LOGGER_NAME_INCLUDE +  " is " + (end - start) + "ms");
            Thread.sleep(1000);
        }
    }
}

The code logic is extremely simple, that is, calling logger.info to print 10,000 logs each time, and then recording time-consuming. The comparison between the two is as follows. Java technology resource sharing (including Java advanced programming, architects, SSM,

From these two pictures, we can see the same logic. The time-consuming gap between the two programs is dozens of times different. But looking at the pictures, it seems that only the logger name is different. Analyzing the above experimental results, we may have two questions

The above code test is standard and normative
If it is really a performance problem, then which method of these two codes has such a big gap that led to the final performance difference

The following two tools will answer these two questions separately

Introduction to JMH

The first question is whether the test method is standard. When we write the code, we use the method of infinite loop + "stopwatch" before and after. If we want to add a multi-threaded test, we also need to build a thread pool, in addition to the logic of the code itself, we must also care about the logic of the test. We will wonder if there is a tool that can completely liberate us from the test logic and only need to care about the code logic we need to test. JMH is such a Java testing framework. The following is the official definition of JMH

JMH 是一个面向 Java 语言或者其他 Java 虚拟机语言的性能基准测试框架

What we need to pay attention to here is that the method tested by JMH is as simple as possible, and with less dependencies, the most suitable scenario is to test the performance of two sets of put and get, such as the comparison of ArrayList and LinkedList. Here we need The test is the time required to print a batch of logs in batches, which is basically in line with the test scenario using JMH. Below is the test code, the bench framework code and the main function.

Method to be tested

public class LogBenchMarkWorker {

    private LogBenchMarkWorker() {}

    private static class LogBenchMarkWorkerInstance {
        private static final LogBenchMarkWorker instance = new LogBenchMarkWorker();
    }

    public static LogBenchMarkWorker getInstance() {
        return LogBenchMarkWorkerInstance.instance;
    }

    private static final String LOGGER_DEFAULT_NAME = "defaultLogger";
    private static final String LOGGER_INCLUDE_LOCATION = "includeLocationLogger";
    private static final Logger LOGGER = LoggerFactory.getLogger(LOGGER_DEFAULT_NAME);
    private static long BATCH_SIZE = 10000;

    public void logBatch() {
        for (int i = 0; i < BATCH_SIZE; i++) {
            LOGGER.info("msg is {}", i);
        }
    }
}

It can be seen that the method to be tested is very simple, that is, the operation of printing 10,000 logs at a time in a single batch, so there is no need for additional explanation. Let's look at the benchmark section again.

public class LogBenchMarkMain {

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1)
    @Threads(1)
    public void testLog1() {
        LogBenchMarkWorker.getInstance().logBatch();
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1)
    @Threads(4)
    public void testLog4() {
        LogBenchMarkWorker.getInstance().logBatch();
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1)
    @Threads(8)
    public void testLog8() {
        LogBenchMarkWorker.getInstance().logBatch();
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(value = 1)
    @Threads(16)
    public void testLog16() {
        LogBenchMarkWorker.getInstance().logBatch();
    }

In this code, we will find that there are some unique things in JMH, I will briefly introduce them below.

Benchmark注解：标识在某个具体方法上，表示这个方法将是一个被测试的最小方法，在JMH中成为一个OPS
BenmarkMode:测试类型，JMH提供了几种不同的Mode
    Throughput:整体吞吐量
    AverageTime:调用的平均时间，每次OPS执行的时间
    SampleTime:取样，给出不同比例的ops时间，例如99%的ops时间，99.99%的ops时间
fork：fork的次数，如果设置为2，JMH会fork出两个进程来测试
Threads：很容易理解，这个参数表示这个方法同时被多少个线程执行

In the above code, I defined 4 methods to be tested. Fork and BenchmarkMode of the method are the average time of a single OPS test, but the thread counts of the 4 methods are different. In addition to these parameters, there are a few more parameters, I will talk about it in the main function, the main code is as follows. Java technology resource sharing (including Java advanced programming, architects, SSM,

public class Main {
    public static void main(String[] args) throws RunnerException {
        Options options = new OptionsBuilder()
                .include(LogBenchMarkMain.class.getSimpleName())
                .warmupIterations(5)
                .measurementIterations(5)
                .output("logs/BenchmarkCommon.log")
                .build();
        new Runner(options).run();
    }
}

We can see that in the main function, we just want to set the basic configuration of JMH. The meanings of several configuration parameters are as follows:

include:benchmark所在类的名字，可以使用正则表达
warmupIteration:预热的迭代次数，这里为什么要预热的原因是由于JIT的存在，随着代码的运行，会动态对代码的运行进行优化。因此在测试过程中需要先预热几轮，让代码运行稳定后再实际进行测试
measurementIterations:实际测试轮次
output:测试报告输出位置

I run the test with two loggers separately to view the comparison of performance test reports

Use normal logger

LogBenchMarkMain.testLog1   avgt    5  0.006 ± 0.001   s/op
LogBenchMarkMain.testLog16  avgt    5  0.062 ± 0.026   s/op
LogBenchMarkMain.testLog4   avgt    5  0.006 ± 0.002   s/op
LogBenchMarkMain.testLog8   avgt    5  0.040 ± 0.004   s/op

Logger using INCLUDE_LOCATION

LogBenchMarkMain.testLog1   avgt    5  0.379 ± 0.007   s/op
LogBenchMarkMain.testLog16  avgt    5  1.784 ± 0.670   s/op
LogBenchMarkMain.testLog4   avgt    5  0.378 ± 0.003   s/op
LogBenchMarkMain.testLog8   avgt    5  0.776 ± 0.070   s/op

Here we see that the performance gap is emerging. The performance of using INCLUDE_LOCATION is significantly lower than that of using ordinary loggers. This is we must be very curious, and ask "Where is the slow?" !

Arthas What exactly does my code do at runtime

Arthas is an open source java debugging artifact of Ali, similar to greys, but has more powerful functions than greys, for example, it can directly draw flame graphs of java method calls. The principle of these two tools is to use Java's powerful bytecode technology. After all, I am not a JVM boss, so the specific implementation details cannot be expanded. What we have to do is to stand on the shoulders of giants, accept and use these useful performance monitoring tools proficiently.

Actual operation

talk is cheap, show me your code, since it is a tool, we directly carry out the actual operation. We run the program we started on this machine, and then explain how to use arthas.

We first find the process number of the program through jps, and then directly use the as.sh given by arthas to enhance the bytecode of the program we are running, and then start arthas, the command is as follows

./as.sh pid

As you can see, arthas supports viewing the current jvm status, viewing the current thread status, monitoring the call time of certain methods, trace, profile generating flame graphs, etc., and the functions are all available. We only include a few more commonly used commands here. If you are interested in other commands, please refer to the official website of arthas. This article mainly introduces the following functions

Decompile code
Monitor the call of a method
View the call and return value of a method
trace a method

Monitoring method call

This main command is monitor. According to the introduction on the official website, the commonly used method is

monitor -c duration className methodName

The duration represents the statistical results displayed every few seconds, that is, a single statistical period, className is the fully qualified name of the class, and methodname is the name of the method. The method we look at here is the info method of the Logger class. Two different logger info methods. The class here is org.slf4j.Logger, the method is info, and our monitoring statement is

monitor -c 5 org.slf4j.Logger info

The monitoring results are as follows

Use ordinary appender

We can see that the log printing method using include appeder is 3 times higher than that of the ordinary appender, which makes us wonder how the steps of these two methods are time-consuming. Below we introduce the second command, the trace method.

trace command & jad command

The log4j2 configuration files of these two programs are as follows

<?xml version="1.0" encoding="UTF-8"?>
<!--status：日志等级   monitorInterval：更新配置文件的时间间隔，单位秒-->
<configuration status="warn" monitorInterval="30">
    <!--定义所有的appender -->
    <appenders>
        <!--这个输出控制台的配置 -->
        <Console name="console" target="SYSTEM_OUT">
            <!--控制台只输出level及以上级别的信息（onMatch），其他的直接拒绝（onMismatch） -->
            <ThresholdFilter level="warn" onMatch="ACCEPT" onMismatch="DENY"/>
            <!--日志打印格式 -->
            <PatternLayout pattern="[%d{HH:mm:ss.SSS}] [%-5p] %l - %m%n"/>
        </Console>


        <Async name="AsyncDefault" blocking="false" includeLocation="false">
            <AppenderRef ref="fileAppender"/>
        </Async>

        <Async name="AsyncIncludeLocation" blocking="false" includeLocation="true">
            <AppenderRef ref="fileAppender"/>
        </Async>

        <!--文件会打印出所有信息，这个log每次运行程序会自动清空，由append属性决定，这个也挺有用的，适合临时测试用 -->
        <!--append为TRUE表示消息增加到指定文件中，false表示消息覆盖指定的文件内容，默认值是true -->
        <File name="fileAppender" fileName="log/test.log" append="false">
            <PatternLayout pattern="[%d{HH:mm:ss.SSS}] [%-5p] %l - %m%n"/>
        </File>

        <!--添加过滤器ThresholdFilter,可以有选择的输出某个级别以上的类别  onMatch="ACCEPT" onMismatch="DENY"意思是匹配就接受,否则直接拒绝  -->
        <File name="ERROR" fileName="logs/error.log">
            <ThresholdFilter level="error" onMatch="ACCEPT" onMismatch="DENY"/>
            <PatternLayout pattern="[%d{yyyy.MM.dd 'at' HH:mm:ss z}] [%-5p] %l - %m%n"/>
        </File>

        <!--这个会打印出所有的信息，每次大小超过size，则这size大小的日志会自动存入按年份-月份建立的文件夹下面并进行压缩，作为存档 -->
        <RollingFile name="rollingFile" fileName="logs/app.log"
                     filePattern="logs/$${date:yyyy-MM}/web-%d{yyyy-MM-dd}.log.gz">
            <PatternLayout pattern="[%d{yyyy-MM-dd HH:mm:ss}] [%-5p] %l - %m%n"/>
            <Policies>
                <TimeBasedTriggeringPolicy modulate="true" interval="1"/>
            </Policies>
            <DefaultRolloverStrategy>
                <Delete basePath="logs" maxDepth="2">
                    <IfFileName glob="*/*.log.gz" />
                    <IfLastModified age="7d" />
                </Delete>
            </DefaultRolloverStrategy>
        </RollingFile>
    </appenders>

    <!--然后定义logger，只有定义了logger并引入的appender，appender才会生效 -->
    <loggers>
        <!--过滤掉spring和mybatis的一些无用的DEBUG信息-->
        <logger name="defaultLogger" additivity="false">
            <appender-ref ref="AsyncDefault"></appender-ref>
        </logger>

        <logger name="includeLocationLogger" additivity="false">
            <appender-ref ref="AsyncIncludeLocation"></appender-ref>
        </logger>

        <!--建立一个默认的root的logger -->
        <root level="INFO">

        </root>
    </loggers>

</configuration>

We all use an AsyncAppender and apply a FileAppender. Looking at fileAppender, I found that the two are the same and there is no difference. Only one option in asyncAppender is different, which is includeLocation, one is false, and the other is true. As for the meaning of this parameter, we will not discuss it for the time being. We now know that the problem may lie in AsyncAppender, but how do we verify it. The trace command is very useful.

The basic usage of the trace command is similar to that of monitor, one of the main parameters is -n, which means how many times to trace

trace -n trace_times className methodName

I mentioned in the previous Log4j2 related blog that the core method of any appender is his append method. So we trace the append methods of the two programs separately.

trace -n 5 org.apache.logging.log4j.core.appender.AsyncAppender append

The trace result is as follows

Use ordinary appender

We can immediately find that the hot methods of the two traces are different. Among the appenders that use include, the longest method is the createMemento method in the org.apache.logging.log4j.core.impl.Log4jLogEvent class, so how can it be done? If you know what this method does, please come up with our next commonly used command jad, which can decompile the code of the corresponding method. Here we jad the createMemento method mentioned above, the command is very simple

jad org.apache.logging.log4j.core.impl.Log4jLogEvent createMemento

The result is as follows

watch command

The watch command can observe the input parameters, return value and other information of a specific method. We use this command to check the input parameters of the createMemento method. If the input parameters of the two programs are different, it can basically be concluded that this is the reason for the command as follows

watch org.apache.logging.log4j.core.impl.Log4jLogEvent createMemento "params" -x 2 -n 5 -b -f

The meaning of the parameters here are as follows

-x parameter expansion level
-n number of executions
-b View the state before the method is called
-f after method call

Among them, param represents the call parameter list of the view method, and there are other monitoring items detailed in the official website.

The final watch result is as follows

Use normal logger

Sure enough, these two parameters are true and false. Let's take a brief look at how this parameter is passed in. Let's jad the append method of AsyncAppender.

But to find out, I still followed this code statically

This includeLocation will be used in the createMemento of the event, and a LogEventProxy will be created when the object is serialized. The code is as follows

public LogEventProxy(final LogEvent event, final boolean includeLocation) {
    this.loggerFQCN = event.getLoggerFqcn();
    this.marker = event.getMarker();
    this.level = event.getLevel();
    this.loggerName = event.getLoggerName();

    final Message msg = event.getMessage();
    this.message = msg instanceof ReusableMessage
            ? memento((ReusableMessage) msg)
            : msg;
    this.timeMillis = event.getTimeMillis();
    this.thrown = event.getThrown();
    this.thrownProxy = event.getThrownProxy();
    this.contextData = memento(event.getContextData());
    this.contextStack = event.getContextStack();
    this.source = includeLocation ? event.getSource() : null;
    this.threadId = event.getThreadId();
    this.threadName = event.getThreadName();
    this.threadPriority = event.getThreadPriority();
    this.isLocationRequired = includeLocation;
    this.isEndOfBatch = event.isEndOfBatch();
    this.nanoTime = event.getNanoTime();
}

If includeLocation is true, then the getSource function will be called and follow up to check, the code is as follows

public StackTraceElement getSource() {
        if (source != null) {
            return source;
        }
        if (loggerFqcn == null || !includeLocation) {
            return null;
        }
        source = Log4jLogEvent.calcLocation(loggerFqcn);
        return source;
    }
     public static StackTraceElement calcLocation(final String fqcnOfLogger) {
        if (fqcnOfLogger == null) {
            return null;
        }
        // LOG4J2-1029 new Throwable().getStackTrace is faster than Thread.currentThread().getStackTrace().
        final StackTraceElement[] stackTrace = new Throwable().getStackTrace();
        StackTraceElement last = null;
        for (int i = stackTrace.length - 1; i > 0; i--) {
            final String className = stackTrace[i].getClassName();
            if (fqcnOfLogger.equals(className)) {
                return last;
            }
            last = stackTrace[i];
        }
        return null;
    }

We see that he will look for the line of code that calls this method from the entire call stack, and its performance can be imagined. Let's use arthas to monitor and verify.

First, we trace crateMemento method

trace -n 5 org.apache.logging.log4j.core.impl.Log4jLogEvent createMemento

trace -n 5 org.apache.logging.log4j.core.impl.Log4jLogEvent serialize

trace -n 5 org.apache.logging.log4j.core.impl.Log4jLogEvent$LogEventProxy <init>

trace -n 5 trace -n 5 org.apache.logging.log4j.core.LogEvent getSource

So far we have identified an online performance problem by combining JMH and arthas. But what I introduced is just the tip of the iceberg. More commonly used commands are hoped that you can learn and practice them yourself through the official website. After a few hands-on practice, this tool will become familiar.

Source | bryantchang.github.io/2019/12/08/java-profile-tools/

Super awesome performance monitoring artifact! Quickly locate online problems

Problem Description

Introduction to JMH

Method to be tested

Use normal logger

Logger using INCLUDE_LOCATION

Arthas What exactly does my code do at runtime

Actual operation

Monitoring method call

Use ordinary appender

trace command & jad command

Use ordinary appender

watch command

First, we trace crateMemento method

民工哥

引用和评论

早知道有这么个吊炸天的 CI&CD 工具，我就不用 Jenkins 了！

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性