Another performance monitoring tool-flame graph

Preface

The evolution of tools has always been a sign of the progress of human productivity. The rational use of tools can greatly improve our work efficiency. When we encounter problems, the rational use of tools can speed up the progress of troubleshooting. This is also the reason why I like the shell very much. Its rich command line tool set and pipeline features are really accurate and elegant to deal with text data sets, which is fascinating.

But in many cases the expressive power of the text is very limited. It can be said that it is lacking. When expressing absolute value, it is naturally unfavorable, but when showing relative value, it is somewhat stretched, let alone multi-dimensional data.

We can use the shell to query the cumulative value, maximum value, etc. in the text very quickly, but when we encounter the correlation analysis of the two sets of values, we are at a loss. At this time, you need to use another analysis tool-graph, such as scatter plot can clearly show the correlation.

going to introduce a picture, 161a763cc74b10 flame picture . The gods in the group have shared its usage method before, but I haven’t used it for a long time, so that I am not impressed by it. Recently, when troubleshooting our Java application load problem After trying it out, I got a little bit of its use.

introduce

Introduction

When troubleshooting performance problems, we usually dump the thread stack and then use

grep --no-group-separator -A 1 java.lang.Thread.State jstack.log | awk 'NR%2==0' | sort | uniq -c | sort -nr

Similar shell statements to see what most thread stacks are doing. The frequency of the thread stack is used to infer the most time-consuming calls in the JVM.

As for the principle, imagine that there is a large screen in the square that is constantly playing various advertisements. If we randomly take pictures of the big screen, the number of times is too large, and the frequency of each advertisement in the photo is counted, and we can basically get the proportion of the playing time of each advertisement.

And the resources of our application are like a big screen. Each call is like playing an advertisement. By counting the proportion of the thread stack from the dump, we can basically see the time-consuming proportion of the thread stack. Although there is an error, it is repeated many times. There shouldn't be much difference under statistics. This is why some parents find their children look at the system desktop every time they enter their children’s room and think that their children like to stare at the desktop in a daze. :)

2444  at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1200)
1587  at sun.misc.Unsafe.park(Native Method)
795  at java.security.Provider.getService(Provider.java:1035)
293  at java.lang.Object.wait(Native Method)
292  at java.lang.Thread.sleep(Native Method)
 73  at org.apache.logging.log4j.core.layout.TextEncoderHelper.copyDataToDestination(TextEncoderHelper.java:61)
 71  at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 70  at java.lang.Class.forName0(Native Method)
 54  at org.apache.logging.log4j.core.appender.rolling.RollingFileManager.checkRollover(RollingFileManager.java:217)

But there are some problems like this. It is very troublesome to write the shell first. In addition, if I want to view the most calls from the second stack from the top of the stack, even if the shell command is modified, the result is not intuitive.

The main reason for this problem is that our thread stack has a call relationship, that is, we need to consider the two dimensions of the thread stack's call chain and frequency. It is difficult for a single text to express these two dimensions. Therefore, the famous performance Analysis master brendan gregg proposed the flame diagram.

introduce

The flame map, named after its shape like a flame, its open source code address:

https://github.com/brendangregg/FlameGraph

It is a kind of svg interactive graphics, we can display more information by clicking and pointing with the mouse. The picture below is a typical flame picture. From the structure, it is composed of multiple squares of different sizes and colors. Each square has characters. The bottoms of them are connected together to form the base of the flame, and there are many on the top. "Little Flame".

When we click on the square, the picture will expand upward from the square we clicked on, and when we point the mouse to the square, the detailed description of the square will be displayed.

characteristic

Before introducing the analysis of the flame graph, we must first explain its characteristics:

A unique call chain can be traced from the bottom to the top, and the lower square is the parent call of the upper square.
The squares called by the same parent are arranged in alphabetical order from left to right.
The characters on the square indicate a call name. In parentheses are the number of times the call pointed to by the flame graph appears in the flame graph and the percentage of the width of this square in the bottommost square.
The color of the square has no practical meaning, and the color difference between adjacent squares is just for easy viewing.

analyze

So, give us a flame diagram, how can we see where the system is wrong?

From the characteristics of the flame graph above, when viewing the flame graph, our main focus should be on the width of the square, because the width represents the number of times the call stack appears globally, the number of times represents the frequency of occurrence, and the frequency is also Can explain the time-consuming.

However, it is not meaningful to observe the width of the bottom or middle square of the flame graph. As in the flame graph above, the width of the do_redirections function in the middle is 24.87%, which means it consumes nearly a quarter of the entire application. It is not the do_redirections function that consumes time, but other functions called inside do_redirections, and its sub-calls are divided into many, and the time-consuming of each call is not abnormal.

What we should pay more attention to is some "flat top mountains" at the top of the flame graph. The top shows that it has no sub-calls. The width of the square shows that it takes a long time, hangs for a long time, or is called very frequently. This kind of square-directed calls is the performance The culprit of the problem.

Find the abnormal call, directly optimize it, or find our business code to optimize it according to the call chain of the flame graph, and you're done.

Application scenario

Each tool has its suitable application scenarios, and the flame diagram is suitable for:

Code loop analysis: If there is a large loop or infinite loop code in the code, there will be an obvious "flat top" from the top of the flame graph or close to the item, indicating that the code frequently switches up and down on a certain thread stack . But it should be noted that if the total cycle time is not long, it will not be obvious on the flame graph.
IO bottleneck/lock analysis: In our application code, our calls are generally synchronous, which means that when network calls, file I/O operations or unsuccessful locks are obtained, the thread will stay on a certain call Waiting for I/O response or lock, if this waiting is very time-consuming, it will cause the thread to hang on a certain call, which will be very clear on the flame graph. In contrast to this, the flame graph composed of our application threads cannot accurately express the CPU consumption, because there is no system call stack in the application thread. When the application thread stack is hanging, the CPU may do other things, causing us to see It takes a long time, but the CPU is very idle.
Flame graph inversion analysis global code: Flame graph inversion is sometimes very useful. If our code N different branches call a certain method, after inversion, all the same calls on the top of the stack are merged together, we can see The total time-consuming of this method makes it easy to evaluate the benefits of optimizing this method.

accomplish

Now that the flame graph is so powerful, how can we achieve it?

Build tool

The great god brendan gregg has implemented the method of generating flame graphs with perl. The open source code is in the Github repository above. The flamegraph.pl file in the root directory is the executable perl file.

This command can also pass in various parameters to support us to modify the color and size of the flame graph.

But flamegraph.pl can only handle files in a specific format, like:

a;b;c 12
a;d 3
b;c 3
z;d 5
a;c;e 3

The front is the call chain, used between each call; to separate, the number after each line is the number of times the call stack appears.

Like the above data, the flame graph generated by flamegraph.pl is as follows:

data preparation

As for jstack information is processed into the above format, Great God provides tools for common dump formats. For example, stackcollapse-perf.pl can process the output of the perf stackcollapse-jstack.pl processes the jstack output, and stackcollapse-gdb.pl processes the stack output of the gdb.

You can also use the shell to simply implement the processing method of jstack

grep -v -P '.+prio=d+ os_prio=d+' | grep -v -E 'locked <' | awk '{if ($0==""){print $0}else{printf"%s;",$0}}' | sort | uniq -c | awk '{a=$1;$1="";print $0,a}'

summary

The flame chart is finished, and there is another way to deal with performance problems in the future.

The longer I do development, the more I can feel the importance of tools, so I am going to add a special topic to introduce the various tools I use. Of course, this also requires me to learn more about, use and summarize new tools.

Source: https://zhenbianshu.github.io

Another performance monitoring tool-flame graph

Preface

introduce

Introduction

introduce

characteristic

analyze

Application scenario

accomplish

Build tool

data preparation

summary

民工哥

引用和评论

早知道有这么个吊炸天的 CI&CD 工具，我就不用 Jenkins 了！

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性