1

summary

Java performance analysis is an art and science. Science refers to performance analysis generally including a large number of numbers, measurements and analysis; art refers to the use of knowledge, experience and intuition. The tools or methods of performance analysis have their own merits, but the process of performance analysis is quite different. This article shares some known and applicable Java performance analysis tips to help users better understand and use them.

Tip 1: Thread Stack Analysis

Thread stack analysis is a snapshot analysis of running Java threads. It is a lightweight analysis method. Users can give priority to try when they are not sure about the performance problems of the application. Although there is no uniform standard for judging whether a Java thread is abnormal, users can make quantitative evaluations through some indicators. The following 4 test indicators are shared:

1) Thread deadlock check
Thread deadlock check is a very valuable detection indicator. If the thread is deadlocked, there will generally be problems such as waste of system resources or decreased service capacity, and once it is found, it needs to be dealt with in time. Thread deadlock detection will display the thread deadlock relationship and the corresponding stack information, and the code that triggers the deadlock can be located through analysis. The deadlock model shown in Figure-1 shows a complex 4-thread deadlock scenario.

图 1

2) Thread statistics check
Status statistics are statistics and summaries of running threads according to their running status. When users do not fully understand the pressure of their business, they generally configure a very sufficient range for the number of available threads, which will cause performance degradation or exhaustion of system resources due to too many threads. As shown in Figure -2, it can be found that more than 90% of the threads are in a blocked and waiting state, so appropriately optimizing the number of threads can reduce the overhead caused by thread scheduling and unnecessary waste of resources.

图 2
As shown in Figure-3, the number of threads in the running phase has exceeded 90%, and further analysis may have thread leaks. At the same time, there are too many threads running, and the overhead of thread switching is also very large.

图 3

3) Thread CPU usage check
Statistic and sort the CPU usage of each Java thread, analyze the thread stack with extremely high CPU usage, and quickly locate program hot spots. As shown in Figure-4, the CPU usage of the first task thread has reached 100%, and the developer can determine whether to optimize the code according to the business logic.

图 4
4) GC thread count check
The number of GC threads is often an indicator that is easily overlooked by users. When setting the number of parallel GC threads, it is easy for users to ignore the resource situation of the system, or randomly deploy applications on physical machines with more CPU cores. As shown in Figure-5, we found that the number of concurrent collection threads of G1 in a 4-core 8GB container is 9 (generally, the number of parallel GC threads is 1/4 of the number of GC task threads), that is, the occurrence of GC There may be 9 parallel GC threads. In this case, the CPU resources will be directly exhausted for a short time and the system and business will be blocked. Therefore, when using GC collectors (such as CMS, G1), set or pay attention to the number of GC threads as much as possible.

图 5

Tip 2: GC log analysis

Log analysis is the analysis of the recorded data collected by the Java program GC, and the collection of this part of the data requires specific options to be turned on. Therefore, before starting the Java program, you must add log parameters (such as JDK8: -Xloggc:logs/gc-%t.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps; JDK11: -Xlog:gc*:logs/gc- %t.log:time,uptime). The result of GC log analysis describes the state of memory recycling of Java programs in the past period of time. By analyzing these status information, users can easily obtain GC parameters and even Java code optimization index data. The following three analysis indicators are expanded:

1) GC throughput rate
Throughput rate describes the percentage of time that can be used for business processing during the JVM running time period, that is, non-GC occupancy time. The larger the value, the less time the user GC takes, and the better the JVM performance. The JVM specifies that the value cannot be lower than 90%, otherwise the performance loss caused by the JVM itself will seriously affect business performance. Figure-6 shows the log analysis results of JDK8 CMS running for about 3 hours. The analysis results show that its throughput rate exceeds 99.2%, and the performance loss caused by JVM (GC) is relatively low.

图 6

2) Pause time statistics
The GC pause time refers to the time during which the business thread needs to be stopped during the GC process, and this time needs to be within a reasonable range. If most of the pause time exceeds expectations (the user can accept the range), it is necessary to adjust the GC parameters and heap size, and even set the number of parallel GC threads. As shown in Figure -7, more than 95% of the GC pause time is within 40ms; and the pause of more than 100ms may be the main factor that causes the service request time glitch. In order to eliminate the problem of pause time fluctuations, you can choose such as G1 GC or ZGC, or adjust the number of parallel threads or GC parameters.

图 7

3) Scatter plot of GC stage
The scatter chart reflects the distribution of the memory size released by each GC operation. As shown in Figure 8, the size of the memory released by GC is basically the same each time, indicating that the memory release process is relatively stable. However, if there is a relatively large fluctuation or a relatively large number of Full GCs, it may be due to insufficient heap space in the Cenozoic area, which leads to a large amount of promotion; if the release amount of each GC is relatively small, it may be caused by the G1 GC adaptive algorithm. The generation space is small and so on. Because the data displayed by the scatter chart is limited, it is generally necessary to combine other indicators and the user's JVM parameters for joint analysis.

图 8

trick three: JFR incident analysis

JFR is the abbreviation of Java Flight Record, which is the built-in event-based JDK monitoring and recording framework of JVM. In the community, JFR was released on OpenJDK11 first, and then ported to the higher version 260 of OpenJDK8, and the unified use interface and operation command jcmd was followed. At the same time, since JFR recording generally has little impact on the application (the performance impact of default enabled is within 1%), it is suitable for long-term activation; and JFR can collect rich information such as Runtime, GC, thread stack, heap, IO, etc. , It is very convenient for users to understand the running status of Java programs.
There are more than 100 kinds of events recorded by JFR. If the program is complex, the size of the JFR file recorded in less than 10 minutes will exceed 500MB, so users often do not pay attention to all the information when analyzing. Here are some common ones in business performance to share:

1) Process CPU usage
The default interval of CPU sampling is 1s, which can basically reflect the average CPU usage of the current process in time. When the CPU continues to be high or the CPU occasionally spikes high as shown in Figure-9, certain detection and analysis can be performed. Through further positioning, the CPU spike here is consistent with the GC trigger time, and it is preliminarily confirmed that the CPU changes caused by the GC.

图 9

2) GC configuration and pause distribution
GC configuration can help us understand the current process GC collector and its main configuration parameters, because the characteristics of unused collectors will be different, analysis of heap space, trigger control and other parameters are also very important. Control parameters can help us understand the GC collection process. For example, the G1 collector shown in Figure-10 has a maximum heap of 8GB and a GC pause time of about 40ms (the default expected 200ms), which is far lower than the expected value. Further analysis of the parameters found that the NewRatio value was set to 2 (users are not familiar with G1 GC, it is easy to set this parameter), resulting in frequent triggers of GC in the Cenozoic area, and the mixed GC is not triggered from the data. In order to increase the utilization of the heap space, the NewRatio parameter can be removed, the maximum ratio of the young generation area can be increased (because the mixed GC is not triggered, indicating that the amount of promotion during heap collection is very low), the collection threshold of the collection block, etc. can be reduced, and the collection threshold can be increased. The whole pile is used. Through optimization, the use of heap space has increased from the original 4GB to 7GB, the YGC frequency has been increased from 20s/time to an average of 40s/time, and the GC pause time has not changed significantly.

图 10

3) Method sampling flame diagram
The method flame graph is the statistics of the sampling times of calling the method, the larger the ratio, the more the calling times. Because there is complete information about the stack during the sampling process, it is very intuitive for users, and the help of performance optimization is greatly increased. As shown in Figure -11, you can clearly see that the execution frequency of GroupHeap.match is close to 30%, which can be used as a performance optimization point.

图 11

4) IO read and write performance
Checking IO performance is mostly a scenario where there is a sudden change in the processing performance of the program, such as a decline or a spike. For example, the amount of data read in from the socket soars, resulting in a soaring CPU for processing services; or because more data needs to be written out, business threads are blocked and processing capabilities are reduced. As shown in Figure-11, the IO capability during the monitoring period can be judged through the read/write trend graph.

图 12

Tip 4: Heap content analysis

Heap content analysis is a common method to analyze the causes of Java heap OOM (OutOfMemoryError). OOM mainly includes heap space overflow, meta space overflow, stack space overflow, and direct memory overflow, but not all overflow situations can be obtained through heap content analysis. For heap dump files, the possibility of memory overflow is uncertain, but it can be judged by some quantitative indicators or agreed conditions, and then final confirmation by developers or testers. Three valuable metrics are shared below:

1) Large object check
Statistics on the distribution of large objects can help us understand the proportion of memory consumption on these objects and whether the existence of large objects is reasonable. If too many large objects cannot be released, the memory will be used up faster and OOM will appear. Compared with the full analysis of all objects, the inspection of large objects is representative, as shown in Figure-13.

图 13

2) Class loading check
Class loading statistics mainly count all the class information currently loaded by the program, which is an important data for calculating the space occupied by the meta. Too much loaded class information will also cause a large amount of meta space to be occupied. In a scenario similar to RPC, caching loaded class information can easily trigger OOM.

3) Object leak check
First introduce three concepts;
Shallow heap: The size of the memory occupied by an object has nothing to do with the content of the object, only the structure of the object.
Deep heap: After an object is reclaimed by GC, the actual memory size that can be released, that is, the sum of the shallow heap (dominance tree) of all objects accessed through the object.
Domination tree: In the reference graph of an object, all paths to object B pass through object A, then object A is considered to dominate object B; if object A is the closest dominating object to object B, then object A is considered to be the object of B Direct dominator.
According to the GC strategy, objects in the heap can only have two states, one is the object reachable through the GC root; the other is the object unreachable through the GC root. Unreachable objects will be reclaimed by the GC collector, and the corresponding memory will be returned to the system. The reachable objects are all objects that are directly or indirectly referenced by the user, so object leakage is for objects that are indirectly referenced by the user but will never be used. These objects cannot be released because they are referenced. Object leakage is not absolute, but relative. Generally, there is no exact standard, but it can be evaluated by the object's deep heap size. For example, it is detected that the HashMap stores 4844 objects (as shown in Figure-14), and the shallow heap of the HashMap is calculated to be about 115KB. You may think that there is no problem here; but by calculating the deep heap of the object, it is found that it exceeds 500MB. In this case, if the HashMap cannot be released and new key values are continuously added, the heap memory may be exhausted and OOM may occur.

图 14

About the author:

Nianwu, senior back-end engineer

Mainly responsible for Java performance platform and JDK support, and in-depth research on defect inspection and compiler.

For more exciting content, please follow the [OPPO Internet Technology] public account


OPPO数智技术
612 声望950 粉丝