10

background

Students who often do back-end service maintenance or development have more or less encountered the CPU load being particularly high . especially on weekends or in the middle of the night (do you have the same feelings, usually there is no problem, when there are frequent failures during breaks or after work, some of them are at the end of the article to indicate them) , suddenly someone in the group feedbacks the online machine The load is extremely high, and students who are not familiar with the positioning process and ideas may get on the server in a positioning process takes hundreds of thousands of times.

图片

In this regard, many students have sorted out related processes or methodology, similar to the steps required to put an elephant in the refrigerator

traditional scheme is generally 4 steps:

1. top oder by with P:1040 // 首先按进程负载排序找到  axLoad(pid)
2. top -Hp 进程PID:1073    // 找到相关负载 线程PID
3. printf “0x%x\n”线程PID: 0x431  // 将线程PID转换为 16进制,为后面查找 jstack 日志做准备
4. jstack  进程PID | vim +/十六进制线程PID -        // 例如:jstack 1040|vim +/0x431 -

But for online problem location, every second counts. The above 4 steps are still too cumbersome and time-consuming. Is it possible to package it into a tool to locate the problematic code line in seconds with one click when there is a problem?

of course can!

The maturity of the tool chain not only reflects a developer's operation and maintenance capabilities, but also reflects the developer's awareness of efficiency.

Taobao's oldratlee classmate encapsulated the above process as a tool:

show-busy-java-threads.sh https://github.com/oldratlee/useful-scripts

It is very convenient to locate such problems on the line. Below I will give two examples to see the actual effect.

Quick installation and use:

source <(curl -fsSL https://raw.githubusercontent.com/oldratlee/useful-scripts/master/test-cases/self-installer.sh)

Java regular expression backtracking causes 100% CPU

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class RegexLoad {
    public static void main(String[] args) {
        String[] patternMatch = {"([\\w\\s]+)+([+\\-/*])+([\\w\\s]+)",
                "([\\w\\s]+)+([+\\-/*])+([\\w\\s]+)+([+\\-/*])+([\\w\\s]+)"};
        List patternList = new ArrayList();

        patternList.add("Avg Volume Units product A + Volume Units product A");
        patternList.add("Avg Volume Units /  Volume Units product A");
        patternList.add("Avg retailer On Hand / Volume Units Plan / Store Count");
        patternList.add("Avg Hand Volume Units Plan Store Count");
        patternList.add("1 - Avg merchant Volume Units");
        patternList.add("Total retailer shipment Count");

        for (String s :patternList ){

            for(int i=0;i<patternmatch.length;i++){
                Pattern pattern = Pattern.compile(patternMatch[i]);

                Matcher matcher = pattern.matcher(s);
                System.out.println(s);
                if (matcher.matches()) {

                    System.out.println("Passed");
                }else
                    System.out.println("Failed;");
            }
        }
    }
}

After compiling and running the above code, we can observe that the server has an additional java process with 100% CPU:

图片

How to use it?

show-busy-java-threads.sh
# 从 所有的 Java进程中找出最消耗CPU的线程(缺省5个),打印出其线程栈。

show-busy-java-threads.sh -c <要显示的线程栈数>

show-busy-java-threads.sh -c <要显示的线程栈数> -p <指定的Java Process>
# -F选项:执行jstack命令时加上-F选项(强制jstack),一般情况不需要使用
show-busy-java-threads.sh -p <指定的Java Process> -F

show-busy-java-threads.sh -s <指定jstack命令的全路径>
# 对于sudo方式的运行,JAVA_HOME环境变量不能传递给root,
# 而root用户往往没有配置JAVA_HOME且不方便配置,
# 显式指定jstack命令的路径就反而显得更方便了

show-busy-java-threads.sh -a <输出记录到的文件>

show-busy-java-threads.sh -t <重复执行的次数> -i <重复执行的间隔秒数>
# 缺省执行一次;执行间隔缺省是3秒

##############################
# 注意:
##############################
# 如果Java进程的用户 与 执行脚本的当前用户 不同,则jstack不了这个Java进程。
# 为了能切换到Java进程的用户,需要加sudo来执行,即可以解决:
sudo show-busy-java-threads.sh

Example:

work@dev_zz_Master 10.48.186.32 23:45:50 ~/demo >
bash show-busy-java-threads.sh
[1] Busy(96.2%) thread(8577/0x2181) stack of java process(8576) under user(work):
"main" prio=10 tid=0x00007f0c64006800 nid=0x2181 runnable [0x00007f0c6a64a000]
   java.lang.Thread.State: RUNNABLE
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
        ...
        at java.util.regex.Matcher.match(Matcher.java:1127)
        at java.util.regex.Matcher.matches(Matcher.java:502)
        at RegexLoad.main(RegexLoad.java:27)

[2] Busy(1.5%) thread(8591/0x218f) stack of java process(8576) under user(work):
"C2 CompilerThread1" daemon prio=10 tid=0x00007f0c64095800 nid=0x218f waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

[3] Busy(0.8%) thread(8590/0x218e) stack of java process(8576) under user(work):
"C2 CompilerThread0" daemon prio=10 tid=0x00007f0c64093000 nid=0x218e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

[4] Busy(0.2%) thread(8593/0x2191) stack of java process(8576) under user(work):
"VM Periodic Task Thread" prio=10 tid=0x00007f0c640a2800 nid=0x2191 waiting on condition 

[5] Busy(0.1%) thread(25159/0x6247) stack of java process(25137) under user(work):
"VM Periodic Task Thread" prio=10 tid=0x00007f13340b4000 nid=0x6247 waiting on condition 
work@dev_zz_Master 10.48.186.32 23:46:04 ~/demo >

As you can see, is it very convenient to directly locate the abnormal code line with one click?

Thread deadlock, program hangs

import java.util.*;
public class SimpleDeadLock extends Thread {
    public static Object l1 = new Object();
    public static Object l2 = new Object();
    private int index;
    public static void main(String[] a) {
        Thread t1 = new Thread1();
        Thread t2 = new Thread2();
        t1.start();
        t2.start();
    }
    private static class Thread1 extends Thread {
        public void run() {
            synchronized (l1) {
                System.out.println("Thread 1: Holding lock 1...");
                try { Thread.sleep(10); }
                catch (InterruptedException e) {}
                System.out.println("Thread 1: Waiting for lock 2...");
                synchronized (l2) {
                    System.out.println("Thread 2: Holding lock 1 & 2...");
                }
            }
        }
    }
    private static class Thread2 extends Thread {
        public void run() {
            synchronized (l2) {
                System.out.println("Thread 2: Holding lock 2...");
                try { Thread.sleep(10); }
                catch (InterruptedException e) {}
                System.out.println("Thread 2: Waiting for lock 1...");
                synchronized (l1) {
                    System.out.println("Thread 2: Holding lock 2 & 1...");
                }
            }
        }
    }
}

The effect after execution:

图片

How to locate with tools

图片

One-key positioning: You can clearly see that threads have locked each other's waiting resources, resulting in deadlock, and directly locate the line of code and the specific reason.

Through the above two examples, I think you should have a deeper understanding of what problems this tool and tools can solve, and you can stop panicking when you encounter a CPU 100% problem.

However, it is more dependent on everyone to practice, after all, practice brings true knowledge~

Free and practical script tool spree

In addition to show-busy-java-threads.sh mentioned in the text, oldratlee also integrated many common script tools involved in the development, operation and maintenance process. I find them particularly useful. I will briefly list them:

(1)show-duplicate-java-classes

Occasionally, local development and testing are normal, but inexplicable class exceptions after going online. After all the hard work, the cause turned out to be Jar conflict!

This tool can find duplicate classes in Java Lib (Java library, Jar file) or Class directory (class directory).

A troublesome problem with Java development is Jar conflict (ie multiple versions of Jar), or duplicate classes. There will be problems such as NoSuchMethod, but it is not necessarily a problem at that time. Finding out Jars with duplicates can prevent problems before they happen.

# 查找当前目录下所有Jar中的重复类
show-duplicate-java-classes

# 查找多个指定目录下所有Jar中的重复类
show-duplicate-java-classes path/to/lib_dir1 /path/to/lib_dir2

# 查找多个指定Class目录下的重复类。Class目录 通过 -c 选项指定
show-duplicate-java-classes -c path/to/class_dir1 -c /path/to/class_dir2

# 查找指定Class目录和指定目录下所有Jar中的重复类的Jar
show-duplicate-java-classes path/to/lib_dir1 /path/to/lib_dir2 -c path/to/class_dir1 -c path/to/class_dir2
例如:
# 在war模块目录下执行,生成war文件
$ mvn install
...
# 解压war文件,war文件中包含了应用的依赖的Jar文件
$ unzip target/*.war -d target/war
...
# 检查重复类
$ show-duplicate-java-classes -c target/war/WEB-INF/classes target/war/WEB-INF/lib
...
(2)find-in-jars

Search for class or resource files in all jar files in the current directory.

Usage: Note that Pattern is an extended regular expression of grep.

find-in-jars 'log4j.properties'
find-in-jars 'log4j.xml$' -d /path/to/find/directory
find-in-jars log4j.xml
find-in-jars 'log4j.properties|log4j.xml'

Example:

$ ./find-in-jars 'Service.class$'
./WEB-INF/libs/spring-2.5.6.SEC03.jar!org/springframework/stereotype/Service.class
./rpc-benchmark-0.0.1-SNAPSHOT.jar!com/taobao/rpc/benchmark/service/HelloService.class
(3)housemd pid [java_home]

In the early days, we used BTrace to troubleshoot problems. While lamenting the power of BTrace, we had to toss and hang the online system several times.

In 2012, Taobao’s Ju Shi wrote HouseMD, which integrated several commonly used Btrace scripts to form an independent style application. Its core code is Scala.

HouseMD is a diagnostic tool based on bytecode technology, so in addition to Java, any language that ultimately runs on the JVM in bytecode form, HouseMD supports the diagnosis of them, such as Clojure (thanks to @Killme2008 for providing its use Getting started), scala, Groovy, JRuby, Jython, kotlin, etc.

Use housemd to track java programs at runtime. The supported operations are:

(4)jvm pid

Execute the jvm debug tool, including checking the status of java stack, heap, thread, gc, etc. The supported functions are:

========线程相关=======
1 : 查看占用cpu最高的线程情况
2 : 打印所有线程
3 : 打印线程数
4 : 按线程状态统计线程数
========GC相关=======
5 : 垃圾收集统计(包含原因)可以指定间隔时间及执行次数,默认1秒, 10次
6 : 显示堆中各代的空间可以指定间隔时间及执行次数,默认1秒,5次
7 : 垃圾收集统计。可以指定间隔时间及执行次数,默认1秒, 10次
8 : 打印perm区内存情况*会使程序暂停响应*
9 : 查看directbuffer情况
========堆对象相关=======
10 : dump heap到文件*会使程序暂停响应*默认保存到`pwd`/dump.bin,可指定其它路径
11 : 触发full gc。*会使程序暂停响应*
12 : 打印jvm heap统计*会使程序暂停响应*
13 : 打印jvm heap中top20的对象。*会使程序暂停响应*参数:1:按实例数量排序,2:按内存占用排序,默认为1
14 : 触发full gc后打印jvm heap中top20的对象。*会使程序暂停响应*参数:1:按实例数量排序,2:按内存占用排序,默认为1
15 : 输出所有类装载器在perm里产生的对象。可以指定间隔时间及执行次数
========其它=======
16 : 打印finalzer队列情况
17 : 显示classloader统计
18 : 显示jit编译统计
19 : 死锁检测
20 : 等待X秒,默认为1
q : exit
进入jvm工具后可以输入序号执行对应命令
可以一次执行多个命令,用分号";"分隔,如:1;3;4;5;6
每个命令可以带参数,用冒号":"分隔,同一命令的参数之间用逗号分隔,如:
Enter command queue:1;5:1000,100;10:/data1/output.bin
(5)greys[@IP:PORT]

PS: Currently Greys only supports Java6+ on Linux/Unix/Mac, but Windows cannot support it temporarily

Greys is an abnormal diagnosis tool during the execution of the JVM process, which can easily complete the troubleshooting work without interrupting the execution of the program.

Like HouseMD, Greys-Anatomy takes the name of the American drama "Grey the Intern" with the same name, with the purpose of paying tribute to its predecessors. When writing the code, the ideas of BTrace and HouseMD were taken into consideration.

Use greys to track java programs at runtime (no parameters are required, greys -C pid first, then greys). The supported operations are:

  • View loading class, method information
  • View the current basic information of the JVM
  • Method execution monitoring (call volume, failure rate, response time, etc.)
  • Method to perform data observation, recording and playback (parameters, return results, abnormal information, etc.)
  • Method call trace rendering
  • For more information, please refer to: https://github.com/oldmanpushcart/greys-anatomy/wiki

(6)sjksjk --commands sjk --help

  • Use sjk for Java diagnosis, performance troubleshooting, and optimization tools
  • ttop: monitor the cpu usage of each thread of the specified jvm process
  • jps: enhanced version
  • hh: jmap -histo enhanced version
  • gc: Real-time report garbage collection information
  • For more information, please refer to: https://github.com/aragozin/jvm-tools
Source: https://my.oschina.net/leejun2005/blog/1524687


民工哥
26.4k 声望56.7k 粉丝

10多年IT职场老司机的经验分享,坚持自学一路从技术小白成长为互联网企业信息技术部门的负责人。2019/2020/2021年度 思否Top Writer