1

Introduction to ASM

ASM is a general Java bytecode manipulation and analysis framework, which can be used to modify existing classes or directly generate classes dynamically in binary form. ASM provides some common bytecode conversion and analysis algorithms, from which you can build customized complex conversion and code analysis tools. ASM provides functions similar to other Java bytecode frameworks, but focuses on performance. Because its design and implementation are as small and fast as possible, it is very suitable for use in dynamic systems (of course, it can also be used in a static way, such as in a compiler).

ASM is used in many projects, including the following:

  • OpenJDK, generate lambda call site, and Nashorn compiler;
  • Groovy compiler and Kotlin compiler;
  • Cobertura and Jacoco, measure code coverage with tooling classes;
  • CGLIB, used to dynamically generate proxy classes;
  • Gradle, generate some classes at runtime;

For more reference official website: https://asm.ow2.io/

IDE plugin

ASM directly manipulates the bytecode. If you are not familiar with the bytecode operation collection, it will be very difficult to write. Therefore, ASM provides the development plug-in BytecodeOutline for mainstream IDEs:

Take IDEA as an example, just right-click in the corresponding class -> Show Bytecode outline, roughly as shown in the figure below:

image-20210608154029529.png

The panel contains three tabs:

  • Bytecode : the bytecode file corresponding to the class;
  • ASMified : Use ASM generate the code corresponding to the bytecode;
  • Groovified : the bytecode instruction corresponding to the class;

ASM API

ASM library provides for generating and converting two compiled classes API , a is the core API , based on the class represented in the form of event; the other is a tree API , based on the object represented in the form of classes; can be compared to XML file Analysis method: SAX mode and DOM mode; core API corresponds to SAX mode, tree API corresponds to DOM mode; each mode has its own advantages and disadvantages:

  • Event-based APIs are faster than object-based APIs and require less memory. However, when using event-based APIs, the implementation of class conversion may be more difficult;
  • The object-based API will load the entire class into memory;

ASM library is organized in several packages, which are distributed in several separate JAR files:

  • org.objectweb.asm and org.objectweb.asm.signature packages: define event-based API and provide class parser and writer components, which are contained in asm.jar;
  • org.objectweb.asm.util package: provides various tools based on the core API, these tools can be used in the development and debugging of ASM applications, included in asm-util.jar ;
  • org.objectweb.asm.commons package: provides several useful predefined class converters, mainly based on the core API, included in asm-commons.jar ;
  • org.objectweb.asm.tree package: defines object-based APIs and provides tools for converting between event-based representations and object-based representations, included in asm-tree.jar ;
  • org.objectweb.asm.tree.analysis package: The package provides a tree API-based class analysis framework and several predefined class analyzers, which are included in asm-analysis.jar ;

Core API

Before learning the core API , it is recommended to understand the visitor pattern, because ASM 's operation and analysis of bytecode are based on the visitor pattern;

Visitor mode

The visitor model suggests putting the new behavior into a visitor instead of trying to integrate it into the existing class. Now, the original object that needs to perform the operation will be passed as a parameter to the method in the visitor, so that the method can access all the necessary data contained in the object; common application scenarios:

  • If you need to perform certain operations on all elements in a complex object structure (such as an object tree), you can use the visitor mode;
  • The visitor pattern can be used to clean up the business logic of the auxiliary behavior;
  • This mode can be used when a certain behavior is only meaningful in some classes in the class hierarchy, but has no meaning in other classes;

Bytecode is actually a complex object structure, and Sharding-Jdbc in sql also uses visitor mode. It can be found that there are some data with relatively stable data structure and fixed syntax;

More reference: visitor pattern

class

The visitor model has two core classes: independent visitor and receiver event generator; there are two core classes in the ASM ClassVisitor and ClassReader , which are introduced below;

ClassVisitor

ASM API used to generate and transform compiled classes is based on the ClassVisitor abstract class. Each method in this class corresponds to the class file structure of the same name:

public abstract class ClassVisitor {
    public ClassVisitor(int api);
    public ClassVisitor(int api, ClassVisitor cv);
    public void visit(int version, int access, String name,String signature, String superName, String[] interfaces);
    public void visitSource(String source, String debug);
    public void visitOuterClass(String owner, String name, String desc);
    AnnotationVisitor visitAnnotation(String desc, boolean visible);
    public void visitAttribute(Attribute attr);
    public void visitInnerClass(String name, String outerName,String innerName, int access);
    public FieldVisitor visitField(int access, String name, String desc,String signature, Object value);
    public MethodVisitor visitMethod(int access, String name, String desc,String signature, String[] exceptions);
    void visitEnd();
}

The content can have any length and complexity of the components will be returned to the auxiliary visitor category, mainly including: AnnotationVisitor , FieldVisitor , MethodVisitor ; For more information, please refer to the Java virtual machine specification;

All the above methods will be ClassReader . The parameters in all methods are ClassReader . Of course, each method is called in order:

visit visitSource? visitOuterClass? ( visitAnnotation | visitAttribute )* ( visitInnerClass | visitField |visitMethod )* visitEnd

First calls visit then is visitSource up a call, is followed visitOuterClass up a call, and then in any order of visitAnnotation and visitAttribute access to any of a plurality, in any order followed to visitInnerClass , visitField and visitMethod any number of calls, and finally to a visitEnd call ends.

ClassReader

The main function of this type is to read the bytecode file, and then notify ClassVisitor the read data. The bytecode file can be passed in in various ways:

  • public ClassReader(final InputStream inputStream) : byte stream mode;
  • public ClassReader(final String className) : the full path of the file;
  • public ClassReader(final byte[] classFile) : binary file;

The common usage is as follows:

ClassReader classReader = new ClassReader("com/zh/asm/TestService");
ClassWriter classVisitor = new ClassWriter(ClassWriter.COMPUTE_MAXS);
classReader.accept(classVisitor, 0);

ClassReader the accept method for processing receiving a visitor, further comprising a further parsingOptions parameters, options, comprising:

  • SKIP_CODE : skip access to compiled code (this may be useful if you only need the class structure);
  • SKIP_DEBUG : Do not access debugging information, nor create artificial tags for it;
  • SKIP_FRAMES : skip the stack map frame;
  • EXPAND_FRAMES : decompress these frames;

ClassWriter

In the above example, ClassWriter is used, which is inherited from ClassVisitor . It is mainly used to generate classes and can be used alone, as shown below:

ClassWriter cw = new ClassWriter(0);
cw.visit(V1_5, ACC_PUBLIC + ACC_ABSTRACT + ACC_INTERFACE,"pkg/Comparable", null, "java/lang/Object",new String[]{"pkg/Mesurable"});
cw.visitField(ACC_PUBLIC + ACC_FINAL + ACC_STATIC, "LESS","I", null, new Integer(-1)).visitEnd();
cw.visitField(ACC_PUBLIC + ACC_FINAL + ACC_STATIC, "EQUAL","I", null, new Integer(0)).visitEnd();
cw.visitField(ACC_PUBLIC + ACC_FINAL + ACC_STATIC, "GREATER","I", null, new Integer(1)).visitEnd();
cw.visitMethod(ACC_PUBLIC + ACC_ABSTRACT, "compareTo","(Ljava/lang/Object;)I", null, null).visitEnd();
cw.visitEnd();
byte[] b = cw.toByteArray();

//输出
FileOutputStream fileOutputStream = new FileOutputStream(new File("F:/asm/Comparable.class"));
fileOutputStream.write(b);
fileOutputStream.close();

The above ClassWriter , and then converts it into a byte array, and finally FileOutputStream . The decompilation result is as follows:

package pkg;

public interface Comparable extends Mesurable {
    int LESS = -1;
    int EQUAL = 0;
    int GREATER = 1;

    int compareTo(Object var1);
}

ClassWriter need to provide a parameter flags when instantiating 060c2caba2f3a2, the options include:

  • COMPUTE_MAXS : Will calculate the size of the local variables and operand stack part for you; still have to call visitMaxs , but you can use any parameters: they will be ignored and recalculated; when using this option, you must still calculate these frames yourself;
  • COMPUTE_FRAMES : Everything is automatically calculated; visitFrame visitMaxs must still be called (parameters will be ignored and recalculated);
  • 0: Nothing will be calculated automatically; you must calculate the size of the frame, local variables and operand stack yourself;

The above is only ClassWriter , but it is more meaningful to integrate the above three core classes. Let's focus on the conversion operation;

Conversion operation

ClassVisitor between the class reader and the class writer, and integrate the three. The general code structure is as follows:

ClassReader classReader = new ClassReader("com/zh/asm/TestService");
ClassWriter classWriter = new ClassWriter(ClassWriter.COMPUTE_MAXS);
//处理
ClassVisitor classVisitor = new AddFieldAdapter(classWriter...);
classReader.accept(classVisitor, 0);

The corresponding architecture of the above code is shown in the figure below:

image-20210609172035340.png

An adapter for adding attributes is provided here. You can rewrite the visitEnd method and then write new attributes. The code is as follows:

public class AddFieldAdapter extends ClassVisitor {
    private int fAcc;
    private String fName;
    private String fDesc;
    //是否已经有相同名称的属性
    private boolean isFieldPresent;

    public AddFieldAdapter(ClassVisitor cv, int fAcc, String fName,
                           String fDesc) {
        super(ASM4, cv);
        this.fAcc = fAcc;
        this.fName = fName;
        this.fDesc = fDesc;
    }

    @Override
    public FieldVisitor visitField(int access, String name, String desc,
                                   String signature, Object value) {
        //判断是否有相同名称的字段,不存在才会在visitEnd中添加
        if (name.equals(fName)) {
            isFieldPresent = true;
        }
        return cv.visitField(access, name, desc, signature, value);
    }

    @Override
    public void visitEnd() {
        if (!isFieldPresent) {
            FieldVisitor fv = cv.visitField(fAcc, fName, fDesc, null, null);
            if (fv != null) {
                fv.visitEnd();
            }
        }
        cv.visitEnd();
    }
}

According to the ClassVisitor in which each method of 060c2caba2f5b3 is called, if there are multiple attributes in the class, then visitField will be called multiple times, and each time it will check whether the field to be added already isFieldPresent , and then save it in the 060c2caba2f5b7 logo, so that it is accessed In the final visitEnd , judge whether new attributes need to be added;

ClassVisitor classVisitor = new AddFieldAdapter(classWriter,ACC_PUBLIC + ACC_FINAL + ACC_STATIC,"id","I");

public static final int id is added here; you can write the byte array into the class file, and then decompile and view:

public class TestService {
    public static final int id;
    ......
}

Tools

In addition to the core classes above, ASM also provides some tool classes for the convenience of users:

  • Type
    Type object represents a Java type, which can be constructed from a type descriptor or a Class object; the Type type also contains static variables that represent primitive types;
  • TraceClassVisitor
    Extend the ClassVisitor class and construct a textual representation of the accessed class; use TraceClassVisitor to obtain a readable trace of the actual generated content;
  • CheckClassAdapter
    ClassWriter class does not verify whether the call sequence of its methods is appropriate, and whether the parameters are valid; therefore, some invalid classes that are rejected by the Java virtual machine validator may be generated. In order to detect some of these errors as early as possible, you can use the CheckClassAdapter category;
  • ASMifier
    This class TraceClassVisitor tool (by default, it uses a Textifier backend that produces the output type shown above). This backend makes TraceClassVisitor class print the Java code used to call it.

method

In the introduction of the above ClassVisitor , the components of the access complexity will be returned to the auxiliary visitor class, including: AnnotationVisitor , FieldVisitor , MethodVisitor ; Before introducing MethodVisitor , learn about the Java virtual machine execution model;

Execution model

When each method is executed, the Java virtual machine will synchronously create a stack frame (Stack Frame) to store information such as the local variable table, operand stack, dynamic connection, method export, etc.
interest. The process from when each method is called to the completion of execution corresponds to the process of a stack frame from pushing to popping in the virtual machine stack;

  • Local variable table: contains variables that can be accessed in random order by their indexes;
  • Operand stack: The bytecode instruction is used as the value stack of the operand;

Look at an execution stack with 3 frames:

image-20210610102350747.png

The first frame: Contains 3 local variables, the maximum value of the operand stack is 4, and contains 2 values;

The second frame: Contains 2 local variables, the maximum value of the operand stack is 3, and contains 2 values;

The third frame: Contains 4 local variables, the maximum value of the operand stack is 2, and contains 2 values;

Byte code instruction

The bytecode instruction consists of an operation code that identifies the instruction and a fixed number of parameters:

  • Opcode: is an unsigned byte value, identified by a mnemonic symbol. For example, the opcode value 0 is designed by the mnemonic NOP and corresponds to an instruction that does not perform any operation.
  • Parameter: It is a static value, which determines the precise command behavior. They are given immediately after the opcode.

Bytecode instructions are divided into two categories:

  • A small number of instructions are used to transfer values from local variables to the operand stack;
  • Other instructions only act on the operand stack: they pop some values from the stack, calculate the result based on these values, and then push it back to the stack;

Local variable instructions:

  • ILOAD : used to load a boolean, byte, char, short or int local variable;
  • LLOAD, FLOAD, DLOAD : used to load long, float or double values respectively;
  • ALOAD : used to load any non-primitive value, that is, object and array references;

Operand stack instructions:

  • ISTORE : Pop a boolean, byte, char, short or int local variable value from the operand stack and store it in the local variable specified by its index i;
  • LSTORE,FSTORE,DSTORE : pop up long, float or double values respectively;
  • ASTORE : used to pop up any non-primitive value;
  • GETFIELD , PUTFIELD : GETFIELD owner name desc pops up an object reference and pushes the value of name
    PUTFIELD owner name desc pops up a value and an object reference, and stores the value in its name field;
    In both cases, the object must be of owner , and its field must be of type desc GETSTATIC and PUTSTATIC are similar instructions, but for static fields.
  • INVOKEVIRTUAL、INVOKESTATIC、INVOKESPECIAL、INVOKEINTERFACE、INVOKEDYNAMIC
    INVOKEVIRTUAL owner name desc calls name method defined in the owner , and its method descriptor is desc . INVOKESTATIC used for static methods, INVOKESPECIAL used for private methods and constructors, and INVOKEINTERFACE used for methods defined in interfaces. Finally, for the java7 class, INVOKEDYNAMIC used for the new dynamic method invocation mechanism.

MethodVisitor

ASM API used to generate and convert compiled methods is based on the MethodVisitor abstract class; it is ClassVisitor by the visitMethod method of 060c2caba2fdc0; this class also defines a method for each bytecode instruction category according to the number and types of these instructions; These methods are called in the following order:

visitAnnotationDefault? ( visitAnnotation | visitParameterAnnotation | visitAttribute )*( visitCode( visitTryCatchBlock | visitLabel | visitFrame | visitXxx Insn |visitLocalVariable | visitLineNumber )*visitMaxs )?visitEnd

Let's look at an example of converting an existing method, adding a start and end log to the method;

  1. Prepare the instance that needs to be converted, and add logs before and after the query

    public class TestService {
        public void query(int param) {
            System.out.println("service handle...");
        }
    }
  2. Rewrite ClassVisitor in visitMethod

    public class MyClassVisitor extends ClassVisitor implements Opcodes {
        public MyClassVisitor(ClassVisitor cv) {
            super(ASM5, cv);
        }
    
        @Override
        public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) {
            MethodVisitor methodVisitor = cv.visitMethod(access, name, desc, signature,
                    exceptions);
            if (!name.equals("<init>") && methodVisitor != null) {
                methodVisitor = new MyMethodVisitor(methodVisitor);
            }
            return methodVisitor;
        }
    }

Filter out the <init> method, other methods will be MyMethodVisitor , and then rewrite the MethodVisitor method;

  1. Overload MethodVisitor

    public class MyMethodVisitor extends MethodVisitor implements Opcodes {
        public MyMethodVisitor(MethodVisitor mv) {
            super(Opcodes.ASM4, mv);
        }
    
        @Override
        public void visitCode() {
            super.visitCode();
            mv.visitFieldInsn(GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;");
            mv.visitLdcInsn("start");
            mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V", false);
        }
    
        @Override
        public void visitInsn(int opcode) {
            if ((opcode >= Opcodes.IRETURN && opcode <= Opcodes.RETURN)
                    || opcode == Opcodes.ATHROW) {
                //方法在返回之前打印"end"
                mv.visitFieldInsn(GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;");
                mv.visitLdcInsn("end");
                mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V", false);
            }
            mv.visitInsn(opcode);
        }
    }

visitCode method before accessing, and visitInsn needs to determine whether the operator is a method return. The general method will perform the mv.visitInsn(RETURN) operation before returning. At this time, it can be judged opcode

  1. View the generated new bytecode file

    public class TestService {
        public TestService() {
        }
    
        public void query(int var1) {
            System.out.println("start");
            System.out.println("service handle...");
            System.out.println("end");
        }
    }

Tools

Some tools are also provided under the method:

  • LocalVariablesSorter : This method adapter renumbers the local variables used in a method according to the order in which they appear in this method. At the same time, you can use the newLocal method to create a new local variable;
  • AdviceAdapter : This method adapter is an abstract class that can be used RETURN or ATHROW instructions; its main advantage is that it also applies to constructors, where the code cannot be inserted only at the beginning of the constructor, but at the beginning of the constructor. Insert after calling the super constructor.

scenes to be used

ASM is used in many projects, here are two common usage scenarios: AOP and instead of reflection;

AOP

Aspect-oriented programming is mainly used to solve some system-level problems in program development, such as logs, transactions, and permission waiting; the key technology is proxy, which includes dynamic proxy and static proxy, and there are many ways to implement it:

  • AspectJ: belongs to static weaving, the principle is static proxy;
  • JDK dynamic agent: JDK dynamic agent two core classes: Proxy and InvocationHandler ;
  • Cglib dynamic proxy: encapsulates ASM , and can dynamically generate new Class ; it is more powerful JDK dynamic proxy in function;

Among them, the dynamic proxy method Cglib ASM . In the above example, we also saw the bytecode enhancement function ASM

Instead of reflection

FastJson known for its fast speed, one of which is to use ASM instead of Java reflection; there is also a ReflectASM package specifically used to replace Java reflection;

ReflectASM is a very small Java class library that provides high-performance reflection processing through code generation, and automatically provides access classes for get/set fields. Access classes use bytecode operations instead of Java's reflection technology, so it is very fast.

Look at a simple way of using ReflectASM

TestBean testBean = new TestBean(1, "zhaohui", 18);
MethodAccess methodAccess = MethodAccess.get(TestBean.class);
String[] mns = methodAccess.getMethodNames();

for (int i = 0; i < mns.length; i++) {
    System.out.println(methodAccess.invoke(testBean, mns[i]));
}

TestBean are printed normally here. Why is the speed fast? Because a temporary ASM TestBeanMethodAccess , the invoke method is internally rewritten, and the decompilation is as follows:

public Object invoke(Object var1, int var2, Object... var3) {
        TestBean var4 = (TestBean)var1;
        switch(var2) {
        case 0:
            return var4.getName();
        case 1:
            return var4.getId();
        case 2:
            return var4.getAge();
        default:
            throw new IllegalArgumentException("Method not found: " + var2);
        }
 }

It can be found that invoke is actually a normal call, and the speed is definitely faster than using java reflection.

Reference documents

asm4-guide.pdf

ASM4 manual Chinese version

Thanks for attention

You can follow the WeChat public "160c2caba30b04 roll back code ", read the first time, the article is continuously updated; focus on Java source code, architecture, algorithm and interview.

ksfzhaohui
398 声望70 粉丝