1
This article is authorized to collate and publish by the chief lecturer of HeapDump performance community Kumo (Ma Zhi)

Chapter 9-Definition of Bytecode Instructions

The previous article introduced the Java stack frame creation and bytecode dispatch logic under the interpretation and execution, but it has never been mentioned how the virtual machine executes the bytecode in the Java method. Before introducing the execution of the bytecode, you need to First know the definition of bytecode instructions. Some attributes of bytecode instructions are defined in the Bytecodes::initialize() function. The call chain of this function is as follows:

init_globals()
bytecodes_init() 
Bytecodes::initialize()

There is a definition similar to this in the Bytecodes::initialize() function:

//  bytecode               bytecode name           format   wide f.   result tp  stk traps
def(_nop                 , "nop"                 , "b"    , NULL    , T_VOID   ,  0, false);
def(_aconst_null         , "aconst_null"         , "b"    , NULL    , T_OBJECT ,  1, false);
def(_iconst_m1           , "iconst_m1"           , "b"    , NULL    , T_INT    ,  1, false);
def(_iconst_0            , "iconst_0"            , "b"    , NULL    , T_INT    ,  1, false);
def(_iconst_1            , "iconst_1"            , "b"    , NULL    , T_INT    ,  1, false);
// ...

Now the 202 bytecode instructions defined by the Java virtual machine specification will be defined by calling the def() function as shown in the figure above. We need to focus on the parameters bytecode name, format, etc. passed when calling the def() function. The following is an explanation one by one, as follows:

  • bytecode name is the bytecode name;
  • wide indicates whether wide can be added in front of the bytecode, if so, the value is "wbii";
  • result tp indicates the result type after the instruction is executed. For example, when it is T_ILLEGAL, it means that the type of the execution result cannot be determined by referring to the current bytecode. For example, the _invokevirtual method call instruction, the result type should be the method return type, but this time only this The bytecode instruction of the calling method cannot be determined;
  • stk represents the effect on the depth of the expression stack. For example, the _nop instruction does not perform any operation, so it has no effect on the depth of the expression stack. The value of stk is 0; when _iconst_0 is used to push 0 into the stack, the depth of the stack Increase by 1, so the value of stk is 1. When it is _lconst_0, the depth of the stack will increase by 2; when it is _lstore_0, the depth of the stack will decrease by 2;
  • Traps means can_trap. This is more important and will be described in detail later.
  • format. This attribute can express two meanings. First, it can express the format of the bytecode, and it can also express the length of the bytecode.

Below we need to focus on the format parameter. format represents the format of the bytecode. When there is one character in the string, it is a bytecode with a length of one byte, and when it is 2 characters, it is a bytecode with a length of 2 bytes..., such as _iconst_0 is A bytecode of one byte width, the format of _istore is "bi", so it is 2 bytes wide. The format may also be an empty string. When it is an empty string, it means that the current bytecode is not the bytecode defined in the Java Virtual Machine specification, such as _fast_agetfield, _fast_bgetfield and other bytecodes to improve the efficiency of interpretation and execution. The bytecode is defined inside the virtual machine. It can also express the format of bytecode, the meaning of each character in the string is as follows:

b: Indicates that the bytecode instructions are non-variable length, so for variable-length instructions such as tableswitch and lookupswitch, the format string will not contain the b character;
c: The operand is a signed constant. For example, the bipush instruction will sign-extend byte to an int type value, and then push this value onto the operand stack;
i: The operand is an unsigned local variable table index value. For example, the iload instruction loads an int type value from the local variable table to the operand stack;
j: The operand is the index of the constant pool cache. Note that the constant pool cache index is different from the constant pool index. The constant pool index is described in detail in the basic volume of "In-depth analysis of the Java virtual machine: source code analysis and detailed examples". Introduce again
k: The operand is an unsigned constant pool index. For example, the ldc instruction will extract data from the runtime constant pool and push it into the operand stack, so the format is "bk";
o: The operand is the branch offset. For example, ifeq means the integer is compared with zero. If the integer is 0, the comparison result is true. The operand is regarded as the branch offset to jump, so the format is "boo";
_: Can be ignored
w: can be used to expand the bytecode of the local variable table index. These bytecodes include iload, fload, etc., so the value of wild is "wbii";

The implementation of the called def() function is as follows:

void Bytecodes::def(
Code          code,
const char*   name,
const char*   format,
const char*   wide_format,
BasicType     result_type,
int           depth,
bool          can_trap,
Code          java_code
) {
  int len  = (format      != NULL ? (int) strlen(format)      : 0);
  int wlen = (wide_format != NULL ? (int) strlen(wide_format) : 0);

  _name          [code] = name;
  _result_type   [code] = result_type;
  _depth         [code] = depth;
  _lengths       [code] = (wlen << 4) | (len & 0xF); // 0xF的二进制值为1111
  _java_code     [code] = java_code;


  int bc_flags = 0;
  if (can_trap){
    // ldc、ldc_w、ldc2_w、_aload_0、iaload、iastore、idiv、ldiv、ireturn等
    // 字节码指令都会含有_bc_can_trap
    bc_flags |= _bc_can_trap; 
  }
  if (java_code != code){
    bc_flags |= _bc_can_rewrite; // 虚拟机内部定义的指令都会有_bc_can_rewrite
  }

  // 在这里对_flags赋值操作
  _flags[(u1)code+0*(1<<BitsPerByte)] = compute_flags(format,      bc_flags);
  _flags[(u1)code+1*(1<<BitsPerByte)] = compute_flags(wide_format, bc_flags);
}

The _name, _result_type, etc. are all static arrays defined in the Bytecodes class. The subscript is the Opcode value, and the stored values are name, result_type, etc. The definitions of these variables are as follows:

const char*     Bytecodes::_name          [Bytecodes::number_of_codes];
BasicType       Bytecodes::_result_type   [Bytecodes::number_of_codes];
s_char          Bytecodes::_depth         [Bytecodes::number_of_codes];
u_char          Bytecodes::_lengths       [Bytecodes::number_of_codes];
Bytecodes::Code Bytecodes::_java_code     [Bytecodes::number_of_codes];
u_short         Bytecodes::_flags         [(1<<BitsPerByte)*2];

The value of Bytecodes::number_of_codes is 234, which is enough to store all bytecode instructions (including instructions for internal expansion of the virtual machine).

Looking back at the Bytecodes::def() function, some attributes of the bytecode are calculated by calling the compute_flags() function according to the passed wide_format and format, and then stored in the upper 8 bits and the lower 8 bits. The implementation of the called compute_flags() function is as follows:

int Bytecodes::compute_flags(const char* format, int more_flags) {
  if (format == NULL) {
      return 0;  // not even more_flags
  }

  int flags = more_flags;
  const char* fp = format;
  switch (*fp) {
  case '\0':
    flags |= _fmt_not_simple; // but variable
    break;
  case 'b':
    flags |= _fmt_not_variable;  // but simple
    ++fp;  // skip 'b'
    break;
  case 'w':
    flags |= _fmt_not_variable | _fmt_not_simple;
    ++fp;  // skip 'w'
    guarantee(*fp == 'b', "wide format must start with 'wb'");
    ++fp;  // skip 'b'
    break;
  }

  int has_nbo = 0, has_jbo = 0, has_size = 0;
  for (;;) {
    int this_flag = 0;
    char fc = *fp++;
    switch (fc) {
    case '\0':  // end of string
      assert(flags == (jchar)flags, "change _format_flags");
      return flags;

    case '_': continue;         // ignore these

    case 'j': this_flag = _fmt_has_j; has_jbo = 1; break;
    case 'k': this_flag = _fmt_has_k; has_jbo = 1; break;
    case 'i': this_flag = _fmt_has_i; has_jbo = 1; break;
    case 'c': this_flag = _fmt_has_c; has_jbo = 1; break;
    case 'o': this_flag = _fmt_has_o; has_jbo = 1; break;

    case 'J': this_flag = _fmt_has_j; has_nbo = 1; break;
    ...
    default:  guarantee(false, "bad char in format");
    }// 结束switch

    flags |= this_flag;

    guarantee(!(has_jbo && has_nbo), "mixed byte orders in format");
    if (has_nbo){
      flags |= _fmt_has_nbo;
    }

    int this_size = 1;
    if (*fp == fc) {
      // advance beyond run of the same characters
      this_size = 2;
      while (*++fp == fc){
          this_size++;
      }
      switch (this_size) {
      case 2: flags |= _fmt_has_u2; break; // 如sipush、ldc_w、ldc2_w、wide iload等
      case 4: flags |= _fmt_has_u4; break; // 如goto_w和invokedynamic指令
      default:
          guarantee(false, "bad rep count in format");
      }
    }

    has_size = this_size;
  }
}

The function calculates the value of flags according to wide_format and format. The value in flags can represent the bytecode b, c, i, j, k, o, w (introduced in the previous introduction to format) and bytecode operations The size of the number (whether the operand is 2 bytes or 4 bytes). Some variables beginning with _fmt have been defined in the enumeration class, as follows:

// Flag bits derived from format strings, can_trap, can_rewrite, etc.:
enum Flags {
// semantic flags:
_bc_can_trap      = 1<<0,     // bytecode execution can trap(卡住) or block
// 虚拟机内部定义的字节码指令都会含有这个标识
_bc_can_rewrite   = 1<<1,     // bytecode execution has an alternate(代替者) form

// format bits (determined only by the format string):
_fmt_has_c        = 1<<2,     // constant, such as sipush "bcc"
_fmt_has_j        = 1<<3,     // constant pool cache index, such as getfield "bjj"
_fmt_has_k        = 1<<4,     // constant pool index, such as ldc "bk"
_fmt_has_i        = 1<<5,     // local index, such as iload
_fmt_has_o        = 1<<6,     // offset, such as ifeq

_fmt_has_nbo      = 1<<7,     // contains native-order field(s)
_fmt_has_u2       = 1<<8,     // contains double-byte field(s)
_fmt_has_u4       = 1<<9,     // contains quad-byte field
_fmt_not_variable = 1<<10,    // not of variable length (simple or wide) 不可变长度的指令
_fmt_not_simple   = 1<<11,    // either wide or variable length 或者是可加wild的字节码指令,或者是可变长度的指令
_all_fmt_bits     = (_fmt_not_simple*2 - _fmt_has_c),

// ...
};

The corresponding relationship with format is as follows:

In this way, different values can be expressed through combination. Commonly used combinations are defined in the enumeration class as follows:

_fmt_b      = _fmt_not_variable,
_fmt_bc     = _fmt_b | _fmt_has_c,
_fmt_bi     = _fmt_b | _fmt_has_i,
_fmt_bkk    = _fmt_b | _fmt_has_k | _fmt_has_u2,
_fmt_bJJ    = _fmt_b | _fmt_has_j | _fmt_has_u2 | _fmt_has_nbo,
_fmt_bo2    = _fmt_b | _fmt_has_o | _fmt_has_u2,
_fmt_bo4    = _fmt_b | _fmt_has_o | _fmt_has_u4

For example, when the bytecode is bipush, the format is "bc", then the value of flags is _fmt_b | _fmt_has_c, and the format of the ldc bytecode is "bk", then the value of flags is _fmt_b | _fmt_has_k.


Chapter 10-Initialize the template table

In Chapter 9-Definition of Instructions 16196080a73cf5 we introduced the bytecode instructions and stored the information related to the bytecode instructions in the related array. You only need to pass Opcode to get the corresponding information from the related array. .

After calling bytecodes_init() in the init_globals() function to initialize the bytecode instructions, the interpreter_init() function is called to initialize the interpreter. The function will eventually be called to the TemplateInterpreter::initialize() function. The implementation of this function is as follows:

Source code location: /src/share/vm/interpreter/templateInterpreter.cpp

void TemplateInterpreter::initialize() {
  if (_code != NULL) 
       return;
   
  // 抽象解释器AbstractInterpreter的初始化,
  // AbstractInterpreter是基于汇编模型的解释器的共同基类,
  // 定义了解释器和解释器生成器的抽象接口
  AbstractInterpreter::initialize();
   
  // 模板表TemplateTable的初始化,模板表TemplateTable保存了各个字节码的模板
  TemplateTable::initialize();
   
  // generate interpreter
  {
     ResourceMark rm;
     int code_size = InterpreterCodeSize;
     // CodeCache的Stub队列StubQueue的初始化
     _code = new StubQueue(new InterpreterCodeletInterface, code_size, NULL,"Interpreter");
     //  实例化模板解释器生成器对象TemplateInterpreterGenerator
     InterpreterGenerator g(_code);
  }
   
  // 初始化字节分发表
  _active_table = _normal_table;
}

The initialization logic involved in this initialization function is more and more complicated. We divide the initialization into 4 parts:

(1) Initialization of the abstract interpreter AbstractInterpreter. AbstractInterpreter is a common base class of the interpreter based on the assembly model, and defines the abstract interface of the interpreter and the interpreter generator.
(2) Initialization of the template table TemplateTable, the template table TemplateTable saves the template of each bytecode (object code generation function and parameters);
(3) Initialization of StubQueue StubQueue of CodeCache;
(4) Initialization of InterpreterGenerator.

Among them, some counts are involved when the abstract interpreter is initialized. These counts are mainly related to compilation and execution, so I won't introduce them here for the time being, and we will introduce them when we introduce compilation and execution later.

Below we respectively introduce the initialization process of the above three parts, this article only introduces the initialization process of the template table.

The implementation of function TemplateTable::initialize() is as follows:

The template table TemplateTable stores the execution templates (object code generation functions and parameters) of each bytecode. The definition of the bytecode has been introduced in detail in the previous article. The execution template defines how each bytecode is Executed in interpretation mode. The implementation of the initialize() function is as follows:

Source code location: /src/share/vm/interpreter/templateInterpreter.cpp

void TemplateTable::initialize() {
  if (_is_initialized) return;
  
  
  _bs = Universe::heap()->barrier_set();
  
  // For better readability
  const char _    = ' ';
  const int  ____ = 0;
  const int  ubcp = 1 << Template::uses_bcp_bit;
  const int  disp = 1 << Template::does_dispatch_bit;
  const int  clvm = 1 << Template::calls_vm_bit;
  const int  iswd = 1 << Template::wide_bit;
  //                                    interpr. templates
  // Java spec bytecodes                ubcp|disp|clvm|iswd  in    out   generator             argument
  def(Bytecodes::_nop                 , ____|____|____|____, vtos, vtos, nop                 ,  _           );
  def(Bytecodes::_aconst_null         , ____|____|____|____, vtos, atos, aconst_null         ,  _           );
  def(Bytecodes::_iconst_m1           , ____|____|____|____, vtos, itos, iconst              , -1           );
  def(Bytecodes::_iconst_0            , ____|____|____|____, vtos, itos, iconst              ,  0           );
  // ...
  def(Bytecodes::_tableswitch         , ubcp|disp|____|____, itos, vtos, tableswitch         ,  _           );
  def(Bytecodes::_lookupswitch        , ubcp|disp|____|____, itos, itos, lookupswitch        ,  _           );
  def(Bytecodes::_ireturn             , ____|disp|clvm|____, itos, itos, _return             , itos         );
  def(Bytecodes::_lreturn             , ____|disp|clvm|____, ltos, ltos, _return             , ltos         );
  def(Bytecodes::_freturn             , ____|disp|clvm|____, ftos, ftos, _return             , ftos         );
  def(Bytecodes::_dreturn             , ____|disp|clvm|____, dtos, dtos, _return             , dtos         );
  def(Bytecodes::_areturn             , ____|disp|clvm|____, atos, atos, _return             , atos         );
  def(Bytecodes::_return              , ____|disp|clvm|____, vtos, vtos, _return             , vtos         );
  def(Bytecodes::_getstatic           , ubcp|____|clvm|____, vtos, vtos, getstatic           , f1_byte      );
  def(Bytecodes::_putstatic           , ubcp|____|clvm|____, vtos, vtos, putstatic           , f2_byte      );
  def(Bytecodes::_getfield            , ubcp|____|clvm|____, vtos, vtos, getfield            , f1_byte      );
  def(Bytecodes::_putfield            , ubcp|____|clvm|____, vtos, vtos, putfield            , f2_byte      );
  def(Bytecodes::_invokevirtual       , ubcp|disp|clvm|____, vtos, vtos, invokevirtual       , f2_byte      );
  def(Bytecodes::_invokespecial       , ubcp|disp|clvm|____, vtos, vtos, invokespecial       , f1_byte      );
  def(Bytecodes::_invokestatic        , ubcp|disp|clvm|____, vtos, vtos, invokestatic        , f1_byte      );
  def(Bytecodes::_invokeinterface     , ubcp|disp|clvm|____, vtos, vtos, invokeinterface     , f1_byte      );
  def(Bytecodes::_invokedynamic       , ubcp|disp|clvm|____, vtos, vtos, invokedynamic       , f1_byte      );
  def(Bytecodes::_new                 , ubcp|____|clvm|____, vtos, atos, _new                ,  _           );
  def(Bytecodes::_newarray            , ubcp|____|clvm|____, itos, atos, newarray            ,  _           );
  def(Bytecodes::_anewarray           , ubcp|____|clvm|____, itos, atos, anewarray           ,  _           );
  def(Bytecodes::_arraylength         , ____|____|____|____, atos, itos, arraylength         ,  _           );
  def(Bytecodes::_athrow              , ____|disp|____|____, atos, vtos, athrow              ,  _           );
  def(Bytecodes::_checkcast           , ubcp|____|clvm|____, atos, atos, checkcast           ,  _           );
  def(Bytecodes::_instanceof          , ubcp|____|clvm|____, atos, itos, instanceof          ,  _           );
  def(Bytecodes::_monitorenter        , ____|disp|clvm|____, atos, vtos, monitorenter        ,  _           );
  def(Bytecodes::_monitorexit         , ____|____|clvm|____, atos, vtos, monitorexit         ,  _           );
  def(Bytecodes::_wide                , ubcp|disp|____|____, vtos, vtos, wide                ,  _           );
  def(Bytecodes::_multianewarray      , ubcp|____|clvm|____, vtos, atos, multianewarray      ,  _           );
  def(Bytecodes::_ifnull              , ubcp|____|clvm|____, atos, vtos, if_nullcmp          , equal        );
  def(Bytecodes::_ifnonnull           , ubcp|____|clvm|____, atos, vtos, if_nullcmp          , not_equal    );
  def(Bytecodes::_goto_w              , ubcp|____|clvm|____, vtos, vtos, goto_w              ,  _           );
  def(Bytecodes::_jsr_w               , ubcp|____|____|____, vtos, vtos, jsr_w               ,  _           );
  
  // wide Java spec bytecodes
  def(Bytecodes::_iload               , ubcp|____|____|iswd, vtos, itos, wide_iload          ,  _           );
  def(Bytecodes::_lload               , ubcp|____|____|iswd, vtos, ltos, wide_lload          ,  _           );
  // ...
  
  // JVM bytecodes
  // ...
  
  def(Bytecodes::_shouldnotreachhere   , ____|____|____|____, vtos, vtos, shouldnotreachhere ,  _           );
}

The initialization call def() of TemplateTable saves all bytecode target code generation functions and parameters in _template_table or _template_table_wide (wide instruction) template array. In addition to the bytecode instructions defined by the virtual machine specification itself, the HotSpot virtual machine also defines some bytecode instructions. These instructions are used to assist the virtual machine to implement better functions. For example, Bytecodes::_return_register_finalizer has been introduced before. The registration function of finalizer type objects can be better realized.

We only give the template definitions of part of the bytecode instructions. The def() function is called to define the template of each bytecode instruction. The parameters passed are the focus of our attention:

(1) Point out which bytecode instruction defines the template
(2) ubcp|disp|clvm|iswd, this is a combined number. The specific number is closely related to the enumeration class defined in the Template. The constants defined in the enumeration class are as follows:

enum Flags {
 uses_bcp_bit,        // set if template needs the bcp pointing to bytecode
 does_dispatch_bit,   // set if template dispatches on its own 就其本身而言; 靠自己
 calls_vm_bit,        // set if template calls the vm
 wide_bit             // set if template belongs to a wide instruction
};

These parameters are explained in detail below, as follows:

  • uses_bcp_bit, the flag needs to use byte code pointer (byte code pointer, the value is byte code base address + byte code offset). Indicates whether a pointer to a bytecode instruction needs to be used in the generated template code. In fact, it means whether it is necessary to read the operand of the bytecode instruction. Therefore, most instructions containing operands require bcp, but some are not. Required, such as monitorenter and monitorexit. These operands are all in the expression stack. The top of the expression stack is its operand. It does not need to be read from the Class file, so bcp is not needed;
  • does_dispatch_bit, the flag indicates whether it contains control flow forwarding logic, such as tableswitch, lookupswitch, invokevirtual, ireturn and other bytecode instructions, it needs to perform control flow forwarding;
  • calls_vm_bit, indicates whether the JVM function needs to be called. When calling the TemplateTable::call_VM() function, it will be judged whether there is this flag. Usually, when the method calls the JVM function, the call is completed indirectly by calling the TemplateTable::call_VM() function. JVM functions are functions written in C++.
  • wide_bit, whether the flag is a wide instruction (use additional bytes to expand the global variable index)

(3) _tos_in and _tos_out: Represents the TosState (the data type of the top element of the operand stack, TopOfStack) before and after the template is executed, which is used to check whether the output and input types declared by the template are consistent with this function to ensure the stack The top element is used correctly).

The values of _tos_in and _tos_out must be constants defined in the enumeration class, as follows:

enum TosState {         // describes the tos cache contents
  btos = 0,             // byte, bool tos cached
  ctos = 1,             // char tos cached
  stos = 2,             // short tos cached
  itos = 3,             // int tos cached
  ltos = 4,             // long tos cached
  ftos = 5,             // float tos cached
  dtos = 6,             // double tos cached
  atos = 7,             // object cached
  vtos = 8,             // tos not cached
  number_of_states,
  ilgl                  // illegal state: should not occur
};

For example, the iload instruction, the state of the top of the stack before execution is vtos, which means that the data on the top of the stack will not be used, so if the program caches the result of the last execution in the register in order to improve the execution efficiency, then it should be before the iload instruction is executed. Push the value of this register onto the top of the stack. The top state of the stack after the iload instruction is executed is itos, because iload pushes an integer into the operand stack, so the top state of the stack at this time is of type int, then this value can be cached in the register, assuming the next instruction is ireturn , Then the state before and after the top of the stack are itos and itos respectively, then the int type cached in the register can be returned directly, without any operations related to the operand stack.

(4) _gen and _arg: _gen represents the template generator (function pointer), this function will generate the corresponding execution logic for the corresponding bytecode; _arg represents the parameters passed by the template generator. Calling function pointers will generate different machine instructions for each bytecode instruction according to its semantics on different platforms. Here we only discuss the implementation of 64-bit machine instructions under the x86 architecture. Since machine instructions are difficult to read, we will follow up Only read assembly instructions decompiled by machine instructions.

Let's take a look at the Template::def() function called in the TemplateTable::initialize() function, as follows:

void TemplateTable::def(
  Bytecodes::Code code,    // 字节码指令
  int flags,               // 标志位
  TosState in,             // 模板执行前TosState
  TosState out,            // 模板执行后TosState
  void (*gen)(int arg),    // 模板生成器,是模板的核心组件
  int arg 
) {
  // 表示是否需要bcp指针
  const int ubcp = 1 << Template::uses_bcp_bit;     
  // 表示是否在模板范围内进行转发
  const int disp = 1 << Template::does_dispatch_bit; 
  // 表示是否需要调用JVM函数
  const int clvm = 1 << Template::calls_vm_bit;   
  // 表示是否为wide指令   
  const int iswd = 1 << Template::wide_bit;          
  
  // 如果是允许在字节码指令前加wide字节码指令的一些指令,那么
  // 会使用_template_table_wild模板数组进行字节码转发,否则
  // 使用_template_table模板数组进行转发
  bool is_wide = (flags & iswd) != 0;
  Template* t = is_wide ? template_for_wide(code) : template_for(code);
  
  // 调用模板表t的initialize()方法初始化模板表
  t->initialize(flags, in, out, gen, arg); 
}

The template table consists of an array of template tables and a set of generators:

The template array has _template_table and _template_table_wild, the subscript of the array is the bytecode Opcode, and the value is Template. It is defined as follows:

Template  TemplateTable::_template_table[Bytecodes::number_of_codes];
Template TemplateTable::_template_table_wide[Bytecodes::number_of_codes];

The value of the template array is Template. This Template class defines the _flags attribute that saves the flags flags, _tos_in and _tos_out that save the stack top cache state in and out, and _gen and _tos_out that save the generator gen and parameter arg _arg, so after calling t->initialize(), it actually initializes the variables in the Template. The implementation of the initialize() function is as follows:

void Template::initialize(
 int flags, 
 TosState tos_in, 
 TosState tos_out, 
 generator gen, 
 int arg
) {
  _flags   = flags;
  _tos_in  = tos_in;
  _tos_out = tos_out;
  _gen     = gen;
  _arg     = arg;
}

But here does not call the gen function to generate the corresponding assembly code, but save the various information passed to the def() function to the Template instance, in the TemplateTable::def() function, through template_for() or template_for_wild() After the function obtains the corresponding Template instance in the array, it will call the Template::initialize() function to save the information to the corresponding Template instance, so that the corresponding Template instance can be obtained from the array according to the bytecode index, and then obtained Information about the bytecode instruction template.

Although gen will not be called here to generate the machine instructions corresponding to the bytecode instructions, we can see in advance how the pointer function gen generates the corresponding machine instructions for a bytecode instruction.

Take a look at the call to the def() function in the TemplateTable::initialize() function. Take _iinc (increase the value of the corresponding slot bit in the local variable table by 1) as an example, the call is as follows:

def(
 Bytecodes::_iinc,     // 字节码指令
 ubcp|____|clvm|____,  // 标志
 vtos,                 // 模板执行前的TosState
 vtos,                 // 模板执行后的TosState
 iinc ,                // 模板生成器,是一个iinc()函数的指针
 _                     // 不需要模板生成器参数
); 

Setting the flags uses_bcp_bit and calls_vm_bit indicates that the generator of the iinc instruction needs to use the bcp pointer function at_bcp(), and needs to call the JVM function. The definition of the generator is given below:

Source code location: /hotspot/src/cpu/x86/vm/templateTable_x86_64.cpp

void TemplateTable::iinc() {
  transition(vtos, vtos);
  __ load_signed_byte(rdx, at_bcp(2)); // get constant
  locals_index(rbx);
  __ addl(iaddress(rbx), rdx);
}

Since the iinc instruction only involves the operation of the local variable table, it does not affect the operand stack, nor does it need to use the value at the top of the operand stack, so the state before and after the top of the stack is vtos and vtos, and the transition() function is called. Just verify that the state of the top of the stack is correct.

The bytecode format of the iinc instruction is as follows:

iinc
index // 局部变量表索引值
const // 将局部变量表索引值对应的slot值加const

The opcode iinc occupies one byte, while index and const occupies one byte each. Use the at_bcp() function to get the operand of the iinc instruction, 2 means the offset is 2 bytes, so the const will be taken out and stored in rdx. Call the locals_index() function to retrieve the index, and locals_index() is the JVM function. The final assembly is as follows:

// %r13存储的是指向字节码的指针,偏移
// 2字节后取出const存储到%edx
movsbl 0x2(%r13),%edx
// 取出index存储到%ebx
movzbl 0x1(%r13),%ebx
neg    %rbx
// %r14指向本地变量表的首地址,将%edx加到
// %r14+%rbx*8指向的内存所存储的值上
// 之所以要对%rbx执行neg进行符号反转,
// 是因为在Linux内核的操作系统上,
// 栈是向低地址方向生长的
add    %edx,(%r14,%rbx,8)

The explanation of the comment is very clear, so I won't introduce too much here.


Part 11-Know Stub and StubQueue

In Chapter 10-Initialization Template Table we introduced the TemplateInterpreter::initialize() function. In this function, the TemplateTable::initialize() function is called to initialize the template table, and then the new keyword is used to initialize the definition in the AbstractInterpreter class. The _code static properties are as follows:

static StubQueue* _code;

Since TemplateInterpreter inherits from AbstractInterpreter, the _code attribute initialized in TemplateInterpreter is actually the _code attribute defined in the AbstractInterpreter class.

The code to initialize the _code variable in the initialize() function is as follows:

// InterpreterCodeSize是在平台相关
// 的templateInterpreter_x86.hpp中
// 定义的,64位下是256 * 1024
int code_size = InterpreterCodeSize;
_code = new StubQueue(
                new InterpreterCodeletInterface, 
                code_size, 
                NULL,
                "Interpreter");

StubQueue is a Stub queue used to store the generated native code. Each element of the queue corresponds to an InterpreterCodelet object. The InterpreterCodelet object inherits from the abstract base class Stub and contains the native code corresponding to bytecode and some debugging and output information. Below we introduce the StubQueue class and related classes Stub, InterpreterCodelet and CodeletMark classes.

1. InterpreterCodelet and Stub class

The definition of the Stub class is as follows:

class Stub VALUE_OBJ_CLASS_SPEC { ... };

The InterpreterCodelet class inherits from the Stub class, and the specific definition is as follows:

class InterpreterCodelet: public Stub {
 private:
  int                _size;         // the size in bytes
  const char*        _description;  // a description of the codelet, for debugging & printing
  Bytecodes::Code    _bytecode;     // associated bytecode if any
 
 public:
  // Code info
  address code_begin() const  {
     return (address)this + round_to(sizeof(InterpreterCodelet), CodeEntryAlignment);
  }
  address code_end() const {
     return (address)this + size();
  }
 
  int size() const {
     return _size;
  }
  // ...
  int code_size() const { 
     return code_end() - code_begin();  
  }
  // ...
};

InterpreterCodelet instances are stored in StubQueue. Each InterpreterCodelet instance represents a piece of machine instruction (including the machine instruction fragment corresponding to the bytecode and some debugging and output information). For example, each bytecode has an InterpreterCodelet instance, so it is explained When executing, if a bytecode is to be executed, the machine instruction fragment represented by the InterpreterCodelet instance is executed.

Three attributes and some functions are defined in the class, and the memory layout is shown in the figure below.

After aligning to the CodeEntryAlignment, the InterpreterCodelet is the generated target code.

2. StubQueue class

StubQueue is a Stub queue used to store the generated local machine instruction fragments. Each element of the queue is an InterpreterCodelet instance.

The definition of the StubQueue class is as follows:

class StubQueue: public CHeapObj<mtCode> {
 private:
  StubInterface* _stub_interface;     // the interface prototype
  address        _stub_buffer;        // where all stubs are stored
 
  int            _buffer_size;       // the buffer size in bytes
  int            _buffer_limit;      // the (byte) index of the actual buffer limit (_buffer_limit <= _buffer_size)
 
  int            _queue_begin;       // the (byte) index of the first queue entry (word-aligned)
  int            _queue_end;         // the (byte) index of the first entry after the queue (word-aligned)
 
  int            _number_of_stubs;   // the number of buffered stubs
 
 
  bool is_contiguous() const {
      return _queue_begin <= _queue_end;
  }
  int index_of(Stub* s) const {
      int i = (address)s - _stub_buffer;
      return i;
  }
  Stub* stub_at(int i) const {
      return (Stub*)(_stub_buffer + i);
  }
  Stub* current_stub() const {
      return stub_at(_queue_end);
  }
 
  // ...
}

The constructor of this class is as follows:

StubQueue::StubQueue(
 StubInterface* stub_interface,  // InterpreterCodeletInterface对象
 int            buffer_size,     // 256*1024
 Mutex*         lock,
 const char*    name) : _mutex(lock)
{
  intptr_t     size = round_to(buffer_size, 2*BytesPerWord); // BytesPerWord的值为8
  BufferBlob*  blob = BufferBlob::create(name, size); // 在StubQueue中创建BufferBlob对象
 
  _stub_interface  = stub_interface;
 
  _buffer_size     = blob->content_size();
  _buffer_limit    = blob->content_size();
  _stub_buffer     = blob->content_begin();
 
  _queue_begin     = 0;
  _queue_end       = 0;
  _number_of_stubs = 0;
}

The stub_interface is used to save an instance of the InterpreterCodeletInterface type. The InterpreterCodeletInterface class defines the function for operating the Stub, avoiding the definition of virtual functions in the Stub. Each StubQueue has an InterpreterCodeletInterface, which can be used to manipulate each Stub instance stored in the StubQueue.

Call the BufferBlob::create() function to allocate memory for StubQueue. Here we need to remember that the memory used by StubQueue is allocated through BufferBlob, that is, BufferBlob may be a StubQueue in nature. Let's introduce the create() function in detail below.

BufferBlob* BufferBlob::create(const char* name, int buffer_size) {
  // ...
  BufferBlob*    blob = NULL;
  unsigned int   size = sizeof(BufferBlob);
 
  // align the size to CodeEntryAlignment
  size = align_code_offset(size);
  size += round_to(buffer_size, oopSize); // oopSize是一个指针的宽度,在64位上就是8
 
  {
     MutexLockerEx mu(CodeCache_lock, Mutex::_no_safepoint_check_flag);
     blob = new (size) BufferBlob(name, size);
  }
 
  return blob;
}

To allocate memory for BufferBlob through the new keyword, the new overload operator is as follows:

void* BufferBlob::operator new(size_t s, unsigned size, bool is_critical) throw() {
  void* p = CodeCache::allocate(size, is_critical);
  return p;
}

Allocate memory from codeCache. CodeCache uses local memory and has its own memory management method, which will be described in detail later.

The layout structure of StubQueue is shown in the figure below.

The InterpreterCodelet in the queue represents a small routine, such as the machine code corresponding to iconst_1, the machine code corresponding to invokedynamic, the code corresponding to exception handling, and the code corresponding to the method entry point. These codes are all InterpreterCodelets. The entire interpreter is composed of these small pieces of code routines, and each small piece of routine completes part of the interpreter's functions, thereby realizing the entire interpreter.


Part 12-Meet CodeletMark

InterpreterCodelet relies on CodeletMark to complete automatic creation and initialization. CodeletMark inherits from ResourceMark and allows automatic destruction. The main operation performed is to allocate memory and submit it according to the actual machine instruction fragments stored in the InterpreterCodelet. The definition of this class is as follows:

class CodeletMark: ResourceMark {
 private:
  InterpreterCodelet*           _clet; // InterpreterCodelet继承自Stub
  InterpreterMacroAssembler**   _masm;
  CodeBuffer                    _cb;
  
 public:
  // 构造函数
  CodeletMark(
     InterpreterMacroAssembler*&    masm,
     const char*                    description,
     Bytecodes::Code                bytecode = Bytecodes::_illegal):
      // AbstractInterpreter::code()获取的是StubQueue*类型的值,调用request()方法获取的
      // 是Stub*类型的值,调用的request()方法实现在vm/code/stubs.cpp文件中
      _clet( (InterpreterCodelet*)AbstractInterpreter::code()->request(codelet_size()) ),
      _cb(_clet->code_begin(), _clet->code_size()) 
  {
  
     // 初始化InterpreterCodelet中的_description和_bytecode属性
     _clet->initialize(description, bytecode);
  
     // InterpreterMacroAssembler->MacroAssembler->Assembler->AbstractAssembler
     // 通过传入的cb.insts属性的值来初始化AbstractAssembler的_code_section与_oop_recorder属性的值
     // create assembler for code generation
     masm  = new InterpreterMacroAssembler(&_cb); // 在构造函数中,初始化r13指向bcp、r14指向本地局部变量表
     _masm = &masm;
  }
  
  // ... 省略析构函数
};

Two tasks are mainly completed in the constructor:

(1) Initialize the variable _clet of type InterpreterCodelet. Assign values to the 3 attributes in the InterpreterCodelet instance;
(2) Create an InterpreterMacroAssembler instance and assign it to masm and _masm. This instance will write machine instructions to the InterpreterCodelet instance through CodeBuffer.

In the destructor, the destructor is usually called automatically at the end of the code block. The memory used by the InterpreterCodelet is submitted in the destructor and the value of the related variables is cleaned up.

1, CodeletMark constructor

The CodeletMark constructor will allocate memory for InterpreterCodelet from StubQueue and initialize related variables

When initializing the _clet variable, call the AbstractInterpreter::code() method to return the value of the _code attribute of the AbstractInterpreter class. This value has been initialized in the TemplateInterpreter::initialize() method before. Continue to call the request() method in the StubQueue class. What is passed is the size required to be allocated to store the code, which is obtained by calling the codelet_size() function, as follows:

int codelet_size() {
  // Request the whole code buffer (minus a little for alignment).
  // The commit call below trims it back for each codelet.
  int codelet_size = AbstractInterpreter::code()->available_space() - 2*K;
  
  return codelet_size;
}

It should be noted that when the InterpreterCodelet is created, almost all the available memory in the StubQueue will be allocated to this InterpreterCodelet instance, which will inevitably be a great waste, but we will follow the InterpreterCodelet instance in the destructor The instance size is committed to the memory, so don't worry about wasting this problem. The main reason for this is to store each InterpreterCodelet instance continuously in the memory. This has a very important application, which is to know whether the stack frame is an interpretation stack frame by simply judging by the pc, which will be described in detail later.

Allocate memory from StubQueue by calling the StubQueue::request() function. The implementation of the function is as follows:

Stub* StubQueue::request(int  requested_code_size) {
 
  Stub* s = current_stub();
 
  int x = stub_code_size_to_size(requested_code_size);
  int requested_size = round_to( x , CodeEntryAlignment);  // CodeEntryAlignment=32
 
  // 比较需要为新的InterpreterCodelet分配的内存和可用内存的大小情况
  if (requested_size <= available_space()) {
    if (is_contiguous()) { // 判断_queue_begin小于等于_queue_end时,函数返回true
      // Queue: |...|XXXXXXX|.............|
      //        ^0  ^begin  ^end          ^size = limit
      assert(_buffer_limit == _buffer_size, "buffer must be fully usable");
      if (_queue_end + requested_size <= _buffer_size) {
        // code fits in(适应) at the end => nothing to do
        CodeStrings  strings;
        stub_initialize(s, requested_size, strings);
        return s; // 如果够的话就直接返回
      } else {
        // stub doesn't fit in at the queue end
        // => reduce buffer limit & wrap around
        assert(!is_empty(), "just checkin'");
        _buffer_limit = _queue_end;
        _queue_end = 0;
      }
    }
  }
 
  // ...
 
  return NULL;
}

Through the above function, we can clearly see the logic of how to allocate InterpreterCodelet memory from StubQueue.

First calculate the memory size that needs to be allocated from StubQueue this time, and the related functions to call are as follows:

The implementation of the called stub_code_size_to_size() function is as follows:

// StubQueue类中定义的函数
int stub_code_size_to_size(int code_size) const {  
  return _stub_interface->code_size_to_size(code_size);
}
 
// InterpreterCodeletInterface类中定义的函数
virtual int  code_size_to_size(int code_size) const { 
    return InterpreterCodelet::code_size_to_size(code_size);
}
 
// InterpreterCodelet类中定义的函数
static  int code_size_to_size(int code_size) { 
  // CodeEntryAlignment = 32
  // sizeof(InterpreterCodelet)  = 32
  return round_to(sizeof(InterpreterCodelet), CodeEntryAlignment) + code_size;
}

Through the above method of allocating memory size, the memory structure is as follows:

After calculating the memory size that needs to be allocated from the StubQueue in the StubQueue::request() function, the memory allocation is performed below. The StubQueue::request() function only gives the most general case, that is, it is assumed that all InterpreterCodelet instances are allocated consecutively from the _stub_buffer address of StubQueue. The is_contiguous() function is used to determine whether the area is continuous, and the implementation is as follows:

bool is_contiguous() const {
  return _queue_begin <= _queue_end;
} 

The available_space() function is called to get the size of the StubQueue available area, which is implemented as follows:

// StubQueue类中定义的方法
int available_space() const {
 int d = _queue_begin - _queue_end - 1;
 return d < 0 ? d + _buffer_size : d;
} 

The size obtained after calling the above function is the yellow area in the figure below.

Continue to look at the StubQueue::request() function. When the memory size required by this InterpreterCodelet instance can be met, the stub_initialize() function will be called. The implementation of this function is as follows:

// 下面都是通过stubInterface来操作Stub的
void  stub_initialize(Stub* s, int size,CodeStrings& strings)    {
  // 通过_stub_interface来操作Stub,会调用s的initialize()函数
  _stub_interface->initialize(s, size, strings);
}
 
// 定义在InterpreterCodeletInterface类中函数
virtual void  initialize(Stub* self, int size,CodeStrings& strings){
  cast(self)->initialize(size, strings);
}  
 
// 定义在InterpreterCodelet类中的函数
void initialize(int size,CodeStrings& strings) {
  _size = size;
}

We operate Stub through the functions defined in the StubInterface class. As for why we need to operate Stub through StubInterface, it is because there are many Stub instances, so in order to avoid writing virtual functions in Stub (in C++, a pointer needs to be allocated to classes containing virtual functions. The space points to the virtual function table) to waste memory space.

The above three functions only accomplish one thing in the end, which is to record the memory size allocated this time in the _size attribute of the InterpreterCodelet. As mentioned earlier in the introduction of the function codelet_size(), this value usually has a lot of space left after the machine instruction fragment is stored, but don’t worry, the destructor described below will be based on the machine instructions actually generated in the InterpreterCodelet instance. The size updates the value of this attribute.

2. CodeletMark destructor

The implementation of the destructor is as follows:

// 析构函数
~CodeletMark() {
   // 对齐InterpreterCodelet
   (*_masm)->align(wordSize);
  
   // 确保生成的所有机器指令片段都存储到了InterpreterCodelet实例中
   (*_masm)->flush();
  
   // 更新InterpreterCodelet实例的相关属性值
   AbstractInterpreter::code()->commit((*_masm)->code()->pure_insts_size(), (*_masm)->code()->strings());
  
   // 设置_masm,这样就无法通过这个值继续向此InterpreterCodelet实例中生成机器指令了
   *_masm = NULL;
}

Call the AbstractInterpreter::code() function to get the StubQueue. Call (*_masm)->code()->pure_insts_size() to get the actual memory size required by the machine instruction fragment of the InterpreterCodelet instance.

The implementation of the StubQueue::commit() function is as follows:

void StubQueue::commit(int committed_code_size, CodeStrings& strings) {
  int x = stub_code_size_to_size(committed_code_size);
  int committed_size = round_to(x, CodeEntryAlignment);
 
  Stub* s = current_stub();
  assert(committed_size <= stub_size(s), "committed size must not exceed requested size");
 
  stub_initialize(s, committed_size, strings);
  _queue_end += committed_size;
  _number_of_stubs++;
}

Call the stub_initialize() function to record the actual memory size of the machine instruction fragment in this instance through the _size attribute of the InterpreterCodelet instance. At the same time, update the values of the _queue_end and _number_of_stubs attributes of StubQueue, so that memory can be allocated for the next InterpreterCodelet instance.


Chapter 13-Store machine instruction fragments through InterpreterCodelet

In the TemplateInterpreterGenerator::generate_all() function, many bytecode instructions and some virtual machine-assisted execution of machine instruction fragments are generated. For example, the implementation of generating a null pointer exception throw entry is as follows:

{
    CodeletMark cm(_masm, "throw exception entrypoints");
    // ...
    Interpreter::_throw_NullPointerException_entry = generate_exception_handler("java/lang/NullPointerException",NULL);
    // ...
}

Call the generate_exception_handler() function to generate a code snippet that throws a null pointer.

address generate_exception_handler(const char* name, const char* message) {
    return generate_exception_handler_common(name, message, false);
}

The implementation of the called generate_exception_handler_common() function is as follows:

address TemplateInterpreterGenerator::generate_exception_handler_common(
const char* name, 
const char* message, 
bool pass_oop
) {
 
  assert(!pass_oop || message == NULL, "either oop or message but not both");
  address entry = __ pc();
  if (pass_oop) {
    // object is at TOS
    __ pop(c_rarg2);
  }
 
  // expression stack must be empty before entering the VM if an
  // exception happened
  __ empty_expression_stack();
 
  // setup parameters
  __ lea(c_rarg1, ExternalAddress((address)name));
 
  if (pass_oop) {
    __ call_VM(rax,
               CAST_FROM_FN_PTR(address,InterpreterRuntime::create_klass_exception),
               c_rarg1,c_rarg2);
  } else {
    // kind of lame ExternalAddress can't take NULL because
    // external_word_Relocation will assert.
    if (message != NULL) {
      __ lea(c_rarg2, ExternalAddress((address)message));
    } else {
      __ movptr(c_rarg2, NULL_WORD);
    }
    __ call_VM(rax,
               CAST_FROM_FN_PTR(address, InterpreterRuntime::create_exception),
               c_rarg1, c_rarg2);
  }
 
  // throw exception
  __ jump(ExternalAddress(Interpreter::throw_exception_entry()));
 
  return entry;
}

The generated assembly code is as follows:

0x00007fffe10101cb: mov    -0x40(%rbp),%rsp
0x00007fffe10101cf: movq   $0x0,-0x10(%rbp)
0x00007fffe10101d7: movabs $0x7ffff6e09878,%rsi
0x00007fffe10101e1: movabs $0x0,%rdx
0x00007fffe10101eb: callq  0x00007fffe10101f5
0x00007fffe10101f0: jmpq   0x00007fffe1010288
0x00007fffe10101f5: lea    0x8(%rsp),%rax
0x00007fffe10101fa: mov    %r13,-0x38(%rbp)
0x00007fffe10101fe: mov    %r15,%rdi
0x00007fffe1010201: mov    %rbp,0x200(%r15)
0x00007fffe1010208: mov    %rax,0x1f0(%r15)
0x00007fffe101020f: test   $0xf,%esp
0x00007fffe1010215: je     0x00007fffe101022d
0x00007fffe101021b: sub    $0x8,%rsp
0x00007fffe101021f: callq  0x00007ffff66b3fbc
0x00007fffe1010224: add    $0x8,%rsp
0x00007fffe1010228: jmpq   0x00007fffe1010232
0x00007fffe101022d: callq  0x00007ffff66b3fbc
0x00007fffe1010232: movabs $0x0,%r10
0x00007fffe101023c: mov    %r10,0x1f0(%r15)
0x00007fffe1010243: movabs $0x0,%r10
0x00007fffe101024d: mov    %r10,0x200(%r15)
0x00007fffe1010254: cmpq   $0x0,0x8(%r15)
0x00007fffe101025c: je     0x00007fffe1010267
0x00007fffe1010262: jmpq   0x00007fffe1000420
0x00007fffe1010267: mov    0x250(%r15),%rax
0x00007fffe101026e: movabs $0x0,%r10
0x00007fffe1010278: mov    %r10,0x250(%r15)
0x00007fffe101027f: mov    -0x38(%rbp),%r13
0x00007fffe1010283: mov    -0x30(%rbp),%r14
0x00007fffe1010287: retq   
0x00007fffe1010288: jmpq   0x00007fffe100f3d3

The point here is not to understand the logic of the TemplateInterpreterGenerator::generate_exception_handler_common() function and the generated assembly code, but to know the application of CodeletMark and how the machine instructions generated by the generate_exception_handler_common() function are written into the InterpreterCodelet instance. The InterpreterCodelet and CodeBuffer classes have been introduced before, as follows:

The memory area storing machine instruction fragments of the InterpreterCodelet instance is operated by CodeBuffer, and the code section (CodeSection) in CodeBuffer is assigned to AbstractAssembler::_code_section. In this way, we can write machine instructions to the InterpreterCodelet instance through the _code_section attribute.

The _masm parameter passed to CodeletMark is defined in the AbstractInterpreterGenerator class, as follows:

class AbstractInterpreterGenerator: public StackObj {
   protected:
      InterpreterMacroAssembler* _masm;
      // ...
}

The __ in the generate_exception_handler_common() function is a macro defined as follows:

#define __ _masm->

This is actually calling the relevant functions in the InterpreterMacroAssembler class to write machine instructions, for example

__ pop(c_rarg2);

The pop() function called is as follows:

// 定义在InterpreterMacroAssembler中
void pop(Register r ) {
  ((MacroAssembler*)this)->pop(r);
}
 
// 定义在Assembler类中
void Assembler::pop(Register dst) {
  int encode = prefix_and_encode(dst->encoding());
  emit_int8(0x58 | encode);
}
 
// 定义在AbstractAssembler类中
void emit_int8(   int8_t  x) { 
   code_section()->emit_int8(   x); 
}

The code_section() function gets the value of the _code_section attribute of AbstractAssembler.


Chapter 14-Generate important routines

The TemplateInterpreter::initialize() function was introduced before. In this function, the template table and the StubQueue instance are initialized, and the InterpreterGenerator instance is created in the following way:

InterpreterGenerator g(_code);

The generate_all() function is called when creating an InterpreterGenerator instance, as follows:

InterpreterGenerator::InterpreterGenerator(StubQueue* code)
  : TemplateInterpreterGenerator(code) {
   generate_all(); 
}

Generate various routines (machine instruction fragments) in the generate_all() function and store them in the Interpretercodelet instance. In HotSpot VM, there are not only routines corresponding to the bytecode, but also many routines that assist the virtual machine during runtime, such as the entry_point routines for the common method entry introduced earlier, the routines for handling exceptions, and so on. These routines will be stored in StubQueue, as shown in the following figure.

Some important routines generated are shown in the following table.

Among them, the entry of non-native methods, the entry of local methods, and the entry of bytecode are more important, and they are also the key content we will introduce later. This article introduces the entry of non-native methods and the entry of bytecode. The entry of local methods will be introduced in detail when introducing local methods, but I will not introduce more here.

1. Entrance of non-local methods

As we mentioned in the previous introduction of creating Java stack frames for non-local common methods, the main non-local method entries are as follows:

enum MethodKind {
    zerolocals,  // 普通的方法             
    zerolocals_synchronized,  // 普通的同步方法         
    ...
}

In the generate_all() function, the entry logic to generate the ordinary method and the ordinary synchronization method is as follows:

{
 CodeletMark cm(_masm, "method entry point (kind = " "zerolocals" ")");
 Interpreter::_entry_table[Interpreter::zerolocals] = generate_method_entry(Interpreter::zerolocals);
}
{
 CodeletMark cm(_masm, "method entry point (kind = " "zerolocals_synchronized" ")");
 Interpreter::_entry_table[Interpreter::zerolocals_synchronized] = generate_method_entry(Interpreter::zerolocals_synchronized);
}

The called generate_method_entry() function has been introduced in detail in Chapter 6, and will eventually generate a routine to create a Java stack frame, and store the first address of the routine in the Interpreter::_entry_table array.

The stack frame establishment and special logic processing of the synchronization method will be introduced in detail when introducing the lock-related knowledge, and I will not introduce too much here.

In addition to ordinary methods, some special entry addresses are also generated for some methods, such as routines generated for java.lang.Math.sin(), java.lang.Math.cos() and other methods. If you are interested, you can study it yourself, and I won't introduce it in detail here.

2. Bytecode entry

In the generate_all() function, the set_entry_points_for_all_bytes() function is called. This function generates routines for all defined bytecodes and saves the entries through the corresponding attributes. These entries point to the first address of the routine. The implementation of set_entry_points_for_all_bytes() function is as follows:

void TemplateInterpreterGenerator::set_entry_points_for_all_bytes() {
  for (int i = 0; i < DispatchTable::length; i++) {
     Bytecodes::Code code = (Bytecodes::Code)i;
     if (Bytecodes::is_defined(code)) {
         set_entry_points(code);
     } else {
         set_unimplemented(i);
     }
  }
}

When code is a bytecode instruction defined in the Java virtual machine specification, call the set_entry_points() function, this function takes out the template template corresponding to the bytecode instruction and calls the set_short_enrty_points() function for processing, and saves the entry address in the forwarding table (DispatchTable) _normal_table or _wentry_table (using the wide command). Template template has been introduced before, bytecode instructions will correspond to a Template template, and the template saves the information needed in the bytecode instruction to generate the corresponding code routine.

The implementation of the set_entry_points() function is as follows:

void TemplateInterpreterGenerator::set_entry_points(Bytecodes::Code code) {
  CodeletMark cm(_masm, Bytecodes::name(code), code);
 
  address bep = _illegal_bytecode_sequence;
  address cep = _illegal_bytecode_sequence;
  address sep = _illegal_bytecode_sequence;
  address aep = _illegal_bytecode_sequence;
  address iep = _illegal_bytecode_sequence;
  address lep = _illegal_bytecode_sequence;
  address fep = _illegal_bytecode_sequence;
  address dep = _illegal_bytecode_sequence;
  address vep = _unimplemented_bytecode;
  address wep = _unimplemented_bytecode;
 
  // 处理非wide指令,注意指的是那些不能在前面加wide指令的字节码指令
  if (Bytecodes::is_defined(code)) {
     Template* t = TemplateTable::template_for(code);
     set_short_entry_points(t, bep, cep, sep, aep, iep, lep, fep, dep, vep);
  }
 
  // 处理wide指令,注意指的是那些能在前面加wide指令的字节码指令
  if (Bytecodes::wide_is_defined(code)) {
     Template* t = TemplateTable::template_for_wide(code);
     set_wide_entry_point(t, wep);
  }
 
  // 当为非wide指令时,共有9个入口,当为wide指令时,只有一个入口
  EntryPoint  entry(bep, cep, sep, aep, iep, lep, fep, dep, vep);
  Interpreter::_normal_table.set_entry(code, entry);
  Interpreter::_wentry_point[code] = wep;
}

Note that a variable cm is created when the function is declared at the beginning. At this time, the CodeletMark constructor will be called to create an InterpreterCodelet instance storing machine fragments in the StubQueue, so the machine instructions generated by functions such as TemplateInterpreterGenerator::set_short_entry_points() will be written to this In the instance. When the function is executed, the CodeletMark destructor will submit the used memory and reset the relevant attribute values.

The next step is to assign initial values to variables that represent the state of Top-of-Stack Caching (TOSCA, or Tos for short). The _illegal_bytecode_sequence and _unimplemented_bytecode variables also point to the entry addresses of specific routines. These examples The process is generated in the generate_all() function. If you are interested, you can study how these routines deal with illegal bytecodes.

When calling the set_short_entry_points() function, you need to pass in the top of the stack cache state, that is, the result generated by the previous bytecode may be stored in the register when the previous bytecode is executed. The main purpose of using the stack-top cache is to improve the efficiency of interpretation and execution. The HotSpot VM defines 9 TosStates, which are represented by enumerated constants, as follows:

enum TosState {      // describes the tos cache contents
  btos = 0,          // byte, bool tos cached
  ctos = 1,          // char tos cached
  stos = 2,          // short tos cached
  itos = 3,          // int tos cached
  ltos = 4,          // long tos cached
  ftos = 5,          // float tos cached
  dtos = 6,          // double tos cached
  atos = 7,          // object cached
  vtos = 8,          // tos not cached
  number_of_states,
  ilgl               // illegal state: should not occur
};

Taking non-wide instructions as an example, bep (byte entry point), cep, sep, aep, iep, lep, fep, dep, vep respectively indicate that the state of the top element of the stack before the instruction is executed is byte/boolean, char, short, array /reference (object reference), int, long, float, double, void type entry address. For example, if iconst_0 means to push a constant 0 into the stack, then the bytecode instruction template has the following definition:

def(Bytecodes::_iconst_0 , ____|____|____|____, vtos, itos, iconst,0);

The third parameter specifies tos_in, the fourth parameter is tos_out, tos_in and tos_out are the TosState before and after the instruction is executed. In other words, there is no n


HeapDump性能社区
442 声望693 粉丝

有性能问题,上HeapDump性能社区