Go function calling convention

This article aims to discuss a problem in Why can Go functions support multi-parameter return, but C/C++ and Java can’t? This actually involves a problem called function calling convention.

Calling convention

In the program code, functions provide the smallest functional unit, and program execution is actually the process of calling each other between functions. When calling, the function caller and the callee must abide by a certain convention, and their understanding must be consistent. This convention is called the function calling convention.

Function calling conventions are often specified by the compiler. This article mainly focuses on two points:

Are function parameters (input and output parameters) passed through the stack or registers?
If it is passed through the stack, is it from left to right, or from right to left?

Stack

The stack is one of the most important concepts in modern computer programs. Without a stack, there are no functions and no local variables. The stack saves the maintenance information needed for a function call, which is often referred to as a stack frame (Stack Frame) or an activity record (Activate Record). The stack frame generally includes the following aspects:

The return address and parameters of the function.
Temporary variables: including non-static local variables of the function and other temporary variables automatically generated by the compiler.
Saved context information: including registers that need to be kept unchanged before and after the function call.

A stack frame can be points to the top stack pointer register the SP and maintain the current stack frame reference address reference pointer register on BP represented. Therefore, a typical function activity record can be expressed as follows

The data after the parameter is the activity record of the current function. BP is fixed at the position shown in the figure (it is convenient to index parameters and variables, etc.), and it will not change with the execution of the function. The SP always points to the top of the stack, and as the function is executed, the SP will continue to change. Before BP is the return address of the function, it is represented as BP+4 in 32-bit machines, and BP+8 in 64-bit machines, and then the parameters pushed onto the stack. The data directly pointed to by BP is the value of BP before the function is called, so when the function returns, BP can restore the value before the call by reading this value.

Assembly code analysis

Next, let's compare and analyze the differences between C and Go calling conventions.

C calling convention

Assuming that there is a C program source file of main.c, the main function calls the add function, and the detailed code is as follows.

// main.c
int add(int arg1, int arg2, int arg3, int arg4,int arg5, int arg6,int arg7, int arg8) {
    return arg1 + arg2 + arg3 + arg4 + arg5 + arg6 + arg7 + arg8;
}

int main() {
    int i = add(10, 20, 30, 40, 50, 60, 70, 80);
}

We use the clang compiler to compile on the x86_64 platform.

$ clang -v
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin19.5.0

The assembly code obtained after main.c is compiled is as follows

 $ clang -S main.c
  ...
_main:                                
  ...
    subq    $32, %rsp      
    movl    $10, %edi    // 将参数1数据置于edi寄存器
    movl    $20, %esi    // 将参数2数据置于esi寄存器
    movl    $30, %edx    // 将参数3数据置于edx寄存器
    movl    $40, %ecx    // 将参数4数据置于ecx寄存器
    movl    $50, %r8d    // 将参数5数据置于r8d寄存器
    movl    $60, %r9d    // 将参数6数据置于r9d寄存器
    movl    $70, (%rsp)  // 将参数7数据置于栈上
    movl    $80, 8(%rsp) // 将参数8数据置于栈上
    callq    _add         // 调用add函数
    xorl    %ecx, %ecx
    movl    %eax, -4(%rbp)
    movl    %ecx, %eax  // 最终通过eax寄存器承载着返回值返回
    addq    $32, %rsp
    popq    %rbp
    retq
  ...  
_add:                                 
  ...
    movl    24(%rbp), %eax  
    movl    16(%rbp), %r10d 
    movl    %edi, -4(%rbp)  // 将edi寄存器上的数据放置于栈上
    movl    %esi, -8(%rbp)  // 将esi寄存器上的数据放置于栈上
    movl    %edx, -12(%rbp) // 将edx寄存器上的数据放置于栈上
    movl    %ecx, -16(%rbp) // 将ecx寄存器上的数据放置于栈上
    movl    %r8d, -20(%rbp) // 将r8d寄存器上的数据放置于栈上
    movl    %r9d, -24(%rbp) // 将edi寄存器上的数据放置于栈上
    movl    -4(%rbp), %ecx  // 将栈上的数据 10 放置于ecx寄存器
    addl    -8(%rbp), %ecx  // 实际为：ecx = ecx + 20
    addl    -12(%rbp), %ecx // ecx = ecx + 30
    addl    -16(%rbp), %ecx // ecx = ecx + 40
    addl    -20(%rbp), %ecx // ecx = ecx + 50 
    addl    -24(%rbp), %ecx // ecx = ecx + 60
    addl    16(%rbp), %ecx  // ecx = ecx + 70
    addl    24(%rbp), %ecx  // ecx = ecx + 80
    movl    %eax, -28(%rbp)        
    movl    %ecx, %eax      // 最终通过eax寄存器承载着返回值返回
    popq    %rbp
    retq
  ...

Therefore, before the main function calls the add function, its parameters are stored as shown in the figure below

The data after calling the add function is stored as shown in the figure below

Therefore, for the default C language calling convention ( cdecl calling convention), we can draw the following conclusions

When there are no more than six function parameters, the parameters will be passed in order using six registers, edi, esi, edx, ecx, r8d and r9d;
When there are more than six parameters, the excess parameters will be passed on the stack, and the parameters of the function will be pushed onto the stack in order from right to left

The return value of completed by register transfer, but according to the size of the return value, there are the following three situations.

Less than 4 bytes, the return value is stored in the eax register, and the function caller reads the value of eax
The return value is 5 to 8 bytes, using eax and edx registers to return jointly
More than 8 bytes, first open up a part of the extra space temp on the stack, and use the address of the temp object as a hidden parameter onto the stack. When the function returns, the data is copied to the temp object, and the address of the temp object is transferred out using the register eax. The caller copies the content from the temp object pointed to by eax.

As you can see, adopts the design of register transfer return value, the return value of C language can only have one . Here is an answer to why C cannot realize function multi-value return.

Go function calling convention

Suppose there is a Go program source file of main.go, which is the same as the example in C, where the main function calls the add function. The detailed code is as follows.

package main

func add(arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8 int) int {
    return arg1 + arg2 + arg3 + arg4 + arg5 + arg6 + arg7 + arg8
}

func main() {
    _ = add(10, 20, 30, 40, 50, 60, 70, 80)
}

Use the go tool compile -S -N -l main.go command to compile and get the following assembly code

"".main STEXT size=122 args=0x0 locals=0x50
        // 80代表栈帧大小为80个字节，0是入参和出参大小之和
        0x0000 00000 (main.go:7)        TEXT    "".main(SB), ABIInternal, $80-0
        ...
        0x000f 00015 (main.go:7)        SUBQ    $80, SP
        0x0013 00019 (main.go:7)        MOVQ    BP, 72(SP)
        0x0018 00024 (main.go:7)        LEAQ    72(SP), BP
        ...
        0x001d 00029 (main.go:8)        MOVQ    $10, (SP)  // 将数据填置栈上
        0x0025 00037 (main.go:8)        MOVQ    $20, 8(SP)
        0x002e 00046 (main.go:8)        MOVQ    $30, 16(SP)
        0x0037 00055 (main.go:8)        MOVQ    $40, 24(SP)
        0x0040 00064 (main.go:8)        MOVQ    $50, 32(SP)
        0x0049 00073 (main.go:8)        MOVQ    $60, 40(SP)
        0x0052 00082 (main.go:8)        MOVQ    $70, 48(SP)
        0x005b 00091 (main.go:8)        MOVQ    $80, 56(SP)
        0x0064 00100 (main.go:8)        PCDATA  $1, $0
        0x0064 00100 (main.go:8)        CALL    "".add(SB) // 调用add函数
        0x0069 00105 (main.go:9)        MOVQ    72(SP), BP
        0x006e 00110 (main.go:9)        ADDQ    $80, SP
        0x0072 00114 (main.go:9)        RET
        ...

"".add STEXT nosplit size=55 args=0x48 locals=0x0
        // add栈帧大小为0字节，72是 8个入参 + 1个出参 的字节大小之和
        0x0000 00000 (main.go:3)        TEXT    "".add(SB), NOSPLIT|ABIInternal, $0-72
        ...
        0x0000 00000 (main.go:3)        MOVQ    $0, "".~r8+72(SP)  // 初始化返回值，将其置为0
        0x0009 00009 (main.go:4)        MOVQ    "".arg1+8(SP), AX  // 开始将栈上的值放置在AX寄存器上
        0x000e 00014 (main.go:4)        ADDQ    "".arg2+16(SP), AX // AX = AX + 20
        0x0013 00019 (main.go:4)        ADDQ    "".arg3+24(SP), AX
        0x0018 00024 (main.go:4)        ADDQ    "".arg4+32(SP), AX
        0x001d 00029 (main.go:4)        ADDQ    "".arg5+40(SP), AX
        0x0022 00034 (main.go:4)        ADDQ    "".arg6+48(SP), AX
        0x0027 00039 (main.go:4)        ADDQ    "".arg7+56(SP), AX
        0x002c 00044 (main.go:4)        ADDQ    "".arg8+64(SP), AX
        0x0031 00049 (main.go:4)        MOVQ    AX, "".~r8+72(SP)  // 将结果AX填置到对应栈上位置
        0x0036 00054 (main.go:4)        RET
        ...

Similarly, when we call the add function from the main function, the parameters are stored and visualized as shown below

Here we can see that the order in which the add function enters the stack is the same as that of C, from right to left, that is, the last parameter is at SP+56~SP+64 near the bottom of the stack, and the first parameter is at The top of the stack is SP~SP+8.

The data after calling the add function is stored as shown in the figure below

Note that here is different from the call in C. Because the parameters are passed through the stack, there is no need to copy the parameters saved in the register to the stack. In this example, the add frame directly calls the data on the main frame stack for calculation. By accumulating the result to the AX register, and finally putting the final return value back on the stack, the position of the return value is above the last input parameter.

Therefore, we know that the and output parameters of the 1611118cc73d28 Go function are passed through the stack. Therefore, if you want to return multiple values, you only need to allocate more memory on the stack. here also answers the question at the beginning of the article.

Summarize

In the function calling convention, C language and Go language have chosen different implementation methods. The C language uses both registers and stacks to pass parameters. In the Go language, except for the temporary use of accumulating registers such as AX in the function calculation process, all parameters are passed through the stack.

Any choice will have its advantages and disadvantages. is implemented in C language with more consideration of performance, and implementation of Go language is more with regard to complexity . Below, we compare the two calling conventions in detail.

C language way

The efficiency of CPU access to registers will be significantly higher than that of stack;

The registers of different platforms are different, and the corresponding register transfer rules need to be set for each architecture;

When there are too many parameters, it is necessary to use the register and the stack to pass at the same time, which increases the complexity of the implementation, and at this time, the function call performance and the Go language method are no longer different;

Only one return value can be supported.

Go language way

Follow the cross-platform compilation concept of the Go language: all are passed through the stack, so there is no need to worry about register differences caused by different architectures;

In the case of fewer parameters, the function call performance will be lower than that of the C language;

The compiler is easy to maintain;

Can support multiple return values.

Go function calling convention

Calling convention

Stack

Assembly code analysis

Summarize

C language way

Go language way

机器铃砍菜刀

引用和评论

Golang 中 defer Close() 的潜在风险

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

大模型时代，后端程序员如何避免被AI卷死？

Go 1.24 相比 Go 1.23 有哪些值得注意的改动？

Go slice切片使用教程，一次通关！

腾讯 tRPC-Go 教学——（1）搭建服务

不愧是腾讯，面试的质量太高了