Parse Go&#39;s slice type and expansion mechanism from source code

First, let's take a look at how the official Go team defines slices, which are written in A Tour of Go :

An array has a fixed size. A slice, on the other hand, is a dynamically-sized, flexible view into the elements of an array. In practice, slices are much more common than arrays.

Simply put, a slice is a dynamically scalable and flexible "view" built on top of an array. In real projects, slices are more commonly used than arrays. Here's how to explicitly create slices on an array and create slices of slices (the more commonly used ones in actual projects should be created by the make built-in function):

 //
// main.go
//

func main() {
    arr := [...]int{1, 3, 5, 7, 9}

    s1 := arr[:3]
    fmt.Printf("s1 = %v, len = %d, cap = %d\n", s1, len(s1), cap(s1))
    // s1 = [1 3 5], len = 3, cap = 5

    s2 := arr[2:]
    fmt.Printf("s2 = %v, len = %d, cap = %d\n", s2, len(s2), cap(s2))
    // s2 = [5 7 9], len = 3, cap = 3

    s3 := s1[3:]
    fmt.Printf("s3 = %v, len = %d, cap = %d\n", s3, len(s3), cap(s3))
    // s3 = [], len = 0, cap = 2

    s4 := s1[2:3:3]
    fmt.Printf("s4 = %v, len = %d, cap = %d\n", s4, len(s4), cap(s4))
    // s4 = [5], len = 1, cap = 1

    s5 := []int{2, 4, 6, 8}
    fmt.Printf("s5 = %v, len = %d, cap = %d\n", s5, len(s5), cap(s5))
    // s5 = [2 4 6 8], len = 4, cap = 4
}

It should be noted that slice-based slices and directly created slices ( make method) can be regarded as the underlying implicitly an anonymous array (a newly created array or an underlying array that references other slices). It does not contradict the previous definition of slice.

convention

This article is based on the source code of go1.16 version. Since go.17 Go uses "register-based calling convention" instead of "stack-based calling convention". This modification makes the compiled binary smaller and also brings a slight performance improvement, but at the same time slightly increases the complexity of analyzing the assembly code. In order to facilitate the analysis of calls and display some features of the runtime, the go1.16 version is used for analysis.

For more information on "register-based calling conventions", please refer to the following:

The underlying data structure of the slice

A slice in Go is actually a structure with three fields, the definition of which can be found in runtime/slice.go :

 //
// runtime/slice.go
//

type slice struct {
    array unsafe.Pointer    // 指向底层数组，是某一块内存的首地址
    len   int               // 切片的长度
    cap   int               // 切片的容量
}

When we need to do something with the underlying data structure to achieve some purpose, we can use the structure definition exported in the reflect package:

 //
// reflect/value.go
//

// SliceHeader is the runtime representation of a slice.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type SliceHeader struct {
    Data uintptr
    Len  int
    Cap  int
}

If you feel that the text looks abstract, you can understand it from the following picture (where data points to the starting position of the array):
Structure of the Slice

Slicing Tips and Pitfalls

If you just look at the data in memory, it's just a string of meaningless 0 or 1, and what gives these data meaning is how our program interprets them. Since the Data field of a slice is just an address pointing to a certain block of memory, we can assign memory meaning in some "dangerous" ways to achieve some things that Go syntax doesn't allow. PLEASE NOTE: When using these techniques, you should clearly understand what you are doing and the corresponding side effects

Zero-copy conversion of strings to byte slices

A string in Go is actually a fixed-length byte array, and we know that a slice is a view built on top of an array, so we can do this:

 func String(bs []byte) (s string) {
    hdr := (*reflect.StringHeader)(unsafe.Pointer(&s))
    hdr.Data = (*reflect.SliceHeader)(unsafe.Pointer(&bs)).Data
    hdr.Len = (*reflect.SliceHeader)(unsafe.Pointer(&bs)).Len
    return
}

func Bytes(s string) (bs []byte) {
    hdr := (*reflect.SliceHeader)(unsafe.Pointer(&bs))
    hdr.Data = (*reflect.StringHeader)(unsafe.Pointer(&s)).Data
    hdr.Len = (*reflect.StringHeader)(unsafe.Pointer(&s)).Len
    hdr.Cap = hdr.Len
    return
}

An interview pitfall question

First of all, we know that Go is called by value (the internal implementation of slices and channels is a structure), then let's look at a trap:

 func AppendSlice(s []int) {
    s = append(s, 1234)
    fmt.Printf("s = %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // s = [1234], len = 1, cap = 8
}

func main() {
    s := make([]int, 0, 8)

    AppendSlice(s)
    fmt.Printf("s = %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // s = [], len = 0, cap = 8
}

The slice here s has enough capacity to hold the elements after append , but why is the slice printed in the main function empty? You can think about it for yourself and then check the answer:

Answer: Since a slice is a struct at runtime, and Go is called by value. So actually we can look at this code another way:

 type Slice struct {
    Data uintptr
    Len  int
    Cap  int
}

func AppendSlice(s Slice) {
    if s.Len+1 > s.Cap {
        // grow slice ...
    }

    *(*int)(unsafe.Pointer(s.Data + uintptr(s.Len)*8)) = 1024
    s.Len += 1
}

func main() {
    s := Slice{Data: 0x12345, Len: 0, Cap: 8}

    AppendSlice(s)
    fmt.Printf("s = %+v, len = %d, cap = %d\n", s, s.Len, s.Cap)
}

I believe that you have seen what I want to express here. Finally, the content of our append has actually been written to memory. Since the main slice in the function s.len is still 0, we cannot see this element. It can be rediscovered by:

 func main() {
    s := make([]int, 0, 8)

    AppendSlice(s)
    fmt.Printf("s = %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // s = [], len = 0, cap = 8

    (*reflect.SliceHeader)(unsafe.Pointer(&s)).Len = 1
    fmt.Printf("s = %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // s = [1234], len = 1, cap = 8
}

Slice expansion

In daily development, one or more values are often added to the end of the slice through the append built-in method to achieve the purpose of expansion. Since slicing is based on a "view" of an array, and the size of the array is immutable, in the process of append , if the length of the array is not enough to add more values, the underlying array needs to be expanded. . You can see how the expansion looks from the outside through the following code:

 //
// main.go
//

func main() {
    var s []int
    fmt.Printf("s = %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // s = [], len = 0, cap = 0

    s = append(s, 1)
    fmt.Printf("append(1) => %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // append(1) => [1], len = 1, cap = 1

    s = append(s, 2)
    fmt.Printf("append(2) => %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // append(2) => [1 2], len = 2, cap = 2

    s = append(s, 3)
    fmt.Printf("append(3) => %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // append(3) => [1 2 3], len = 3, cap = 4

    s = append(s, 4, 5)
    fmt.Printf("append(4, 5) => %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // append(4, 5) => [1 2 3 4 5], len = 5, cap = 8

    s = append(s, 6, 7, 8, 9)
    fmt.Printf("append(6, 7, 8, 9) => %v, len = %d, cap = %d\n\n", s, len(s), cap(s))
    // append(6, 7, 8, 9) => [1 2 3 4 5 6 7 8 9], len = 9, cap = 16

    s1 := []int{1, 2, 3}
    fmt.Printf("s1 = %v, len = %d, cap = %d\n", s1, len(s1), cap(s1))
    // s1 = [1 2 3], len = 3, cap = 3

    s1 = append(s1, 4)
    fmt.Printf("append(4) => %v, len = %d, cap = %d\n", s1, len(s1), cap(s1))
    // append(4) => [1 2 3 4], len = 4, cap = 6

    s1 = append(s1, 5, 6, 7)
    fmt.Printf("append(5, 6, 7) => %v, len = %d, cap = %d\n\n", s1, len(s1), cap(s1))
    // append(5, 6, 7) => [1 2 3 4 5 6 7], len = 7, cap = 12

    s2 := []int{0}
    fmt.Printf("s2 => len = %d, cap = %d\n", len(s2), cap(s2))
    // s2 => len = 1, cap = 1

    for i := 0; i < 13; i++ {
        for j, n := 0, 1<<i; j < n; j++ {
            s2 = append(s2, j)
        }
        fmt.Printf("append(<%d>...) => len = %d, cap = %d\n", 1<<i, len(s2), cap(s2))
        // append(<1>...) => len = 2, cap = 2
        // append(<2>...) => len = 4, cap = 4
        // append(<4>...) => len = 8, cap = 8
        // append(<8>...) => len = 16, cap = 16
        // append(<16>...) => len = 32, cap = 32
        // append(<32>...) => len = 64, cap = 64
        // append(<64>...) => len = 128, cap = 128
        // append(<128>...) => len = 256, cap = 256
        // append(<256>...) => len = 512, cap = 512
        // append(<512>...) => len = 1024, cap = 1024
        // append(<1024>...) => len = 2048, cap = 2304
        // append(<2048>...) => len = 4096, cap = 4096
        // append(<4096>...) => len = 8192, cap = 9216
    }
}

Observe the output results, the trend of slice growth is roughly doubled, let's verify it next.

Start with disassembly

The function signature of the built-in method append is func append(slice []Type, elems ...Type) []Type , this method does not have a specific method body implementation, but will be append when the compiler executes the intermediate code generation. Replaced with real runtime functions. Let's look at the sample code:

 //
// main.go
//

func main() {
    var s []int
    s = append(s, 1234)
    s = append(s, 5678)
}

// go tool compile -N -l -S main.go > main.S
// 
// -N disable optimizations : 禁止优化
// -l disable inlining : 禁止内联
// -S print assembly listing : 输出汇编

You can use the go tool compile command to export the assembly content of the above code, as follows:

 "".main STEXT size=270 args=0x0 locals=0x68 funcid=0x0
        // func main()
    0x0000 00000 (main.go:3)    TEXT    "".main(SB), ABIInternal, $104-0

    // 检查是否需要扩展栈空间
    0x0000 00000 (main.go:3)    MOVQ    (TLS), CX
    0x0009 00009 (main.go:3)    CMPQ    SP, 16(CX)
    0x000d 00013 (main.go:3)    JLS     260

    // 为 main 函数开辟栈空间
    0x0013 00019 (main.go:3)    SUBQ    $104, SP
    0x0017 00023 (main.go:3)    MOVQ    BP, 96(SP)
    0x001c 00028 (main.go:3)    LEAQ    96(SP), BP

    // 初始化切片 s, 分别为切片的三个字段赋零值
    0x0021 00033 (main.go:4)    MOVQ    $0, "".s+72(SP)       // s.Data = null
    0x002a 00042 (main.go:4)    XORPS   X0, X0                // 这里使用 128 位的 XMM 寄存器一次性初始化两个字段
    0x002d 00045 (main.go:4)    MOVUPS  X0, "".s+80(SP)       // s.Len = s.Cap = 0
    0x0032 00050 (main.go:5)    JMP     52

    // 第一次插入并进行切片扩容
    // func growslice(et *_type, old slice, cap int) slice
    0x0034 00052 (main.go:5)    LEAQ    type.int(SB), AX        // 获取切片元素类型的指针
    0x003b 00059 (main.go:5)    MOVQ    AX, (SP)                // 第一个参数 et 压栈
    0x003f 00063 (main.go:5)    XORPS   X0, X0                  // 使用 128 位的 XMM 寄存器减少使用的指令数量
    0x0042 00066 (main.go:5)    MOVUPS  X0, 8(SP)               // 第二个参数 old 的 .Data 和 .Len 字段初始化
    0x0047 00071 (main.go:5)    MOVQ    $0, 24(SP)              // 第二个参数 old 的 .Cap 字段初始化
    0x0050 00080 (main.go:5)    MOVQ    $1, 32(SP)              // 第三个参数 cap 压栈, 值为 1
    0x0059 00089 (main.go:5)    CALL    runtime.growslice(SB)   // 调用 runtime.growslice 方法进行切片扩容
    0x005e 00094 (main.go:5)    MOVQ    40(SP), AX              // 返回值 r.Data
    0x0063 00099 (main.go:5)    MOVQ    56(SP), CX              // 返回值 r.Cap
    0x0068 00104 (main.go:5)    MOVQ    48(SP), DX              // 返回值 r.Len = 0
    0x006d 00109 (main.go:5)    LEAQ    1(DX), BX               // 返回值 r.Len = 0 的值加 1 赋值给 BX
    0x0071 00113 (main.go:5)    JMP     115
    0x0073 00115 (main.go:5)    MOVQ    $1234, (AX)             // 将需要 append 的元素保存到 .Data 所指向的位置
    0x007a 00122 (main.go:5)    MOVQ    AX, "".s+72(SP)         // s.Data = r.Data
    0x007f 00127 (main.go:5)    MOVQ    BX, "".s+80(SP)         // s.Len = r.Len
    0x0084 00132 (main.go:5)    MOVQ    CX, "".s+88(SP)         // s.Cap = r.Cap

    // 检查是否需要切片扩容, 再执行插入操作
    0x0089 00137 (main.go:6)    LEAQ    2(DX), SI               // 返回值 r.Len = 0 的值加 2 赋值给 SI
    0x008d 00141 (main.go:6)    CMPQ    CX, SI                  // 对比 r.Cap 和 Len 的值 (r.Cap - Len)
    0x0090 00144 (main.go:6)    JCC     148                     // >= unsigned
    0x0092 00146 (main.go:6)    JMP     190                     // 如果 r.Cap < Len, 则跳转到 190 进行切片扩容
    0x0094 00148 (main.go:6)    JMP     150                     // 如果 r.Cap >= Len, 则直接将元素添加到末尾
    0x0096 00150 (main.go:6)    LEAQ    (AX)(DX*8), DX          // 保存元素的地址 DX = (r.Data + r.Len(0) * 8)
    0x009a 00154 (main.go:6)    LEAQ    8(DX), DX               // r.Len = 0 且已经有一个元素, 所以这里需要 +8
    0x009e 00158 (main.go:6)    MOVQ    $5678, (DX)             // 写入数据
    0x00a5 00165 (main.go:6)    MOVQ    AX, "".s+72(SP)         // s.Data = r.Data
    0x00aa 00170 (main.go:6)    MOVQ    SI, "".s+80(SP)         // s.Len = r.Len
    0x00af 00175 (main.go:6)    MOVQ    CX, "".s+88(SP)         // s.Cap = r.Cap

    // 清理 main 函数的栈空间并返回
    0x00b4 00180 (main.go:7)    MOVQ    96(SP), BP
    0x00b9 00185 (main.go:7)    ADDQ    $104, SP
    0x00bd 00189 (main.go:7)    RET

    // 第二次插入空间不够, 需要再次扩容
    // func growslice(et *_type, old slice, cap int) slice
    0x00be 00190 (main.go:5)    MOVQ    DX, ""..autotmp_1+64(SP)    // 备份 DX = r.Len = 0 到一个临时变量上
    0x00c3 00195 (main.go:6)    LEAQ    type.int(SB), DX            // 获取切片元素类型的指针
    0x00ca 00202 (main.go:6)    MOVQ    DX, (SP)                    // 第一个参数 et 压栈
    0x00ce 00206 (main.go:6)    MOVQ    AX, 8(SP)                   // 第二个参数 old.Data 压栈
    0x00d3 00211 (main.go:6)    MOVQ    BX, 16(SP)                  // 第二个参数 old.Len = 1 压栈
    0x00d8 00216 (main.go:6)    MOVQ    CX, 24(SP)                  // 第二个参数 old.Cap 压栈
    0x00dd 00221 (main.go:6)    MOVQ    SI, 32(SP)                  // 第三个参数 cap = 2 压栈
    0x00e2 00226 (main.go:6)    CALL    runtime.growslice(SB)       // 执行切片扩容, 扩容之后只有 Cap, Data 会变
    0x00e7 00231 (main.go:6)    MOVQ    40(SP), AX                  // 覆盖切片 s.Data = r.Data
    0x00ec 00236 (main.go:6)    MOVQ    48(SP), CX                  // 覆盖切片 s.Len = r.Len
    0x00f1 00241 (main.go:6)    MOVQ    56(SP), DX                  // 覆盖切片 s.Cap = r.Cap
    0x00f6 00246 (main.go:6)    LEAQ    1(CX), SI                   // 将 r.Len + 1 保证后续步骤的寄存器能对上
    0x00fa 00250 (main.go:6)    MOVQ    DX, CX                      // 保证后续步骤的寄存器能相对应
    0x00fd 00253 (main.go:6)    MOVQ    ""..autotmp_1+64(SP), DX    // 恢复 DX 的值
    0x0102 00258 (main.go:6)    JMP     150

    // 执行栈扩展
    0x0104 00260 (main.go:6)    NOP
    0x0104 00260 (main.go:3)    CALL    runtime.morestack_noctxt(SB)
    0x0109 00265 (main.go:3)    JMP     0

There are some interesting points in the above assembly code, let's talk about it:

Use the corresponding XORPS and MOVUPS instructions of the XMM register to initialize 16 bytes of data at a time, and if you use MOVQ , you can only clear 8 bytes at a time. Means twice as many instructions are needed to achieve the same purpose.
Use LEAQ to calculate instead of ADDQ or MULQ . The reason is that the instructions of LEAQ are very short and can also perform simple arithmetic operations. And the LEAQ instruction does not occupy ALU , which is friendly to parallel support.

Back to the topic, through the assembly code, it can be observed that the runtime.growslice method (located in the runtime/slice.go file) performs the expansion operation at runtime. Next, we will discuss in detail how it works. Implement slice expansion.

Related proposals for slice expansion:

[proposal: Go 2: allow cap(make([]T, m, n)) > n][14]

Implementation of in-depth expansion

First let's take a look at the method signature and comment content of runtime.growslice :

 //
// runtime/slice.go
//

// growslice handles slice growth during append.
// It is passed the slice element type, the old slice, and the desired new minimum capacity,
// and it returns a new slice with at least that capacity, with the old data
// copied into it.
// The new slice's length is set to the old slice's length,
// NOT to the new requested capacity.
// This is for codegen convenience. The old slice's length is used immediately
// to calculate where to write new values during an append.
// TODO: When the old backend is gone, reconsider this decision.
// The SSA backend might prefer the new length or to return only ptr/cap and save stack space.
func growslice(et *_type, old slice, cap int) slice {}

It can be known from the method signature and annotation that the method will expand the capacity according to the element type of the slice, the current slice , and the minimum capacity required for the new slice (that is, the length of the new slice), return a new slice with at least the specified capacity , and set the The data in the old slice is copied over.

The signature slice is the slice structure we mentioned above, and the other _type type is relatively unfamiliar. Let's take a look at how this type is defined:

 //
// runtime/type.go
//

// Needs to be in sync with ../cmd/link/internal/ld/decodesym.go:/^func.commonsize,
// ../cmd/compile/internal/reflectdata/reflect.go:/^func.dcommontype and
// ../reflect/type.go:/^type.rtype.
// ../internal/reflectlite/type.go:/^type.rtype.
type _type struct {
    size       uintptr // 类型所占内存的大小
    ptrdata    uintptr // size of memory prefix holding all pointers
    hash       uint32  // 类型的哈希值，在接口断言和接口查询中使用
    tflag      tflag   // 类型的特征标记
    align      uint8   // _type 作为整体保存时的对齐字节数
    fieldAlign uint8   // 当前结构字段的对齐字节数
    kind       uint8   // 基础类型的枚举值，与 reflect.Kind 的值相同，决定了如何解析该类型
    // function for comparing objects of this type
    // (ptr to object A, ptr to object B) -> ==?
    equal func(unsafe.Pointer, unsafe.Pointer) bool
    // gcdata stores the GC type data for the garbage collector.
    // If the KindGCProg bit is set in kind, gcdata is a GC program.
    // Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
    gcdata    *byte    // GC 的相关信息
    str       nameOff  // 类型的名称字符串在二进制文件中的偏移值。由链接器负责填充
    ptrToThis typeOff  // 类型元信息（即当前结构体）的指针在编译后二进制文件中的偏移值。由链接器负责填充
}

From the above assembly code, you can see that the incoming symbol is type.int(SB) , this symbol will be filled and replaced during symbol relocation during the linking phase. Among them, we only need to use the size _type type (used to calculate the size of the memory occupied by the new slice).

Part 1: Calculate the capacity of the new slice

Back to the code, let's first look at the implementation of the first half. This part of the code is mainly to calculate the capacity of the new slice, which is convenient to apply for memory use later. Look directly at the code:

 //
// runtime/slice.go 
//
// -- 为了便于分析和展示, 代码经过删减 --
//
// et: 切片的元素类型
// old: 当前切片
// cap: 新切片所需的最小容量
func growslice(et *_type, old slice, cap int) slice {
    newcap := old.cap
    doublecap := newcap + newcap
    if cap > doublecap {
        newcap = cap
    } else {
        if old.cap < 1024 {
            newcap = doublecap
        } else {
            // Check 0 < newcap to detect overflow
            // and prevent an infinite loop.
            for 0 < newcap && newcap < cap {
                newcap += newcap / 4
            }
            // Set newcap to the requested cap when
            // the newcap calculation overflowed.
            if newcap <= 0 {
                newcap = cap
            }
        }
    }
    // ...
}

According to the above code, the following expansion strategies can be sorted out:

If the minimum capacity required by the new slice is greater than twice the capacity of the current slice , then the minimum capacity required by the new slice is used directly
If the minimum required capacity of the new slice is less than or equal to twice the capacity of the current slice
- If the capacity of the current slice is less than 1024, directly double the capacity of the current slice as the capacity of the new slice
- If the capacity of the current slice is greater than or equal to 1024, increment the slice capacity by 1/4 times each time until it is greater than the minimum capacity required for the new slice .

To summarize the above expansion strategy, we can express it in the following pseudocode:

 if NewCap > CurrCap * 2
    return NewCap

if CurrCap < 1024
    return CurrCap * 2

while CurrCap < NewCap
    CurrCap += CurrCap / 4

Part 2: Doing Memory Allocation

After getting the capacity of the new slice , the easiest way is to directly apply to the system for the memory of ElementTypeSize * NewSliceCap . But doing so will inevitably generate a lot of memory fragmentation, and it is not friendly to high performance (when the heap memory is insufficient, you need to expand the heap through the brk system call).

Go memory management

So how should we achieve high-performance memory management while reducing memory fragmentation? We need to implement the following functions:

Memory pool technology: apply a large block of memory from the operating system at one time to avoid switching from user mode to kernel mode
Garbage collection: Dynamic, automatic garbage collection mechanism allows memory to be reused
Memory allocation algorithm: can achieve efficient memory allocation while avoiding contention and fragmentation

The memory allocation algorithm of the Go runtime is implemented based on TCMalloc . The core idea of this algorithm is to divide the memory into multiple different levels for management to reduce the granularity of the lock. How to implement specific memory management and why it is implemented can refer to the following articles to learn:

Simply put, Go divides memory less than 32K bytes into about 70 different sizes (spans) ranging from 8B to 32K. If the requested memory is less than 32K, directly fetch a certain specification (which may cause a certain waste). If the requested memory is larger than 32K, the application is directly rounded up by page (one page is 8K bytes) from the heap held by Go.

Application in slice expansion

Next, go back to the topic and see how memory management is applied in slice expansion:

 //
// runtime/slice.go 
//

// -- 为了便于分析和展示, 代码经过删减 --
func growslice(et *_type, old slice, cap int) slice {
    // ...
    var overflow bool
    // lenmem    -> 当前切片元素所占用内存的大小
    // newlenmem -> 在 append 之后切片元素所占用的内存大小
    // capmem    -> 实际申请到的内存大小
    var lenmem, newlenmem, capmem uintptr
    // Specialize for common values of et.size.
    // For 1 we don't need any division/multiplication.
    // For sys.PtrSize, compiler will optimize division/multiplication into a shift by a constant.
    // For powers of 2, use a variable shift.
    switch {
    case et.size == 1:
        lenmem = uintptr(old.len)
        newlenmem = uintptr(cap)
        capmem = roundupsize(uintptr(newcap))
        overflow = uintptr(newcap) > maxAlloc
        newcap = int(capmem)
    case et.size == sys.PtrSize:
        lenmem = uintptr(old.len) * sys.PtrSize
        newlenmem = uintptr(cap) * sys.PtrSize
        capmem = roundupsize(uintptr(newcap) * sys.PtrSize)
        overflow = uintptr(newcap) > maxAlloc/sys.PtrSize
        newcap = int(capmem / sys.PtrSize)
    case isPowerOfTwo(et.size):
        var shift uintptr
        if sys.PtrSize == 8 {
            // Mask shift for better code generation.
            shift = uintptr(sys.Ctz64(uint64(et.size))) & 63
        } else {
            shift = uintptr(sys.Ctz32(uint32(et.size))) & 31
        }
        lenmem = uintptr(old.len) << shift
        newlenmem = uintptr(cap) << shift
        capmem = roundupsize(uintptr(newcap) << shift)
        overflow = uintptr(newcap) > (maxAlloc >> shift)
        newcap = int(capmem >> shift)
    default:
        lenmem = uintptr(old.len) * et.size
        newlenmem = uintptr(cap) * et.size
        capmem, overflow = math.MulUintptr(et.size, uintptr(newcap))
        capmem = roundupsize(capmem)
        newcap = int(capmem / et.size)
    }
}

In fact, just look at the logic in the default branch. Other branches are only optimized according to the size of the element type (by shifting and other operations to execute fewer instructions). In this code, it is the roundupsize method that determines how much memory is finally allocated:

 //
// runtime/msize.go
//

// Malloc small size classes.
//
// See malloc.go for overview.
// See also mksizeclasses.go for how we decide what size classes to use.

// Returns size of the memory block that mallocgc will allocate if you ask for the size.
func roundupsize(size uintptr) uintptr {
    if size < _MaxSmallSize { // _MaxSmallSize = 32768
        if size <= smallSizeMax-8 { // smallSizeMax = 1024
            // smallSizeDiv = 8
            return uintptr(class_to_size[size_to_class8[divRoundUp(size, smallSizeDiv)]])
        } else {
            // smallSizeMax = 1024, largeSizeDiv = 128
            return uintptr(class_to_size[size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)]])
        }
    }
    if size+_PageSize < size { // _PageSize = 8192
        return size
    }
    return alignUp(size, _PageSize)
}


//
// runtime/stubs.go
//

// divRoundUp returns ceil(n / a).
func divRoundUp(n, a uintptr) uintptr {
    // a is generally a power of two. This will get inlined and
    // the compiler will optimize the division.
    return (n + a - 1) / a
}

// alignUp rounds n up to a multiple of a. a must be a power of 2.
func alignUp(n, a uintptr) uintptr {
    return (n + a - 1) &^ (a - 1)
}

This method passes in size the parameters are calculated from NewSliceCap * ElementType.Size . If the incoming size is less than 32K, it will select a "specification that just meets the requirements" from the following specifications and return:

 // 0 表示大对象(large object)
var class_to_size = [_NumSizeClasses]uint16{ // _NumSizeClasses = 68
        0,     8,    16,    24,    32,    48,    64,    80,    96,   112, 
      128,   144,   160,   176,   192,   208,   224,   240,   256,   288, 
      320,   352,   384,   416,   448,   480,   512,   576,   640,   704, 
      768,   896,  1024,  1152,  1280,  1408,  1536,  1792,  2048,  2304, 
     2688,  3072,  3200,  3456,  4096,  4864,  5376,  6144,  6528,  6784, 
     6912,  8192,  9472,  9728, 10240, 10880, 12288, 13568, 14336, 16384, 
    18432, 19072, 20480, 21760, 24576, 27264, 28672, 32768, 
}

If the requested memory is greater than 32K, the page that "just meets the requirements" will be applied directly from the heap (the page size is 8K). Related content can refer to the following source code:

runtime/malloc.go : Implementation related to memory management
runtime/mksizeclasses.go : The logic of the above specifications

Part 3: Copying Objects and Returning

Since the actual memory capacity obtained by the application may be larger than the required capacity, it is necessary to re-determine how many elements the applied memory specification can hold after the specification of the requested memory is determined. As shown in the following code:

 //
// runtime/slice.go 
//

// -- 为了便于分析和展示, 代码经过删减 --
func growslice(et *_type, old slice, cap int) slice {
    // ...
    capmem = roundupsize(capmem)
    newcap = int(capmem / et.size)
    // ...
}

Here is a unified explanation of several variables related to memory size during the expansion process:

lenmem : the memory size occupied by all elements in the slice before append ( OldSlice.len * ElementTypeSize )
newlenmem : the memory size occupied by all elements in the slice after append ( (OldSlice.len + AppendSize) * ElementTypeSize )
size : the memory size required for the capacity of the new slice (calculated in the first part) after append 8e9e63f0bd4a71b8eae480017de7dd02--- ( NewSliceCap * ElementTypeSize )
capmem : The memory size actually applied after the memory specification is matched (calculated in the second part)

Next, you need to copy the data from the old slice to the new slice, look directly at the code:

 //
// runtime/slice.go 
//

// -- 为了便于分析和展示, 代码经过删减 --
func growslice(et *_type, old slice, cap int) slice {
    // ...
    // p 指向申请到的内存的首地址
    var p unsafe.Pointer
    if et.ptrdata == 0 { // 如果元素类型中不包含指针
        // 申请 capmem 个字节
        p = mallocgc(capmem, nil, false)
        // 由于可能会申请到大于需求容量的内存，所以需要将目前用不到的内存清零
        // 在需求容量之内的内存会由之后的程序负责填充和赋值
        memclrNoHeapPointers(add(p, newlenmem), capmem-newlenmem)
    } else {
        // 申请指定大小的内存并将所有内存清零
        p = mallocgc(capmem, et, true)
        // 为内存添加写屏障
        if lenmem > 0 && writeBarrier.enabled {
            // Only shade the pointers in old.array since we know the destination slice p
            // only contains nil pointers because it has been cleared during alloc.
            bulkBarrierPreWriteSrcOnly(uintptr(p), uintptr(old.array), lenmem-et.size+et.ptrdata)
        }
    }
    // 将旧切片中的的数据拷贝过来，从 old.array 拷贝 lenmem 个字节到 p
    memmove(p, old.array, lenmem)

    // 返回新切片
    return slice{p, old.len, newcap}
}


//
// runtime/stubs.go
// 

// memmove copies n bytes from "from" to "to".
func memmove(to, from unsafe.Pointer, n uintptr)

At this point, all the expansion-related work is over, and other parts of the program will fill the data into the newly expanded slice. Here we use the following code plus a picture to summarize all the above, and try to see if you can match the logic of the above parts in the picture. code show as below:

 type Triangle [3]byte

func main() {
    s := []Triangle{{1, 1, 1}, {2, 2, 2}}

    s = append(s, Triangle{3, 3, 3}, Triangle{4, 4, 4})
    fmt.Printf("s = %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // s = [[1 1 1] [2 2 2] [3 3 3] [4 4 4]], len = 4, cap = 5
}

According to the above code, we can draw the following structure diagram (where the cuboid represents a Triangle type, occupying 3 bytes of space):
Structure of the memory

Verify slice scaling

On paper, I feel shallow at the end, and I absolutely know that this matter has to be done. After all, the content analyzed above is just theory and guesswork. We need to verify the reliability of the summarized laws through practice.

Slice expansion smaller than 32K

First, let's try to expand on slices smaller than 32K, with the following code:

 func main() {
    var s []int

    s = append(s, 1, 2, 3, 4, 5)
    fmt.Printf("s = %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // s = [1 2 3 4 5], len = 5, cap = 6

    s = append(s, 6, 7, 8, 9)
    fmt.Printf("s = %v, len = %d, cap = %d\n", s, len(s), cap(s))
    // s = [1 2 3 4 5 6 7 8 9], len = 9, cap = 12
}

We follow the rules summarized above to verify step by step, the process is as follows:

 var s []int // s.data = null, s.len = 0, s.cap = 0

s = append(s, 1, 2, 3, 4, 5)
// 旧切片的容量不足，需要进行扩容
// 执行扩容方法：growslice(et = type.int, old = {null, 0, 0}, cap = 5)
//    1. 由于 cap > 2 * old.cap ，所以新切片的容量至少需要为 5 (NewSliceCap)
//    2. 执行内存规格匹配，需要的内存容量为 5 * 8(int) = 40 字节，查表可得实际使用的内存规格为 48 字节
//    3. 所以实际的切片容量为 48(capmem) / 8(et.size) = 6
// 验证正确

s = append(s, 6, 7, 8, 9)
// 旧切片容量为 6 ，需求容量为 9 ，需要进行扩容
// 执行扩容方法：growslice(et = type.int, old = {<addr>, 5, 6}, cap = 9)
//    1. 由于 cap < 2 * old.cap = 12 且 old.cap < 1024 ，所以新切片的容量至少需要为 2 * old.cap = 12
//    2. 执行内存规格匹配，需要的内存容量为 12 * 8(int) = 96 字节，查表可得实际使用的内存规格为 96 字节
//    3. 所以实际的切片容量为 96(capmem) / 8(et.size) = 12
// 验证正确

Slice expansion larger than 32K

We gradually verify the expansion of slices larger than 32K through a byte array of length 128 (occupying 1024 bytes):

 type Array [128]int // 128 * 8 = 1024 Bytes

func main() {
    var s []Array

    s = append(s, Array{}, Array{}, Array{}, Array{}, Array{}, Array{}, Array{}) // 7K
    fmt.Printf("s, len = %d, cap = %d\n", len(s), cap(s))
    // s, len = 7, cap = 8

    s = append(s, make([]Array, 26)...) // 33K
    fmt.Printf("s, len = %d, cap = %d\n", len(s), cap(s))
    // s, len = 33, cap = 40
}

Similarly, according to the summary rule, the calculation is carried out as follows:

 var s []Array // s.data = null, s.len = 0, s.cap = 0

s = append(s, Array{}, Array{}, Array{}, Array{}, Array{}, Array{}, Array{})
// 旧切片的容量不足，需要进行扩容
// 执行扩容方法：growslice(et = type.Array, old = {null, 0, 0}, cap = 7)
//    1. 由于 cap > 2 * old.cap ，所以新切片的容量至少需要为 7 (NewSliceCap)
//    2. 执行内存规格匹配，需要的内存容量为 7 * 1024(Array) = 7168 字节，查表可得实际使用的内存规格为 8192 字节
//    3. 所以实际的切片容量为 8192(capmem) / 1024(et.size) = 8
// 验证正确

s = append(s, make([]Array, 26)...)
// 旧切片容量为 8 ，需求容量为 33 ，需要进行扩容
// 执行扩容方法：growslice(et = type.Array, old = {<addr>, 7, 8}, cap = 33)
//    1. 由于 cap > 2 * old.cap = 16 ，所以新切片的容量至少需要为 cap = 33
//    2. 执行内存规格匹配，需要的内存容量为 33 * 1024(int) = 33792 字节，由于大于 32768 ，按页向上取整得到 40960 字节
//    3. 所以实际的切片容量为 40960(capmem) / 1024(et.size) = 40
// 验证正确

Here all the content of this article is over, I hope you can gain something. If there is any mistake, please let me know!

Other references

Here are some references on Go assembly

Parse Go's slice type and expansion mechanism from source code

convention

The underlying data structure of the slice

Slicing Tips and Pitfalls

Zero-copy conversion of strings to byte slices

An interview pitfall question

Slice expansion

Start with disassembly

Implementation of in-depth expansion

Part 1: Calculate the capacity of the new slice

Part 2: Doing Memory Allocation

Go memory management

Application in slice expansion

Part 3: Copying Objects and Returning

Verify slice scaling

Slice expansion smaller than 32K

Slice expansion larger than 32K

Other references

JaysonWang

引用和评论

一文掌握 MCP 上下文协议：从理论到实践

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

Go slice切片使用教程，一次通关！

gozero限流、熔断、降级如何实现？面试的时候怎么回答？

腾讯 tRPC-Go 教学——（1）搭建服务

一文弄懂用Go实现MCP服务

如何系统地入门学习stm32？

Parse Go&#39;s slice type and expansion mechanism from source code

convention

The underlying data structure of the slice

Slicing Tips and Pitfalls

Zero-copy conversion of strings to byte slices

An interview pitfall question

Slice expansion

Start with disassembly

Implementation of in-depth expansion

Part 1: Calculate the capacity of the new slice

Part 2: Doing Memory Allocation

Go memory management

Application in slice expansion

Part 3: Copying Objects and Returning

Verify slice scaling

Slice expansion smaller than 32K

Slice expansion larger than 32K

Other references

JaysonWang

引用和评论

一文掌握 MCP 上下文协议：从理论到实践

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

Go slice切片使用教程，一次通关！

gozero限流、熔断、降级如何实现？面试的时候怎么回答？

腾讯 tRPC-Go 教学——（1）搭建服务

一文弄懂用Go实现MCP服务

如何系统地入门学习stm32？

Parse Go's slice type and expansion mechanism from source code