4

Arrays and slices are two basic data structures provided by the Go language. The concept of arrays should be familiar to everyone. A collection of elements of the same type, and the elements are stored continuously in memory, it is very convenient to access array elements through subscripts; so what? Is it sliced? Slices can be understood as dynamic arrays, which means that the length of the array (the maximum number of elements that can be stored) can be dynamically adjusted. Slicing is one of the most commonly used data structures in our daily development, and should be studied with emphasis.

array

The definition and use of an array is very simple, as shown in the following example:

 package main

import "fmt"

func main() {
  var arr [3]int
  //数组访问
  arr[0] = 100
  arr[1] = 200
  arr[2] = 300

  //arr最大可以存储三个整数,下标从0开始,最大为2
  //Invalid array index 3 (out of bounds for 3-element array);访问越界,无法编译通过
  //arr[3] = 400

  fmt.Println(len(arr), arr) //len返回数组长度

  var arr1 [5]int
  //数组的类型包括:元素类型 + 数组长度,任意一项不等,说明数组类型不同,无法相互赋值
  //Cannot use 'arr' (type [3]int) as type [5]int
  //arr1 = arr
  fmt.Println(arr1)
}

In the process of using the array, it is necessary to pay attention to the maximum value of the subscript is len - 1, and do not access out-of-bounds situations. Go language arrays are very similar to C language arrays, but there are some differences when using arrays as function parameters.

The first thing to be clear is that Go language function parameters are passed by value (the input parameters will be copied), not by reference (the address of the input parameters), so although you modify the input parameters inside the function, the call The square variable has not changed, as in the following example:

 package main

import "fmt"

func main() {
  arr := [6]int{1,2,3,4,5,6}
  testArray(arr)
  fmt.Println(arr)  //原数组未发生修改:[1 2 3 4 5 6]
}

func testArray(arr [6]int) {
  arr[0] = 0
  arr[5] = 500
  fmt.Println(arr) //修改数组元素:[0 2 3 4 5 500]
}

Partners who have learned C language arrays may be puzzled. In C language, in this case, the elements of the caller's array will be changed synchronously. How does the Go language do it? As mentioned above, the parameters of Go language functions are passed by value, so Go language will copy all elements of the array, so that the array modified inside the function has nothing to do with the original array.

We can simply look at the assembly code of Go language. Go language itself provides compilation tools:

 //-N 禁止优化 -l 禁止内联 -S 输出汇编
go tool compile -S -N -l test.go

"".main STEXT size=125 
  //MOVQ 拷贝8字节数据
  0x0026 00038 (test.go:6)  MOVQ  $1, "".arr+48(SP)
  0x002f 00047 (test.go:6)  MOVQ  $2, "".arr+56(SP)
  0x0038 00056 (test.go:6)  MOVQ  $3, "".arr+64(SP)
  0x0041 00065 (test.go:6)  MOVQ  $4, "".arr+72(SP)
  0x004a 00074 (test.go:6)  MOVQ  $5, "".arr+80(SP)
  0x0053 00083 (test.go:6)  MOVQ  $6, "".arr+88(SP)
  //MOVUPS 拷贝16字节数组,数组6个元素拷贝三次
  0x005c 00092 (test.go:7)  MOVUPS  "".arr+48(SP), X0
  0x0061 00097 (test.go:7)  MOVUPS  X0, (SP)
  0x0065 00101 (test.go:7)  MOVUPS  "".arr+64(SP), X0
  0x006a 00106 (test.go:7)  MOVUPS  X0, 16(SP)
  0x006f 00111 (test.go:7)  MOVUPS  "".arr+80(SP), X0
  0x0074 00116 (test.go:7)  MOVUPS  X0, 32(SP)
  0x0079 00121 (test.go:7)  CALL  "".testArray(SB)
  ……

"".testArray STEXT nosplit 
  0x000f 00015 (test.go:11) SUBQ  $136, SP

  0x0026 00038 (test.go:12) MOVQ  $0, "".arr+144(SP)
  0x0032 00050 (test.go:13) MOVQ  $500, "".arr+184(SP)

Don't be frightened by the word assembly, as long as you understand the virtual memory structure (the key function stack frame structure), understand the concept of registers, and understand the meaning of some common instructions, the above logic is very clear. "CALL testArray" is a function call, and the above instructions are parameter preparation. It is obvious that the parameters are a copy of the original array. The stack frame structure of the above case is shown in the following figure:

slice

Slices can be understood as dynamic arrays. The basic usage is similar to that of arrays. They are stored continuously and can be accessed by subscripts. Dynamic means that the capacity of slices can be adjusted. When adding elements to slices, the underlying Go language determines the capacity of the array. Whether it is enough, if not, trigger the expansion operation.

Basic operation

Let's first look at a small case to understand the basic operations of slice initialization, access, and appending elements, as well as the length and capacity of slices:

 package main

import "fmt"

func main() {
  //声明并初始化切片
  slice := []int{1,2,3}
  slice[0] = 100
  //len:切片长度,即切片存储了几个元素;cap:切片容量,即切片底层数组最多能存储元素数目
  fmt.Println(len(slice), cap(slice), slice) //上述声明方式,切片长度/容量都等于3: 3 3 [100 2 3]

  //往切片追加元素,注意切片slice容量是3,此时追加元素会触发扩容操作
  slice = append(slice, 4)
  fmt.Println(len(slice), cap(slice), slice) //切片已经扩容,此时容量是6(一般按双倍容量扩容): 4 6 [100 2 3 4]

  //切片的容量虽然是6,但长度是4,访问下标5越界
  //slice[5] = 5 //panic: runtime error: index out of range [5] with length 4

  //也可以基于make函数声明切片;第二个参数为切片长度,第三个参数为切片容量(可以省略,默认容量等于长度)
  slice1 := make([]int, 4, 8)
  slice1[1] = 1
  slice1[2] = 2
  fmt.Println(len(slice1), cap(slice1), slice1) //4 8 [0 1 2 0]

  //切片遍历访问
  for idx, v := range slice {
    printSliceValue(idx, v)  //printSliceValue自己随便定义就行
  }
}

The function len is used to obtain the length of the slice, and cap is used to obtain the capacity of the slice; the length of the slice refers to the number of elements in the slice, and the maximum access subscript is len - 1, and the capacity of the slice refers to the maximum number of elements that can be stored in the underlying array of the slice; the append function is used to add to the slice When adding elements, this function will judge the slice capacity. If the capacity is not enough, the expansion operation will be triggered. Generally, the capacity will be doubled. make is a variable initialization function provided by the Go language, which can be used to initialize some built-in type variables, such as slice, map, pipeline chan, etc.

We can traverse the slice through the for range method, and the range can obtain the index and element value of the currently traversed element, so the question is, if the element value is modified during the traversal process, will the element of the slice be modified? Such as the following case:

 package main

import "fmt"

func main() {
  slice := make([]int, 10, 10)
  for i := 0; i < 10; i ++ {
    slice[i] = i
  }

  for idx, v := range slice {
    v += 100
    printSliceValue(idx, v)
  }

  fmt.Println(slice)   //输出 [0 1 2 3 4 5 6 7 8 9]
}

func printSliceValue(idx, val int) {
  fmt.Println(idx, val)
}

Obviously, by modifying the value of element v in this way, the elements of the slice will not change. why? Because the index value v here is just a copy of the slice element, modifying the copy value will definitely not change the original value. So what if you want to modify the value of the slice in the traversal? It can be modified in the form of slice[idx], so that what is accessed is the original value of the slice.

There is also a common operation for slices: interception, that is, intercepting a part of the slice to generate a new slice, the syntax format is "slice[start:end]", start and end both represent subscripts, left open and closed (the new slice includes the following The subscript start element does not contain the subscript end element), and the new slice length is end - start.

 package main

import "fmt"

func main() {
  slice := []int{1,2,3,4,5,6,7,8,9,10}
  //切片截取
  slice1 := slice[2:5]
  //修改新切片slice1元素,slice元素会改变吗?
  slice1[0] = 100
  fmt.Println(len(slice), cap(slice), slice)
  fmt.Println(len(slice1), cap(slice1), slice1)

  //slice1追加多个元素,超过其cap触发扩容
  slice1 = append(slice1, 11,12,13,14,15,16,17,18,19,20,21,22)
  //再次修改slice1元素,slice元素会改变吗?
  slice1[0] = 200
  fmt.Println(len(slice), cap(slice), slice)
  fmt.Println(len(slice1), cap(slice1), slice1)
}

/**
输出:
10 10 [1 2 100 4 5 6 7 8 9 10]
3 8 [100 4 5]

10 10 [1 2 100 4 5 6 7 8 9 10]
15 16 [200 4 5 11 12 13 14 15 16 17 18 19 20 21 22]
**/

After analyzing the output structure, after generating a new slice slice1 by intercepting, modify the slice1 element, and the slice element has also been changed! Why is this? Because the bottom layer of the slice is also implemented based on the array, the two slices share the same bottom layer array after the interception, so the modified elements will affect each other. Then why does it not affect the expansion after append triggers it? Because the expansion will apply for a new array, that is to say, the underlying array of slice1 has changed and is separated from the underlying array of slice. At this time, the modified elements will definitely not affect each other.

Also note that after slice1 := slice[2:5] intercepts the slice, the length of the slice is 3, but the capacity is 8; because slice1 and slice share the underlying array, and the maximum capacity of the underlying array is 10, but slice1 is indexed from the underlying array 2, so the capacity of slice1 is 10 - 2 = 8.

Finally, let's think about another question. Earlier, we introduced that the array is passed by value when it is passed, and the array elements are modified inside the function, but the caller's array has not changed? What about slices? We need to keep in mind that the parameters in the Go language are passed by value? That's it, slices, like arrays, don't change. is that so? Let's verify it with a small example:

 package main

import "fmt"

func main() {
  slice := make([]int, 2, 10)
  slice[0] = 1
  slice[1] = 2
  fmt.Println(len(slice), cap(slice), slice)   //初始切片长度2,容量10:2 10 [1 2]

  testSlice(slice)
  fmt.Println(len(slice), cap(slice), slice)   //切片长度容量都没有改变,但是切片元素改变了:2 10 [100 200]
}

func testSlice(slice []int) {
  slice[0] = 100
  slice[1] = 200
  slice = append(slice, 300)
  fmt.Println(len(slice), cap(slice), slice) //修改切片元素,并追加一个元素,切片长度3,容量10:3 10 [100 200 300]
}

It seems to be different from the guess. The slice elements are modified in the testSlice function, and the slice elements in the main function are also changed synchronously; while the testSlice function adds elements and changes the slice length, but the slice slice length in the main function does not change. why? Is the Go language parameter pass-by-value or pass-by-reference? The Go language does pass parameters by value. The length and capacity are the values of slices, so even if the testSlice function modifies the main function, it will not change, but the underlying array is shared, and the testSlice function modifies the main function and it will be modified synchronously.

You may still be a little confused when you see this, don't worry, after learning the principle of slicing in the next section, I believe you will suddenly realize.

Implementation principle

We have always said that slices are dynamic arrays. How is this dynamic? We all know that arrays are stored in contiguous memory, so it is very troublesome to add elements, you need to apply for a larger contiguous memory space, copy all array elements, and the performance is very large. Slicing is also implemented based on arrays, but a pre-allocation strategy is adopted. Generally, the capacity of the slice is larger than the length of the slice, so that when adding elements to the slice, memory allocation and data copying can be avoided. In this way, slices also need to record more information: such as the first address of the array, which is used to store elements; capacity, which records the maximum number of elements that the underlying array can store; length, which records the number of elements that have been stored. The capacity minus the length is the remaining length of the array, that is, the number of elements that can be added to the slice before the expansion is triggered.

Slices are defined in the runtime/slice.go file as follows:

 type slice struct {
  array unsafe.Pointer
  len   int
  cap   int
}

As we guessed, the slice contains three fields, in fact, the array is a pointer to the address of the underlying array. The file also defines some commonly used slice operation functions:

 //make创建切片底层实现
func makeslice(et *_type, len, cap int) unsafe.Pointer
//切片追加元素时,容量不足扩容实现方法
func growslice(et *_type, old slice, cap int) slice
//切片数据拷贝
func slicecopy(toPtr unsafe.Pointer, toLen int, fromPtr unsafe.Pointer, fromLen int, width uintptr) int

When we use the make function to create a slice type, the bottom layer is to call the makeslice function to allocate an array, where the first parameter type represents the element type stored in the slice, so the memory size required for the array should be the element size multiplied by the group capacity. The implementation of the makeslice function is very simple, as follows:

 func makeslice(et *_type, len, cap int) unsafe.Pointer {
  //math.MulUintptr返回a * b,同时判断是否发生溢出
  mem, overflow := math.MulUintptr(et.size, uintptr(cap))
  
  //省略了一些参数校验逻辑

  return mallocgc(mem, et, true) //mallocgc函数用于分配内存,第三个参数表示是否初始化内存为全零
}

The function makeslice seems to just apply for the memory of the underlying array of slices, so what about other fields in the structure slice? How to maintain it? When a function parameter passes a slice, what exactly is passed? This requires us to analyze the assembly code. The Go program is as follows:

 package main

import "fmt"

func main() {
  slice := make([]int, 4, 10)
  slice[0] = 100
  printInt(len(slice))
  printInt(cap(slice))

  testSlice(slice)
}

func printInt(a int) {
  fmt.Println(a)
}

func testSlice(slice []int) {
  fmt.Println(slice)
}

The compiled assembly code is as follows:

 "".main STEXT size=153
  //makeslice第一个参数是类型指针,这里就是type.int
  0x0018 00024 (test.go:6)  LEAQ  type.int(SB), AX
  //准备第二个参数
  0x001f 00031 (test.go:6)  MOVL  $4, BX
  //准备第三个参数
  0x0024 00036 (test.go:6)  MOVL  $10, CX
  //函数调用;函数返回值即数组首地址,在AX寄存器
  0x0029 00041 (test.go:6)  CALL  runtime.makeslice(SB)
  //下面三行汇编是构造slice结构:数组首地址 + len + cap
  0x002e 00046 (test.go:6)  MOVQ  AX, "".slice+32(SP)
  0x0033 00051 (test.go:6)  MOVQ  $4, "".slice+40(SP)
  0x003c 00060 (test.go:6)  MOVQ  $10, "".slice+48(SP)

  //AX寄存器存储数组首地址,即赋值slice[0] = 100
  0x0047 00071 (test.go:7)  MOVQ  $100, (AX)

  //+40(SP)即切片的len,拷贝到AX寄存器作为参数传递
  0x004e 00078 (test.go:8)  MOVQ  "".slice+40(SP), AX
  0x0053 00083 (test.go:8)  MOVQ  AX, ""..autotmp_1+24(SP)
  0x0058 00088 (test.go:8)  CALL  "".printInt(SB)

  //+48(SP)即切片的cap,拷贝到AX寄存器作为参数传递
  0x005d 00093 (test.go:9)  MOVQ  "".slice+48(SP), AX
  0x0062 00098 (test.go:9)  MOVQ  AX, ""..autotmp_1+24(SP)
  0x0067 00103 (test.go:9)  CALL  "".printInt(SB)

  //拷贝slice结构:数组首地址 + len + cap,构造函数testSlice输入参数
  0x006c 00108 (test.go:11) MOVQ  "".slice+32(SP), AX
  0x0071 00113 (test.go:11) MOVQ  "".slice+40(SP), BX
  0x0076 00118 (test.go:11) MOVQ  "".slice+48(SP), CX
  0x0080 00128 (test.go:11) CALL  "".testSlice(SB)

Function input parameters can be on the stack, or you can use registers to pass input parameters. For example, in the above code, AX is the first input parameter, BX and CX are the second and third input parameters in turn; the return value of the function can also be in On the stack, registers can also be used. The above code uses the AX register as the first return value.

After all, the slice structure is very simple and clear, three 8 bytes, the first address of the array + len + cap, so it can be easily constructed by assembly code. len(slice) gets the length of the slice, and cap(slice) gets the slice capacity even more simply. The slice address is offset by 8 bytes and 16 bytes.

Also pay attention to the testSlice function call, which copies the slice structure as a function parameter. What about the underlying array? It must still be shared, so if the slice element is modified inside the function testSlice, the caller will also modify it synchronously; and the expansion triggered by append inside the function testSlice does not affect the len and cap of the caller's slice. This also solves some of the doubts we left in the previous section.

The schematic diagram of the above case is as follows:

Expansion

append is used to append elements to the slice. The underlying implementation will determine the slice capacity. If the capacity is insufficient, it will trigger expansion. There are usually two ways to write append: 1) append a slice to another slice; 2) append elements to a slice. As shown in the following example:

 package main

import "fmt"

func main() {
  slice := make([]int, 0, 100)
  slice = append(slice, 10, 20, 30)

  slice1 := []int{1, 2, 3}
  slice = append(slice, slice1...)
  
  fmt.Println(slice,slice1) //[10 20 30 1 2 3] [1 2 3]
}

Where is the append function implemented? If you look at the runtime/slice.go file, you will find that there is no appendslice function, but there is an implementation of growslice slice expansion. The append function is actually generated in the compilation phase, and there is no source code. Here, the core logic of the two writing methods is directly given:

 //参考1:cmd/compile/internal/walk/assign.go:appendSlice
//参考2:cmd/compile/internal/walk/builtin.go:walkAppend

// expand append(l1, l2...) to
//   init {
//     s := l1
//     n := len(s) + len(l2)
//     // Compare as uint so growslice can panic on overflow.
//     if uint(n) > uint(cap(s)) {
//       s = growslice(s, n)
//     }
//     s = s[:n]
//     memmove(&s[len(l1)], &l2[0], len(l2)*sizeof(T))
//   }


// Rewrite append(src, x, y, z) 
//   init {
//     s := src
//     const argc = len(args) - 1
//     if cap(s) - len(s) < argc {
//      s = growslice(s, len(s)+argc)
//     }
//     n := len(s)
//     s = s[:n+argc]
//     s[n] = a
//     s[n+1] = b
//     ...
//   }

It can be seen that when the capacity is insufficient, growslice is used to expand the capacity. When the slice capacity is small, the function growslice expands by twice; when the slice capacity is large, it expands by 25%. After determining the slice capacity, it is to apply for memory and copy the slice data to the new array at the same time. Interested readers can study the source code of the growslice function.

copy

Finally, let's explore another question: whether it is slice interception, parameter transfer, etc., the underlying array is initially shared. Modifying the elements of one slice will inevitably affect another slice. Is there a way to achieve a complete copy of the slice? After copying, the two slice arrays are also isolated and do not affect each other. This complete copy can be implemented based on the built-in function copy of the Go language:

 package main

import "fmt"

func main() {
  slice := []int{1,2,3,4,5}
  slice1 := make([]int, len(slice), 10)
  copy(slice1, slice)

  slice1[0] = 100
  fmt.Println(slice, slice1)
}

/**
  [1 2 3 4 5] [100 2 3 4 5]
**/

It can be seen that after modifying the slice slice1 element, the slice slice element has not changed. Here is another question, what is the implementation logic of the copy function? Is it the slicecopy function in the runtime/slice.go file? It can only be said that it is not completely. The Go language judges at the compilation stage that if the slice element type includes a pointer, the copy corresponds to the typedslicecopy function; if some runtime variables are required, the copy corresponds to the slicecopy function; otherwise, the compilation stage directly generates assembly code, which is directly here. Given the core logic of this assembly code:

 //参考:cmd/compile/internal/walk/builtin.go:walkCopy

// Lower copy(a, b) to a memmove call or a runtime call.
//
// init {
//   n := len(a)
//   if n > len(b) { n = len(b) }
//   if a.ptr != b.ptr { memmove(a.ptr, b.ptr, n*sizeof(elem(a))) }
// }

Summarize

At this point, the explanation of arrays and slices is basically completed. Did you expect that there are so many details to pay attention to? You must remember to pass parameters by value in an array. The slice structure definition of a slice must be clear. Combined with the structure definition, it should be easier to understand when you think about slice interception, parameter transmission, and expansion.


李烁
156 声望90 粉丝