Go SliceHeader and StringHeader, do you know?

Hello everyone, I am fried fish.

There are always some weird-looking things in the Go language. At first glance, they feel very familiar, but they don’t understand their actual meaning in the Go code. Interviewers love to ask...

Today I want to introduce you to the SliceHeader and StringHeader structures, to understand exactly what they are and what they are for, and I will introduce you to the content of 0 copy conversion at the end.

Let's happily start the road of fish-sucking together.

SliceHeader

SliceHeader as its name, Slice + Header, looks very intuitive, but it is actually the runtime performance of Go Slice (slice).

The definition of SliceHeader is as follows:

type SliceHeader struct {
 Data uintptr
 Len  int
 Cap  int
}

Data: Point to the specific underlying array.
Len: Represents the length of the slice.
Cap: Represents the capacity of the slice.

Now that we know the runtime performance of slices, does that mean we can make one ourselves?

In daily programs, you can use the SliceHeader reflect to create one:

func main() {
  // 初始化底层数组
 s := [4]string{"脑子", "进", "煎鱼", "了"}
 s1 := s[0:1]
 s2 := s[:]

  // 构造 SliceHeader
 sh1 := (*reflect.SliceHeader)(unsafe.Pointer(&s1))
 sh2 := (*reflect.SliceHeader)(unsafe.Pointer(&s2))
 fmt.Println(sh1.Len, sh1.Cap, sh1.Data)
 fmt.Println(sh2.Len, sh2.Cap, sh2.Data)
}

What do you think is the output? Will these two new slices point to the memory address of the same underlying array?

Output result:

1 4 824634330936
4 4 824634330936

The underlying array pointed to by the Data property of the two slices is the same, the value of the Len property is different, sh1 and sh2 are two slices respectively.

doubt

Why are the Data pointed to by the two new slices at the same address?

This is actually the design of the Go language itself in order to reduce memory usage and improve overall performance.

When copying a slice to any function, it will not affect the size of the underlying array. When copying, only the slice itself is copied (value transfer), and the underlying array is not involved.

That is, to transfer slices between functions, it only copies 24 bytes (8 bytes for the pointer field, and 8 bytes for the length and capacity respectively), which is very efficient.

pit

This design also leads to a new problem. In ordinary s[i:j] , the bottom layers of the two slices point to the same bottom array.

Assuming that does not exceed the capacity (cap), the second slice operation will affect the first slice .

This is a big "pit" that many Go developers often encounter, and the unclear investigation for a long time will not end.

StringHeader

In addition to SliceHeader, there is a typical representative in the Go language, which is the runtime performance of strings.

The definition of StringHeader is as follows:

type StringHeader struct {
   Data uintptr
   Len  int
}

Data: Storage pointer, which points to a specific memory area for storing data.
Len: The length of the string.

It can be seen that the underlying data of the "Hello" string is as follows:

var data = [...]byte{
    'h', 'e', 'l', 'l', 'o',
}

The schematic diagram of the underlying storage is as follows:

The picture comes from the network

The real demonstration example is as follows:

func main() {
 s := "脑子进煎鱼了"
 s1 := "脑子进煎鱼了"
 s2 := "脑子进煎鱼了"[7:]

 fmt.Printf("%d \n", (*reflect.StringHeader)(unsafe.Pointer(&s)).Data)
 fmt.Printf("%d \n", (*reflect.StringHeader)(unsafe.Pointer(&s1)).Data)
 fmt.Printf("%d \n", (*reflect.StringHeader)(unsafe.Pointer(&s2)).Data)
}

What do you think is the output? Will the variables s and s1 and s2 point to the same underlying memory space?

Output result:

17608227 
17608227 
17608234

From the output result, the variables s and s1 point to the same memory address. Although the variable s2 is slightly deviated, it essentially points to the same block.

Because it is a string slicing operation, starting from the 7th bit index, so exactly 17608234-17608227 = 7. That is, the three variables all point to the same memory space. Why?

This is because in the Go language, the strings are read-only. In order to save memory, the same literal string usually corresponds to the same string constant, so it points to the same underlying array .

0 Copy conversion

Why would anyone run into this kind of attention SliceHeader, StringHeader when the details of it, there will be a large part of the reason is the industry developers, want to use it to achieve zero copy string of bytes to convert .

Common conversion codes are as follows:

func string2bytes(s string) []byte {
 stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))

 bh := reflect.SliceHeader{
  Data: stringHeader.Data,
  Len:  stringHeader.Len,
  Cap:  stringHeader.Len,
 }

 return *(*[]byte)(unsafe.Pointer(&bh))
}

But this is actually wrong, the official clearly stated:

the Data field is not sufficient to guarantee the data it references will not be garbage collected, so programs must keep a separate, correctly typed pointer to the underlying data.

The Data field of uintptr type 061703a3e5a129. Because the Go language only passes by value.

Therefore, in the above code, Data will be copied as a value, which will cause be unable to guarantee that the data it refers to will not be garbage collected (GC) .

The following conversion methods should be used:

func main() {
 s := "脑子进煎鱼了"
 v := string2bytes1(s)
 fmt.Println(v)
}

func string2bytes1(s string) []byte {
 stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))

 var b []byte
 pbytes := (*reflect.SliceHeader)(unsafe.Pointer(&b))
 pbytes.Data = stringHeader.Data
 pbytes.Len = stringHeader.Len
 pbytes.Cap = stringHeader.Len

 return b
}

A separate, correct type of pointer to the underlying data must be kept in the program.

In terms of performance, if you only expect pure conversion and are not sensitive to field values such as capacity (cap), you can also use the following methods:

func string2bytes2(s string) []byte {
 return *(*[]byte)(unsafe.Pointer(&s))
}

Performance comparison:

string2bytes1-1000-4   3.746 ns/op  0 allocs/op
string2bytes1-1000-4   3.713 ns/op  0 allocs/op
string2bytes1-1000-4   3.969 ns/op  0 allocs/op

string2bytes2-1000-4   2.445 ns/op  0 allocs/op
string2bytes2-1000-4   2.451 ns/op  0 allocs/op
string2bytes2-1000-4   2.455 ns/op  0 allocs/op

The conversion performance will be fairly standard and will be slightly faster, and this strong conversion will also cause a small problem.

code show as below:

func main() {
 s := "脑子进煎鱼了"
 v := string2bytes2(s)
 println(len(v), cap(v))
}
func string2bytes2(s string) []byte {
 return *(*[]byte)(unsafe.Pointer(&s))
}

Output result:

18 824633927632

This forced conversion will result in a very large byte slice capacity, which requires special attention. It is generally recommended to use the standard SliceHeader and StringHeader methods, which are also easy for future maintainers to understand.

Summarize

In this article, we introduced two runtime performances of string and slice, namely StringHeader and SliceHeader.

At the same time, after understanding its runtime performance, we also explained the address points and common pits of the two.

Finally, we went further and introduced and analyzed the performance of the scene of zero-copy conversion.

Have you ever encountered any doubts or problems in this area? Welcome everyone to discuss together!

If you have any questions, welcome feedback and exchanges in the comment area. The best relationship between . Your likes is the greatest motivation for the creation of fried fish

Article continually updated, you can head into the micro-channel search [fried] read the article GitHub github.com/eddycjy/blog already been included, Go language learning can be seen Go Learning Map and Directions , welcome Star urge more.

refer to

The essence of slice in Go language-SliceHeader
Arrays, strings, and slices
Zero copy implementation of string and bytes conversion questions

Go SliceHeader and StringHeader, do you know?

SliceHeader

doubt

pit

StringHeader

0 Copy conversion

Summarize

refer to

煎鱼

引用和评论

Cloudflare 从 PHP 到 Go：迁移与经验分享

这些年

一个用JavaScript生成思维导图(mindmap)的github repo

还在用命令行监控服务器？试试这款监控工具吧，直观又易用！

Cloudflare 从 PHP 到 Go：迁移与经验分享

探索 Java 线程的创建

从零开始 Elasticsearh Docker 单机集群