1
头图

Original link: How to efficiently perform string splicing in Go language (6 ways for comparative analysis)

Preface

Hello, everyone, my name is asong

String splicing operations are inseparable in daily business development. Strings in different languages are implemented in different ways. In Go language, 6 is provided for string splicing. How to choose these splicing methods? ? Which one is more efficient? Let's analyze this article together.

This article uses the Go language version: 1.17.1

string type

Let's first understand the structure definition of the string Go language, first look at the official definition:

// string is the set of all strings of 8-bit bytes, conventionally but not
// necessarily representing UTF-8-encoded text. A string may be empty, but
// not nil. Values of string type are immutable.
type string string

string is a 8 -bit bytes, usually but not necessarily representing UTF-8 encoded text. string can be empty, but cannot be nil. cannot be changed.

string type is essentially a structure, defined as follows:

type stringStruct struct {
    str unsafe.Pointer
    len int
}

stringStruct and slice still very similar. The str points to the first address of an array, and len represents the length of the array. Why is slice so similar to 061a4abb10a920, the bottom layer also points to an array, what array is it? Let's take a look at the method he calls upon instantiation:

//go:nosplit
func gostringnocopy(str *byte) string {
    ss := stringStruct{str: unsafe.Pointer(str), len: findnull(str)}
    s := *(*string)(unsafe.Pointer(&ss))
    return s
}

The parameter is a byte type of pointer, we can see from string type is a bottom byte type of array, we can draw a picture such:

string type is essentially an array of the byte Go language, the string type is designed to be immutable, not only in the Go language, but also in other languages. The string type is also designed to be immutable in concurrency. In a scenario, we can use the same string multiple times without the control of a lock, without worrying about security issues while ensuring efficient sharing.

string type though can not be changed, but can be replaced because stringStruct in str pointer can be changed, but the pointer can not change the content, also said that every change the string, you need to re-allocate a memory , The previously allocated space will be reclaimed gc

There are so many knowledge points about the string type, which is convenient for us to analyze the string splicing later.

6 ways and principles of string splicing

Native splicing method "+"

Go language natively supports the use of the + operator to directly concatenate two strings. The usage example is as follows:

var s string
s += "asong"
s += "真帅"

This method is the easiest to use. Basically all languages provide this method. When the + is used for splicing, the string will be traversed, calculated and opened up a new space to store the original two strings.

String formatting function fmt.Sprintf

Go language uses the function fmt.Sprintf for string formatting by default, so you can also use this method for string splicing:

str := "asong"
str = fmt.Sprintf("%s%s", str, str)

fmt.Sprintf implementation principle of 061a4abb10ab31 mainly uses reflection. The specific source code analysis will not be analyzed in detail here because of the space. Seeing reflection will cause performance loss, you know! ! !

Strings.builder

Go strings that specializes in manipulating strings. strings.Builder for string splicing, and it provides the writeString method for splicing strings. The usage is as follows:

var builder strings.Builder
builder.WriteString("asong")
builder.String()

strings.builder is very simple, and the structure is as follows:

type Builder struct {
    addr *Builder // of receiver, to detect copies by value
    buf  []byte // 1
}

addr field is mainly used for copycheck , and the buf field is a byte type 061a4abb10ac00. This is used to store the string content. The writeString() method provided is to append data to the buf

func (b *Builder) WriteString(s string) (int, error) {
    b.copyCheck()
    b.buf = append(b.buf, s...)
    return len(s), nil
}

The String method provided is to convert []]byte to string type. In order to avoid the problem of memory copy, forced conversion is used to avoid memory copy:

func (b *Builder) String() string {
    return *(*string)(unsafe.Pointer(&b.buf))
}

bytes.Buffer

Because the bottom layer of the string byte array, so we can perform string splicing with bytes.Buffer Go bytes.Buffer is a buffer of byte , which stores all byte . The usage is as follows:

buf := new(bytes.Buffer)
buf.WriteString("asong")
buf.String()

bytes.buffer bottom layer of []byte slice, the structure is as follows:

type Buffer struct {
    buf      []byte // contents are the bytes buf[off : len(buf)]
    off      int    // read at &buf[off], write at &buf[len(buf)]
    lastRead readOp // last read operation, so that Unread* can work correctly.
}

Because bytes.Buffer can continue to Buffer write tail data, from Buffer read data head, so off field is used to record the reading position, reuse sections cap characteristics known writing position, this is not the focus of this, look at the focus WriteString is how the 061a4abb10ad53 method concatenates strings:

func (b *Buffer) WriteString(s string) (n int, err error) {
    b.lastRead = opInvalid
    m, ok := b.tryGrowByReslice(len(s))
    if !ok {
        m = b.grow(len(s))
    }
    return copy(b.buf[m:], s), nil
}

A slice does not apply for a memory block when it is created. It only applies when data is written into it. The size of the first application is the size of the data to be written. If the written data is less than 64 bytes, the application will be based on 64 bytes. Using the mechanism of dynamic expansion slice , string appending uses copy to copy the appended part to the end. copy is a built-in copy function that can reduce memory allocation.

But when the []byte converted to the string type, the standard type is still used, so memory allocation will occur:

func (b *Buffer) String() string {
    if b == nil {
        // Special case, useful in debugging.
        return "<nil>"
    }
    return string(b.buf[b.off:])
}

strings.join

Strings.join method can string type slice into a string, and you can define the concatenation operator, which is used as follows:

baseSlice := []string{"asong", "真帅"}
strings.Join(baseSlice, "")

strings.join is also strings.builder , the code is as follows:

func Join(elems []string, sep string) string {
    switch len(elems) {
    case 0:
        return ""
    case 1:
        return elems[0]
    }
    n := len(sep) * (len(elems) - 1)
    for i := 0; i < len(elems); i++ {
        n += len(elems[i])
    }

    var b Builder
    b.Grow(n)
    b.WriteString(elems[0])
    for _, s := range elems[1:] {
        b.WriteString(sep)
        b.WriteString(s)
    }
    return b.String()
}

The only difference is that the join method is called in the b.Grow(n) method. This is for preliminary capacity allocation, and the length of n calculated above is the length of the slice we want to splice, because the length of the incoming slice is fixed, so the capacity allocation can be done in advance. Reduce memory allocation, very efficient.

Slice append

Because the bottom layer of string byte , we can redeclare a slice and use append for string splicing. The usage is as follows:

buf := make([]byte, 0)
base = "asong"
buf = append(buf, base...)
string(base)

If you want to reduce the memory allocation in the []byte converted to string time type, consider using a cast.

Benchmark comparison

We have provided a total of 6 methods above. We basically know the principle, so we use Go in the Benchmark language to analyze which string splicing method is more efficient. We mainly analyze it in two situations:

  • Concatenation of a small number of strings
  • Mass string concatenation

Because there is a lot of code, only the analysis results are posted below. The detailed code has been uploaded github : https://github.com/asong2020/Golang_Dream/tree/master/code_demo/string_join

We first define a basic string:

var base  = "123456789qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASFGHJKLZXCVBNM"

For the test of a small number of string splicing, we use the method of splicing once to verify, base splicing base, so we get the benckmark result:

goos: darwin
goarch: amd64
pkg: asong.cloud/Golang_Dream/code_demo/string_join/once
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkSumString-16           21338802                49.19 ns/op          128 B/op          1 allocs/op
BenchmarkSprintfString-16        7887808               140.5 ns/op           160 B/op          3 allocs/op
BenchmarkBuilderString-16       27084855                41.39 ns/op          128 B/op          1 allocs/op
BenchmarkBytesBuffString-16      9546277               126.0 ns/op           384 B/op          3 allocs/op
BenchmarkJoinstring-16          24617538                48.21 ns/op          128 B/op          1 allocs/op
BenchmarkByteSliceString-16     10347416               112.7 ns/op           320 B/op          3 allocs/op
PASS
ok      asong.cloud/Golang_Dream/code_demo/string_join/once     8.412s

To test a large number of string splicing, we first construct a string slice with a length of 200:

var baseSlice []string
for i := 0; i < 200; i++ {
        baseSlice = append(baseSlice, base)
}

Then traverse this slice and continue to splice, because you can get benchmark :

goos: darwin
goarch: amd64
pkg: asong.cloud/Golang_Dream/code_demo/string_join/muliti
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkSumString-16                       7396            163612 ns/op         1277713 B/op        199 allocs/op
BenchmarkSprintfString-16                   5946            202230 ns/op         1288552 B/op        600 allocs/op
BenchmarkBuilderString-16                 262525              4638 ns/op           40960 B/op          1 allocs/op
BenchmarkBytesBufferString-16             183492              6568 ns/op           44736 B/op          9 allocs/op
BenchmarkJoinstring-16                    398923              3035 ns/op           12288 B/op          1 allocs/op
BenchmarkByteSliceString-16               144554              8205 ns/op           60736 B/op         15 allocs/op
PASS
ok      asong.cloud/Golang_Dream/code_demo/string_join/muliti   10.699s

in conclusion

Through two benchmark , we can see that when a small number of strings are spliced, the + operator is used directly to splice the strings. The efficiency is still quite high, but when the number of strings to be spliced comes up, the performance of the + It is relatively low; function fmt.Sprintf is still not suitable for string splicing, no matter how many strings are spliced, the performance loss is very large, or strings.Builder whether it is a small number of strings The splicing is still a large number of string splicing, and the performance has always been stable. This is why the Go language officially recommends using strings.builder for string splicing. When using strings.builder , it is best to use the Grow method for preliminary capacity allocation. Observe the benchmark of the strings.join It can be found that because the grow method is used, the memory is allocated in advance. During the string splicing process, there is no need to copy the string or allocate new memory. This way, using strings.builder best performance and has the least memory consumption. . bytes.Buffer performance of the 061a4abb10b3e2 method is lower than strings.builder . When bytes.Buffer converted to a string, a new space is applied to store the generated string variable. Unlike strings.buidler the underlying []byte directly converted to a string type return, which takes up more Lots of space.

The conclusion of the final analysis of synchronization:

No matter what the situation is, using strings.builder for string splicing is the most efficient, but to use it mainly, remember to call grow for capacity allocation to be efficient. strings.join is approximately equal to strings.builder . It can be used when the string is already sliced. It is not recommended when it is unknown. There is a performance loss in constructing slices. If a small amount of string splicing is performed, directly using the + operator is the most convenient and performance The highest, you can give up the use of strings.builder

Comprehensive comparison performance ranking:

strings.joinstrings.builder > bytes.buffer > []byte conversion string > "+"> fmt.sprintf

Summarize

In this paper we focused 6 be kind of splicing string introduction, and by benckmark comparing the efficiency whenever using strings.builder go wrong, but when a small amount of string concatenation, direct + is the better way, specific business scenarios Specific analysis, do not generalize.

The code in the article has been uploaded github : https://github.com/asong2020/Golang_Dream/tree/master/code_demo/string_join

, this is the end of this article, I am asong , see you in the next issue.

Welcome to pay attention to the public account: Golang DreamWorks


asong
605 声望906 粉丝