Original link: How to efficiently perform string splicing in Go language (6 ways for comparative analysis)
Preface
Hello, everyone, my name is
asong
String splicing operations are inseparable in daily business development. Strings in different languages are implemented in different ways. In
Go
language, 6 is provided for string splicing. How to choose these splicing methods? ? Which one is more efficient? Let's analyze this article together.This article uses the Go language version: 1.17.1
string type
Let's first understand the structure definition of the string
Go
language, first look at the official definition:
// string is the set of all strings of 8-bit bytes, conventionally but not
// necessarily representing UTF-8-encoded text. A string may be empty, but
// not nil. Values of string type are immutable.
type string string
string
is a 8
-bit bytes, usually but not necessarily representing UTF-8 encoded text. string can be empty, but cannot be nil. cannot be changed.
string
type is essentially a structure, defined as follows:
type stringStruct struct {
str unsafe.Pointer
len int
}
stringStruct
and slice
still very similar. The str
points to the first address of an array, and len
represents the length of the array. Why is slice
so similar to 061a4abb10a920, the bottom layer also points to an array, what array is it? Let's take a look at the method he calls upon instantiation:
//go:nosplit
func gostringnocopy(str *byte) string {
ss := stringStruct{str: unsafe.Pointer(str), len: findnull(str)}
s := *(*string)(unsafe.Pointer(&ss))
return s
}
The parameter is a byte
type of pointer, we can see from string
type is a bottom byte
type of array, we can draw a picture such:
string
type is essentially an array of the byte
Go
language, the string
type is designed to be immutable, not only in the Go
language, but also in other languages. The string
type is also designed to be immutable in concurrency. In a scenario, we can use the same string multiple times without the control of a lock, without worrying about security issues while ensuring efficient sharing.
string
type though can not be changed, but can be replaced because stringStruct
in str
pointer can be changed, but the pointer can not change the content, also said that every change the string, you need to re-allocate a memory , The previously allocated space will be reclaimed gc
There are so many knowledge points about the string
type, which is convenient for us to analyze the string splicing later.
6 ways and principles of string splicing
Native splicing method "+"
Go
language natively supports the use of the +
operator to directly concatenate two strings. The usage example is as follows:
var s string
s += "asong"
s += "真帅"
This method is the easiest to use. Basically all languages provide this method. When the +
is used for splicing, the string will be traversed, calculated and opened up a new space to store the original two strings.
String formatting function fmt.Sprintf
Go
language uses the function fmt.Sprintf
for string formatting by default, so you can also use this method for string splicing:
str := "asong"
str = fmt.Sprintf("%s%s", str, str)
fmt.Sprintf
implementation principle of 061a4abb10ab31 mainly uses reflection. The specific source code analysis will not be analyzed in detail here because of the space. Seeing reflection will cause performance loss, you know! ! !
Strings.builder
Go
strings
that specializes in manipulating strings. strings.Builder
for string splicing, and it provides the writeString
method for splicing strings. The usage is as follows:
var builder strings.Builder
builder.WriteString("asong")
builder.String()
strings.builder
is very simple, and the structure is as follows:
type Builder struct {
addr *Builder // of receiver, to detect copies by value
buf []byte // 1
}
addr
field is mainly used for copycheck
, and the buf
field is a byte
type 061a4abb10ac00. This is used to store the string content. The writeString()
method provided is to append data to the buf
func (b *Builder) WriteString(s string) (int, error) {
b.copyCheck()
b.buf = append(b.buf, s...)
return len(s), nil
}
The String
method provided is to convert []]byte
to string
type. In order to avoid the problem of memory copy, forced conversion is used to avoid memory copy:
func (b *Builder) String() string {
return *(*string)(unsafe.Pointer(&b.buf))
}
bytes.Buffer
Because the bottom layer of the string
byte
array, so we can perform string splicing with bytes.Buffer
Go
bytes.Buffer
is a buffer of byte
, which stores all byte
. The usage is as follows:
buf := new(bytes.Buffer)
buf.WriteString("asong")
buf.String()
bytes.buffer
bottom layer of []byte
slice, the structure is as follows:
type Buffer struct {
buf []byte // contents are the bytes buf[off : len(buf)]
off int // read at &buf[off], write at &buf[len(buf)]
lastRead readOp // last read operation, so that Unread* can work correctly.
}
Because bytes.Buffer
can continue to Buffer
write tail data, from Buffer
read data head, so off
field is used to record the reading position, reuse sections cap
characteristics known writing position, this is not the focus of this, look at the focus WriteString
is how the 061a4abb10ad53 method concatenates strings:
func (b *Buffer) WriteString(s string) (n int, err error) {
b.lastRead = opInvalid
m, ok := b.tryGrowByReslice(len(s))
if !ok {
m = b.grow(len(s))
}
return copy(b.buf[m:], s), nil
}
A slice does not apply for a memory block when it is created. It only applies when data is written into it. The size of the first application is the size of the data to be written. If the written data is less than 64 bytes, the application will be based on 64 bytes. Using the mechanism of dynamic expansion slice
, string appending uses copy
to copy the appended part to the end. copy
is a built-in copy function that can reduce memory allocation.
But when the []byte
converted to the string
type, the standard type is still used, so memory allocation will occur:
func (b *Buffer) String() string {
if b == nil {
// Special case, useful in debugging.
return "<nil>"
}
return string(b.buf[b.off:])
}
strings.join
Strings.join
method can string
type slice into a string, and you can define the concatenation operator, which is used as follows:
baseSlice := []string{"asong", "真帅"}
strings.Join(baseSlice, "")
strings.join
is also strings.builder
, the code is as follows:
func Join(elems []string, sep string) string {
switch len(elems) {
case 0:
return ""
case 1:
return elems[0]
}
n := len(sep) * (len(elems) - 1)
for i := 0; i < len(elems); i++ {
n += len(elems[i])
}
var b Builder
b.Grow(n)
b.WriteString(elems[0])
for _, s := range elems[1:] {
b.WriteString(sep)
b.WriteString(s)
}
return b.String()
}
The only difference is that the join
method is called in the b.Grow(n)
method. This is for preliminary capacity allocation, and the length of n calculated above is the length of the slice we want to splice, because the length of the incoming slice is fixed, so the capacity allocation can be done in advance. Reduce memory allocation, very efficient.
Slice append
Because the bottom layer of string
byte
, we can redeclare a slice and use append
for string splicing. The usage is as follows:
buf := make([]byte, 0)
base = "asong"
buf = append(buf, base...)
string(base)
If you want to reduce the memory allocation in the []byte
converted to string
time type, consider using a cast.
Benchmark comparison
We have provided a total of 6 methods above. We basically know the principle, so we use Go
in the Benchmark
language to analyze which string splicing method is more efficient. We mainly analyze it in two situations:
- Concatenation of a small number of strings
- Mass string concatenation
Because there is a lot of code, only the analysis results are posted below. The detailed code has been uploaded github
: https://github.com/asong2020/Golang_Dream/tree/master/code_demo/string_join
We first define a basic string:
var base = "123456789qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASFGHJKLZXCVBNM"
For the test of a small number of string splicing, we use the method of splicing once to verify, base splicing base, so we get the benckmark result:
goos: darwin
goarch: amd64
pkg: asong.cloud/Golang_Dream/code_demo/string_join/once
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkSumString-16 21338802 49.19 ns/op 128 B/op 1 allocs/op
BenchmarkSprintfString-16 7887808 140.5 ns/op 160 B/op 3 allocs/op
BenchmarkBuilderString-16 27084855 41.39 ns/op 128 B/op 1 allocs/op
BenchmarkBytesBuffString-16 9546277 126.0 ns/op 384 B/op 3 allocs/op
BenchmarkJoinstring-16 24617538 48.21 ns/op 128 B/op 1 allocs/op
BenchmarkByteSliceString-16 10347416 112.7 ns/op 320 B/op 3 allocs/op
PASS
ok asong.cloud/Golang_Dream/code_demo/string_join/once 8.412s
To test a large number of string splicing, we first construct a string slice with a length of 200:
var baseSlice []string
for i := 0; i < 200; i++ {
baseSlice = append(baseSlice, base)
}
Then traverse this slice and continue to splice, because you can get benchmark
:
goos: darwin
goarch: amd64
pkg: asong.cloud/Golang_Dream/code_demo/string_join/muliti
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkSumString-16 7396 163612 ns/op 1277713 B/op 199 allocs/op
BenchmarkSprintfString-16 5946 202230 ns/op 1288552 B/op 600 allocs/op
BenchmarkBuilderString-16 262525 4638 ns/op 40960 B/op 1 allocs/op
BenchmarkBytesBufferString-16 183492 6568 ns/op 44736 B/op 9 allocs/op
BenchmarkJoinstring-16 398923 3035 ns/op 12288 B/op 1 allocs/op
BenchmarkByteSliceString-16 144554 8205 ns/op 60736 B/op 15 allocs/op
PASS
ok asong.cloud/Golang_Dream/code_demo/string_join/muliti 10.699s
in conclusion
Through two benchmark
, we can see that when a small number of strings are spliced, the +
operator is used directly to splice the strings. The efficiency is still quite high, but when the number of strings to be spliced comes up, the performance of the +
It is relatively low; function fmt.Sprintf
is still not suitable for string splicing, no matter how many strings are spliced, the performance loss is very large, or strings.Builder
whether it is a small number of strings The splicing is still a large number of string splicing, and the performance has always been stable. This is why the Go
language officially recommends using strings.builder
for string splicing. When using strings.builder
, it is best to use the Grow
method for preliminary capacity allocation. Observe the benchmark of the strings.join
It can be found that because the grow
method is used, the memory is allocated in advance. During the string splicing process, there is no need to copy the string or allocate new memory. This way, using strings.builder
best performance and has the least memory consumption. . bytes.Buffer
performance of the 061a4abb10b3e2 method is lower than strings.builder
. When bytes.Buffer
converted to a string, a new space is applied to store the generated string variable. Unlike strings.buidler
the underlying []byte
directly converted to a string type return, which takes up more Lots of space.
The conclusion of the final analysis of synchronization:
No matter what the situation is, using strings.builder
for string splicing is the most efficient, but to use it mainly, remember to call grow
for capacity allocation to be efficient. strings.join
is approximately equal to strings.builder
. It can be used when the string is already sliced. It is not recommended when it is unknown. There is a performance loss in constructing slices. If a small amount of string splicing is performed, directly using the +
operator is the most convenient and performance The highest, you can give up the use of strings.builder
Comprehensive comparison performance ranking:
strings.join
≈ strings.builder
> bytes.buffer
> []byte
conversion string
> "+"> fmt.sprintf
Summarize
In this paper we focused 6
be kind of splicing string introduction, and by benckmark
comparing the efficiency whenever using strings.builder
go wrong, but when a small amount of string concatenation, direct +
is the better way, specific business scenarios Specific analysis, do not generalize.
The code in the article has been uploaded github
: https://github.com/asong2020/Golang_Dream/tree/master/code_demo/string_join
, this is the end of this article, I am asong
, see you in the next issue.
Welcome to pay attention to the public account: Golang DreamWorks
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。