1

Features

You can see the definition and description of the built-in type string from the standard library file src/builtin/builtin.go

// string is the set of all strings of 8-bit bytes, conventionally but not
// necessarily representing UTF-8-encoded text. A string may be empty, but
// not nil. Values of string type are immutable.
type string string

From this we can see that string is a collection of 8-bit bytes, usually but not necessarily UTF-8 encoded text. In addition, string can be empty (length is 0), but it will not be nil, and the string object cannot be modified.

Strings can be assigned using double quotation marks or back single quotation marks. There is not much difference between a string declared with double quotation marks and strings in other languages. It can only be used for the initialization of single-line strings. If special symbols such as line breaks or double quotation marks appear in the string, you need to use the \ symbol to convert The string declared by backticks can get rid of the single-line restriction, and special symbols can be used directly inside the string, which is very convenient in scenarios where handwritten JSON or other complex data formats are required.

Implementation Principle

data structure

The source code package src/runtime/string.go:stringStruct defines the data structure of string:

type stringStruct struct {
    str unsafe.Pointer
    len int
}

The structure is very simple, and the two fields respectively represent the first address and length of the string.

When generating a string, a stringStruct object will be constructed first and then converted into a string. The code is as follows:

func gostringnocopy(str *byte) string {
    ss := stringStruct{str: unsafe.Pointer(str), len: findnull(str)}
    s := *(*string)(unsafe.Pointer(&ss))
    return s
}

related operations

String splicing

In the runtime package, the concatstrings function is used to concatenate strings. All strings to be concatenated are organized into a slice and passed in. The core source code is as follows:

func concatstrings(buf *tmpBuf, a []string) string {
    // 计算待拼接字符串切片长度及个数,以此申请内存
    idx := 0
    l := 0
    count := 0
    for i, x := range a {
        n := len(x)
        if n == 0 {
            continue
        }
        if l+n < l {
            throw("string concatenation too long")
        }
        l += n
        count++
        idx = i
    }
    if count == 0 {
        return ""
    }

    // 如果非空字符串的数量为 1 且当前字符串不在栈上,直接返回该字符串
    if count == 1 && (buf != nil || !stringDataOnStack(a[idx])) {
        return a[idx]
    }
    // 分配内存,构造一个字符串和切片,二者共享内存
    s, b := rawstringtmp(buf, l)
    // 向切片中拷贝待拼接字符串
    for _, x := range a {
        copy(b, x)
        b = b[len(x):]
    }
    // 返回拼接后字符串
    return s
}

It should be noted that under normal circumstances, copy will be called during runtime to copy multiple input strings to the memory space where the target string is located. Once the string to be spliced is very large, the performance loss caused by the copy cannot be ignored.

Type conversion

When we use the Go language to parse and serialize data formats such as JSON, we often need to convert the data back and forth between string and []byte.

The slicebytetostring function is needed to convert from byte array to string. The core source code is as follows:

func slicebytetostring(buf *tmpBuf, ptr *byte, n int) (str string) {
    // 字节数组长度为 0 或 1 时特殊处理
    if n == 0 {
        return ""
    }
    if n == 1 {
        p := unsafe.Pointer(&staticuint64s[*ptr])
        if sys.BigEndian {
            p = add(p, 7)
        }
        stringStructOf(&str).str = p
        stringStructOf(&str).len = 1
        return
    }

    var p unsafe.Pointer
    // 根据传入的缓冲区大小决定是否需要为新字符串分配内存空间
    if buf != nil && n <= len(buf) {
        p = unsafe.Pointer(buf)
    } else {
        p = mallocgc(uintptr(n), nil, false)
    }
    stringStructOf(&str).str = p
    stringStructOf(&str).len = n
    // 将原 []byte 中的字节全部复制到新的内存空间中
    memmove(p, unsafe.Pointer(ptr), uintptr(n))
    return
}

When we want to convert a string into []byte type, we need to use the stringtoslicebyte function, the implementation of this function is very easy to understand:

func stringtoslicebyte(buf *tmpBuf, s string) []byte {
    var b []byte
    // 当传入缓冲区并且空间足够时,从该缓冲区切取字符串长度大小切片,否则构造一个切片
    if buf != nil && len(s) <= len(buf) {
        *buf = tmpBuf{}
        b = buf[:len(s)]
    } else {
        b = rawbyteslice(len(s))
    }
    // 将字符串复制到切片中
    copy(b, s)
    return b
}

There are many scenarios where []byte is converted to string. For performance reasons, sometimes when the string is only needed temporarily, copying will not happen at this time, but a string will be returned directly, and the pointer in it will point to []byte. the address of. Moreover, we need to keep in mind: the overhead of type conversion is not as small as expected, and it often becomes a performance hotspot of the program.


与昊
225 声望636 粉丝

IT民工,主要从事web方向,喜欢研究技术和投资之道