Go bytebufferpool of a library every day

Introduction

In programming development, we often need frequently create and destroy similar objects. Such operations are likely to have an impact on performance. At this time, the commonly used optimization method is to use the object pool (object pool). When we need to create an object, we first look for it in the object pool. If there is a free object, remove the object from the pool and return it to the caller. Only when there are no free objects in the pool, a new object is actually created. On the other hand, after the object is used up, we do not destroy it. Instead, it is returned to the object pool for subsequent use. Using object pools can greatly improve performance in situations where objects are frequently created and destroyed. At the same time, in order to avoid the objects in the object pool occupying too much memory. The object pool is generally equipped with a specific cleaning strategy. The Go standard library sync.Pool is one such example. sync.Pool in 0609b51a78ac67 will be cleaned up by garbage collection.

Among these types of objects, a special one is byte buffering (the bottom layer is generally byte slices). When doing string concatenation, in order to concatenate efficiently, we usually store the intermediate result in a byte buffer. After the splicing is completed, the resulting string is generated from the byte buffer. When sending and receiving network packets, it is also necessary to temporarily store incomplete packets in the byte buffer.

bytes.Buffer in the Go standard library encapsulates byte slices and provides some interfaces for use. We know that the capacity of slices is limited, and expansion is needed when the capacity is insufficient. Frequent expansion can easily cause performance jitter. bytebufferpool implements its own Buffer type, and uses a simple algorithm to reduce the performance loss caused by expansion. bytebufferpool has been applied in the famous web framework fasthttp and the flexible Go module library quicktemplate . In fact, these 3 libraries are the same author: valyala😀.

Quick to use

The code in this article uses Go Modules.

Create a directory and initialize:

$ mkdir bytebufferpool && cd bytebufferpool
$ go mod init github.com/darjun/go-daily-lib/bytebufferpool

Install the bytebufferpool library:

$ go get -u github.com/PuerkitoBio/bytebufferpool

A typical use first by bytebufferpool provided Get() obtain a method bytebufferpool.Buffer object and then call the object's method of writing data, and then use the call after the completion of bytebufferpool.Put() objects back into the object pool. example:

package main

import (
  "fmt"

  "github.com/valyala/bytebufferpool"
)

func main() {
  b := bytebufferpool.Get()
  b.WriteString("hello")
  b.WriteByte(',')
  b.WriteString(" world!")

  fmt.Println(b.String())

  bytebufferpool.Put(b)
}

Directly call Get() Put() bytebufferpool package, and the underlying operation is the default object pool in the package:

// bytebufferpool/pool.go
var defaultPool Pool

func Get() *ByteBuffer { return defaultPool.Get() }
func Put(b *ByteBuffer) { defaultPool.Put(b) }

Of course, we can create a new object pool according to actual needs and put together objects of the same use (for example, we can create an object pool to assist in receiving network packets, and one to assist in splicing strings):

func main() {
  joinPool := new(bytebufferpool.Pool)
  b := joinPool.Get()
  b.WriteString("hello")
  b.WriteByte(',')
  b.WriteString(" world!")

  fmt.Println(b.String())

  joinPool.Put(b)
}

bytebufferpool does not provide a specific creation function, but it can be created new

Optimize the details

When the object is returned to the pool, it will be processed according to the capacity of the current slice. bytebufferpool divides the size into 20 intervals:

| < 2^6 | 2^6 ~ 2^7-1 | ... | > 2^25 |

If the capacity is less than 2^6, it belongs to the first interval. If it is between 2^6 and 2^7-1, it falls into the second interval. And so on. After performing enough replacement times, bytebufferpool will recalibrate and calculate which interval has the most objects. Set defaultSize as the upper limit capacity of the interval, the upper limit capacity of the first interval is 2^6, the second interval is 2^7, and the last interval is 2^26. When subsequently Get() , if there are no free objects in the pool, when creating a new object, directly set the capacity to defaultSize . This basically avoids slice expansion during use, thereby improving performance. The following combined code to understand:

// bytebufferpool/pool.go
const (
  minBitSize = 6 // 2**6=64 is a CPU cache line size
  steps      = 20

  minSize = 1 << minBitSize
  maxSize = 1 << (minBitSize + steps - 1)

  calibrateCallsThreshold = 42000
  maxPercentile           = 0.95
)

type Pool struct {
  calls       [steps]uint64
  calibrating uint64

  defaultSize uint64
  maxSize     uint64

  pool sync.Pool
}

We can see that bytebufferpool internally uses the object sync.Pool the standard library.

Here steps is the interval mentioned above, a total of 20 copies. calls array records the number of times the object capacity falls in each interval.

When calling Pool.Get() put the object back, first calculate the interval in which the slice capacity falls, and increase the value of the corresponding element in the calls

// bytebufferpool/pool.go
func (p *Pool) Put(b *ByteBuffer) {
  idx := index(len(b.B))

  if atomic.AddUint64(&p.calls[idx], 1) > calibrateCallsThreshold {
    p.calibrate()
  }

  maxSize := int(atomic.LoadUint64(&p.maxSize))
  if maxSize == 0 || cap(b.B) <= maxSize {
    b.Reset()
    p.pool.Put(b)
  }
}

If calls array exceeds the specified value calibrateCallsThreshold=42000 (indicating that since the last calibration, the number of times the object is placed in the interval has reached the threshold, 42000 should be an empirical number), then call Pool.calibrate() perform the calibration operation:

// bytebufferpool/pool.go
func (p *Pool) calibrate() {
  // 避免并发放回对象触发 `calibrate`
  if !atomic.CompareAndSwapUint64(&p.calibrating, 0, 1) {
    return
  }

  // step 1.统计并排序
  a := make(callSizes, 0, steps)
  var callsSum uint64
  for i := uint64(0); i < steps; i++ {
    calls := atomic.SwapUint64(&p.calls[i], 0)
    callsSum += calls
    a = append(a, callSize{
      calls: calls,
      size:  minSize << i,
    })
  }
  sort.Sort(a)

  // step 2.计算 defaultSize 和 maxSize
  defaultSize := a[0].size
  maxSize := defaultSize

  maxSum := uint64(float64(callsSum) * maxPercentile)
  callsSum = 0
  for i := 0; i < steps; i++ {
    if callsSum > maxSum {
      break
    }
    callsSum += a[i].calls
    size := a[i].size
    if size > maxSize {
      maxSize = size
    }
  }

  // step 3.保存对应值
  atomic.StoreUint64(&p.defaultSize, defaultSize)
  atomic.StoreUint64(&p.maxSize, maxSize)

  atomic.StoreUint64(&p.calibrating, 0)
}

step 1. Count and sort

calls array records the number of times the object is put back into the corresponding interval. Sort by this number from largest to smallest. Note: minSize << i represents the upper limit capacity of the i

step 2. Calculate defaultSize and maxSize

defaultSize well understood, just take the first size after sorting. maxSize value records the maximum value of multiple object capacities whose replacement times exceed 95%. Its function is to prevent the less used large-capacity objects from being put back into the object pool, thus taking up too much memory. Here you can understand the logic of the second half of the Pool.Put()

// 如果要放回的对象容量大于 maxSize，则不放回
maxSize := int(atomic.LoadUint64(&p.maxSize))
if maxSize == 0 || cap(b.B) <= maxSize {
  b.Reset()
  p.pool.Put(b)
}

step 3. Save the corresponding value

When subsequently Pool.Get() , if there are no free objects in the pool, the default capacity of newly created objects is defaultSize . Such capacity can meet the use in most cases and avoid slicing expansion during use.

// bytebufferpool/pool.go
func (p *Pool) Get() *ByteBuffer {
  v := p.pool.Get()
  if v != nil {
    return v.(*ByteBuffer)
  }
  return &ByteBuffer{
    B: make([]byte, 0, atomic.LoadUint64(&p.defaultSize)),
  }
}

Some other details:

The minimum capacity is 2^6 = 64, because this is the size of the CPU cache line on a 64-bit computer. Data of this size can be loaded into the CPU cache line at one time, and it is meaningless no matter how small it is.
atomic atomic operation is used multiple times in the code to avoid performance loss caused by locking.

Of course, the shortcomings of this library are also obvious, because most of the used capacity is less than defaultSize , there will be some memory waste.

to sum up

Remove the comments and blank lines, bytebufferpool only used about 150 lines of code to realize a high-performance Buffer object pool. The details are worth savoring. Reading high-quality code helps to improve your coding skills and learn the details of coding. It is strongly recommended to take the time to read it carefully! ! !

If you find a fun and useful Go language library, welcome to submit an issue on the Go Daily Library GitHub😄

reference

bytebufferpool GitHub：https://github.com/valyala/bytebufferpool
Go daily library GitHub: https://github.com/darjun/go-daily-lib

I

My blog: https://darjun.github.io

Welcome to follow my WeChat public account [GoUpUp], learn together and make progress together~

Go bytebufferpool of a library every day

Introduction

Quick to use

Optimize the details

to sum up

reference

I

darjun

引用和评论

一起用Go做一个小游戏（下）

年度最佳【golang】sync.Pool详解

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

Go slice切片使用教程，一次通关！

gozero限流、熔断、降级如何实现？面试的时候怎么回答？

腾讯 tRPC-Go 教学——（1）搭建服务

如何系统地入门学习stm32？