go-zero microservices combat series (9. Ultimate optimization of spike performance)

In the previous article, the message queue was introduced to reduce the peak of the spike traffic. We use Kafka, which seems to be working well, but there are still many hidden dangers. If these hidden dangers are not optimized and dealt with, then the spike buys After the activity starts, there may be problems such as message accumulation, consumption delay, data inconsistency, and even service crash, so the consequences can be imagined. In this article, we will solve these hidden dangers together.

Bulk Data Aggregation

In the method of SeckillOrder , a message is often sent in Kafka for every seckill snap-up request. If 10 million users come to buy at the same time, even if we implement various current limiting strategies, there may still be millions of messages sent to Kafka in an instant, which will generate a lot of network IO and disk IO costs. Everyone knows that Kafka is a log-based message system. Although writing messages is mostly sequential IO, it may not be able to handle when a large number of messages are written at the same time.

So how to solve this problem? The answer is to do message aggregation. Sending a message before will generate one network IO and one disk IO. After we aggregate the messages, for example, aggregate 100 messages and send them to Kafka. At this time, only one network IO and disk IO will be generated for 100 messages. The throughput and performance is a very big improvement. In fact, this is the idea of small packet aggregation, or the idea of batch or batch. This kind of thinking can be seen everywhere. For example, when we use Mysql to insert batch data, we can execute one SQL statement instead of looping one by one insert, as well as Redis's Pipeline operation and so on.

So how to aggregate, what is the aggregation strategy? The aggregation strategy has two dimensions: the number of aggregated messages and the aggregation time. For example, when the aggregated message reaches 100, we will send it to Kafka once. This number can be configured. What if the number of aggregated messages never reaches 100? ? The aggregation time can also be configured. For example, the aggregation time is set to 1 second, that is, no matter how many messages are currently aggregated, as long as the aggregation time reaches 1 second, the data will be sent to Kafka once. The relationship between the number of aggregated items and the aggregation time is OR, that is, as long as one of the conditions is satisfied, it will trigger.

Here we provide Batcher, a tool for batch aggregation of data, defined as follows

 type Batcher struct {
  opts options

  Do       func(ctx context.Context, val map[string][]interface{})
  Sharding func(key string) int
  chans    []chan *msg
  wait     sync.WaitGroup
}

Do method: The Do method will be executed after the aggregation conditions are met, where the val parameter is the aggregated data

Sharding method: sharding by key, the same key message is written to the same channel and processed by the same goroutine

There are two conditions for triggering the execution of the Do method in the merge method, one is when the number of aggregated data is greater than or equal to the set number, and the other is when the set timer is triggered

The code implementation is relatively simple, as follows:

 type msg struct {
  key string
  val interface{}
}

type Batcher struct {
  opts options

  Do       func(ctx context.Context, val map[string][]interface{})
  Sharding func(key string) int
  chans    []chan *msg
  wait     sync.WaitGroup
}

func New(opts ...Option) *Batcher {
  b := &Batcher{}
  for _, opt := range opts {
    opt.apply(&b.opts)
  }
  b.opts.check()

  b.chans = make([]chan *msg, b.opts.worker)
  for i := 0; i < b.opts.worker; i++ {
    b.chans[i] = make(chan *msg, b.opts.buffer)
  }
  return b
}

func (b *Batcher) Start() {
  if b.Do == nil {
    log.Fatal("Batcher: Do func is nil")
  }
  if b.Sharding == nil {
    log.Fatal("Batcher: Sharding func is nil")
  }
  b.wait.Add(len(b.chans))
  for i, ch := range b.chans {
    go b.merge(i, ch)
  }
}

func (b *Batcher) Add(key string, val interface{}) error {
  ch, msg := b.add(key, val)
  select {
  case ch <- msg:
  default:
    return ErrFull
  }
  return nil
}

func (b *Batcher) add(key string, val interface{}) (chan *msg, *msg) {
  sharding := b.Sharding(key) % b.opts.worker
  ch := b.chans[sharding]
  msg := &msg{key: key, val: val}
  return ch, msg
}

func (b *Batcher) merge(idx int, ch <-chan *msg) {
  defer b.wait.Done()

  var (
    msg        *msg
    count      int
    closed     bool
    lastTicker = true
    interval   = b.opts.interval
    vals       = make(map[string][]interface{}, b.opts.size)
  )
  if idx > 0 {
    interval = time.Duration(int64(idx) * (int64(b.opts.interval) / int64(b.opts.worker)))
  }
  ticker := time.NewTicker(interval)
  for {
    select {
    case msg = <-ch:
      if msg == nil {
        closed = true
        break
      }
      count++
      vals[msg.key] = append(vals[msg.key], msg.val)
      if count >= b.opts.size {
        break
      }
      continue
    case <-ticker.C:
      if lastTicker {
        ticker.Stop()
        ticker = time.NewTicker(b.opts.interval)
        lastTicker = false
      }
    }
    if len(vals) > 0 {
      ctx := context.Background()
      b.Do(ctx, vals)
      vals = make(map[string][]interface{}, b.opts.size)
      count = 0
    }
    if closed {
      ticker.Stop()
      return
    }
  }
}

func (b *Batcher) Close() {
  for _, ch := range b.chans {
    ch <- nil
  }
  b.wait.Wait()
}

When using it, you need to create a Batcher first, and then define the Sharding method and the Do method of the Batcher. In the Sharding method, the aggregation of different products is delivered to different goroutines for processing through ProductID. In the Do method, we batch the aggregated data at one time. is sent to Kafka, defined as follows:

 b := batcher.New(
  batcher.WithSize(batcherSize),
  batcher.WithBuffer(batcherBuffer),
  batcher.WithWorker(batcherWorker),
  batcher.WithInterval(batcherInterval),
)
b.Sharding = func(key string) int {
  pid, _ := strconv.ParseInt(key, 10, 64)
  return int(pid) % batcherWorker
}
b.Do = func(ctx context.Context, val map[string][]interface{}) {
  var msgs []*KafkaData
  for _, vs := range val {
    for _, v := range vs {
      msgs = append(msgs, v.(*KafkaData))
    }
  }
  kd, err := json.Marshal(msgs)
  if err != nil {
    logx.Errorf("Batcher.Do json.Marshal msgs: %v error: %v", msgs, err)
  }
  if err = s.svcCtx.KafkaPusher.Push(string(kd)); err != nil {
    logx.Errorf("KafkaPusher.Push kd: %s error: %v", string(kd), err)
  }
}
s.batcher = b
s.batcher.Start()

In the SeckillOrder method, the message is no longer delivered to Kafka every time a request is made, but is added to the Batcher through the Add method provided by the batcher and then delivered to Kafka after the aggregation conditions are met.

 err = l.batcher.Add(strconv.FormatInt(in.ProductId, 10), &KafkaData{Uid: in.UserId, Pid: in.ProductId})
if err!= nil {
    logx.Errorf("l.batcher.Add uid: %d pid: %d error: %v", in.UserId, in.ProductId, err)
}

Reduce message consumption delay

Through the idea of batch message processing, we provide the Batcher tool to improve performance, but this is mainly for the production side. When we consume batches of data, we still need to process data serially one by one. Is there a way to speed up consumption and reduce the delay of consuming messages? There are two options:

increase the number of consumers
Increase the parallelism of message processing within a consumer

Because in Kafka, a Topci can be configured with multiple Partitions, and the data will be written to multiple partitions on average or according to the method specified by the producer, then when consuming, Kafka stipulates that a partition can only be consumed by one consumer. Why is it designed this way? What I understand is that if there are multiple Consumers consuming the data of a partition at the same time, then locking is required when operating the consumption progress, which has a great impact on performance. So when the number of consumers is less than the number of partitions, we can increase the number of consumers to increase the message processing capacity, but when the number of consumers is greater than the number of partitions, it makes no sense to continue to increase the number of consumers.

When consumers cannot be added, the parallelism of processing messages can be improved in the same Consumer, that is, data is consumed in parallel through multiple goroutines. Let's take a look at how to consume messages through multiple goroutines.

Define msgsChan in Service, msgsChan is Slice, and the length of Slice indicates how many goroutines process data in parallel. The initialization is as follows:

 func NewService(c config.Config) *Service {
  s := &Service{
    c:          c,
    ProductRPC: product.NewProduct(zrpc.MustNewClient(c.ProductRPC)),
    OrderRPC:   order.NewOrder(zrpc.MustNewClient(c.OrderRPC)),
    msgsChan:   make([]chan *KafkaData, chanCount),
  }
  for i := 0; i < chanCount; i++ {
    ch := make(chan *KafkaData, bufferCount)
    s.msgsChan[i] = ch
    s.waiter.Add(1)
    go s.consume(ch)
  }

  return s
}

After consuming the data from Kafka, deliver the data to the Channel, pay attention to sharding according to the id of the product when delivering the message, this can ensure that the processing of the same product in the same Consumer is serial, serial data Dealing with data races that do not cause concurrency

 func (s *Service) Consume(_ string, value string) error {
  logx.Infof("Consume value: %s\n", value)
  var data []*KafkaData
  if err := json.Unmarshal([]byte(value), &data); err != nil {
    return err
  }
  for _, d := range data {
    s.msgsChan[d.Pid%chanCount] <- d
  }
  return nil
}

We define chanCount goroutines to process data at the same time, the length of each channel is defined as bufferCount, and the method of parallel processing data is consume, as follows:

 func (s *Service) consume(ch chan *KafkaData) {
  defer s.waiter.Done()

  for {
    m, ok := <-ch
    if !ok {
      log.Fatal("seckill rmq exit")
    }
    fmt.Printf("consume msg: %+v\n", m)
    p, err := s.ProductRPC.Product(context.Background(), &product.ProductItemRequest{ProductId: m.Pid})
    if err != nil {
      logx.Errorf("s.ProductRPC.Product pid: %d error: %v", m.Pid, err)
      return
    }
    if p.Stock <= 0 {
      logx.Errorf("stock is zero pid: %d", m.Pid)
      return
    }
    _, err = s.OrderRPC.CreateOrder(context.Background(), &order.CreateOrderRequest{Uid: m.Uid, Pid: m.Pid})
    if err != nil {
      logx.Errorf("CreateOrder uid: %d pid: %d error: %v", m.Uid, m.Pid, err)
      return
    }
    _, err = s.ProductRPC.UpdateProductStock(context.Background(), &product.UpdateProductStockRequest{ProductId: m.Pid, Num: 1})
    if err != nil {
      logx.Errorf("UpdateProductStock uid: %d pid: %d error: %v", m.Uid, m.Pid, err)
    }
  }
}

How can you be sure it won't be oversold?

When the seckill activity starts, a large number of users click the seckill button on the product details page, which will generate a large number of concurrent requests to query the inventory. Once a certain request finds that there is inventory, the system will deduct the inventory immediately. Then, the system generates the actual order and performs subsequent processing. If the request cannot find the inventory, it will return, and the user will usually continue to click the seckill button to continue to query the inventory. In simple terms, there are three operations at this stage: check inventory, inventory deduction, and order processing. Because each seckill request will query the inventory, and the subsequent inventory deduction and order processing will be executed only after the request is found to have a surplus in the inventory. Therefore, the greatest concurrent pressure in this stage is the inventory check operation.

In order to support a large number of high concurrent inventory check requests, we need to use Redis to store inventory separately. So, can both inventory deduction and order processing be handed over to Mysql? In fact, the order processing can be performed in the database, but the inventory deduction operation cannot be handed over to Mysql for direct processing. Because in the actual order processing stage, the pressure of the request is not too big, and the database can fully support these order processing requests. So why can't inventory deductions be performed directly in the database? This is because once the request is found to have inventory, it means that the request is eligible to purchase, and then the order will be placed, and the inventory will be reduced by one. At this time, if you directly operate the database to deduct inventory, it may lead to Oversold problem.

Why does direct operation of the database to deduct inventory lead to oversold? Due to the slow processing speed of the database, the inventory balance cannot be updated in time, which will lead to a large number of inventory query requests to read the old inventory value and place an order. At this time, the order quantity will be larger than the actual inventory. volume, resulting in oversold. Therefore, it is necessary to directly deduct the inventory in Redis. The specific operation is that after the inventory is checked, once the inventory has a surplus, we will immediately deduct the inventory in Redis. At the same time, in order to avoid requesting to query the old inventory Value, inventory check and inventory deduction are two operations that need to guarantee atomicity.

We use Redis's Hash to store inventory, total is the total inventory, and seckill is the number of seckills. In order to ensure the atomicity of inventory query and inventory reduction, we use Lua scripts to perform atomic operations, and return 1 when the seckill amount is less than the inventory. Indicates that the seckill is successful, otherwise it returns 0, indicating that the seckill failed, the code is as follows:

 const (
  luaCheckAndUpdateScript = `
local counts = redis.call("HMGET", KEYS[1], "total", "seckill")
local total = tonumber(counts[1])
local seckill = tonumber(counts[2])
if seckill + 1 <= total then
  redis.call("HINCRBY", KEYS[1], "seckill", 1)
  return 1
end
return 0
`
)

func (l *CheckAndUpdateStockLogic) CheckAndUpdateStock(in *product.CheckAndUpdateStockRequest) (*product.CheckAndUpdateStockResponse, error) {
  val, err := l.svcCtx.BizRedis.EvalCtx(l.ctx, luaCheckAndUpdateScript, []string{stockKey(in.ProductId)})
  if err != nil {
    return nil, err
  }
  if val.(int64) == 0 {
    return nil, status.Errorf(codes.ResourceExhausted, fmt.Sprintf("insufficient stock: %d", in.ProductId))
  }
  return &product.CheckAndUpdateStockResponse{}, nil
}

func stockKey(pid int64) string {
  return fmt.Sprintf("stock:%d", pid)
}

The corresponding seckill-rmq code is modified as follows:

 func (s *Service) consume(ch chan *KafkaData) {
  defer s.waiter.Done()

  for {
    m, ok := <-ch
    if !ok {
      log.Fatal("seckill rmq exit")
    }
    fmt.Printf("consume msg: %+v\n", m)
    _, err := s.ProductRPC.CheckAndUpdateStock(context.Background(), &product.CheckAndUpdateStockRequest{ProductId: m.Pid})
    if err != nil {
      logx.Errorf("s.ProductRPC.CheckAndUpdateStock pid: %d error: %v", m.Pid, err)
      return
    }
    _, err = s.OrderRPC.CreateOrder(context.Background(), &order.CreateOrderRequest{Uid: m.Uid, Pid: m.Pid})
    if err != nil {
      logx.Errorf("CreateOrder uid: %d pid: %d error: %v", m.Uid, m.Pid, err)
      return
    }
    _, err = s.ProductRPC.UpdateProductStock(context.Background(), &product.UpdateProductStockRequest{ProductId: m.Pid, Num: 1})
    if err != nil {
      logx.Errorf("UpdateProductStock uid: %d pid: %d error: %v", m.Uid, m.Pid, err)
    }
  }
}

So far, we have seen how to use atomic Lua scripts to implement inventory checks and deductions. In fact, there is another way to ensure the atomicity of inventory checks and deductions, and that is to use distributed locks.

There are many ways to implement distributed locks, which can be based on Redis, Etcd, etc. There are many articles about using Redis to implement distributed locks. Those who are interested can search for references by themselves. Here is a brief introduction to implementing distributed locks based on Etcd. In order to simplify the implementation of distributed locks, distributed elections, and distributed transactions, the etcd community provides a package called concurrency to help us use distributed locks more easily and correctly. Its implementation is very simple, the main process is as follows:

First create a Session through the concurrency.NewSession method, which essentially creates a Lease with a TTL of 10
After getting the Session object, create a mutex object through concurrency.NewMutex, including Lease, key prefix and other information
Then heard the Lock method of the mutex object try to acquire the lock
Finally, release the lock through the Unlock method of the mutex object

 cli, err := clientv3.New(clientv3.Config{Endpoints: endpoints})
if err != nil {
   log.Fatal(err)
}
defer cli.Close()

session, err := concurrency.NewSession(cli, concurrency.WithTTL(10))
if err != nil {
   log.Fatal(err)
}
defer session.Close()

mux := concurrency.NewMutex(session, "lock")
if err := mux.Lock(context.Background()); err != nil {
   log.Fatal(err)
}


if err := mux.Unlock(context.Background()); err != nil {
   log.Fatal(err)
}

concluding remarks

This article is mainly to continue to make some optimizations for the seckill function. On the production side of Kafka messages, the batch message aggregation sending has been optimized. The Batch idea is used a lot in actual production and development. I hope everyone can use it flexibly. On the message consumption side, increase the parallelism to improve the throughput capacity, which is also an improvement. Commonly used optimization methods for performance. Finally, the reasons that may lead to oversold are introduced, and the corresponding solutions are given. At the same time, the distributed lock based on Etcd is introduced. The problem of data competition often occurs in distributed services, which can generally be solved by distributed locks, but the introduction of distributed locks will inevitably lead to performance degradation. Therefore, it is necessary to combine Consider whether a distributed lock needs to be introduced in the actual situation.

Hope this article is helpful to you, thank you.

Updated every Monday and Thursday

Code repository: https://github.com/zhoushuguang/lebron

project address

https://github.com/zeromicro/go-zero

Welcome go-zero and star support us!

WeChat exchange group

Follow the official account of " Microservice Practice " and click on the exchange group to get the QR code of the community group.

go-zero microservices combat series (9. Ultimate optimization of spike performance)

Bulk Data Aggregation

Reduce message consumption delay

How can you be sure it won't be oversold?

concluding remarks

project address

WeChat exchange group

kevinwan

引用和评论

熔断原理分析与源码解读

Go 语言-计算密集型服务性能优化

IO 密集型服务耗时优化

Go 语言 JSON 与 Cache 库调研与选型

前端微服务跨域配置解决办法，devServer为例

Go 程序如何实现优雅退出？来看看 K8s 是怎么做的——上篇

Go 语言-内存泄漏排查两例

go-zero microservices combat series (9. Ultimate optimization of spike performance)

Bulk Data Aggregation

Reduce message consumption delay

How can you be sure it won't be oversold?

concluding remarks

project address

WeChat exchange group

kevinwan

引用和评论

熔断原理分析与源码解读

Go 语言-计算密集型服务 性能优化

IO 密集型服务 耗时优化

Go 语言 JSON 与 Cache 库 调研与选型

前端微服务跨域配置解决办法，devServer为例

Go 程序如何实现优雅退出？来看看 K8s 是怎么做的——上篇

Go 语言-内存泄漏排查两例

Go 语言-计算密集型服务性能优化

IO 密集型服务耗时优化

Go 语言 JSON 与 Cache 库调研与选型