The power of Go concurrent reading and writing sync.map

Hello everyone, I am fried fish.

In the previous " why are Go maps and slices not thread-safe?" "16153f2dcb77c9" article, we discussed the non-thread-safety issues of map and slice in Go language. Based on this, we derived the two modes of map with the most concurrency support currently used in the industry.

They are:

Native map + mutex lock or read-write lock mutex.
Standard library sync.Map (Go1.9 and later).

With choice, there is always a choice problem. two types of 16153f2dcb7910, and who has the better ? 16153f2dcb7914? I have a friend who said that the standard library sync.Map has very good performance, don't use it. Who am I listening to...

Today, Fried Fish will take you to reveal the secret of Go sync.map. We will first understand what scenarios, how to use the various types of Go map, and who has the best performance!

Then, according to the results of the performance analysis of each map, the source code of sync.map is dissected to understand WHY.

Let's happily start the road of fish-sucking together.

Advantages of sync.Map

Some suggestions for the Map type are clearly pointed out in the official Go documentation:

The concurrent use of multiple goroutines is safe and does not require additional locking or coordinated control.
Most code should use native maps instead of separate locking or coordinated control for better type safety and maintainability.

At the same time, the Map type has also been optimized for the following scenarios:

When an entry for a given key is written only once but read multiple times. For example, in a cache that only grows, there will be such business scenarios.
When multiple goroutines read, write, and overwrite entries of irrelevant key sets.

In these two cases, compared with Go map with a separate Mutex or RWMutex, using the Map type can greatly reduce lock contention.

Performance Testing

After listening to the official documents introduced a bunch of benefits, he did not talk about the disadvantages, whether the said advantages after performance optimization are true and credible. Let's verify it together.

First we define the basic data structure:

// 代表互斥锁
type FooMap struct {
 sync.Mutex
 data map[int]int
}

// 代表读写锁
type BarRwMap struct {
 sync.RWMutex
 data map[int]int
}

var fooMap *FooMap
var barRwMap *BarRwMap
var syncMap *sync.Map

// 初始化基本数据结构
func init() {
 fooMap = &FooMap{data: make(map[int]int, 100)}
 barRwMap = &BarRwMap{data: make(map[int]int, 100)}
 syncMap = &sync.Map{}
}

In the supporting methods, we have compiled corresponding methods for common addition, deletion, modification, and checking actions. For subsequent stress testing (only part of the code is shown):

func builtinRwMapStore(k, v int) {
 barRwMap.Lock()
 defer barRwMap.Unlock()
 barRwMap.data[k] = v
}

func builtinRwMapLookup(k int) int {
 barRwMap.RLock()
 defer barRwMap.RUnlock()
 if v, ok := barRwMap.data[k]; !ok {
  return -1
 } else {
  return v
 }
}

func builtinRwMapDelete(k int) {
 barRwMap.Lock()
 defer barRwMap.Unlock()
 if _, ok := barRwMap.data[k]; !ok {
  return
 } else {
  delete(barRwMap.data, k)
 }
}

The rest of the types and methods are basically similar, so I will not show it here because of the problem of repetition.

The basic code of the pressure measurement method is as follows:

func BenchmarkBuiltinRwMapDeleteParalell(b *testing.B) {
 b.RunParallel(func(pb *testing.PB) {
  r := rand.New(rand.NewSource(time.Now().Unix()))
  for pb.Next() {
   k := r.Intn(100000000)
   builtinRwMapDelete(k)
  }
 })
}

This piece is mainly about adding, deleting, modifying and checking the code and preparing the stress test method. The stress test code is directly reused in the Go19-examples/benchmark-for-map project of Dabai.

You can also use map\_bench\_test.go officially provided by Go, and interested friends can pull it down and run it for a try.

Pressure test results

1) Write:

Method name	meaning	Pressure test results
BenchmarkBuiltinMapStoreParalell-4	map+mutex write element	237.1 ns/op
BenchmarkSyncMapStoreParalell-4	sync.map write element	509.3 ns/op
BenchmarkBuiltinRwMapStoreParalell-4	map+rwmutex write element	207.8 ns/op

In terms of writing elements, the slowest sync.map , followed by native map + mutex lock (Mutex), and the fastest is native map + read-write lock (RwMutex).

The overall order (from slow to fast) is: SyncMapStore <MapStore <RwMapStore.

2) Find:

Method name	meaning	Pressure test results
BenchmarkBuiltinMapLookupParalell-4	map+mutex find element	166.7 ns/op
BenchmarkBuiltinRwMapLookupParalell-4	map+rwmutex find element	60.49 ns/op
BenchmarkSyncMapLookupParalell-4	sync.map find element	53.39 ns/op

In terms of finding elements, the slowest is the native map+mutual exclusion lock, followed by the native map+read-write lock. The fastest is the sync.map type.

The overall order is: MapLookup <RwMapLookup <SyncMapLookup.

3) Delete:

Method name	meaning	Pressure test results
BenchmarkBuiltinMapDeleteParalell-4	map+mutex delete element	168.3 ns/op
BenchmarkBuiltinRwMapDeleteParalell-4	map+rwmutex delete element	188.5 ns/op
BenchmarkSyncMapDeleteParalell-4	sync.map delete element	41.54 ns/op

In terms of deleting elements, the slowest is the native map+read-write lock, followed by the native map+mutual exclusion lock, and the fastest is the sync.map type.

The overall order is: RwMapDelete <MapDelete <SyncMapDelete.

Scene analysis

According to the above pressure test results, we can get the sync.Map type:

The performance in read and delete scenarios is the best, more than double the lead.
The performance in the writing scenario is very poor, lagging behind the native map+lock by a full double.

So in the actual business scenario. Assuming it is a scenario with more reading and less writing, the sync.Map type is more recommended.

However, if it is a scenario with a lot of writing, such as multi-goroutine batch cyclic writing, it is recommended to find another way, and the performance cannot bear to look directly at it (no performance requirements are another matter).

Anatomy of sync.Map

Know how to test, after the test results. We need to dig deeper to understand why.

Why is the sync.Map type test result so "partial"? Why is the read operation performance so high and the write operation performance terribly low? How did he design it?

data structure

The underlying data structure of the sync.Map

type Map struct {
 mu Mutex
 read atomic.Value // readOnly
 dirty map[interface{}]*entry
 misses int
}

// Map.read 属性实际存储的是 readOnly。
type readOnly struct {
 m       map[interface{}]*entry
 amended bool
}

mu: Mutex, used to protect read and dirty.
read: Read-only data, supports concurrent reading (atomic.Value type). If it involves an update operation, you only need to add a lock to ensure data security.
What read actually stores is the readOnly structure, which is also a native map. The amended attribute is used to mark whether the read and dirty data are consistent.
Dirty: Reading and writing data is a native map, which is not thread-safe. Operation dirty requires locking to ensure data security.
misses: Count how many times the read misses. Each time the read fails, the count value of misses will increase by one.

In read and dirty, there are structures involved:

type entry struct {
 p unsafe.Pointer // *interface{}
}

It contains a pointer p, which points to the value pointed to by the element (key) stored by the user.

I suggest that you must understand read, dirty, and entry, and then look down, the eating effect will be better, and the follow-up will revolve around these concepts.

Find process

To focus, the Map type essentially has two "maps". One is called read and the other is dirty, and the length is similar:

2 maps of sync.Map

When we read data from the sync.Map type, it will first check whether the required elements are contained in read:

If so, read the data through atomic atomic operation and return.
If not, it will judge read.readOnly , and it will tell the program whether dirty contains read.readOnly.m that is not in 06153f2dcb8879; therefore, if it exists, that is, if amended is true, it will further search for data in dirty.

The reason why sync.Map's read operation performance is so high lies in the ingenious design of read, which acts as a cache layer and provides fast path search.

At the same time, combined with the amended attribute, it solves the problem of locks involved in each reading, and realizes the high performance of the reading scenario.

Writing process

We directly focus on sync.Map , which is used to add or update an element.

The source code is as follows:

func (m *Map) Store(key, value interface{}) {
 read, _ := m.read.Load().(readOnly)
 if e, ok := read.m[key]; ok && e.tryStore(&value) {
  return
 }
  ...
}

Call the Load method to check whether this element exists in m.read If it exists and is not marked as deleted, try to save.

If the element does not exist or has been marked as deleted, continue to the following process:

func (m *Map) Store(key, value interface{}) {
 ...
 m.mu.Lock()
 read, _ = m.read.Load().(readOnly)
 if e, ok := read.m[key]; ok {
  if e.unexpungeLocked() {
   m.dirty[key] = e
  }
  e.storeLocked(&value)
 } else if e, ok := m.dirty[key]; ok {
  e.storeLocked(&value)
 } else {
  if !read.amended {
   m.dirtyLocked()
   m.read.Store(readOnly{m: read.m, amended: true})
  }
  m.dirty[key] = newEntry(value)
 }
 m.mu.Unlock()
}

Since we have reached the dirty process, we directly called the mutex lock Lock method at the beginning to ensure data security and highlight the first act performance of 16153f2dcb8970 deteriorated.

It is divided into the following three processing branches:

If it is found that the element exists in read, but it has been marked as expunged, it means that dirty is not equal to nil (the element definitely does not exist in dirty). It will perform the following operations.
Change the element status from expunged to nil.
Insert elements into dirty.
If it is found that the element does not exist in read, but the element exists in dirty, then write directly to update the entry point.
If it is found that the element does not exist in read and dirty, the data that has not been marked for deletion is copied from read, the element is inserted into dirty, and the entry point of the element value is assigned.

Let's take a rationale, the overall flow of the writing process is:

Check read, there is no on read, or it has been marked for deletion.
On the mutex (Mutex).
Dirty operation, processing according to various data conditions and status.

Back to the original topic, why his writing performance is so bad. The reason:

The write must go through the read, in any case, it is one level more than others, and the data situation and status must be checked in the follow-up, and the performance overhead is greater.
(The third processing branch) When initialization or dirty is promoted, the full amount of data will be copied from read. If the amount of data in read is large, performance will be affected.

It can be sync.Map that the 06153f2dcb8b5e type is not suitable for scenarios with a lot of writing, and it is better to read more and write less.

If there is a scene with a large amount of data, you need to consider whether the occasional performance jitter when copying data is acceptable.

Delete process

At this time, some friends may be thinking about it. In theory, the writing process is not too far from the deletion. Why sync.Map type deletion seems to be okay, what's the trick here?

The source code is as follows:

func (m *Map) LoadAndDelete(key interface{}) (value interface{}, loaded bool) {
 read, _ := m.read.Load().(readOnly)
 e, ok := read.m[key]
 ...
  if ok {
  return e.delete()
 }
}

Deletion is the standard opening, still first go to read to check whether the element exists.

If it exists, call delete mark it as expunged (deleted state), which is very efficient. It is clear that the elements in read are deleted, and the performance is very good.

If it does not exist, that is, go to the dirty process:

func (m *Map) LoadAndDelete(key interface{}) (value interface{}, loaded bool) {
 ...
 if !ok && read.amended {
  m.mu.Lock()
  read, _ = m.read.Load().(readOnly)
  e, ok = read.m[key]
  if !ok && read.amended {
   e, ok = m.dirty[key]
   delete(m.dirty, key)
   m.missLocked()
  }
  m.mu.Unlock()
 }
 ...
 return nil, false
}

If the element does not exist in read, dirty is not empty, and read and dirty are inconsistent (using amended discrimination), it indicates that dirty is to be operated and the mutex is locked.

Repeat the double check again, if the element still does not exist in read. The delete method is called to mark the deletion of the element from the dirty.

It should be noted that the delete method with a higher frequency:

func (e *entry) delete() (value interface{}, ok bool) {
 for {
  p := atomic.LoadPointer(&e.p)
  if p == nil || p == expunged {
   return nil, false
  }
  if atomic.CompareAndSwapPointer(&e.p, p, nil) {
   return *(*interface{})(p), true
  }
 }
}

This method is to set entry.p to nil and mark it as expunged (deleted state), and is not a true deletion of .

Note: Do not misuse sync.Map . From the case shared by the byte boss some time ago, they put a connection as a key, so it is related to this connection, for example: the buffer memory will never be released...

Summarize

By reading this article, we have clarified the sync.Map and native map + mutex/read-write lock.

Although the standard library sync.Map supports concurrent read and write maps, it is more suitable for scenarios with more reads and less writes, because the performance of its writing is relatively poor, so you must consider this when using it.

In addition, we sync.Map , learned the reasons behind the speed and slowness, and realized that we know what it is and why it is.

It is really worrying to see that concurrent reading and writing of maps cause fatal errors. Everyone thinks that if this article is good, welcome to share it with more Go lovers:)

If you have any questions please comment and feedback exchange area, best relationship is mutual achievement , everybody thumbs is fried fish maximum power of creation, thanks for the support.

The article is continuously updated, and you can read it on search [the brain is fried fish], this article 16153f2dcb8d14 GitHub github.com/eddycjy/blog has been included, welcome to Star to remind you to update.

refer to

Package sync
Stepped on a pit in Golang sync.Map
go19-examples/benchmark-for-map
In-depth understanding of the working principle of sync.Map through examples

The power of Go concurrent reading and writing sync.map

Advantages of sync.Map

Performance Testing

Pressure test results

Scene analysis

Anatomy of sync.Map

data structure

Find process

Writing process

Delete process

Summarize

refer to

煎鱼

引用和评论

Cloudflare 从 PHP 到 Go：迁移与经验分享

使用PHP对接StockTV全球金融市场数据API实战指南

一文掌握 MCP 上下文协议：从理论到实践

70k star，取代Postman！这款轻量级API工具，太香了！

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

大模型时代，后端程序员如何避免被AI卷死？

在线考试答题系统（Web+H5+小程序）开发方案与实现附源代码