go - Hands-on implementation of a localcache - implementation articles - 个人文章

Original link: hands-on implementation of a localcache - implementation

foreword

Hello, everyone, my name is asong . After the introduction of the previous two articles, we have basically understood how to design a local cache. This article is the end of this series. Implement a local cache by yourself. Next, listen to me Be careful! ! !
code of this article has been uploaded to github: https://github.com/asong2020/go-localcache
is now a 1.0 version, and will continue to optimize and iterate.

Step 1: Abstract Interface

The first step is very important. Based on the principle of interface-oriented programming, we first abstract the methods to be exposed to users and provide users with simple and easy-to-understand methods. Therefore, the results I abstracted are as follows:

// ICache abstract interface
type ICache interface {
    // Set value use default expire time. default does not expire.
    Set(key string, value []byte) error
    // Get value if find it. if value already expire will delete.
    Get(key string) ([]byte, error)
    // SetWithTime set value with expire time
    SetWithTime(key string, value []byte, expired time.Duration) error
    // Delete manual removes the key
    Delete(key string) error
    // Len computes number of entries in cache
    Len() int
    // Capacity returns amount of bytes store in the cache.
    Capacity() int
    // Close is used to signal a shutdown of the cache when you are done with it.
    // This allows the cleaning goroutines to exit and ensures references are not
    // kept to the cache preventing GC of the entire cache.
    Close() error
    // Stats returns cache's statistics
    Stats() Stats
    // GetKeyHit returns key hit
    GetKeyHit(key string) int64
}

Set(key string, value []byte) : The data stored using this method uses the default expiration time. If the expired asynchronous task is not enabled, it will never expire, otherwise the default expiration time is 10 minutes.
Get(key string) ([]byte, error) : Obtain the object content according to key . If the data expires, it will be deleted in this step.
SetWithTime(key string, value []byte, expired time.Duration) : Storage object is using custom expiration time
Delete(key string) error : delete the corresponding cached data according to the key
Len() int : Get the number of objects cached
Capacity() int : Get the current cache capacity
Close() error : Turn off caching
Stats() Stats : Cache monitoring data
GetKeyHit(key string) int64 : Get hit rate data key

Step 2: Define the cache object

The first step is to abstract the interface. Next, we need to define a cache object instance to implement the interface. Let's first look at the definition structure:

type cache struct {
    // hashFunc represents used hash func
    hashFunc HashFunc
    // bucketCount represents the number of segments within a cache instance. value must be a power of two.
    bucketCount uint64
    // bucketMask is bitwise AND applied to the hashVal to find the segment id.
    bucketMask uint64
    // segment is shard
    segments []*segment
    // segment lock
    locks    []sync.RWMutex
    // close cache
    close chan struct{}
}

hashFunc fnv , the user can define it by himself, implement the HashFunc interface, and use the 061e2a9428c0ba algorithm by default.
bucketCount : The number of shards, which must be an even number. The default number of 256 is 061e2a9428c116.
bucketMask : Because the number of shards is even, bit operations can be used instead of remainders to improve performance efficiency when hashValue % bucketCount == hashValue & bucketCount - 1 is possible, 061e2a9428c14a .
segments : Fragment object, the object structure of each fragment will be introduced later.
locks : read-write lock per shard
close goroutine pause when closing cached objects

Next, let's write the constructor of the cache

// NewCache constructor cache instance
func NewCache(opts ...Opt) (ICache, error) {
    options := &options{
        hashFunc: NewDefaultHashFunc(),
        bucketCount: defaultBucketCount,
        maxBytes: defaultMaxBytes,
        cleanTime: defaultCleanTIme,
        statsEnabled: defaultStatsEnabled,
        cleanupEnabled: defaultCleanupEnabled,
    }
    for _, each := range opts{
        each(options)
    }

    if !isPowerOfTwo(options.bucketCount){
        return nil, errShardCount
    }

  if options.maxBytes <= 0 {
        return nil, ErrBytes
    }
  
    segments := make([]*segment, options.bucketCount)
    locks := make([]sync.RWMutex, options.bucketCount)

    maxSegmentBytes := (options.maxBytes + options.bucketCount - 1) / options.bucketCount
    for index := range segments{
        segments[index] = newSegment(maxSegmentBytes, options.statsEnabled)
    }

    c := &cache{
        hashFunc: options.hashFunc,
        bucketCount: options.bucketCount,
        bucketMask: options.bucketCount - 1,
        segments: segments,
        locks: locks,
        close: make(chan struct{}),
    }
    if options.cleanupEnabled {
        go c.cleanup(options.cleanTime)
    }
    
    return c, nil
}

Here for better expansion, we use the Options programming mode, our constructor mainly does three things:

Pre-parameter check, for the parameters passed in from the outside, we still need to do basic verification
Shard object initialization
Construct cache object

When constructing the cache object here, we need to calculate the capacity of each 256M first. By default, the data of 061e2a9428c307 is cached locally, and then evenly divided into each shard, the user can choose the size of the data to be cached.

Step 3: Define the sharding structure

The structure of each shard is as follows:

type segment struct {
    hashmap map[uint64]uint32
    entries buffer.IBuffer
    clock   clock
    evictList  *list.List
    stats IStats
}

hashmp : store the storage index corresponding to key
entries : key/value , which we introduced in the fourth step, is also the core part of the code.
clock : define time method
evicList : Here we use a queue to record the old index, and delete it when the capacity is insufficient (temporary solution, the current storage structure is not suitable for the LRU elimination algorithm)
stats : Cached monitoring data.

Next, let's take a look at the constructor of each shard:

func newSegment(bytes uint64, statsEnabled bool) *segment {
    if bytes == 0 {
        panic(fmt.Errorf("bytes cannot be zero"))
    }
    if bytes >= maxSegmentSize{
        panic(fmt.Errorf("too big bytes=%d; should be smaller than %d", bytes, maxSegmentSize))
    }
    capacity := (bytes + segmentSize - 1) / segmentSize
    entries := buffer.NewBuffer(int(capacity))
    entries.Reset()
    return &segment{
        entries: entries,
        hashmap: make(map[uint64]uint32),
        clock:   &systemClock{},
        evictList: list.New(),
        stats: newStats(statsEnabled),
    }
}

The main thing to note here:

We need to calculate the capacity according to the size of the cached data of each shard, which corresponds to the cache object initialization steps above.

Step 4: Define the cache structure

The cache object is now constructed, and the next step is the core of the local cache: defining the cache structure.

bigcache , fastcache , freecache all use byte arrays instead of map store cached data, thereby reducing the GC , so we can also continue to use byte arrays for reference, here we use two-dimensional byte slices to store cached data key/value ; The diagram shows:

bigcache , the advantage of using a two-dimensional array to store data is that the corresponding data can be deleted directly according to the index. Although there will be wormhole problems, we can record the index of the wormhole and fill it continuously.

The encapsulation structure of each cache is as follows:

The basic idea has been clarified, let's take a look at our encapsulation of the storage layer:

type Buffer struct {
    array [][]byte
    capacity int
    index int
    // maxCount = capacity - 1
    count int
    // availableSpace If any objects are removed after the buffer is full, the idle index is logged.
    // Avoid array "wormhole"
    availableSpace map[int]struct{}
    // placeholder record the index that buffer has stored.
    placeholder map[int]struct{}
}

array [][]byte : Stores a 2D slice of the cached object
capacity : the maximum capacity of the cache structure
index : index, the index of where the record cache is located
count : Number of record caches
availableSpace : Record the "wormhole", when the cache object is deleted, record the index of the free location, so that the "wormhole" can be used later when the capacity is full
placeholder : Record the index of the cache object, which can be used iteratively to clear the expired cache.

The process of writing data to buffer (the code will not be posted):

Step 5: Improve the method of writing data to the cache

Above we have defined all the required structures, the next step is to populate our write cache method:

func (c *cache) Set(key string, value []byte) error  {
    hashKey := c.hashFunc.Sum64(key)
    bucketIndex := hashKey&c.bucketMask
    c.locks[bucketIndex].Lock()
    defer c.locks[bucketIndex].Unlock()
    err := c.segments[bucketIndex].set(key, hashKey, value, defaultExpireTime)
    return err
}

func (s *segment) set(key string, hashKey uint64, value []byte, expireTime time.Duration) error {
    if expireTime <= 0{
        return ErrExpireTimeInvalid
    }
    expireAt := uint64(s.clock.Epoch(expireTime))

    if previousIndex, ok := s.hashmap[hashKey]; ok {
        if err := s.entries.Remove(int(previousIndex)); err != nil{
            return err
        }
        delete(s.hashmap, hashKey)
    }

    entry := wrapEntry(expireAt, key, hashKey, value)
    for {
        index, err := s.entries.Push(entry)
        if err == nil {
            s.hashmap[hashKey] = uint32(index)
            s.evictList.PushFront(index)
            return nil
        }
        ele := s.evictList.Back()
        if err := s.entries.Remove(ele.Value.(int)); err != nil{
            return err
        }
        s.evictList.Remove(ele)
    }
}

The process analysis is as follows:

Calculate the hash value according to key , and then obtain the corresponding shard position according to the number of shards
key exists in the current cache, delete it first, and then re-insert it, the expiration time will be refreshed
Encapsulation storage structure, encapsulated according to expiration timestamp, key length, hash size, cache object
Store the data in the cache, if the cache fails, remove the oldest data and try again

Step 6: Improve the method of reading data from the cache

The first step is to key , and then obtain the corresponding shard position according to the number of shards:

func (c *cache) Get(key string) ([]byte, error)  {
    hashKey := c.hashFunc.Sum64(key)
    bucketIndex := hashKey&c.bucketMask
    c.locks[bucketIndex].RLock()
    defer c.locks[hashKey&c.bucketMask].RUnlock()
    entry, err := c.segments[bucketIndex].get(key, hashKey)
    if err != nil{
        return nil, err
    }
    return entry,nil
}

The second step is to execute the sharding method to obtain cached data:

key exists in the cache according to the hash value, if key not found
Read data from the cache and get key in the cache to determine whether a hash collision occurs
Determine whether the cached object is expired, and delete the cached data after expiration (whether the current expired data can be returned according to the needs of business optimization)
Cache monitoring data at each record

func (s *segment) getWarpEntry(key string, hashKey uint64) ([]byte,error) {
    index, ok := s.hashmap[hashKey]
    if !ok {
        s.stats.miss()
        return nil, ErrEntryNotFound
    }
    entry, err := s.entries.Get(int(index))
    if err != nil{
        s.stats.miss()
        return nil, err
    }
    if entry == nil{
        s.stats.miss()
        return nil, ErrEntryNotFound
    }

    if entryKey := readKeyFromEntry(entry); key != entryKey {
        s.stats.collision()
        return nil, ErrEntryNotFound
    }
    return entry, nil
}

func (s *segment) get(key string, hashKey uint64) ([]byte, error) {
    currentTimestamp := s.clock.TimeStamp()
    entry, err := s.getWarpEntry(key, hashKey)
    if err != nil{
        return nil, err
    }
    res := readEntry(entry)

    expireAt := int64(readExpireAtFromEntry(entry))
    if currentTimestamp - expireAt >= 0{
        _ = s.entries.Remove(int(s.hashmap[hashKey]))
        delete(s.hashmap, hashKey)
        return nil, ErrEntryNotFound
    }
    s.stats.hit(key)

    return res, nil
}

Step 7: Come to a test case to experience

Let's test it with a simple test case:

func (h *cacheTestSuite) TestSetAndGet() {
    cache, err := NewCache()
    assert.Equal(h.T(), nil, err)
    key := "asong"
    value := []byte("公众号：Golang梦工厂")

    err = cache.Set(key, value)
    assert.Equal(h.T(), nil, err)

    res, err := cache.Get(key)
    assert.Equal(h.T(), nil, err)
    assert.Equal(h.T(), value, res)
    h.T().Logf("get value is %s", string(res))
}

operation result:

=== RUN   TestCacheTestSuite
=== RUN   TestCacheTestSuite/TestSetAndGet
    cache_test.go:33: get value is 公众号：Golang梦工厂
--- PASS: TestCacheTestSuite (0.00s)
    --- PASS: TestCacheTestSuite/TestSetAndGet (0.00s)
PASS

You're done, the basic functions are passed, and the rest is to run benchmarks, optimize, and iterate (I won't go into details in the article, you can pay attention to the latest developments in the github

Reference article

Summarize

The implementation chapter is over here, but the coding of this project is not over yet. I will continue to iterate and optimize based on this version. The advantages of this local cache:

Simple to implement and easy to understand
Using two-dimensional slices as the storage structure avoids the disadvantage that the underlying data cannot be deleted, and also avoids the "wormhole" problem to a certain extent.
Complete test cases, suitable as an entry-level project for Xiaobai

Points to be optimized:

Not using an efficient cache elimination algorithm may cause hot data to be deleted frequently
Deleting expired data regularly will cause the lock to be held for too long and needs to be optimized
Closing cache instances requires optimized processing
Optimized according to business scenarios (specific business scenarios)

Iteration point:

Add asynchronous load cache function
...... (thinking)

code of this article has been uploaded to github: https://github.com/asong2020/go-localcache

Okay, this article ends here, my name is asong , see you in the next issue.

Welcome to the public account: [Golang DreamWorks]

Hands-on implementation of a localcache - implementation articles

foreword

Step 1: Abstract Interface

Step 2: Define the cache object

Step 3: Define the sharding structure

Step 4: Define the cache structure

Step 5: Improve the method of writing data to the cache

Step 6: Improve the method of reading data from the cache

Step 7: Come to a test case to experience

Reference article

Summarize

asong

引用和评论

伙计，Go项目怎么使用枚举？

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

大模型时代，后端程序员如何避免被AI卷死？

Go slice切片使用教程，一次通关！

腾讯 tRPC-Go 教学——（1）搭建服务

一文弄懂用Go实现MCP服务

gozero限流、熔断、降级如何实现？面试的时候怎么回答？