Original link: hands-on implementation of a localcache - implementation
foreword
Hello, everyone, my name is
asong
. After the introduction of the previous two articles, we have basically understood how to design a local cache. This article is the end of this series. Implement a local cache by yourself. Next, listen to me Be careful! ! !code of this article has been uploaded to github: https://github.com/asong2020/go-localcache
is now a 1.0 version, and will continue to optimize and iterate.
Step 1: Abstract Interface
The first step is very important. Based on the principle of interface-oriented programming, we first abstract the methods to be exposed to users and provide users with simple and easy-to-understand methods. Therefore, the results I abstracted are as follows:
// ICache abstract interface
type ICache interface {
// Set value use default expire time. default does not expire.
Set(key string, value []byte) error
// Get value if find it. if value already expire will delete.
Get(key string) ([]byte, error)
// SetWithTime set value with expire time
SetWithTime(key string, value []byte, expired time.Duration) error
// Delete manual removes the key
Delete(key string) error
// Len computes number of entries in cache
Len() int
// Capacity returns amount of bytes store in the cache.
Capacity() int
// Close is used to signal a shutdown of the cache when you are done with it.
// This allows the cleaning goroutines to exit and ensures references are not
// kept to the cache preventing GC of the entire cache.
Close() error
// Stats returns cache's statistics
Stats() Stats
// GetKeyHit returns key hit
GetKeyHit(key string) int64
}
Set(key string, value []byte)
: The data stored using this method uses the default expiration time. If the expired asynchronous task is not enabled, it will never expire, otherwise the default expiration time is 10 minutes.Get(key string) ([]byte, error)
: Obtain the object content according tokey
. If the data expires, it will be deleted in this step.SetWithTime(key string, value []byte, expired time.Duration)
: Storage object is using custom expiration timeDelete(key string) error
: delete the corresponding cached data according to the keyLen() int
: Get the number of objects cachedCapacity() int
: Get the current cache capacityClose() error
: Turn off cachingStats() Stats
: Cache monitoring dataGetKeyHit(key string) int64
: Get hit rate datakey
Step 2: Define the cache object
The first step is to abstract the interface. Next, we need to define a cache object instance to implement the interface. Let's first look at the definition structure:
type cache struct {
// hashFunc represents used hash func
hashFunc HashFunc
// bucketCount represents the number of segments within a cache instance. value must be a power of two.
bucketCount uint64
// bucketMask is bitwise AND applied to the hashVal to find the segment id.
bucketMask uint64
// segment is shard
segments []*segment
// segment lock
locks []sync.RWMutex
// close cache
close chan struct{}
}
hashFunc
fnv
, the user can define it by himself, implement theHashFunc
interface, and use the 061e2a9428c0ba algorithm by default.bucketCount
: The number of shards, which must be an even number. The default number of256
is 061e2a9428c116.bucketMask
: Because the number of shards is even, bit operations can be used instead of remainders to improve performance efficiency whenhashValue % bucketCount == hashValue & bucketCount - 1
is possible, 061e2a9428c14a .segments
: Fragment object, the object structure of each fragment will be introduced later.locks
: read-write lock per shardclose
goroutine
pause when closing cached objects
Next, let's write the constructor of the cache
// NewCache constructor cache instance
func NewCache(opts ...Opt) (ICache, error) {
options := &options{
hashFunc: NewDefaultHashFunc(),
bucketCount: defaultBucketCount,
maxBytes: defaultMaxBytes,
cleanTime: defaultCleanTIme,
statsEnabled: defaultStatsEnabled,
cleanupEnabled: defaultCleanupEnabled,
}
for _, each := range opts{
each(options)
}
if !isPowerOfTwo(options.bucketCount){
return nil, errShardCount
}
if options.maxBytes <= 0 {
return nil, ErrBytes
}
segments := make([]*segment, options.bucketCount)
locks := make([]sync.RWMutex, options.bucketCount)
maxSegmentBytes := (options.maxBytes + options.bucketCount - 1) / options.bucketCount
for index := range segments{
segments[index] = newSegment(maxSegmentBytes, options.statsEnabled)
}
c := &cache{
hashFunc: options.hashFunc,
bucketCount: options.bucketCount,
bucketMask: options.bucketCount - 1,
segments: segments,
locks: locks,
close: make(chan struct{}),
}
if options.cleanupEnabled {
go c.cleanup(options.cleanTime)
}
return c, nil
}
Here for better expansion, we use the Options
programming mode, our constructor mainly does three things:
- Pre-parameter check, for the parameters passed in from the outside, we still need to do basic verification
- Shard object initialization
- Construct cache object
When constructing the cache object here, we need to calculate the capacity of each 256M
first. By default, the data of 061e2a9428c307 is cached locally, and then evenly divided into each shard, the user can choose the size of the data to be cached.
Step 3: Define the sharding structure
The structure of each shard is as follows:
type segment struct {
hashmap map[uint64]uint32
entries buffer.IBuffer
clock clock
evictList *list.List
stats IStats
}
hashmp
: store the storage index corresponding tokey
entries
:key/value
, which we introduced in the fourth step, is also the core part of the code.clock
: define time methodevicList
: Here we use a queue to record theold
index, and delete it when the capacity is insufficient (temporary solution, the current storage structure is not suitable for theLRU
elimination algorithm)stats
: Cached monitoring data.
Next, let's take a look at the constructor of each shard:
func newSegment(bytes uint64, statsEnabled bool) *segment {
if bytes == 0 {
panic(fmt.Errorf("bytes cannot be zero"))
}
if bytes >= maxSegmentSize{
panic(fmt.Errorf("too big bytes=%d; should be smaller than %d", bytes, maxSegmentSize))
}
capacity := (bytes + segmentSize - 1) / segmentSize
entries := buffer.NewBuffer(int(capacity))
entries.Reset()
return &segment{
entries: entries,
hashmap: make(map[uint64]uint32),
clock: &systemClock{},
evictList: list.New(),
stats: newStats(statsEnabled),
}
}
The main thing to note here:
We need to calculate the capacity according to the size of the cached data of each shard, which corresponds to the cache object initialization steps above.
Step 4: Define the cache structure
The cache object is now constructed, and the next step is the core of the local cache: defining the cache structure.
bigcache
, fastcache
, freecache
all use byte arrays instead of map
store cached data, thereby reducing the GC
, so we can also continue to use byte arrays for reference, here we use two-dimensional byte slices to store cached data key/value
; The diagram shows:
bigcache
, the advantage of using a two-dimensional array to store data is that the corresponding data can be deleted directly according to the index. Although there will be wormhole problems, we can record the index of the wormhole and fill it continuously.
The encapsulation structure of each cache is as follows:
The basic idea has been clarified, let's take a look at our encapsulation of the storage layer:
type Buffer struct {
array [][]byte
capacity int
index int
// maxCount = capacity - 1
count int
// availableSpace If any objects are removed after the buffer is full, the idle index is logged.
// Avoid array "wormhole"
availableSpace map[int]struct{}
// placeholder record the index that buffer has stored.
placeholder map[int]struct{}
}
array [][]byte
: Stores a 2D slice of the cached objectcapacity
: the maximum capacity of the cache structureindex
: index, the index of where the record cache is locatedcount
: Number of record cachesavailableSpace
: Record the "wormhole", when the cache object is deleted, record the index of the free location, so that the "wormhole" can be used later when the capacity is fullplaceholder
: Record the index of the cache object, which can be used iteratively to clear the expired cache.
The process of writing data to buffer
(the code will not be posted):
<img src="https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/4e339104a34d4fabb45bc45a4a830a3a~tplv-k3u1fbpfcp-zoom-1.image" style="zoom: 33%;" />
Step 5: Improve the method of writing data to the cache
Above we have defined all the required structures, the next step is to populate our write cache method:
func (c *cache) Set(key string, value []byte) error {
hashKey := c.hashFunc.Sum64(key)
bucketIndex := hashKey&c.bucketMask
c.locks[bucketIndex].Lock()
defer c.locks[bucketIndex].Unlock()
err := c.segments[bucketIndex].set(key, hashKey, value, defaultExpireTime)
return err
}
func (s *segment) set(key string, hashKey uint64, value []byte, expireTime time.Duration) error {
if expireTime <= 0{
return ErrExpireTimeInvalid
}
expireAt := uint64(s.clock.Epoch(expireTime))
if previousIndex, ok := s.hashmap[hashKey]; ok {
if err := s.entries.Remove(int(previousIndex)); err != nil{
return err
}
delete(s.hashmap, hashKey)
}
entry := wrapEntry(expireAt, key, hashKey, value)
for {
index, err := s.entries.Push(entry)
if err == nil {
s.hashmap[hashKey] = uint32(index)
s.evictList.PushFront(index)
return nil
}
ele := s.evictList.Back()
if err := s.entries.Remove(ele.Value.(int)); err != nil{
return err
}
s.evictList.Remove(ele)
}
}
The process analysis is as follows:
- Calculate the hash value according to
key
, and then obtain the corresponding shard position according to the number of shards key
exists in the current cache, delete it first, and then re-insert it, the expiration time will be refreshed- Encapsulation storage structure, encapsulated according to expiration timestamp,
key
length, hash size, cache object - Store the data in the cache, if the cache fails, remove the oldest data and try again
Step 6: Improve the method of reading data from the cache
The first step is to key
, and then obtain the corresponding shard position according to the number of shards:
func (c *cache) Get(key string) ([]byte, error) {
hashKey := c.hashFunc.Sum64(key)
bucketIndex := hashKey&c.bucketMask
c.locks[bucketIndex].RLock()
defer c.locks[hashKey&c.bucketMask].RUnlock()
entry, err := c.segments[bucketIndex].get(key, hashKey)
if err != nil{
return nil, err
}
return entry,nil
}
The second step is to execute the sharding method to obtain cached data:
key
exists in the cache according to the hash value, ifkey
not found- Read data from the cache and get
key
in the cache to determine whether a hash collision occurs - Determine whether the cached object is expired, and delete the cached data after expiration (whether the current expired data can be returned according to the needs of business optimization)
- Cache monitoring data at each record
func (s *segment) getWarpEntry(key string, hashKey uint64) ([]byte,error) {
index, ok := s.hashmap[hashKey]
if !ok {
s.stats.miss()
return nil, ErrEntryNotFound
}
entry, err := s.entries.Get(int(index))
if err != nil{
s.stats.miss()
return nil, err
}
if entry == nil{
s.stats.miss()
return nil, ErrEntryNotFound
}
if entryKey := readKeyFromEntry(entry); key != entryKey {
s.stats.collision()
return nil, ErrEntryNotFound
}
return entry, nil
}
func (s *segment) get(key string, hashKey uint64) ([]byte, error) {
currentTimestamp := s.clock.TimeStamp()
entry, err := s.getWarpEntry(key, hashKey)
if err != nil{
return nil, err
}
res := readEntry(entry)
expireAt := int64(readExpireAtFromEntry(entry))
if currentTimestamp - expireAt >= 0{
_ = s.entries.Remove(int(s.hashmap[hashKey]))
delete(s.hashmap, hashKey)
return nil, ErrEntryNotFound
}
s.stats.hit(key)
return res, nil
}
Step 7: Come to a test case to experience
Let's test it with a simple test case:
func (h *cacheTestSuite) TestSetAndGet() {
cache, err := NewCache()
assert.Equal(h.T(), nil, err)
key := "asong"
value := []byte("公众号:Golang梦工厂")
err = cache.Set(key, value)
assert.Equal(h.T(), nil, err)
res, err := cache.Get(key)
assert.Equal(h.T(), nil, err)
assert.Equal(h.T(), value, res)
h.T().Logf("get value is %s", string(res))
}
operation result:
=== RUN TestCacheTestSuite
=== RUN TestCacheTestSuite/TestSetAndGet
cache_test.go:33: get value is 公众号:Golang梦工厂
--- PASS: TestCacheTestSuite (0.00s)
--- PASS: TestCacheTestSuite/TestSetAndGet (0.00s)
PASS
You're done, the basic functions are passed, and the rest is to run benchmarks, optimize, and iterate (I won't go into details in the article, you can pay attention to the latest developments in the github
Reference article
- https://github.com/allegro/bigcache
- https://github.com/VictoriaMetrics/fastcache
- https://github.com/coocood/freecache
- https://github.com/patrickmn/go-cache
Summarize
The implementation chapter is over here, but the coding of this project is not over yet. I will continue to iterate and optimize based on this version. The advantages of this local cache:
- Simple to implement and easy to understand
- Using two-dimensional slices as the storage structure avoids the disadvantage that the underlying data cannot be deleted, and also avoids the "wormhole" problem to a certain extent.
- Complete test cases, suitable as an entry-level project for Xiaobai
Points to be optimized:
- Not using an efficient cache elimination algorithm may cause hot data to be deleted frequently
- Deleting expired data regularly will cause the lock to be held for too long and needs to be optimized
- Closing cache instances requires optimized processing
- Optimized according to business scenarios (specific business scenarios)
Iteration point:
- Add asynchronous load cache function
- ...... (thinking)
code of this article has been uploaded to github: https://github.com/asong2020/go-localcache
Okay, this article ends here, my name is asong
, see you in the next issue.
Welcome to the public account: [Golang DreamWorks]
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。