1

foreword

In the article Redis- Object Type , the hash object in Redis has been studied. It is known that the underlying data structure of the hash object uses the dictionary dict data structure. In fact, the data in Redis is stored in the form of key-value . In dict , a schematic diagram of the dict data structure can be represented as follows.

That is, a dict data structure holds two hash tables dicttht , each dicttht holds a node array of storage elements, each pair of key-value pairs will be encapsulated into a dictEntry node and then added to the node array, when there is a hash In the event of a collision, Redis uses the zipper method to resolve hash collisions. However, the default capacity of the dictEntry array is 4, and the probability of hash collision is extremely high. If the capacity is not expanded, the time complexity of the hash table will deteriorate to O(logN) , so when certain conditions are met, it is necessary to perform The expansion of the dictEntry array is the expansion of Redis . This article will study the expansion mechanism of Redis .

Redis source version: 6.0.16

text

1. Expansion timing of Redis

Redis will trigger expansion in the following two situations.

  • If there is no fork the child process is performing RDB or AOF persistence, once it satisfies ht[0].used >= ht[0].size , the expansion will be triggered at this time;
  • If there is fork when the child process executes RDB or AOF persistence, it needs to meet ht[0].used > 5 * ht[0].size , and the expansion is triggered at this time.

The following will combine the source code to learn the expansion timing of Redis . When adding or updating data to dict , the corresponding method is the dictReplace() method located in the dict.c file, as shown below.

 int dictReplace(dict *d, void *key, void *val) {
    dictEntry *entry, *existing, auxentry;

    //如果添加成功,dictAddRaw()方法会把成功添加的dictEntry返回
    //返回的dictEntry只设置了键,值需要在这里调用dictSetVal()方法进行设置
    entry = dictAddRaw(d,key,&existing);
    if (entry) {
        dictSetVal(d, entry, val);
        return 1;
    }

    //执行到这里,表明在哈希表中已经存在一个dictEntry的键与当前待添加的键值对的键相等
    //此时应该做更新值的操作,且existing此时是指向这个已经存在的dictEntry
    auxentry = *existing;
    //更新值,即为existing指向的dictEntry设置新值
    dictSetVal(d, existing, val);
    //释放旧值
    dictFreeVal(d, &auxentry);
    return 0;
}

dictReplace() The method will perform the addition or update of the key-value pair. If the key of the dictEntry does not exist in the hash table and the key of the key-value pair to be added is equal, a new one will be created based on the key-value pair to be added. dictEntry is inserted into the hash table by head insertion, and 1 is returned at this time; if the key of the dictEntry in the hash table is equal to the key of the key-value pair to be added, a new value is set for the existing dictEntry and the old one is released. value, then returns 0. Usually, the expansion is triggered, and the trigger time is usually when adding a key-value pair, so continue to analyze the dictAddRaw() method, and its source code implementation is as follows.

 dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing) {
    long index;
    dictEntry *entry;
    dictht *ht;

    //判断是否在进行rehash,如果正在进行rehash,则触发渐进式rehash
    //dictIsRehashing()方法在dict.h文件中,如果dict的rehashidx不等于-1,则表明此时在进行rehash
    if (dictIsRehashing(d)) _dictRehashStep(d);

    //获取待添加键值对在哈希表中的索引index
    //如果哈希表已经存在dictEntry的键与待添加键值对的键相等,此时_dictKeyIndex()方法返回-1
    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)
        return NULL;
    
    //如果在进行rehash,待添加的键值对存放到ht[1],否则存放到ht[0]
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    //为新dictEntry开辟内存,此时dictEntry的键和值尚未设置
    entry = zmalloc(sizeof(*entry));
    //头插的方式插入哈希表index位置的哈希桶中
    entry->next = ht->table[index];
    ht->table[index] = entry;
    //哈希表的当前大小加1
    ht->used++;

    //为新dictEntry设置键
    dictSetKey(d, entry, key);
    return entry;
}

dictAddRaw() method will first judge whether it is currently in the rehash stage (judging whether it is currently expanding), and if it is rehash , it will trigger a hash bucket migration operation (this will be analyzed in detail later), and then _dictKeyIndex() Method to obtain the index of the key-value pair to be added in the hash table, if the obtained index is -1, it means that the key of the dictEntry is equal to the key of the key-value pair to be added, at this time dictAddRaw() method returns NULL to tell the method caller that the update operation needs to be performed. If the index is not -1, create a new dictEntry based on the key-value pair to be added and insert the hash of the index position of the hash table by head insertion bucket, then update the current size of the hash table and set the key for the new dictEntry . The trigger of expansion is in the _dictKeyIndex() method, and its source code implementation is as follows.

 static long _dictKeyIndex(dict *d, const void *key, uint64_t hash, dictEntry **existing) {
    unsigned long idx, table;
    dictEntry *he;
    if (existing) *existing = NULL;

    //在_dictExpandIfNeeded()方法中判断是否需要扩容,如果需要,则执行扩容
    if (_dictExpandIfNeeded(d) == DICT_ERR)
        return -1;
    for (table = 0; table <= 1; table++) {
        //将待添加键值对的键的hash值与哈希表掩码相与以得到待添加键值对在哈希表中的索引
        idx = hash & d->ht[table].sizemask;
        //遍历哈希表中idx索引位置的哈希桶的链表
        he = d->ht[table].table[idx];
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key)) {
                //链表中存在dictEntry的key与待添加键值对的key相等
                //此时让existing指向这个已经存在的dictEntry,并返回-1
                //表明存在dictEntry的key与待添加键值对的key相等且existing已经指向了这个存在的dictEntry
                if (existing) *existing = he;
                return -1;
            }
            //继续遍历链表中的下一个节点
            he = he->next;
        }
        if (!dictIsRehashing(d)) break;
    }
    //执行到这里,表明哈希表中不存在dictEntry的key与待添加键值对的key相等,返回索引idx
    return idx;
}

At the beginning of the _dictKeyIndex() method, the --- _dictExpandIfNeeded() _dictExpandIfNeeded() method will be called to determine whether expansion is required. If necessary, the expansion logic will be executed. If an error occurs during the process, the status code 1 will be returned, which is the value represented by the DICT_ERR field. At this time, the _dictKeyIndex() method directly returns -1. If the expansion is not required or the expansion is successful, the key-value pair to be added will be The hash value of the key is combined with the hash table mask to obtain the index of the key-value pair to be added in the hash table, and then traverse the linked list of the hash bucket at the index position to see if a dictEntry key and the key to be added can be found. The keys of the pair are equal, if such a dictEntry can be found, return -1 and let existing point to this dictEntry , otherwise return the previously computed index. It can be seen that the logic of judging expansion and executing expansion is in the _dictExpandIfNeeded() method, and its source code implementation is as follows.

 static int _dictExpandIfNeeded(dict *d) {
    //如果已经在扩容,则返回状态码0
    if (dictIsRehashing(d)) return DICT_OK;

    //如果哈希表的容量为0,则初始化哈希表的容量为4
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    //如果哈希表的当前大小大于等于容量,并且dict_can_resize为1或者当前大小大于容量的五倍
    //此时判断需要扩容,调用dictExpand()方法执行扩容逻辑,且指定扩容后的容量至少为当前大小的2倍
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio)) {
        return dictExpand(d, d->ht[0].used*2);
    }
    return DICT_OK;
}

According to the source code of the _dictExpandIfNeeded() method, to trigger the expansion, the first condition that needs to be met is that the current size of the hash table is greater than or equal to the capacity of the hash table, and then judge whether Redis currently allows expansion. Then execute the expansion logic. If the expansion is not allowed, then judge whether the current size of the hash table has been larger than 5 times the capacity of the hash table. If it is larger, the expansion logic is enforced. The _dictExpandIfNeeded() method has two important parameters, namely dict_can_resize and dict_force_resize_ratio , which are defined in the dict.c file, and the initial values are as follows.

 static int dict_can_resize = 1;
static unsigned int dict_force_resize_ratio = 5;

Then finally, we need to analyze the source code, when will the value of dict_can_resize be changed, there are the following two methods in the dict.c file, which will set the value of dict_can_resize to 1 or 0, as shown below.

 void dictEnableResize(void) {
    dict_can_resize = 1;
}

void dictDisableResize(void) {
    dict_can_resize = 0;
}

These two methods will be called by the updateDictResizePolicy() method in the server.c file, as shown below.

 void updateDictResizePolicy(void) {
    //如果有正在执行RDB或AOF持久化的子进程,hasActiveChildProcess()方法返回true
    if (!hasActiveChildProcess())
        //没有正在执行RDB或AOF持久化的子进程时将dict_can_resize设置为1,表示允许扩容
        dictEnableResize();
    else
        //有正在执行RDB或AOF持久化的子进程时将dict_can_resize设置为0,表示不允许扩容
        dictDisableResize();
}

So far, the source code analysis of Redis 's expansion timing ends. Now let's go to the subsection: When adding or updating data to Redis , it will judge whether the current size of the hash table storing the data is greater than or equal to the hash table capacity. It is judged whether Redis allows expansion, and the basis for determining whether Redis allows expansion is whether there are currently sub-threads executing RDB or AOF persistence. If it exists, expansion is not allowed. Otherwise, expansion is allowed. To judge whether to force expansion, the judgment is based on whether the current size of the hash table that stores the data has been greater than 5 times the capacity of the hash table.

The flow chart is given below to illustrate the whole source code process of triggering expansion.

2. Redis expansion steps

When expansion is required, ht[1] in dict will be used at this time. The expansion steps of Redis are as follows.

  • Calculate the capacity size of ht[1] , that is, the capacity after expansion. The capacity of ht[1] is greater than or equal to ht[0].used * 2 and is the minimum value of the power of 2;
  • Set the value of size and sizemask fields for ht[1] , initialize the used field to 0, and allocate space for the dictEntry array;
  • Set the rehashidx field of dict to 0, indicating that progressive rehash is enabled at this time, and Redis will gradually migrate the dictEntry on ht[0] to ht[1] through progressive rehash ;
  • When all key-value pairs of ht[0] are stored in ht[1] , the memory space of ht[0] is released, and then ht[1] becomes ht[0] .

Note that the above steps are only for normal expansion. If it is the initialization of ht[0] , it is slightly different from the above steps, and will not be repeated here. When there are too many key-value pairs in the dict , rehash will be very time-consuming, so Redis adopts a progressive rehash method to complete the expansion. The rehashidx field in the dict is used to record the index of the hash bucket that has been rehashed , and the progressive The type of rehash means that Redis will not migrate the key-value pairs on ht[0] to ht[1] at one time, but will migrate part of it at certain time points, as shown below.

  • When adding, deleting, modifying and checking data, a hash bucket will be migrated from ht[0] to ht[1] ;
  • Redis will periodically migrate a part of the hash bucket from ht[0] to ht[1] .

Note in particular that if a new key-value pair is added during the progressive rehash process, it will be added directly to ht[1] .

The following will combine the Redis source code to learn the expansion steps of Redis . As known in the first section, the method to execute the expansion logic is the dictExpand() method of the dict.c file, and its source code implementation is as follows.

 int dictExpand(dict *d, unsigned long size) {
    //如果正在rehash或者ht[0]的当前大小大于了扩容后的大小的最小值
    //此时返回状态码1,表示扩容发生异常
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;

    //n就是扩容后的哈希表
    dictht n;
    //得到一个大于等于size的满足2的幂次方的最小值作为n的容量
    unsigned long realsize = _dictNextPower(size);

    //如果扩容后的哈希表的容量与老哈希表容量一样
    //此时返回状态码1,表示扩容发生异常
    if (realsize == d->ht[0].size) return DICT_ERR;

    //为n设置容量size
    n.size = realsize;
    //为n设置掩码sizemask
    n.sizemask = realsize-1;
    //为n的数组分配空间
    n.table = zcalloc(realsize*sizeof(dictEntry*));
    //初始化n的当前大小used为0
    n.used = 0;

    //如果是初始化哈希表,那么直接将ht[0]置为n
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }

    //执行到这里,表明是非初始化哈希表的扩容,将ht[1]置为n
    d->ht[1] = n;
    //将dict的rehashidx字段设置为0,表示开启渐进式rehash
    d->rehashidx = 0;
    return DICT_OK;
}

dictExpand() The main logic of the method is to set the size and sizemask field values for ht[1] , initialize the used field to 0, and allocate space for the dictEntry array, and finally set the rehashidx field of dict to 0 to enable Progressive rehash . Let's take a look at when to migrate key-value pairs in combination with the source code. First, in the first section, we will analyze the dictAddRaw() method as mentioned, dictAddRaw() method will determine whether the current In the rehash stage, if the rehash is in progress, a migration operation of the hash bucket will be triggered. The corresponding method of this migration operation is the _dictRehashStep() method of the dict.c file. The source code is implemented as follows.

 static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}

Continue to see the implementation of the dictRehash() method.

 //参数n表示本次rehash需要迁移的哈希桶个数
int dictRehash(dict *d, int n) {
    //允许遍历的最大空桶数
    int empty_visits = n*10;
    //如果没有在进行渐进式rehash,则返回
    if (!dictIsRehashing(d)) return 0;

    //在ht[0]当前大小不为0的前提下
    //需要迁移多少个哈希桶,就循环多少次,每次循环迁移一个哈希桶
    while(n-- && d->ht[0].used != 0) {
        dictEntry *de, *nextde;

        //rehashidx的值不能大于等于ht[0]的容量
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        //如果哈希表ht[0]的rehashidx位置的哈希桶是空,则继续遍历下一个哈希桶
        //如果遍历的空桶数达到了empty_visits,则本次rehash结束,直接返回
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        //得到ht[0]的rehashidx位置的哈希桶
        de = d->ht[0].table[d->rehashidx];
        //遍历并将rehashidx位置的哈希桶的链表上的节点全部迁移到ht[1]上
        while(de) {
            uint64_t h;

            nextde = de->next;
            //将链表节点的键的hash值与ht[1]的掩码相与得到当前节点在ht[1]上的索引
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            //使用头插法插入ht[1]
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            //ht[0]的当前大小减1
            d->ht[0].used--;
            //ht[1]的当前大小加1
            d->ht[1].used++;
            //继续迁移链表的下一节点
            de = nextde;
        }
        //全部迁移完成后,将ht[0]的rehashidx位置置为空
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }

    //判断是否将ht[0]上的键值对全部迁移到了ht[1]
    if (d->ht[0].used == 0) {
        //如果ht[0]上的键值对全部迁移到了ht[1]
        //先释放ht[0]的数组空间
        zfree(d->ht[0].table);
        //然后将ht[0]置为ht[1]
        d->ht[0] = d->ht[1];
        //重置ht[1]
        //即将ht[1]的数组置为空,容量,掩码和当前大小全部置为0
        _dictReset(&d->ht[1]);
        //将dict的rehashidx字段设置为-1,表示rehash结束
        d->rehashidx = -1;
        return 0;
    }

    return 1;
}

dictRehash() The method has two parameters, the first is the dict that needs to be rehashed , and the second is the number of hash buckets that need to be migrated. , the number of hash buckets to be migrated is 1. At the beginning of the dictRehash() method, a maximum number of empty buckets is defined, which is 10 times the number of this migration, because many empty buckets may be encountered when traversing the hash table, so in order to avoid The time consumption caused by traversing a large number of empty buckets, Redis stipulates that during this rehash migration, if the number of empty buckets encountered reaches 10 times the number of hash buckets that need to be migrated this time, the migration will be stopped and returned. In the dictRehash() method, the migration of each hash bucket is actually traversing the linked list on the hash bucket, recalculating an index based on ht[1] for each linked list node and migrating to ht[1 ] ] on. At the end of the dictRehash() method, you need to judge whether all the data on ht[0] has been migrated to ht[1] . If the migration has been completed, you need to release the old ht[0] first. Array space, then set ht[0] to ht[1] , then reset ht[1] to make its array empty, capacity, mask and current size are all set to 0, and finally set the rehashidx field of dict to -1, indicating the end of rehash .

In addition to the addition, deletion and modification of data, the dictRehash() method will be called to migrate the hash bucket, and Redis will also periodically call the dictRehash() method to migrate the hash bucket. This timed task method It is the serverCron() method of the server.c file. In this method, the databasesCron() method of the server.c file will be called, which will process the incremental execution of the background operation in the Redis database. , these operations include progressive rehash , so in the databasesCron() method, rehash is executed by calling the incrementallyRehash() method of the server.c file, and then in the incrementallyRehash() The dictRehashMilliseconds() method of the dict.c file is called in the method, and the dictRehashMilliseconds() method is actually called in the dictRehash() method to execute the logic of migrating the hash bucket, dictRehashMilliseconds() The source code implementation of the method is shown below.

 int dictRehashMilliseconds(dict *d, int ms) {
    long long start = timeInMilliseconds();
    int rehashes = 0;

    //在1毫秒的时间内循环进行迁移
    //每次循环迁移100个哈希桶
    while(dictRehash(d,100)) {
        rehashes += 100;
        if (timeInMilliseconds()-start > ms) break;
    }
    return rehashes;
}

So at this point, the source code of Redis 's expansion steps has been analyzed.

Summarize

The data in Redis is stored in the dictionary dict data structure, a dict data structure holds two hash tables dicttht , each dicttht holds a dictEntry array for storing data, and each piece of data is encapsulated in the form of key-value pairs as A dictEntry node is then added to the dictEntry array. When there is a hash conflict, Redis uses the zipper method to resolve the hash conflict, but the default capacity of the dictEntry array is 4, and the probability of a hash conflict is extremely high. It will cause the time complexity of the hash table to deteriorate to O(logN) , so when certain conditions are met, the expansion of the dictEntry array is required, that is, the expansion of Redis .

The timing of the expansion of Redis is summarized as follows.

  • If there is no fork the child process is performing RDB or AOF persistence, once it meets ht[0].used >= ht[0].size , the expansion will be triggered at this time;
  • If there is fork when the child process executes the persistence of RDB or AOF , it needs to satisfy ht[0].used > 5 * ht[0].size , and the expansion is triggered at this time.

The dict data structure of Redis usually only uses one of the two hash tables, namely ht[0] , but when expansion is required, another hash table ht[1 of dict will be used at this time. ] , the expansion steps of Redis are as follows.

  • Calculate the capacity size of ht[1] , that is, the capacity after expansion. The capacity of ht[1] is greater than or equal to ht[0].used * 2 and at the same time is the minimum value of the power of 2;
  • Set the value of size and sizemask fields for ht[1] , initialize the used field to 0, and allocate space for the dictEntry array;
  • Set the rehashidx field of dict to 0, indicating that progressive rehash is enabled at this time, and Redis will gradually migrate the dictEntry on ht[0] to ht[1] in the unit of hash bucket through progressive rehash ;
  • When all key-value pairs of ht[0] are stored in ht[1] , the memory space of ht[0] is released, and then ht[1] becomes ht[0] .

When there are too many key-value pairs in the dict , rehash will be very time-consuming, so Redis adopts a progressive rehash method to complete the expansion. The rehashidx field in the dict is used to record the index of the hash bucket that has been rehashed , and the progressive The type of rehash means that Redis will not migrate the key-value pairs on ht[0] to ht[1] at one time, but will migrate part of it at certain time points, as shown below.

  • When adding, deleting, modifying and checking data, a hash bucket will be migrated from ht[0] to ht[1] ;
  • Redis will periodically migrate a part of the hash bucket from ht[0] to ht[1] .

Note in particular that if a new key-value pair is added during the progressive rehash process, it will be added directly to ht[1] .


半夏之沫
68 声望33 粉丝