Let's continue the previous part to share, we can download the source code of redis at the following address:
Here I downloaded the redis-6.2.5 version, xdm can directly download the redis-6.2.6 version in the above picture,
The data structure of the hash table in redis
The data structure of the redis hash table is defined in:
redis-6.2.5\src\dict.h
The structure of the hash table, each dictionary has two implementations of incremental rehashing from the old table to the new table
typedef struct dictht {
dictEntry **table;
unsigned long size;
unsigned long sizemask;
unsigned long used;
} dictht;
table:
table is a secondary pointer, corresponding to this array, each element in the array points to a pointer to a dictEntry structure, and the specific data structure of dictEntry is to store a key-value pair
The specific dictEntry data structure is as follows:
size:
The size attribute records the size of the entire hash table, which can also be understood as the size of the above table array
sizemask:
The sizemask attribute, together with the specific hash value, determines where the key should be placed in the table array
The value of sizemask is always 1 less than size , we can demonstrate
Using the remainder method is actually very inefficient. Our computer will not do multiplication and division. It is also processed by addition and subtraction. For efficient processing, we can use the binary method.
Using the binary method, the field sizemask will be used, which is mainly used to perform a bitwise AND operation with a specific hash value.
As shown in the figure, size = 4, sizemask = 3, the hash value is 7, and the final hash value & sizemask = 0011 , which is 3, so it will be inserted into the specific position in the above figure
used:
The used attribute indicates the number of key-value pairs already in the hash table
For the above case, a simple diagram can be used to represent the hash table structure dicttht
The meaning of each attribute of the dictEntry structure
typedef struct dictEntry {
void *key;
union {
void *val;
uint64_t u64;
int64_t s64;
double d;
} v;
struct dictEntry *next;
} dictEntry;
Above we see that the node information in the array is a dictEntry structure , and the attributes are these meanings:
key
specific redis key
union v
val
Point to a different type of data, here is void *, use this type to save memory
u64
Sentinel mode and election mode for redis cluster
s64
record expiration time
next
pointer to the next node
dict structure
In the src\dict.h file , let's look down and see a very critical structure, dict , which is used in redis to organize
typedef struct dict {
dictType *type;
void *privdata;
dictht ht[2];
long rehashidx; /* rehashing not in progress if rehashidx == -1 */
int16_t pauserehash; /* If >0 rehashing is paused (<0 indicates coding error) */
} dict;
- type
The operation function corresponding to the field, what are the specific operation functions, we can see the information given by typedef struct dictType
- privdata
Dictionary-dependent data, such as redis specific operations, etc.
- ht[2]
The key-value pair of the hash table is placed here, an old one, a new one
ht[0] : is the array before expansion
ht[1]: is the expanded array
This is used for progressive rehash when the amount of data is large
- rehashidx
To specify the position of the specific rehash, corresponding to the index of ht[0], rehashidx == -1 , that is, no re-hash, when rehashidx != -1 , it means that re-hash is in progress
Remember we said earlier that redis has 16 db ?
We can also see the data structure of redisdb in the redis source code src\server.h
We can see that the dict dictionary is used very frequently and critically in redis
It is mentioned above that ht[2] will be used in progressive rehash, so why use progressive rehash and how does it do it?
When expanding, rehash will be triggered
When the amount of data is large, it will involve expansion . If copying from ht[0] to ht[1] at one time is relatively slow, it will block other operations, so there is no way to handle other requests, because redis is a single thread processing tasks
One way to copy ht[0] data to ht[1]
Here's how to rehash :
When expanding, rehash does this:
- First, we will open up memory space for ht[1] mentioned above, and give ht[0].size * 2 to ht[1]
- Then copy the data of ht[0] from
ht[0][0] ... ht[0][size-1]
to ht[1]
How to do it incrementally?
Using the idea of divide and conquer, no matter whether redis is currently doing persistence , when we operate redis to add, delete, modify and check , we will perform enumeration and screening, and gradually rehash ht[0][0] ... ht[0][size-1]
to in ht[1]
You can follow the code process. We start from src\server.c to register the setCommand command. The key process of code design is as follows
When chasing the dictAddRaw function, we can clearly see that when redis adds data , the header insertion method is used.
- First open up the corresponding memory for the new node
- Point the next object of the newly created node to the head of the linked list
- Then point the head of the linked list to the new node address, that is, complete a head insertion
Here we can see that a rehash is actually done
When chasing the dictRehash function, we can see the rehash function dictRehash here, we can see that the rehash method is:
- In the ht[0] array, get the bucket corresponding to rehashidx , or the index position corresponding to the foot array
- Through the index position found above, take the linked list corresponding to ht[0].table[d->rehashidx]
- Then rehash the data in the linked list in turn
Here, the parameter of n of dictRehash indicates the number of times of re-hash, and 1 time of hash means to perform one round of hashing for all the data on the linked list corresponding to this bucket of the array.
can be seen in the code
/* Get the index in the new hash table */
h = dictHashKey(d, de->key) & d->ht[1].sizemask;
Here it is dictHashKey that calculates an integer, and then performs a bitwise AND operation with the sizemask in our dicttht to get a new hash table index position
Where redis calls _dictRehashStep
By looking at the places where the _dictRehashStep function is called in the code, there are not many places. When we check the calling relationship at a time, we will know that it is indeed a progressive rehash every time we operate redis to add , delete, modify and check. These are after we expand the capacity. , how to copy the data of ht[0] to ht[1]
In actual redis, the above functions will call _dictRehashStep:
- dictAddRaw
- dictGenericDelete
- dictFind
- dictGetRandomKey
- dictGetSomeKeys
The second way to copy ht[0] data to ht[1]
The logic of the timer calling dictRehash
When there is no persistent operation in redis, the timing operation in redis will follow the timing logic, the logic is like this
We can search the redis source for where the dictRehash function is used
There are not many locations used, and we can easily find locations where operations are timed at the millisecond level
dictRehashMilliseconds
The logic here is that the while loop takes 100 rehash times as a round, and the time limit is 1ms. As long as the time does not exceed 1ms, the number of rehash that can be done is at least 100 times (100 times per round). If it exceeds 1ms, This will end the timing operation immediately.
Here we can see that the parameter passed in dictRehash(d,100)
is 100, which means rehash 100 times. Remember that the previous progressive rehash was passed in 1 time? You can read the article above.
I am here today, what I have learned, if there are any deviations, please correct me
Welcome to like, follow, favorite
Friends, your support and encouragement are the motivation for me to persist in sharing and improve quality
Okay, here it is this time
Technology is open, and our mentality should be open. Embrace change, live in the sun, and move forward.
I am the native of Abingyun , welcome to like, follow and collect, see you next time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。