Redis is written in C language, so let's think about how redis represents a string? Is the data structure of redis the same as the data structure of C language?
We can see the sds library function in the redis source code, and the specific implementation of sds, which have the following two files:
- sds.h
- sds.c
The specific path is: deps/hiredis/sds.h
, deps/hiredis/sds.c
The following data structures are involved in sds.h:
SDS
SDS simple Dynamic string in redis
simple dynamic string
The way to represent strings in C language is an array of characters, for example:
char data[]="xiaomotong"
If the C language needs to expand, it needs to re-allocate a larger memory to store new strings. If the string needs to be re-allocated every time, the efficiency and performance will be greatly reduced, and if a certain string is “xiaomo\0tong”
At this time, it is actually over when '\0' is encountered in C, so the actual “xiaomo\0tong”
will only read xiaomo
, and the string length is 6
Therefore, the sds data structure in redis is designed like this, which uses a member to mark the length of the string:
SDS:
free:0
len:6
char buf[]="xiaomo"
若这个时候,我们需要在字符串后面追加字符串, sds 就会进行扩容,例如在后面加上 “tong” , 那么 sds 的数据结构中的值会变成如下:
free:10
len:10
char buf[]="xiaomotong"
The last "xiaomotong"
also has \0
, which also maintains the standard of C language. The expansion of sds data structure in redis is multiplied, but to a certain level, for example When it is 1M, it will not double the expansion, but do addition, such as 1M becomes 2M, 2M becomes 3M, etc.
Advantages of SDS:
- Binary Safe Data Structures
- Memory pre-allocation mechanism to avoid frequent memory allocation
- C-compatible library functions
redis source sds data structure
Now what we see is the data structure of reids-6.2.5 sds. The previous representation of a length uses the int type, which is 32 bytes, and the length that can be represented can reach 4.2 billion . In fact, it is far from necessary to use int32 . What a waste of resources
The following data structures can be used according to different needs.
struct __attribute__ ((__packed__)) hisdshdr5 {
unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
char buf[];
};
struct __attribute__ ((__packed__)) hisdshdr8 {
uint8_t len; /* used */
uint8_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) hisdshdr16 {
uint16_t len; /* used */
uint16_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) hisdshdr32 {
uint32_t len; /* used */
uint32_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) hisdshdr64 {
uint64_t len; /* used */
uint64_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
- hisdshdr5
for lengths in the range 0 -- 2^5 - 1
- hisdshdr8
For lengths in the range 2^5-- 2^8 - 1
- hisdshdr16
For lengths in the range 2^8 -- 2^16 - 1
- hisdshdr32
For lengths in the range 2^16 -- 2^32 - 1
- hisdshdr64
For lengths in the range 2^32 -- 2^64 - 1
The above unsigned char flags
occupies 1 byte and 8 bits:
- 3 bits are used to indicate the type
- 5 bits are used to represent the length of the string
The first 3 bits, the range of numbers that can be represented is 0 - 7, for the following macros
#define HI_SDS_TYPE_5 0
#define HI_SDS_TYPE_8 1
#define HI_SDS_TYPE_16 2
#define HI_SDS_TYPE_32 3
#define HI_SDS_TYPE_64 4
#define HI_SDS_TYPE_MASK 7
The source code implementation obtains the specific data structure type through AND operation:
Let's take the hisdshdr8 data structure as an example, unsigned char flags
is like this
- len
Indicates the used length
- alloc
Preallocated space size
- flag
Indicates which data structure to use (the first 3 bits)
- buf
the actual stored string
Then, we can calculate that the remaining space of the data structure is free = alloc - len
The function under sds.h in the source code hisds hi_sdsnewlen(const void *init, size_t initlen)
Create a string using an init pointer and initlen length
hisds hi_sdsnewlen(const void *init, size_t initlen) {
void *sh;
hisds s;
// 计算type,获取需要使用的数据结构类型
char type = hi_sdsReqType(initlen);
// 现在默认使用 HI_SDS_TYPE_8 了
if (type == HI_SDS_TYPE_5 && initlen == 0) type = HI_SDS_TYPE_8;
int hdrlen = hi_sdsHdrSize(type);
unsigned char *fp; /* flags pointer. */
// 分配内存
sh = hi_s_malloc(hdrlen+initlen+1);
if (sh == NULL) return NULL;
if (!init)
memset(sh, 0, hdrlen+initlen+1);
s = (char*)sh+hdrlen;
fp = ((unsigned char*)s)-1;
// 根据不同的类型对数据结构初始化
switch(type) {
case HI_SDS_TYPE_5: {
*fp = type | (initlen << HI_SDS_TYPE_BITS);
break;
}
case HI_SDS_TYPE_8: {
HI_SDS_HDR_VAR(8,s);
sh->len = initlen;
sh->alloc = initlen;
*fp = type;
break;
}
case HI_SDS_TYPE_16: ...
case HI_SDS_TYPE_32: ...
case HI_SDS_TYPE_64: ...
}
if (initlen && init)
memcpy(s, init, initlen);
// 兼容 C 库,字符串后面加上 \0
s[initlen] = '\0';
return s;
}
- hi_sdsReqType
Calculate the data type used based on the length of the string
- hi_sdsHdrSize
According to different types, the size of the space that needs to be allocated to obtain the type
- hi_s_malloc
To open up memory, the call is alloc.h
in hi_malloc
, the specific implementation cannot be seen
- switch(type) …
According to different types, initialize the corresponding data structure
- s[initlen] = '\0'
Compatible with C library, add '\0' after the string
The underlying design principle of redis kv
How does redis store massive data?
The data in redis is stored in the form of key-value, the keys are all strings, and the values are not the same according to different data structures.
They are stored in the form of array + linked list:
- array
The array stores the address of the linked list
- linked list
The linked list stores specific data
for example:
As mentioned above, the keys in redis are all strings, so how to combine them with arrays and linked lists?
The specific logic is to use the hash function to calculate an index value of the string key according to the algorithm. This value is the index of the array. The array element corresponding to the index points to a linked list, and the linked list stores specific data.
- dict[10] as an array, each element will point to a linked list
- Now we want to insert k1 - v1 , k2 - v2 , k3 - v3
Calculated by hash function:
hash(k1) % 10 = 0
hash(k2) % 10 = 1
The reason for modulo 10 here is that the entire array can only hold 10 elements
Then the result is this
dict[0] -> (k1,v1) -> null
dict[1] -> (k2,v2) -> null
What if the index calculated by the (k3, v3) we inserted at this time conflicts with the existing data?
hash(k3) % 10 = 1
This will cause a hash conflict . When the hash conflicts, if k3 and k2 are equal, then the value corresponding to k2 will be updated directly.
If k3 is different from k2, the hash conflict will be solved by the chain address method , and (k3, v3) will be inserted into the original linked list by the head insertion method, such as:
dict[0] -> (k1,v1) -> null
dict[1] -> (k3,v3) -> (k2,v2) -> null
Summary :
- For the above hash, the same input must have the same output
- Different inputs may also have the same output. At this time, there is a hash conflict, which needs to be resolved.
References:
- redis_doc
- reids source code reids-6.2.5 Redis 6.2.5 is the latest stable version.
Welcome to like, follow, favorite
Friends, your support and encouragement are the motivation for me to persist in sharing and improve quality
You are welcome to discuss and share the details of the source code in the article. Please give more advice if there are any shortcomings. If the big guys have better learning methods, please give guidance, thank you
1. If more than 10 people interact in the comment area (excluding the author himself), the author can give away 2 Nuggets badges by lottery in their own name (the Nuggets are officially responsible)
Okay, here it is this time
Technology is open, and our mentality should be open. Embrace change, live in the sun, and move forward.
I'm the little devil boy Nezha , welcome to like, follow and collect, see you next time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。