simple dynamic string
What is SDS
SDS
, namely Simple Dynamic String, simple dynamic string.
Instead of using C's traditional string representation (a null-terminated array of characters) directly, Redis builds an abstract type called simple dynamic string (SDS) by itself, and uses SDS as Default string representation for Redis.
In Redis, C strings are only used as string literals in some 无须对字符串值进行修改的地方
, such as printing logs.
When Redis needs more than just a string literal, but a string value that can be modified, Redis will use SDS to represent string values, such as in the Redis database, containing string values 键值
pairs are implemented by SDS under the hood.
Definition of SDS
Included in the SDS structure:
-
buf
: byte array to hold strings -
len
: record the number of bytes used in the buf array, which is equal to the length of the string saved by SDS -
free
: record the number of unused bytes in the buf array
Example:
- The value of the len attribute is equal to 5, indicating that a five-byte string is stored in the SDS
- The value of the free attribute is equal to 0, indicating that no unused space is allocated in the SDS
- The buf property is a char array, the first five bits hold five characters, and the last byte holds a null character '\0'
Note: SDS follows the null-terminated convention for C strings, the 1-byte space that holds the null character is not counted in the len property of the SDS, and this additional 1-byte space is allocated, and the null character is added to Operations such as the end of the string are automatically completed by the SDS function.
The case where free is 0 is shown just now, and when there is unused space, it is as shown below. We still use the "Redis" string.
Differences between SDS and C strings
We just said that the C language uses a character array of length N+1 to represent a string of length N, and the last element of the character array is always the null character '\0'.
This simple string representation cannot meet the security, efficiency and functional requirements of Redis for strings. Let's compare the differences between SDS and C strings next.
Constant complexity to get string length
First, the C string does not record its own length information. If you want to obtain the length of the C string, you must traverse the entire string and count each character encountered until the end identifier '\0' is encountered. The complexity is O(N).
Unlike SDS, the len attribute in SDS records the length of the SDS itself, so the complexity of obtaining the length of the SDS string is only O(1).
It should be noted that the work of setting and updating the SDS length is done automatically by the SDS API at the time of execution.
Using SDS reduces the complexity of obtaining string length from O(N) to O(1), ensuring that obtaining string length in Redis will not become a performance bottleneck.
Avoid buffer overflow
In addition to the high complexity of obtaining the length of the string, another problem caused by the C string not recording its own length is that it is easy to cause buffer overflow.
For example, the strcat function in string.h can concatenate the contents of the src string to the end of the dest string.
char *strcat(char *dest, const char *src);
Since C strings don't keep track of their length, a buffer overflow will occur if dest is not allocated enough memory to hold all the contents of the src string.
📢 It should be noted that if the two strings s1 and s2 are adjacent to each other in the memory, if not enough space is allocated when modifying the s1 string, it may overflow to the memory space where the s2 string is located, resulting in the s2 string being tampered with .
And SDS is different, SDS 空间分配策略
completely eliminates the possibility of buffer overflow. When modifying the SDS, the API will first check whether the required requirements are met, if not, it will automatically expand the capacity, and then modify it.
Example:
As shown above, at this time we execute
sdscat(s," Cluster");
First, before splicing, it will check whether the length of the current s is enough. After finding that it is not enough to splicing "Cluster", expand the capacity, and then splicing, as shown in the following figure.
📢 Note: SDS not only performs the splicing operation, but also allocates 13 bytes of unused space. Next, we will understand the space allocation strategy of SDS.
Reduce the number of memory reallocations when modifying strings
As we just said, C strings do not record the length of the string, so when adding or shortening a string, a memory reallocation operation must be performed.
- If you perform a growing string operation, such as append, you need to expand the size of the underlying array through a memory reallocation strategy before the operation. If you forget this step, a buffer overflow will occur
- If you perform a shortening string operation, such as trim, then you need to perform memory reallocation to release the space that is no longer used after performing this operation. If you forget this step, a memory leak will occur
In order to avoid this defect of C strings, SDS releases the association between the length of the string and the length of the underlying array by freeing the unused space. In SDS, the length of the buf array is not necessarily the number of characters plus one, and may also contain Bytes are not used.
By freeing unused space, SDS implements two optimization strategies: space pre-allocation and lazy space release .
1.空间预分配
As the name implies, when SDS is expanding, it will not only allocate space necessary for modification to SDS, but also allocate additional unused space to SDS.
Free space allocation strategy:
- After modification, if the length len of SDS is less than 1MB, then free space will be allocated with the same size as len. ie
len = free
- After modification, if the length len of SDS is greater than or equal to 1MB, 1MB free space will be allocated to free. For example, if SDS len is 30MB, then 1MB unused space will be allocated to free. At this time, the length of buf array is 30MB+1MB +1byte
Through this pre-allocation strategy, SDS reduces the number of memory operations required for a string that continuously grows N times from a certain number of times to a maximum of N times.
2.惰性空间释放
When performing a string shortening operation, SDS does not immediately use memory reallocation to reclaim excess space, but uses free for recording. If subsequent growth operations are performed, expansion may not be required. SDS also provides an API to free up unused space in SDS.
binary safe
We know that the end of a C string is represented by a null character, and it cannot contain a null character in the middle, otherwise it will be considered as the end of a character. And it needs to conform to a certain encoding (such as ASCII), so that the C string can only save text data, but cannot save binary data such as pictures and videos.
The API of SDS is binary safe, the program does not do any processing on the data, what it looks like when it is written, what it looks like when it is read, and it can contain null characters in SDS, because len is used in SDS to Check if the string ends.
But why is there still a null character at the end of the SDS? This is to reuse some functions in <string.h>
and avoid unnecessary code duplication.
Summarize
C string | SDS |
---|---|
Getting string length is O(N) | Getting string length is O(1) |
API is not safe and may cause buffer overflow | API is safe and will not cause buffer overflows |
Modifying a string N times requires N memory reallocations | Modifying the string N times will perform at most N memory reallocations |
Only text data can be saved | Not only can save text data but also binary data |
All functions in the <string.h> library can be used | Some functions in the <string.h> library can be used |
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。