2
A dictionary, also known as a symbol table, associative array, or map , is an abstract data structure used to hold key-value pairs . In a dictionary, a key can be associated with a value, and these associated keys and values are called key-value pairs. The key in the key-value pair is 唯一的 , we can find or update the corresponding 键key through the mapping according to 值value .

Many advanced development languages have corresponding sets to support the data structure of dictionary, such as Java set in Map . C语言 no built-in dictionary data structure, Redis build its own dictionary implementation.

application

The dictionary is widely used in Redis 0caa30bb25e05dfbc10279ec126ec93d---, Redis database is the realization of using dictionary as the data bottom layer. The operations of adding, deleting, modifying and checking data are also based on the dictionary.

When executing the command:

 set msg "hello"

Create a key-value pair in the database with a key of msg and a value of hello . This 键值对 is implemented using a dictionary.创建其他数据类型的键值对,比如listhashsetsort set的。

Processing is used to represent the key-value pairs in the data. The dictionary is still one of the underlying implementations of the hash data type, such as a hash data type website , including 100 key-value pairs where the key is 网址名称 and the value is 网页地址 :

 redis> HGETALL website
1)"Redis"
2)"Redis.io"
3)"nacos"
4)"nacos.io"
.....

website The bottom layer of the key is a dictionary, packaged 100 键值对 , for example:

  • The key in the key-value pair is "Redis" and the value is "Redis.io" .
  • The key in the key-value pair is "nacos" and the value is "nacos.io" .

Implementation of the dictionary

Redis dictionary uses a hash table as the underlying implementation. There are multiple hash table nodes in a hash table, and each hash table node stores the key-value pairs in the dictionary.

hash table

Redis The hash table used by the dictionary is represented by the dict.h/dicttht structure:

 /* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
     // table 数组 
    dictEntry **table;
    // 哈希表的大小
    unsigned long size;
    // 等于size-1,用于计算索引值
    unsigned long sizemask;
    // 已有的键值对数量
    unsigned long used;
} dictht;

Note: This is a hash table structure, each dictionary has two implementations of incremental rehashing, from the old hash table to the new hash table.

  • table attribute is an array, each element in the array points to the hash node dictEntry , and each dictEntry structure holds a key-value pair.
  • size records the size of the hash table, which is table the size of the array.
  • used attribute records the number of key-value pairs that the hash table currently has. The value of ---259a793ee6046d663f10afd2c516f5c9 sizemask is equal to size-1 , which together with the hash table determines that the key should be placed in that position of the table array.

The following figure shows an empty hash table of size 4 (without key-value pairs containing tasks):

image.png

hash table node

Hash table nodes are represented by dictEntry structures, each dictEntry structure holds a 键值对 :

 typedef struct dictEntry {
    // 键
    void *key;
    // 值   
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
    } v;
    // 指向下一个哈希表节点,形成链表
    struct dictEntry *next;
} dictEntry;

in:

  • key save key value
  • v holds the value, v can be a pointer, can be a uint64_t integer, or can be an integer int64_t
  • next pointer to another hash table node, this pointer connects multiple key-value pairs of 哈希值相同 hash冲突 .

The figure below shows two 键的hash值 same hash table nodes k0 and k1 , which are connected by the next pointer.

image.png

dictionary

The dictionary in Redis is represented by the dict.h/dict structure:

 typedef struct dict {
    // 类型特定的函数 
    dictType *type;
   // 私有函数
    void *privdata;
    // 哈希表
    dictht ht[2];
    // rehash 索引 
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int iterators; /* number of iterators currently running */
} dict;
  • type attributes and privadata attributes are for different types of key-value pairs, set for creating polymorphic dictionaries.
  • type dictType结构的指针, dictType包含几组针对特定类型的键值对操作的函数, Redis Different functions are set for dictionaries with different purposes. The following figure shows dictType dictionary type:

     typedef struct dictType {
      // 计算哈希值
      unsigned int (*hashFunction)(const void *key);
      // 复制键
      void *(*keyDup)(void *privdata, const void *key);
      // 复制值
      void *(*valDup)(void *privdata, const void *obj);
      // 对比键
      int (*keyCompare)(void *privdata, const void *key1, const void *key2);
      // 销毁键
      void (*keyDestructor)(void *privdata, void *key);
     // 销毁值 
     void (*valDestructor)(void *privdata, void *obj);
    } dictType;
  • privdata Attribute holds optional parameters passed to functions for different types of operations.
  • ht[2] is an array containing two numbers, and the type is dictht hash table. The dictionary only uses the ht[0] hash table, ht [1] will only use the ht[0] hash table for rehash .
  • rehashidx records the progress of rehash , if there is currently no progress rehash , then its value is -1

The following figure shows a dictionary in a normal state (without rehash):

image.png

hash algorithm

When adding a new key-value pair to the dictionary, the program needs to first calculate the hash value and index value according to the key in the key-value pair, and then put the hash table containing the new key value in the hash table according to the index value. at the specified index of the list array.

Redis The steps to calculate the hash value and index value are as follows:

  1. Calculate the hash value of the key using the hash function set by the dictionary.

    hash = dict—>type->hashFunction(key)
  2. Use the sizemask attribute of the hash table and the hash value, and calculate the hash value by taking the remainder.

    index = hash & dict ->ht[x].sizemask

Students who have understood the underlying principle of HashMap should know that the principle of calculating the index value above and HashMap finding the index subscript is similar.

What is the remainder & operation?

The remainder is to calculate the remainder of the division of two numbers. For example, an array with a length of 4 and an index range of 0~3 needs to be placed 0,1,7 as shown in the following figure:

image.png

For example, to add a key-value pair k0 and v0 to the empty dictionary table below:

image.png

First compute the hash of the key:

 hash = dict—>type->hashFunction(key)

Calculate the hash of the key k0 .
Assuming the calculated hash value is 8 , then calculate the index value:

 index = dict -> hash & ht[0].sizemask = 8 & 3 = 0;

Calculate the index value of the key k0 0 , which means that the key-value pair k0 and v0 of the hash table are placed under the array mark 0 , as shown in the following figure:

image.png

key conflict

键冲突 happens when two or more calculated array index values match.

Redis' hash table uses 链表法 to resolve key conflicts, each hash table node has a next pointer, and multiple hash table nodes use next Pointers form a singly linked list, and multiple nodes assigned to the same array index are connected using a singly linked list, which solves the problem of 键冲突 .

For example, the program wants to add a key-value pair k2 and v2 to the hash table below, and calculate the index value of k2 2 , then the keys k1 and k2 will conflict:

image.png

The solution to the conflict is to use the next pointer to connect the nodes where k2 and k1 are located, as shown in the following figure:

image.png

Summarize

  • A dictionary is a mapped 键值对 data structure, in which the keys are unique, and the corresponding value can be quickly found through the unique key.
  • The dictionary contains widely used Redis database.

    • The key-value pairs of all data types use dictionaries as the underlying implementation.
    • Hash type key-value pair is also implemented based on dictionary.
  • The structure of the dictionary

    • Contains a dictionary, including functions processed according to a specific type dictType , two hash tables ht[2] , the dictionary is only used ht[0] will be used when the expansion is encountered ht[1] .
    • 两个哈希表dictht tablesize记录数组的大小, sizemask Equal to size-1 , sizemask and the hash value determines where the data exists in table . used Record existing key-value pairs.
    • 哈希表节点dictEntry 结构保存一个键值对,其中 key 保存键, V 保存值, V 可以是一个指针、可以是 uint64_t 整数、也可以是 int64_t 的整数。 next 是为了解决 Key hash conflict ,两个键的计算出的索引在数组的同一个地方,就使用 next` pointers are linked together.
  • To add a new key-value pair, firstly by calling dict—>type->hashFunction(key) calculation 键的哈希值 , and then dictht sizemask to do the remainder operation to get the db operation of ---1ff75f6801da5a97070 To store the index position of the table array. If 键冲突 occurs, use 链表法 to form a singly linked list of multiple hash nodes through next pointers.
  • Redis dictionary implementation and Java HashMap the data structure have the following similar points:

    • Determine the index position: The key first uses the hash algorithm to calculate the hash value, and then performs the remainder operation with the array 长度-1 to determine the subscript of the storage array.
    • Resolve hash conflict: The indexes of the two key value calculations are consistent, using 链表法 to connect multiple nodes together through the next pointer.

refer to

Redis design and implementation


小码code
16 声望11 粉丝