Teach you how to find the HASH table

Abstract: maps a set of keywords to a limited, continuous address set (interval) based on the set hash function H(key) and the selected method of handling conflicts, and uses the keyword The "image" in the address set serves as the storage location of the corresponding record in the table. The lookup table constructed in this way is called a "hash table".

This article is shared from Huawei Cloud Community " Search-HASH ", the original author: ruochen.

For frequently used lookup tables, hope that ASL = 0
There is a certain relationship between the position of the record in the table and its key

HASH

definition

The set hash function H (Key) and the selected conflict method, a set of keywords mapped to a limited set of consecutive addresses on the address (section), and to The "image" of the keyword in the address set is used as the storage location of the corresponding record in the table. The lookup table constructed in this way is called "hash table"

The construction of HASH function

Construction principle
- The function itself is easy to calculate
- The calculated addresses are evenly distributed, that is, the probability of any keyword k, f(k) corresponding to different addresses is equal, and the purpose is to minimize conflicts

Direct addressing

Hash function is a linear function of keywords
- H(key) = key
- H(key) = a * key + b

This method is only suitable for:
The size of the address collection
Advantages: a linear function value of the key code key is used as a hash address, no conflicts will occur
Disadvantages: occupies continuous address space, low space efficiency

Digital analysis

Assuming that each keyword in the keyword set is composed of s digits (u1, u2, …, us), analyze the entire keyword set, and extract several evenly distributed bits or their combination as the address
This method is only suitable for:
can pre-estimate the frequency of various numbers appearing on each digit of all keywords

Square method

Use the middle bits of the square value of the key as the storage address. The purpose of finding the “square value of the keyword” is to “enlarge the difference”, and at the same time the middle bits of the square value can be affected by the bits in the entire keyword
This method is suitable for:
Every digit in the keyword has certain numbers that repeat frequently

Folding method

Divide the keyword into several parts, and then take their superimposed sum as the hash address. There are two methods of superposition processing: shift superposition and boundary superposition
This method is suitable for:
keyword has a very large number of digits

Divide and leave remainder method

Hash(key)=key mod p (p is an integer)
p≤m (table length)
- p should be the largest prime number less than or equal to m
- impose restrictions on p?

Given a set of keywords: 12, 39, 18, 24, 33, 21 If p=9, their corresponding hash function value will be:
3, 3, 0, 6, 6, 3

It can be seen that if p contains prime factor 3, all keywords containing prime factor 3 are mapped to the address of "multiple of 3", which increases the possibility of "conflict"

Random number method

H(key) = Random(key) (Random is a pseudo-random function)
This method is used to construct a hash function for keywords of unequal length

Considerations

Execution speed (that is, the time required to calculate the hash function)
Keyword length
The size of the hash table
Keyword distribution
Find frequency

The method of constructing the hash function depends on the key set of the table.
principle of 160ee452ad47eb is to reduce the possibility of conflict to as small as possible

Ways to deal with conflicts

1. Open addressing method

Basic idea

When there is a conflict, look for the next empty hash address. As long as the hash table is large enough, the empty hash address can always be found, and the data element is stored

Linear detection method

Hi=(Hash(key)+di) mod m ( 1≤i < m )
Among them: m is the length of the hash table
di is the increment sequence 1, 2,...m-1, and di=i
Once there is a conflict, find the next empty address and deposit

Advantages: As long as the hash table is not filled, guaranteed to find an empty address unit
Disadvantages: The synonym of the i-th hash address can be stored in the i+1-th address, so that the element that should be stored in the i+1-th hash address becomes a synonym of the i+2th hash address. ……, the gathering " occurs, which reduces the search efficiency
Second detection method
di = 12, -12, 22, -22, …±k2

Pseudo-random detection

Hi=(Hash(key)+di) mod m ( 1≤i < m )
Among them: m is the length of the hash table
di is a random number

Open addressing method to establish a hash table steps

- 取数据元素的关键字key，计算其哈希函数值（地址）。若该地址对应的存储 空间还没有被占用，则将该元素存入；否则执行step2解决冲突
- 根据选择的冲突处理方法，计算关键字key的下一个存储地址。若下一个存储地址仍被占用，则继续执行step2，直到找 到能用的存储地址为止

#### 开放定址哈希表的存储结构
/* ------------- 开放定址哈希表的存储结构 ------------- */

int hashsize[] = {997, ...};
typedef struct{
    ElemType* elem;
    int count;  // 当前数据元素个数
    int sizeindex;  // hashsize[sizeindex]为当前容量
} HashTable;

#define SUCCESS 1
#define UNSUCCESS 0
#define DUPLICATE -1

Status SearchHash(HashTable H, KeyType K, int &p, int &c){
    // 在开放定址哈希表H中查找关键码为K的记录
    p = Hash(K);  // 求得哈希地址
    while(H.elem[p].key != NULLKEY && !EQ(K, H.elem[p].key))
        collisiion(p, ++c);  // 求得下一探测地址p
    if(EQ(K, H.elem[p].key)) return SUCCESS;  // 查找成功，返回待查数据元素位置 p
    else return UNSUCCESS;  // 查找不成功
}

2. Re-HASH method

H2(key) is another hash function set, and its function value should be relatively prime to m

3. Chain address method

Basic idea

The records of the same hash address are chained into a singly linked list, m hash addresses set up m singly linked lists , and then use an array to store the head pointers of m singly linked lists to form a dynamic structure

advantage:

Non-synonyms will not conflict, no "gathering" phenomenon
Dynamic application for node space on the linked list is more suitable for situations where the length of the list is uncertain

Hash table lookup

For a given value K, calculate the hash address i = H(K)

If r[i] = NULL, the search is unsuccessful
If r[i].key = K, the search is successful, otherwise "seeking the next address Hi", until r[Hi] = NULL (search is unsuccessful) or r[Hi].key = K (search is successful)

Case v01

Linear detection method to resolve conflicts

Case v02

Chain address method to handle conflicts

Analysis of hash table lookup

From the search process, it is known that the average search length of the hash table search is actually not equal to zero

Factors that determine the ASL for hash table lookup

Hash function selected
The method of choice to handle conflicts
The degree of saturation of the hash table, the loading factor α=n/m value size (n—the number of records, m—the length of the table)

The larger the α, the more records in the table, indicating that the fuller the table is, the greater the possibility of conflicts, and the greater the number of comparisons when searching

Has a very good average performance for hash table technology, better than some traditional technologies
Chain address method is better than open address method
The method of removing the remainder as a hash function is better than other types of functions

Hash table application example

Compiler management of identifiers mostly uses hash tables

constructing a hash function
- Convert each character in the identifier to a non-negative integer
- Combine the obtained integers into an integer (the first, middle and last character values can be added together, or the values of all characters can be added together)
- Adjust the result number to the range of 0~M-1, you can use the modulo method, Ki%M (M is a prime number)

Click to follow, and learn about Huawei Cloud's fresh technology for the first time~

Teach you how to find the HASH table

HASH

definition

The construction of HASH function

Direct addressing

Digital analysis

Square method

Folding method

Divide and leave remainder method

Random number method

Ways to deal with conflicts

1. Open addressing method

Basic idea

Linear detection method

Second detection method

Pseudo-random detection

Open addressing method to establish a hash table steps

2. Re-HASH method

3. Chain address method

Basic idea

Hash table lookup

Case v01

Case v02

Analysis of hash table lookup

Hash table application example

华为云开发者联盟

引用和评论

华为云开发者联盟入选 2023 中国技术品牌影响力企业榜，深耕开发者生态

可视化图解算法34：二叉搜索树的最近公共祖先

如何对接韩国和日本股票数据源API

可视化图解算法19：递归基础

你真的会封装Dialog弹窗吗？

学完这9类函数公式，你就是EXCEL大师！

六大电子表单工具深度对比：选对表单工具，告别低效