4
头图

Preface

In the previous articles, we learned to implement Map operations based on arrays, linked lists, binary trees, and red-black trees. In this article, we will learn to implement Maps based on hash tables. This method corresponds to HashMap in java. This is also The most used method

The hash table implementation of Map is mainly divided into two steps:

  1. Convert the search key to the subscript of the array based on the hash function
  2. There are two ways to deal with conflicts in the case of hash value conflicts: zipper and linear detection

Hash function

The first step to implement a hash table is to consider how to convert a key into an array subscript. At this time, you need to use a hash function, first convert the key value into an integer, and then use the division and leave remainder method calculate The index of the array. We need a different hash function for each type of key.

In Java, hashCode has been implemented for commonly used data types. We only need to use the remainder method according to the value of hashCode to convert into the subscript of the array.

Convention in java: if the equals of two objects are equal, then the hashCode must be the same; if the hashCode is the same, equals may not be the same. For custom types of keys, we usually need to customize the implementation of hashCode and equals; the default hashCode returns the memory address of the object, and this hash value is not very good.

Let me take a look at the common types of hashCode calculations in Java

Integer

Since the value that hashCode needs to return is an int value, the implementation of Integer's hashCode is very simple, directly returning the current value value

@Override
public int hashCode() {
    return Integer.hashCode(value);
}
public static int hashCode(int value) {
    return value;
}

Long

The hashCode calculation of the Long type in Java first shifts the value unsigned to the right by 32 bits, and then XOR with the value to ensure that each bit is used, and finally it is forced to be converted to an int value

@Override
public int hashCode() {
    return Long.hashCode(value);
}

public static int hashCode(long value) {
    return (int)(value ^ (value >>> 32));
}

Double、Float

The implementation of hashCode in java for floating-point keys is to represent the keys as binary

public static int hashCode(float value) {
    return floatToIntBits(value);
}
public static int floatToIntBits(float value) {
    int result = floatToRawIntBits(value);
    // Check for NaN based on values of bit fields, maximum
    // exponent and nonzero significand.
    if ( ((result & FloatConsts.EXP_BIT_MASK) ==
          FloatConsts.EXP_BIT_MASK) &&
         (result & FloatConsts.SIGNIF_BIT_MASK) != 0)
        result = 0x7fc00000;
    return result;
}

String

Every char in java can be represented as an int value, so the string is converted into an int value

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

Soft cache

If calculating a hash value is time-consuming, then I can cache the calculated value, that is, use a variable to save the value and return it directly in the next call. String in Java is in this way.

Zippered hash table

The hash function can convert the key value into the subscript of the array; the second step is to deal with the collision conflict, that is, how to store two objects with the same hash value. There is a way that each element in the array points to a linked list to store objects with the same hash value. This way is called the zipper method

The zipper method can use the original linked list to save the key, or it can be represented by the red-black tree we implemented before. The mixture of these two methods used in java8 becomes a red-black tree after the number of nodes exceeds 8.

Here we use a simple linked list to implement a zipper hash table. The data structure uses LinkedMap that has been implemented in the previous articles. You can refer to the previous article "Achieving Map based on an Array or Linked List" .

public class SeparateChainingHashMap<K, V> implements Map<K, V> {

    private int size;
    private LinkedMap<K, V>[] table;

    public SeparateChainingHashMap(int capacity) {
        this.table = (LinkedMap<K, V>[]) new LinkedMap[capacity];
        for (int i = 0; i < capacity; i++) {
            this.table[i] = new LinkedMap<>();
        }
    }

    @Override
    public void put(K key, V value) {
        this.table[hash(key)].put(key, value);
        size++;
    }

    private int hash(K key) {
        return (key.hashCode() & 0x7fffffff) % table.length;
    }

    @Override
    public V get(K key) {
        return this.table[hash(key)].get(key);
    }

    @Override
    public void delete(K key) {
        this.table[hash(key)].delete(key);
        size--;
    }

    @Override
    public int size() {
        return size;
    }

}

This hash function uses the key's hashCode and the upper 0x7fffffff to get a non-negative integer, and then uses the division and leave remainder method to calculate the subscript of the array. Among them, the binary representation of 0x7FFFFFFF means that except for the first bit is 0, the rest are all 1, that is to say, this is the largest integer number int (because the first bit is the sign bit, 0 means it is a positive number), you can use Integer.MAX_VALUE instead

The main purpose of the hash table is to evenly distribute the key values in the array, so the hash table is unordered. If the Map you need supports extracting the maximum and minimum values, then the hash table is not suitable.

The array size of the zipper hash table we implemented here is fixed, but usually as the amount of data increases, the probability of collisions with shorter arrays increases. Therefore, the array needs to support dynamic expansion. After expansion, you need to The original value is re-inserted into the expanded array. The expansion of the array can refer to the previous article "It's time for me to review the data structure and algorithm"

Linear probe hash table

Another way to implement a hash table is to store N key values with a size of M, where M>N; to resolve collision conflicts, you need to use the space in the array; the simplest implementation of this method is linear detection.

The main idea of linear detection: When inserting a key, directly increase the index by one to check the next position after a collision occurs. At this time, there will be three situations:

  1. The next position is equal to the key to be inserted, then the value is modified
  2. The next position and the key to be inserted are not equal, then the index plus one to continue searching
  3. If the next position is still a vacant position, then directly put the object to be inserted into this vacant position

initialization

The linear detection hash table uses two arrays to store keys and values, capacity represents the size of the initialization array

private int size;
private int capacity;
private K[] keys;
private V[] values;

public LinearProbingHashMap(int capacity) {
    this.capacity = capacity;
    this.keys = (K[]) new Object[capacity];
    this.values = (V[]) new Object[capacity];
}

insert

  1. When the position of the inserted key exceeds the size of the array, you need to go back to the beginning of the array and continue searching until you find a null position; index = (index + 1) % capacity
  2. When the stored capacity of the array exceeds half of the total capacity of the array, it needs to be expanded to twice the original capacity
@Override
public void put(K key, V value) {
    if (Objects.isNull(key)) {
        throw new IllegalArgumentException("Key can not null");
    }
    if (this.size > this.capacity / 2) {
        resize(2 * this.capacity);
    }
    int index;
    for (index = hash(key); this.keys[index] != null; index = (index + 1) % capacity) {
        if (this.keys[index].equals(key)) {
            this.values[index] = value;
            return;
        }
    }
    this.keys[index] = key;
    this.values[index] = value;
    size++;
}

Dynamically adjust the size of the array

We can refer to the previous article "It’s time to review the data structure and algorithm" The dynamic adjustment of the data size implemented in 16077c151e90a6; in linear detection, the original data needs to be re-inserted into the data after expansion, because the array The size of has changed and the position of the index needs to be recalculated.

private void resize(int cap) {
    LinearProbingHashMap<K, V> linearProbingHashMap = new LinearProbingHashMap<>(cap);
    for (int i = 0; i < capacity; i++) {
        linearProbingHashMap.put(keys[i], values[i]);
    }
    this.keys = linearProbingHashMap.keys;
    this.values = linearProbingHashMap.values;
    this.capacity = linearProbingHashMap.capacity;
}

Inquire

The idea of realizing query in linear detection hash table: calculate the position of the index according to the hash function of the key to be queried, and then start to judge whether the key at the current position is equal to the key to be queried, if it is equal, then return the value directly, if not, then Continue to search for the next index until it encounters a null position, which means that the query key does not exist;

@Override
public V get(K key) {
    if (Objects.isNull(key)) {
        throw new IllegalArgumentException("Key can not null");
    }
    int index;
    for (index = hash(key); this.keys[index] != null; index = (index + 1) % capacity) {
        if (this.keys[index].equals(key)) {
            return this.values[index];
        }
    }
    return null;
}

Delete element

Linear detection type deletion is slightly more troublesome. First, you need to find the position of the element to be deleted. After deleting the element, you need to re-insert the element at the continuous position after the current index; because whether there is a space is crucial for the search of the linear detection type hash table. Important; for example: 5->7->9, after deleting 7, if the position of 7 is not inserted again, then the get method will not be able to query the 9 element;

After each deletion, the number of actual elements in the array needs to be checked once. If there is a large difference between the capacity of the array and the capacity of the array, the shrinking operation can be performed;

@Override
public void delete(K key) {
    if (Objects.isNull(key)) {
        throw new IllegalArgumentException("Key can not null");
    }
    int index;
    for (index = hash(key); this.keys[index] != null; index = (index + 1) % capacity) {
        if (this.keys[index].equals(key)) {
            this.keys[index] = null;
            this.values[index] = null;
            break;
        }
    }

    for (index = (index + 1) % capacity; this.keys[index] != null; index = (index + 1) % capacity) {
        this.size--;
        this.put(this.keys[index], this.values[index]);
        this.keys[index] = null;
        this.values[index] = null;
    }
    this.size--;
    if (size > 0 && size < capacity / 4) {
        resize(capacity / 2);
    }

}

All the source code in the article has been placed in the github warehouse:
https://github.com/silently9527/JavaCore
Reading list for programmers: https://github.com/silently9527/ProgrammerBooks

WeChat public account: Beta learns Java

Finally (point attention, don’t get lost)

There may be more or less deficiencies and mistakes in the article. If you have suggestions or comments, you are welcome to comment and exchange.

Finally, is not easy to write, please don’t use me , I hope friends can follow , because these are all the power sources for my sharing🙏


Herman
632 声望3.7k 粉丝

知识星球:Herman's Notes