In-depth analysis of the underlying implementation architecture of HashMap

Abstract: analyzes the detailed use of the Map interface and how the bottom layer of HashMap is implemented?

This article is shared from the HUAWEI CLOUD community " ] in-depth analysis of HashMap high-frequency interviews and the underlying implementation structure! 【Run it! JAVA] ", the original author: Little Grey Ape.

Everyone has heard of the Map interface, right? It is a common way to store key-value pairs in Java. I believe that you should not be unfamiliar with the HashMap. When it comes to HashMap, my friends who want to know a little bit should say: this It stores key-value pairs, and the storage method is in the form of an array and a linked list. But how does it really store and how its underlying architecture is implemented, do you have any understanding of these?

Maybe a lot of friends should talk about it. I only need to know how to use it. I don’t need to know its underlying implementation. But in fact, on the contrary, it’s not enough to know how to use it, and it’s in the Java development interview. , Questions and inspections of the underlying implementation of HashMap are already commonplace. So today I will analyze with you the detailed use of the Map interface and how the bottom layer of HashMap is implemented?

Friends slowly look down, you will definitely be rewarded after reading it!

1. What is the relationship between the Map interface and the List interface?

For this question, if we have to say what kind of relationship exists between these two interfaces, then there is only one, and they are all sets. Store data. In other cases, the connection between the Map interface and the List interface is actually not that great. Why do you say that?

first look at the List interface, about the List interface I mentioned to you before, it is inherited from the Collection interface, is a sub-interface of the Collection interface, it is only used for single-column storage of data. The inheritance relationship is as follows:

and Map interface is a top-level interface, contains a lot of different implementation classes, it is used to store key-value pairs (key:value), the inheritance relationship is as follows:

So don't confuse the relationship and usage of Map interface and List interface!

2. What are the commonly used implementation classes for Map?

We have already understood the inheritance structure of Map above, and we have also looked at many of the different implementation classes, many of which are familiar to us, such as HashMap, TreeMap and HashTable. During the interview, the interviewer often asks, what are the commonly used implementation classes under the Map interface and their functions, then we will briefly introduce and analyze these interfaces.

HashMap: mentioned above, HashMap is in the form of array + linked list + red-black tree. At the same time, the default initial capacity of its array is 16, the expansion factor is 0.75, and each time a double expansion is used . In other words, whenever the storage capacity in our array reaches 75%, we need to double the capacity of the array.

HashTable: HashTable interface is thread safe, but it was used a long time ago, and now it is almost a legacy class. is not recommended in development.

ConcurrentHashMap: This is a thread-safe Map implementation class that is currently used more frequently. Before 1.7, the segmented lock mechanism was used to achieve thread safety. But after 1.8, use the synchronized keyword to achieve thread safety.

Among them, HashMap surveys and questions are the most frequent in the interview, which is also the most in-depth understanding and mastery in daily development. So next, I will mainly analyze the implementation principle of HashMap and the frequently asked questions in the interview with you in detail.

3. Please explain the put process of HashMap?

We know that HaahMap uses the put method for data storage. There are two parameters, key and value. So how is this key-value pair stored? Let's analyze it next.

In the HashMap, an array + linked list is used. The upper layer of the HashMap uses an array to store the "same" keys, and the lower layer uses a linked list to link and store the corresponding keys and values.

Note: The same mentioned here does not necessarily mean that the value of the key is the same, but that there is a certain feature that is the same. We will continue to look down on the specific feature!

HashMap calculates the corresponding array subscript of the value to be stored according to the key. If there is no element at the position of the corresponding array subscript, then the stored element is stored, but if there is already an element at that position, then this We need to use the linked list storage we mentioned above, and store the data sequentially downwards in the order of storage of the linked list. This is the simple process of put, and the storage result is as follows:

But sometimes we store a lot of data, so if we always use the form of linked list for data storage, it may cause the length of our linked list to be very large, so whether it is deleting or inserting is very troublesome, so What should we do in this situation?

This involves the process of "treeing" and "chaining" when storing data in a linked list. So what is "treeing" and "chaining"?

When we are storing key-value pairs, if we store too much data under the same array subscript, it will cause our linked list to be too long, which will lead to troublesome deletion and insertion operations, so in java According to the provisions of when the length of the linked list is greater than 8, we will "tree" the linked list and convert it into a red-black tree (a binary tree where the value of the left node is less than the root node, and the value of the right node is greater than the root node ) , so that when we search for elements, it is similar to a binary search, and the search efficiency will be greatly increased.

But when we delete some of the nodes, the length of the linked list is no longer greater than 8. What should we do at this time? Is it necessary to quickly transform the red-black tree into the form of a linked list? In fact, it is not, only when the length of the linked list is less than 6, we will re-convert the red-black tree into a linked list. This process is called "chaining".

The process diagram is as follows:

So why do we need to "tree" when the length is 8 and "chain" when the length is less than 6? Why not just "chain" when the length is less than 8?

main reason for When an element is deleted and the length of the linked list is less than 8, it will be directly "chained", and when another element is added and the length is equal to 8, it will be "treeed" again. This is repeated. "Chaining" and "treeing" operations are particularly time-consuming and cumbersome. Therefore, the program stipulates that only when the length of the linked list is greater than or equal to 8, and "chained" when the length is less than 6. The two thresholds of 8 treeing and 6 chaining are hoped that everyone keep in mind!

4. In what order is the data stored in the linked list?

We now know how the elements in the HashMap are stored, but sometimes the interviewer may ask us, in the HashMap, are the elements stored in the linked list stored at the head node or at the tail node?

We need to know about the storage of linked list elements in HashMap.

It was inserted at the head node before JDK1.7 and at the tail node after JDK1.8.

5. How is the Hash(key) method implemented?

We now know how the elements in the HashMap are stored, so now how should we calculate the corresponding array subscript based on the key value?

We know that the initial capacity of HashMap is 16 bits. For the initial 16 data bits, if the data is calculated and stored according to the value of the key, the simplest method is generally to obtain an int value based on the key value. The method is:

int hashCode = key.hashCode()
然后对获取到的hashCode与16进行取余运算，
hashCode % 16 = 0~15

The subscripts obtained in this way are always 0-15. This is also the most primitive way to calculate hash(key).

However, in actual situations, the hash (key) calculated by this method is not optimal. The stored in the array of 160fa3ec8363fa are not the most scattered, and it is actually very inconvenient to perform remainder operations in the computer.

Therefore, in order to calculate the results as discrete as possible, the most common method for calculating array subscripts is: first calculate a hashCode according to the value of the key, and perform the exclusive OR operation on the high 18-bit binary and the low 18-bit binary of the hashCode to obtain the result Then it is ANDed with the current array length minus one. Finally get an array subscript, the process is as follows:

int hashCode = key.hashCode()
int hash = hash(key) = key.hashCode()的高16位^低16位&(n-1)  其中n是当前数组长度

At the same time, I would like to remind one thing:

The calculation of hash(key) is slightly different between JDK1.7 and JDK1.8

JDK1.8, the calculation of hash (key) was disturbed twice

JDK1.7, calculation hash (key) for nine perturbations, which are four bit operations and five XOR operations

where disturbance may be understood as the number of operations

The above is the implementation process of the Hash(key) method.

6. Why is the capacity of HashMap always a multiple of 2?

The reason why the capacity of HashMap has always been a multiple of 2 is actually related to the hash(key) algorithm mentioned above.

The reason is that only when the value of (n-1) participating in the hash(key) algorithm is as much as possible, the value obtained is discrete. If our current array length is 16, the binary representation is 10000, and the value of n-1 is 01111, so that the value of n-1 is 1 as much as possible, and the same is true for other values that are multiples of 2 after subtracting 1.

So only when the capacity length of the array is a multiple of 2, the calculated hash(key) value may be relatively discrete.

7. How to resolve Hash conflicts?

What is Hash conflict? That is, when I calculate a certain array subscript, there are already elements stored on the subscript. This is called a Hash conflict. Obviously, if our algorithm for calculating the array subscript is not good enough, it is easy to accumulate the stored data. Above the same subscript, causing too many Hash conflicts.

So how to solve the hash conflict?

The most important thing to solve is to make the array subscript calculated by the stored key as discrete as possible, that is, the hash (key) is required to be optimized as much as possible, and the length of the array is a multiple of 2. This is the main solution to the Hash conflict.

For details, you can view the underlying source code of the key parts of HashMap below:

The underlying implementation of Hash(key)

 /**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     */
    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

Low-level implementation of put(key, value) method

 /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

8. How does HashMap expand?

We mentioned above that the initial capacity of the HashMap array is 16, but it is obvious that 16 storage bits are obviously not enough, so how should the HashMap be expanded?

here needs to use a parameter called "expansion factor", the size of the "expansion factor" in HashMap is 0.75,

As we mentioned above, for an array with an initial length of 16, when the length of the data stored in it is equal to 16*0.75=12. The array element will be expanded. The expanded capacity is twice the original array capacity, which is currently 15 words, and then the expansion will expand the capacity by 32 data bits.

9. How are the elements stored after expansion?

We know that after the HashMap array is expanded, the length of the array is increased, so at this time, the newly expanded part is empty. But should we leave the following data bits empty at this time? Obviously it is impossible, this will cause a lot of waste of memory.

Therefore, after the expansion of the HashMap array, the data elements stored in the original HashMap array will be re-allocated, and the elements will be stored in the new array again. In order to make full use of the array space.

10. Comparison of the implementation of HashMap between JDK1.7 and JDK1.8

The implementation of HashMap in JDK1.7 and JDK1.8 is slightly different. Finally, we analyze and compare the differences in the implementation of HashMap between JDK1.7 and JDK1.8 based on the above explanation.

(1), the underlying data structure is different

In the put process of HashMap, there is no concept of red-black tree in JDK1.7, it is directly stored in linked list. After JDK1.8, the concept of red-black tree was introduced to optimize storage and search.

(2) The way of inserting the linked list is different

In the process of HashMap inserting elements into the linked list, JDK1.7 is inserted at the head node of the table, and after JDK1.8 it is inserted at the end node.

(3) The calculation method of Hash(key) is different

In the calculation of Hash(key), JDK1.7 performed nine perturbations, including four bit operations and five exclusive OR operations. After JDK1.8, only two perturbations were performed.

(4) The calculation method of the storage location of the expanded data is different

In the rearrangement of the stored data after expansion, JDK1.7 shuffles the location of all data and then recalculates it based on the hash(key). After JDK1.8, the original data subscript is doubled. For loops. The calculated new subscript position can only be the original subscript position or the original subscript position plus the original capacity position.

Well, the process of the underlying implementation of the Map interface and HashMap, as well as the core questions referred to in the interview, will be analyzed with everyone here!

Click to follow to learn about Huawei Cloud's fresh technology for the first time~

In-depth analysis of the underlying implementation architecture of HashMap

1. What is the relationship between the Map interface and the List interface?

2. What are the commonly used implementation classes for Map?

3. Please explain the put process of HashMap?

4. In what order is the data stored in the linked list?

5. How is the Hash(key) method implemented?

6. Why is the capacity of HashMap always a multiple of 2?

7. How to resolve Hash conflicts?

The underlying implementation of Hash(key)

Low-level implementation of put(key, value) method

8. How does HashMap expand?

9. How are the elements stored after expansion?

10. Comparison of the implementation of HashMap between JDK1.7 and JDK1.8

华为云开发者联盟

引用和评论

华为云开发者联盟入选 2023 中国技术品牌影响力企业榜，深耕开发者生态

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性