头图

[Note] This article is translated from: Guide to hashCode() in Java | Baeldung

Java hashCode() guide

1 Overview

Hash is a basic concept of computer science.
In Java, efficient hashing algorithms support some of the most popular collections, such as HashMap (check out this in-depth article) and HashSet.
In this tutorial, we will focus on how hashCode() works, how it is handled in a collection, and how to implement it correctly.

2. Use hashCode() in the data structure

In some cases, the simplest collective operation may be inefficient.
For example, this triggers a linear search, which is very inefficient for large lists:

List<String> words = Arrays.asList("Welcome", "to", "Baeldung");
if (words.contains("Baeldung")) {
    System.out.println("Baeldung is in the list");
}

Java provides many data structures to deal with this problem specifically. For example, several Map interface implementations are hash tables.
When using a hash table, use the hashCode() method to calculate the hash value of a given key . Then they use this value internally to store the data so that the access operation is more efficient.

3. Understand how hashCode() works

In short, hashCode() returns an integer value generated by the hash algorithm.
Equal objects (according to their equals()) must return the same hash code. Different objects do not need to return a different hash code .
The general contract statement for hashCode():

  • During the execution of a Java application, as long as it is called multiple times on the same object, hashCode() must always return the same value, provided that the information used in the equals comparison on the object has not been modified. This value does not need to be consistent from one execution of the application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, calling the hashCode() method on each of the two objects must produce the same value.
  • If the two objects are not equal according to the equals(java.lang.Object) method, calling the hashCode method on each of the two objects does not need to produce different integer results. However, developers should be aware that generating different integer results for unequal objects can improve the performance of hash tables.
"When reasonably feasible, the hashCode() method defined by the class Object does return different integers for different objects. (This is usually achieved by converting the internal address of the object to an integer, but the JavaTM programming language does not require this implementation technology.)"

4. A simple hashCode() implementation

A simple hashCode() implementation that fully conforms to the above convention is actually very simple.
To demonstrate this, we will define a sample User class to override the default implementation of this method:

public class User {

    private long id;
    private String name;
    private String email;

    // standard getters/setters/constructors
    @Override
    public int hashCode() {
        return 1;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o)
            return true;
        if (o == null)
            return false;
        if (this.getClass() != o.getClass())
            return false;
        User user = (User) o;
        return id == user.id && (name.equals(user.name) && email.equals(user.email));
    }
    // getters and setters here
}

The User class provides custom implementations for equals() and hashCode() that fully comply with their respective contracts. More importantly, there is nothing illegal for hashCode() to return any fixed value.
However, this implementation downgrades the functionality of the hash table to essentially zero, because every object will be stored in the same single bucket.
In this case, the hash table lookup is performed linearly and does not bring us any real advantages. We will discuss this in detail in Section 7.

5. Improve hashCode() implementation

Let's improve the current hashCode() implementation by including all the fields of the User class, so that it can produce different results for unequal objects:

@Override
public int hashCode() {
    return (int) id * name.hashCode() * email.hashCode();
}

This basic hash algorithm is definitely much better than the previous one. This is because it only calculates the hash code of the object by multiplying the hash code of the name and email fields with the id.
Generally speaking, we can say that this is a reasonable hashCode() implementation, as long as we keep the equals() implementation consistent with it. 6. Standard hashCode() implementation

The better the hash algorithm we use to calculate the hash code, the better the performance of the hash table.
Let's look at a "standard" implementation that uses two prime numbers to add more uniqueness to the calculated hash code:

@Override
public int hashCode() {
    int hash = 7;
    hash = 31 * hash + (int) id;
    hash = 31 * hash + (name == null ? 0 : name.hashCode());
    hash = 31 * hash + (email == null ? 0 : email.hashCode());
    return hash;
}

Although we need to understand the role played by the hashCode() and equals() methods, we don't have to implement them from scratch every time. This is because most IDEs can generate custom hashCode() and equals() implementations. Starting with Java 7, we have an Objects.hash() utility method for comfortable hashing:

Objects.hash(name, email)

IntelliJ IDEA generates the following implementation:

@Override
public int hashCode() {
    int result = (int) (id ^ (id >>> 32));
    result = 31 * result + name.hashCode();
    result = 31 * result + email.hashCode();
    return result;
}

Eclipse produced this:

@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((email == null) ? 0 : email.hashCode());
    result = prime * result + (int) (id ^ (id >>> 32));
    result = prime * result + ((name == null) ? 0 : name.hashCode());
    return result;
}

In addition to the above IDE-based hashCode() implementation, efficient implementations can also be automatically generated, such as using Lombok.
In this case, we need to add lombok-maven dependency in pom.xml:

<dependency>
  <groupId>org.projectlombok</groupId>
  <artifactId>lombok-maven</artifactId>
  <version>1.16.18.0</version>
  <type>pom</type>
</dependency>

Now it is enough to annotate the User class with @EqualsAndHashCode:

@EqualsAndHashCode
public class User {
    // fields and methods here
}

Similarly, if we want Apache Commons Lang's HashCodeBuilder class to generate hashCode() implementation for us, we include the commons-lang Maven dependency in the pom file:

<dependency>
  <groupId>commons-lang</groupId>
  <artifactId>commons-lang</artifactId>
  <version>2.6</version>
</dependency>

hashCode() can be implemented like this:

public class User {
    public int hashCode() {
        return new HashCodeBuilder(17, 37).
        append(id).
        append(name).
        append(email).
        toHashCode();
    }
}

Generally speaking, there is no universal method when implementing hashCode(). We highly recommend reading Effective Java by Joshua Bloch. It provides an exhaustive list of guidelines for implementing efficient hashing algorithms.
Note that all these implementations use the number 31 in some form. This is because 31 has a very good attribute. Its multiplication can be replaced by bit shift, which is faster than standard multiplication:

31 * i == (i << 5) - i

7. Dealing with hash conflicts

The inherent behavior of hash tables brings about a related aspect of these data structures: even if an effective hash algorithm is used, two or more objects may have the same hash code, even if they are not equal. Therefore, even if they have different hash table keys, their hash codes will point to the same bucket.

This situation is usually called a hash collision, and there are many ways to deal with it, each of which has its advantages and disadvantages. Java's HashMap uses a separate link method to handle conflicts:
**"When two or more objects point to the same bucket, they are just stored in a linked list. In this case, the hash table is an array of linked lists, and each object with the same hash value is attached Go to the bucket index in the linked list.
In the worst case, several buckets will be bound to a linked list, and the retrieval of objects in the linked list will be executed linearly. "**
The hash collision method simply illustrates the importance of implementing hashCode() efficiently.
Java 8 brings interesting enhancements to the HashMap implementation. If the bucket size exceeds a certain threshold, the tree map replaces the linked list. This allows for O(logn) lookups instead of pessimistic O(n).

8. Create a simple application

Now we will test the functions implemented by the standard hashCode().
Let's create a simple Java application, add some User objects to the HashMap and use SLF4J to log messages to the console every time the method is called.
This is the entry point of the sample application:

public class Application {

    public static void main(String[] args) {
        Map<User, User> users = new HashMap<>();
        User user1 = new User(1L, "John", "john@domain.com");
        User user2 = new User(2L, "Jennifer", "jennifer@domain.com");
        User user3 = new User(3L, "Mary", "mary@domain.com");

        users.put(user1, user1);
        users.put(user2, user2);
        users.put(user3, user3);
        if (users.containsKey(user1)) {
            System.out.print("User found in the collection");
        }
    }
}

This is the implementation of hashCode():

public class User {

    // ...

    public int hashCode() {
        int hash = 7;
        hash = 31 * hash + (int) id;
        hash = 31 * hash + (name == null ? 0 : name.hashCode());
        hash = 31 * hash + (email == null ? 0 : email.hashCode());
        logger.info("hashCode() called - Computed hash: " + hash);
        return hash;
    }
}

It should be noted here that every time an object is stored in a hash map and checked using the containsKey() method, hashCode() will be called and the calculated hash code will be printed to the console:

[main] INFO com.baeldung.entities.User - hashCode() called - Computed hash: 1255477819
[main] INFO com.baeldung.entities.User - hashCode() called - Computed hash: -282948472
[main] INFO com.baeldung.entities.User - hashCode() called - Computed hash: -1540702691
[main] INFO com.baeldung.entities.User - hashCode() called - Computed hash: 1255477819
User found in the collection

9. Conclusion

Obviously, generating an efficient hashCode() implementation usually requires a mix of mathematical concepts (ie prime numbers and arbitrary numbers), logic, and basic mathematical operations.
In any case, we can effectively implement hashCode() without using these techniques. We just need to make sure that the hashing algorithm generates different hash codes for unequal objects, and that it is consistent with the implementation of equals().
As always, all the code examples shown in this article can be on GitHub at .


信码由缰
65 声望8 粉丝

“码”界老兵,分享程序人生。