How is the value of hashCode generated? Object memory address?

Like it first, then watch it, develop a good habit

First look at one of the simplest printing

System.out.println(new Object());

Will output the fully qualified class name of the class and a string of strings:

java.lang.Object@6659c656

@ What is behind the sign? Is it the hashcode or the memory address of the object? Or some other value?

In fact, @ follows 060c99372a45f5 is only the hashcode value of the object, the hashcode displayed in hexadecimal, just to verify:

Object o = new Object();
int hashcode = o.hashCode();
// toString
System.out.println(o);
// hashcode 十六进制
System.out.println(Integer.toHexString(hashcode));
// hashcode
System.out.println(hashcode);
// 这个方法，也是获取对象的 hashcode；不过和 Object.hashcode 不同的是，该方法会无视重写的hashcode
System.out.println(System.identityHashCode(o));

Output result:

java.lang.Object@6659c656
6659c656
1717159510
1717159510

How is the hashcode of the object generated? Is it really the memory address?

content of this article is based on JAVA 8 HotSpot

The generation logic of hashCode

The logic of generating hashCode in JVM is not that simple. It provides several strategies, and each strategy generates different results.

core method that generates hashCode in the openjdk source code:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = intptr_t(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = intptr_t(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

It can be found from the source code that the generation strategy is controlled by a hashCode , which defaults to 5; and the definition of this variable is in another header file :

  product(intx, hashCode, 5,                                            
         "(Unstable) select hashCode generation algorithm" )

The source code is very clear... (unstable) Choose the algorithm generated by hashCode, and the definition here can be controlled by the jvm startup parameters. First, confirm the default value:

java -XX:+PrintFlagsFinal -version | grep hashCode

intx hashCode                                  = 5                                   {product}
openjdk version "1.8.0_282"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_282-b08)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.282-b08, mixed mode)

So we can configure different hashcode generation algorithms through the startup parameters of jvm, and test the generation results under different algorithms:

-XX:hashCode=N

Now let's take a look at the different performance of each hashcode generation algorithm.

Algorithm 0

if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random();
  }

This generation algorithm uses a random number generation strategy of Park-Miller RNG But it should be noted that...this random algorithm will spin wait when it is high concurrency

Algorithm 1

if (hashCode == 1) {
    // This variation has the property of being stable (idempotent)
    // between STW operations.  This can be useful in some of the 1-0
    // synchronization schemes.
    intptr_t addrBits = intptr_t(obj) >> 3 ;
    value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
}

This algorithm is really the memory address of the object. It directly obtains the intptr_t type pointer of the object.

Algorithm 2

if (hashCode == 2) {
    value = 1 ;            // for sensitivity testing
}

There is no need to explain this... It is fixed to return 1, which should be used for internal test scenarios.

Interested students, you can try -XX:hashCode=2 to turn on this algorithm and see if the hashCode results are all 1.

Algorithm 3

if (hashCode == 3) {
    value = ++GVars.hcSequence ;
}

This algorithm is also very simple, self-increment, all the hashCode of the object uses this self-increment variable. Let's try the effect:

System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());

//output
java.lang.Object@144
java.lang.Object@145
java.lang.Object@146
java.lang.Object@147
java.lang.Object@148
java.lang.Object@149

Sure enough, it is self-increasing... It's interesting

Algorithm 4

if (hashCode == 4) {
    value = intptr_t(obj) ;
}

There is not much difference between the first algorithm here, and both return the object address, but the first algorithm is a variant.

Algorithm 5

The last one, is also the default generation algorithm , which is used when the hashCode configuration is not equal to 0/1/2/3/4:

else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

Here is a hash value obtained by the exclusive OR (XOR) operation of the current state value. Compared with the previous auto-increment algorithm and random algorithm, it is more efficient, but the repetition rate should be relatively higher, but what is the hashCode repetition? What about...

Originally, jvm does not guarantee that this value will not be repeated. For example, the chain address method in HashMap is used to resolve hash conflicts.

to sum up

hashCode can be a memory address or not a memory address, or even a constant of 1 or an auto-increment number! You can use any algorithm you want!

Originality is not easy, unauthorized reprinting is prohibited. If my article is helpful to you, please like/favorite/follow to encourage and support it ❤❤❤❤❤❤

How is the value of hashCode generated? Object memory address?

The generation logic of hashCode

Algorithm 0

Algorithm 1

Algorithm 2

Algorithm 3

Algorithm 4

Algorithm 5

to sum up

空无

引用和评论

PDF 那些事

Spring-@Configuration注解简析

单元测试-PowerMock

还在用命令行监控服务器？试试这款监控工具吧，直观又易用！

实现钉钉登录第三方网站功能

springboot初始化数据库+druid解密

探索 Java 线程的创建