Java String常量池

1. String实例的初始化

String类型的初始化在Java中分为两类：

一类是通过双引号包裹一个字符来初始化；
另一类是通过关键字new 像一个普通的对象那样初始化一个String实例。

前者在constant pool中开辟一个常量，并返回相应的引用，而后者是在heap中开辟一个常量，再返回相应的对象。所以，两者的reference肯定是不同的：

public static void main(String... args) {
    String s1 = "abcd";
    String s2 = new String("abcd");
    System.out.println(s1 == s2);   // false
}

而constant pool中的常量是可以被共享用于节省内存开销和创建时间的开销（这也是引入constant pool的原因）。例如：

public static void main(String... args) {
    String s1 = "abcd";
    String s2 = "abcd";
    System.out.println(s1 == s2);   // true
}

结合这两者，其实还可以回答另外一个常见的面试题目：

public static void main(String... args) {
    String s = new String("abcd");
}

这句话创建了几个对象？

首先毫无疑问，"abcd"本身是一个对象，被放于常量池。而由于这里使用了new关键字，所以s得到的对象必然是被创建在heap里的。所以，这里其实一共创建了2个对象。

但tricky的部分是，如果在这个函数被调用前的别的地方，已经有了"abcd"这个字符串，那么它就事先在constant pool中被创建了出来。此时，这里就只会创建一个对象，即创建在heap的new String("abcd")对象。

但String的内存分配，远远没有这么简单。对于String的拼接，需要做更深入的理解和思考。

2. String的拼接

下面看一个问题：

public static void main(String... args) {
    String s1 = "hell" + "o";
    String s2 = "h" + "ello";
    System.out.println(s1 == s2);   // true
    System.out.println(s1 == "hello");  // true
    System.out.println(s2 == "hello");  // true
    System.out.println("hello" == "hello"); // true
    
    // ------------------------
    
    String c1 = "ab";
    String c2 = c1 + "c";
    System.out.println(c2 == "abc");  // false
    
}

前面四个输出其实很容易理解，最终的结果，都指向了constant pool里的一个常量"hello"。但奇怪的是，最后一个输出也是"abc"，并且还都是用指向constant pool中常量的变量来做的拼接，但却得到了一个false的结果。

原因是，Java的String拼接有两个规则：

对于拼接的值，如果都是双引号包裹字符串的形式，则将结果放于constant pool，如果constant pool已经有这个值了，则直接返回这个已有值。
而如果拼接的值中，有一个是非双引号包裹字符串的形式，则从heap中开辟一个新的区域保存常量。也即是使用了String变量来做拼接的情况。

在这样的大原则下，对声明为final的String变量需要做更多的考虑：

如果String变量被声明为final时就已经被赋值，则它被编译器自动处理为常量，因而它就会被当作常量池的变量来处理。

public class ConstantPool {
    public static final String s1 = "ab";
    public static final String s2 = "cd";
    
    public static void main(String... args) {
        String s = s1 + s2;
        String ss = "abcd";
        
        System.out.println(s == ss);  // true
    }
}

而如果声明为final的字符串没有在声明时被赋值，则编译器无法事先决定它的准确值，所以依旧会把它当作一个变量来处理。

public class ConstantPool {
    public static final String s1;
    public static final String s2;
    
    static {
        s1 = "ab";
        s2 = "cd";
    }
    
    public static void main(String... args) {
        String s = s1 + s2;
        String ss = "abcd";
        
        System.out.println(s == ss);  // false
    }
}

3. intern()方法

String.intern()方法，可以在runtime期间将常量加入到常量池（constant pool）。它的运作方式是：

如果constant pool中存在一个常量恰好等于这个字符串的值，则inter()方法返回这个存在于constant pool中的常量的引用。
如果constant pool不存在常量恰好等于这个字符串的值，则在constant pool中创建一个新的常量，并将这个字符串的值赋予这个新创建的在constant pool中的常量。intern()方法返回这个新创建的常量的引用。

示例：

public static void main(String... args) {
    String s1 = "abcd";
    String s2 = new String("abcd");

    /**
     * s2.intern() will first search String constant pool,
     * of which the value is the same as s2.
     */
    String s3 = s2.intern();
    // As s1 comes from constant pool, and s3 is also comes from constant pool, they're same.
    System.out.println(s1 == s3);
    // As s2 comes from heap but s3 comes from constant pool, they're different.
    System.out.println(s2 == s3); 
}

/**
 * Output:
 *     true
 *     false
 */

回顾到最开始的第一部分，为什么要引入intern()这个函数呢？就是因为，虽然"abcd"是被分配在constant pool里的，但是，一旦使用new String("abcd")就会在heap中新创建一个值为abcd的对象出来。试想，如果有100个这样的语句，岂不是就要在heap里创建100个同样值的对象？！这就造成了运行的低效和空间的浪费。

于是，如果引入了intern()它就会直接去constant pool找寻是否有值相同的String对象，这就极大地节省了空间也提高了运行效率。

Java String常量池

1. String实例的初始化

2. String的拼接

3. intern()方法

geekartt

引用和评论

ElasticSearch 索引的存储机制推演

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性