2

1 Introduction

The origin is an OOM of a certain customer's online environment, and it is said that this kind of strange OOM has been the second time. First describe the relevant information of OOM:

  1. Heap memory 13G;
  2. Use the G1 garbage collector;
  3. The dump file automatically generated after OOM is 7G;
  4. The general function of the application is to consume kafka data, and then send it back to kafka after some business processing [single message is said to be about 32M]

Customer doubts:

  1. The heap memory is 13G, and the dump file is 7G. Obviously, the memory is enough. Why is it OOM?

Due to online environment problems, you cannot replay and debug at will, and only a memory dump file can be analyzed.

2. Online Dump file analysis

2.1 Problem point 1 - kafka's RecordAccumulator takes up a lot of memory

There is no screenshot here (forgot), brain supplement 😅. It was once suspected that the OOM caused by the local cache of kafka. In general RecordAccumulator and KafkaProducer is a one-to-one correspondence [ KafkaProducer created in the number of construction lines at each initialization RecordAccumulator ] . But the customer said that KafkaProducer exists in ThreadLocal and will not be created unlimitedly, that is, RecordAccumulator will not be created unlimitedly.

Keep the possibility of this problem first (after all RecordAccumulator the number of instances is nearly 140, which is not normal).

After a rough search DefaultKafkaProducerFactory to see if there is any mystery in this factory class. I really found something: the general situation is that in the callback method of sending a message, if an exception is thrown when sending a message❌, the related KafkaProducer instance will be closed (physically closed), and it will be recreated when needed later.

This can explain why KafkaProducer does not match the number of threads. But RecordAccumulator the problem of too many instances is still unexplainable, and it may be necessary to dig through the source code 😤.

About DefaultKafkaProducerFactory refer to this blog: https://blog.csdn.net/PacosonSWJTU/article/details/121306370

2.2 Problem 1 - OOM information in the thread stack

It is found that the business thread stack of kafka producer has OOM content:
image.png

2.3 Problem 2 - There is information that kafka message failed to send in the thread stack

  • A RecordTooLargeException error occurs in the kafka producer callback function

image.png

  • kafka producer starts to record error logs

image.png

2.4 Problem point 3 - RecordTooLargeException

image.png

After analyzing a circle of memory dumps, I gave myself a bold guess, whether the actual heap memory was enough, but some other condition was triggered, causing the display to be thrown OutOfMemoryError error.

With this doubt, I still focused on the thread stack, and after a closer look, I suddenly realized something. In the thread stack are the OOM errors thrown during the process of printing the error log. Moreover, in the process of constructing the log message, string concatenation is performed, and then the array copy caused by the expansion of StringBuilder is performed, and then OOM. So far, I have shifted my attention to StringBuilder .

3. Expansion of StringBuilder

With the above conjectures, I carefully read the source code and expansion mechanism of StringBuilder in append() .

In StringBuilder , there is a character array called value [declared in the parent class AbstractStringBuilder], which is used to store the string content (also in the String class).
image.png

StringBuilder.append() method will eventually call the java.lang.AbstractStringBuilder#append(java.lang.String) method of the parent class, the source code is as follows:

 public AbstractStringBuilder append(String str) {
        if (str == null)
            return appendNull();
        // 获取要追加的字符串长度
        int len = str.length();
        // 检查StringBuilder中字符数组的容量,以确保可以追加成功
        ensureCapacityInternal(count + len);
        str.getChars(0, len, value, count);
        count += len;
        return this;
    }

Next look at ensureCapacityInternal() method:
image.png

As you can see, if the length of the new string is greater than the length of the original string, an array copy will be performed. And note the note above: if the value overflows, OutOfMemoryError will be thrown. I feel that my conjecture is about to be confirmed, and I am excited.

Looking closely at newCapacity() method,

 /**
     * The maximum size of array to allocate (unless necessary).
     * Some VMs reserve some header words in an array.
     * Attempts to allocate larger arrays may result in
     * OutOfMemoryError: Requested array size exceeds VM limit
     */
    private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

    /**
     * Returns a capacity at least as large as the given minimum capacity.
     * Returns the current capacity increased by the same amount + 2 if
     * that suffices.
     * Will not return a capacity greater than {@code MAX_ARRAY_SIZE}
     * unless the given minimum capacity is greater than that.
     *
     * @param  minCapacity the desired minimum capacity
     * @throws OutOfMemoryError if minCapacity is less than zero or
     *         greater than Integer.MAX_VALUE
     */
    private int newCapacity(int minCapacity) {
        // overflow-conscious code
        // 新数组的长度为原来数组长度的2倍➕2
        int newCapacity = (value.length << 1) + 2;
        // 如果追加新字符所需最少的数组长度 大于 newCapacity,则将minCapacity的值赋给newCapacity
        // 这种情况只出现在追加的字符串长度是原字符串长度的2倍多的多
        if (newCapacity - minCapacity < 0) {
            newCapacity = minCapacity;
        }
        // 如果newCapacity大于Integer.MAX_VALUE - 8,则走hugeCapacity的逻辑
        // 否则直接返回newCapacity
        return (newCapacity <= 0 || MAX_ARRAY_SIZE - newCapacity < 0)
            ? hugeCapacity(minCapacity)
            : newCapacity;
    }

The value of MAX_ARRAY_SIZE is Integer.MAX_VALUE - 8. And the comment gives a hint: this is the maximum length of the array (unless necessary), allocating large arrays may cause OOM.
image.png

Look again hugeCapacity() method,

 private int hugeCapacity(int minCapacity) {
        // 如果追加新字符所需最少的数组长度 大于Integer.MAX_VALUE,直接OOM
        if (Integer.MAX_VALUE - minCapacity < 0) { // overflow
            throw new OutOfMemoryError();
        }
        /**
        * 如果追加新字符所需最少的数组长度 大于Integer.MAX_VALUE - 8,
        * 则新数组长度是minCapacity,否则数组长度就是MAX_ARRAY_SIZE的值
        * 在minCapacity大于Integer.MAX_VALUE一半的时候的最近一次扩容,
        * 新数组长度会直接使用MAX_ARRAY_SIZE
        **/
        return (minCapacity > MAX_ARRAY_SIZE)
            ? minCapacity : MAX_ARRAY_SIZE;
    }
  • The key points are summarized in one sentence: if the string length is greater than Integer.MAX_VALUE it will directly OOM .

Here comes the question again:

  1. The 32M message, no matter how the expansion, will not exceed the value of Integer.MAX_VALUE ;
  2. Look at the demo below, this Nima is too different from Integer.MAX_VALUE ;
    image.png

Don't panic, let's continue (hand cranking the wheelchair on the road) 🦽.

Let’s talk about the conclusion first, this is actually related to the currently available heap memory .

4. Array object allocation

  • Put a set of data first: Integer.MAX_VALUE A character array of size takes about 4G heap memory (2147483646*2/1024/1024/1024).

Let's take a look at the two experiments I did locally, and review the GC log by the way.

  1. Startup parameters (GC is not configured, jdk8 defaults to PS garbage collection): -Xmx8g -Xms8g -XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xloggc:gc.log

image.png

✨GC logs are as follows:

 Java HotSpot(TM) 64-Bit Server VM (25.281-b09) for bsd-amd64 JRE (1.8.0_281-b09), built on Dec  9 2020 12:44:49 by "java_re" with gcc 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)
Memory: 4k page, physical 33554432k(1613060k free)

/proc/meminfo:

CommandLine flags: -XX:InitialHeapSize=8589934592 -XX:MaxHeapSize=8589934592 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC 
{Heap before GC invocations=1 (full 0):
 PSYoungGen      total 2446848K, used 1436457K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 68% used [0x0000000715580000,0x000000076d04a5f8,0x0000000795600000)
  from space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
  to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
 ParOldGen       total 5592576K, used 3538944K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 63% used [0x00000005c0000000,0x0000000698000020,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.302: [GC (Allocation Failure) [PSYoungGen: 1436457K->1444K(2446848K)] 4975401K->3540396K(8039424K), 0.0133225 secs] [Times: user=0.16 sys=0.00, real=0.01 secs] 
Heap after GC invocations=1 (full 0):
 PSYoungGen      total 2446848K, used 1444K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x00000007957691e0,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 3538952K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 63% used [0x00000005c0000000,0x0000000698002020,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=2 (full 0):
 PSYoungGen      total 2446848K, used 1444K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x00000007957691e0,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 3538952K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 63% used [0x00000005c0000000,0x0000000698002020,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.316: [GC (Allocation Failure) [PSYoungGen: 1444K->1491K(2446848K)] 3540396K->3540451K(8039424K), 0.0185590 secs] [Times: user=0.23 sys=0.00, real=0.02 secs] 
Heap after GC invocations=2 (full 0):
 PSYoungGen      total 2446848K, used 1491K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x00000007aab00000,0x00000007aac74db8,0x00000007c0000000)
  to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
 ParOldGen       total 5592576K, used 3538960K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 63% used [0x00000005c0000000,0x0000000698004020,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=3 (full 1):
 PSYoungGen      total 2446848K, used 1491K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x00000007aab00000,0x00000007aac74db8,0x00000007c0000000)
  to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
 ParOldGen       total 5592576K, used 3538960K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 63% used [0x00000005c0000000,0x0000000698004020,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.334: [Full GC (Allocation Failure) [PSYoungGen: 1491K->0K(2446848K)] [ParOldGen: 3538960K->2360557K(5592576K)] 3540451K->2360557K(8039424K), [Metaspace: 3918K->3918K(1056768K)], 0.2781291 secs] [Times: user=2.89 sys=0.16, real=0.28 secs] 
Heap after GC invocations=3 (full 1):
 PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
  to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
 ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=4 (full 1):
 PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
  to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
 ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.613: [GC (Allocation Failure) [PSYoungGen: 0K->0K(2446848K)] 2360557K->2360557K(8039424K), 0.0177683 secs] [Times: user=0.22 sys=0.00, real=0.01 secs] 
Heap after GC invocations=4 (full 1):
 PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=5 (full 2):
 PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.630: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(2446848K)] [ParOldGen: 2360557K->2360504K(5592576K)] 2360557K->2360504K(8039424K), [Metaspace: 3918K->3918K(1056768K)], 0.0095262 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
Heap after GC invocations=5 (full 2):
 PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 2360504K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065012e1d8,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

Heap
 PSYoungGen      total 2446848K, used 52441K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 2% used [0x0000000715580000,0x00000007188b67e8,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 2360504K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065012e1d8,0x0000000715580000)
 Metaspace       used 3926K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 430K, capacity 460K, committed 512K, reserved 1048576K

It can be seen that after the last Full GC, the character array was moved to the old age. At this time, the old age occupies 2360504K, which is exactly the size of the character array with a length of 1207959551. At this time, if you want to expand the capacity again, the new array length is Integer.MAX_VALUE - 8 , that is, an array of nearly 4G needs to be re-allocated in the heap memory. However, whether it is the young generation or the old generation, the 4G size cannot be accommodated, so Also OOM.

OOM information: OutOfMemoryError: Java heap space , which is different from the OOM of 16g heap memory below, which means that there is a problem of insufficient memory when the object actually allocates memory.
image.png
  1. Startup parameters (GC is not configured, jdk8 default PS garbage collection): -Xmx16g -Xms16g -XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xloggc:gc.log
    [eden: 5588992K ParOldGen: 11185152K]

image.png

✨GC logs are as follows:

 Java HotSpot(TM) 64-Bit Server VM (25.281-b09) for bsd-amd64 JRE (1.8.0_281-b09), built on Dec  9 2020 12:44:49 by "java_re" with gcc 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)
Memory: 4k page, physical 33554432k(942232k free)

/proc/meminfo:

CommandLine flags: -XX:InitialHeapSize=17179869184 -XX:MaxHeapSize=17179869184 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC 
{Heap before GC invocations=1 (full 0):
 PSYoungGen      total 4893184K, used 2788885K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 66% used [0x000000066ab00000,0x0000000714e85418,0x000000076ab00000)
  from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
  to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
 ParOldGen       total 11185152K, used 10747904K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 96% used [0x00000003c0000000,0x0000000650000010,0x000000066ab00000)
 Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
18.887: [GC (Allocation Failure) [PSYoungGen: 2788885K->1488K(4893184K)] 13536789K->10749400K(16078336K), 0.0556800 secs] [Times: user=0.39 sys=0.21, real=0.05 secs] 
Heap after GC invocations=1 (full 0):
 PSYoungGen      total 4893184K, used 1488K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ac74010,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 10747912K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 96% used [0x00000003c0000000,0x0000000650002010,0x000000066ab00000)
 Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=2 (full 1):
 PSYoungGen      total 4893184K, used 1488K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ac74010,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 10747912K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 96% used [0x00000003c0000000,0x0000000650002010,0x000000066ab00000)
 Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
18.943: [Full GC (Ergonomics) [PSYoungGen: 1488K->0K(4893184K)] [ParOldGen: 10747912K->4195565K(11185152K)] 10749400K->4195565K(16078336K), [Metaspace: 3914K->3914K(1056768K)], 0.9751963 secs] [Times: user=2.33 sys=4.18, real=0.98 secs] 
Heap after GC invocations=2 (full 1):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 4195565K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 37% used [0x00000003c0000000,0x00000004c013b720,0x000000066ab00000)
 Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=3 (full 1):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
20.945: [GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] 8389869K->8389869K(16078336K), 0.0296937 secs] [Times: user=0.38 sys=0.00, real=0.03 secs] 
Heap after GC invocations=3 (full 1):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
  to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
 ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=4 (full 1):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
  to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
 ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
20.975: [GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] 8389869K->8389869K(16078336K), 0.0293211 secs] [Times: user=0.37 sys=0.00, real=0.04 secs] 
Heap after GC invocations=4 (full 1):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=5 (full 2):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
21.019: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] [ParOldGen: 8389869K->4195332K(11185152K)] 8389869K->4195332K(16078336K), [Metaspace: 3916K->3916K(1056768K)], 0.4611617 secs] [Times: user=5.40 sys=0.04, real=0.46 secs] 
Heap after GC invocations=5 (full 2):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 4195332K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 37% used [0x00000003c0000000,0x00000004c0101090,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=6 (full 2):
 PSYoungGen      total 4893184K, used 83886K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 2% used [0x000000066ab00000,0x000000066fceb868,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
21.963: [GC (Allocation Failure) [PSYoungGen: 83886K->0K(4893184K)] 8473522K->8389636K(16078336K), 0.0356100 secs] [Times: user=0.46 sys=0.00, real=0.04 secs] 
Heap after GC invocations=6 (full 2):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
  to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
 ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=7 (full 2):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
  to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
 ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
21.999: [GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] 8389636K->8389636K(16078336K), 0.0234444 secs] [Times: user=0.30 sys=0.00, real=0.02 secs] 
Heap after GC invocations=7 (full 2):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=8 (full 3):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
22.023: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] [ParOldGen: 8389636K->4195323K(11185152K)] 8389636K->4195323K(16078336K), [Metaspace: 3916K->3916K(1056768K)], 0.4557171 secs] [Times: user=5.56 sys=0.04, real=0.46 secs] 
Heap after GC invocations=8 (full 3):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 4195323K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 37% used [0x00000003c0000000,0x00000004c00fedf8,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=9 (full 3):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
22.976: [GC (Allocation Failure) [PSYoungGen: 0K->0K(5590528K)] 8389627K->8389627K(16775680K), 0.0401370 secs] [Times: user=0.52 sys=0.00, real=0.04 secs] 
Heap after GC invocations=9 (full 3):
 PSYoungGen      total 5590528K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 0% used [0x000000066ab00000,0x000000066ab00000,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=10 (full 3):
 PSYoungGen      total 5590528K, used 4194304K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 75% used [0x000000066ab00000,0x000000076ab00008,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
25.776: [GC (Allocation Failure) --[PSYoungGen: 4194304K->4194304K(5590528K)] 12583931K->12583931K(16775680K), 0.0263122 secs] [Times: user=0.32 sys=0.00, real=0.02 secs] 
Heap after GC invocations=10 (full 3):
 PSYoungGen      total 5590528K, used 4194304K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 75% used [0x000000066ab00000,0x000000076ab00008,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=11 (full 4):
 PSYoungGen      total 5590528K, used 4194304K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 75% used [0x000000066ab00000,0x000000076ab00008,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
25.802: [Full GC (Ergonomics) [PSYoungGen: 4194304K->0K(5590528K)] [ParOldGen: 8389627K->4195328K(11185152K)] 12583931K->4195328K(16775680K), [Metaspace: 3916K->3916K(1056768K)], 0.8243356 secs] [Times: user=1.74 sys=1.38, real=0.83 secs] 
Heap after GC invocations=11 (full 4):
 PSYoungGen      total 5590528K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 0% used [0x000000066ab00000,0x000000066ab00000,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 4195328K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 37% used [0x00000003c0000000,0x00000004c0100008,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

Heap
 PSYoungGen      total 5590528K, used 4343344K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 77% used [0x000000066ab00000,0x0000000773c8c0c0,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 4195328K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 37% used [0x00000003c0000000,0x00000004c0100008,0x000000066ab00000)
 Metaspace       used 3924K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 430K, capacity 460K, committed 512K, reserved 1048576K

It can be seen that the old generation occupies 4195328K, which is the size of the character array before expansion, and the young generation eden area is 4194304K, which is the size of the new array with a length of 2147483646 after expansion. Therefore, after several times of Full GC finishing on the 16G heap, the top grid expansion of StringBuilder can finally be achieved.

It should be noted that ⚠️: There is only one StringBuilder in the demo, and there are bound to be many other objects in the actual environment, so they basically do not exist
StringBuilder The case of top grid expansion. More often, when StringBuilder expands, it may be accompanied by the allocation of other objects, resulting in
Arrays#copyOf method throws OOM.

OOM information: Requested array size exceeds VM limit , this is actually an OOM exception thrown before the actual memory allocation:
image.png

Description Before actually allocating memory, the check found that the length of the array was out of bounds.
🚗About array out of bounds: I was lazy and debugged the hotspot source code directly. The maximum length of the array under 64-bit jdk8 is Integer.MAX_VALUE-2, which is 2147483645 [operating systems with different digits, different array element types, and whether to enable pointer compression, etc. , will result in a change in the maximum available length of the array (max_array_length).

in conclusion:

  1. StringBuilder During expansion, the character array will be copied. When the length of the array is less than half of Integer.MAX_VALUE , a new array will be created for copying, and the new array length is the same as the original array. 2 times ➕2 [memory occupation at this time: original array size ➕ 2* original array size];
  2. If the array is more than half of Integer.MAX_VALUE the latest expansion, the new array length directly uses MAX_ARRAY_SIZE (Integer.MAX_VALUE - 8);
  3. In the case of the default ratio of 16G heap memory, the character array can be successfully expanded to the length of Integer.MAX_VALUE (⚠️ Note that the demo here is only StringBuilder object, there must be many other objects in the actual situation object, so there are few StringBuilders in the actual environment that can achieve top-level expansion);
  4. In the process of expansion, if the size of the new array cannot be allocated in the eden area, a Young GC will be performed. If it still cannot be allocated, it will be allocated directly in the old age; if the old age cannot be allocated, a Full GC will be performed. Full GC], and then OOM is still not allocated (the place thrown is when java.util.Arrays#copyOf(char[], int) new char[newLength] creates a new array);

5. Final Analysis

Through the above experiments and analysis, the following conclusions can be drawn about the OOM problem on customer online:

  1. OOM is thrown when printing the error log build StringBuilder ;
  2. The expansion of StringBuilder did not reach the size of Integer.MAX_VALUE, because java.util.Arrays#copyOf(char[], int) throws OOM, the old original array memory size is only 73728K (the new array size is 147456K when expanding) , far less than the size of Integer.MAX_VALUE.
  3. If the message is too large, the kafka message fails to be sent, which will cause the KafkaProducer to be closed and then recreated [Logic of DefaultKafkaProducerFactory]. Continue to retain doubts about the excessive number of RecordAccumulator instances, and need to continue in-depth analysis through the source code;
  4. The reason for OOM is that, along with the memory occupation of other objects in the heap memory, it is caused by the inability to allocate new array objects when StringBuilder expands the array copy [need to combine GC log in-depth analysis];

开翻挖掘机
225 声望26 粉丝

不忘初心❤️,且行且思考