Out of Memory (OOM) refers to the existence of unrecoverable memory or excessive memory in the application system, which ultimately makes the memory used by the program running greater than the maximum memory that can be provided. At this time, the program will not run, the system will prompt the memory overflow, and sometimes the software will be closed automatically, restart the computer or release part of the memory and then the software can run normally, which is caused by system configuration, data flow, user code, etc. The memory overflow error of, even if the user re-executes the task, it is still unavoidable.
JVM OOM exceptions may occur in the following situations: Java heap overflow, virtual machine stack and local method stack overflow, method area and runtime constant pool overflow, native direct memory overflow . These situations are caused by different reasons.
In real business scenarios, the environment is often more complicated. Today, Dui Dui will take you to learn a few OOM problem troubleshooting actual cases, through the real cases recorded by several authors, remind yourself to avoid stepping on pits, and also review relevant knowledge points by the way.
1. Experience the troubleshooting and resolution process of an online CPU 100% and application OOM
Author: A Feiyun
https://heapdump.cn/article/1665572
Overview:
After receiving the application exception alarm, the author logged in to the server with the problem to check. When viewing the service log, he found that the service was OOM, and then used the Top command to check the resource occupancy status of each process in the system, and found that one process was using CPU The rate reached 300%, and then query the CPU usage of all threads in the process and save the stack data. According to the foregoing operations, after obtaining the GC information, thread stack, heap snapshot and other data of the service that has the problem, analyze it with the XElephant provided by the HeapDump community and find that the OOM is caused by InMemoryReporterMetrics, and further discover the zipkin version of the service that has the problem. Lower, upgrading it to solve the problem.
Highlights: Although what this article describes and solves is not a rare incurable disease, it has a clear thinking and a complete process. It also recommends a troubleshooting tool, which is suitable for novices to read and learn.
2. An exploration of OOM problem of containerized springboot program
Author: Xia Meng
https://heapdump.cn/article/1588447
Overview: The author was told that a containerized java program will have OOM problems every time it runs for a period of time. First, check the log and find no abnormality; then check the GC situation through JStat and find that the GC situation is normal but the ByteBuffer object occupies the highest (abnormal point 1) ; Then check the thread snapshot through JStack and found that too many kafka producers were created (abnormal point 2); finally, verify the conjecture by writing a Demo program, and determine that the problem is caused by the cyclical creation of Producer objects in the business code.
Highlights: The investigation process is clear and clear, the tools are used skillfully, and the verification process is fast and accurate.
3. Troubleshooting and analysis of Nginx OOM for a million-long connection pressure test
Author: Master Zhang Digging
https://heapdump.cn/article/433792
Overview:
In a stress test of millions of long connections, the author found that four Nginx 32C 128G frequently appeared OOM. After discovering the problem, I first checked the network connection status between Nginx and the client. First, I suspected that the jmeter client had limited processing capabilities and many messages were piled up at the transit Nginx. So I dumped the memory of nginx to check, and I was determined because of the cache. The memory increase caused by a large number of messages; then I checked the parameter configuration of Nginx and found that the value of proxy_buffers was set to be extremely large; and then simulated the impact of the inconsistent upstream and downstream sending and receiving speeds on the memory usage of Nginx. Finally, after setting proxy_buffering to off and reducing the value of proxy_buffer_size, Nginx's memory is stable.
Highlights: The author has a clear idea for investigation, is very skillful in tool use and parameter adjustment, and has a deep understanding of the underlying principles and source code. Both experience and attitude are worth learning and reference.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。