Notes: Following Cloud-Native Java

This article was first published on Mooring Nuggets : https://juejin.cn/user/1468603264665335

Version	date	Remark
1.0	2022.7.4	Article first published
1.1	2022.7.4	Modify content and title based on feedback

0. Preface

A while ago, I swiped Dr. Zhou Zhiming's video at station B. The theme is java in the cloud native era , and the main content is the challenges in the cloud native era and the countermeasures of the Java community. I saw this video two years ago and was impressed at the time. Now I also want to take a look at the progress of related projects and some details with you. This note will make a lot of reference to the content mentioned in the video. If the reader has seen the relevant video, you can skip this note.

In the video sharing, it is mentioned that there are two reasons for the contradiction between Java and cloud native:

Bear the brunt of Java's "write once, run anywhere" (Write Once, Run Anywhere) . It was a very good practice at the time, and directly started the prosperity of many managed languages. However, in the cloud-native era, everyone will choose to isolate the solution through the immutable infrastructure implemented by containers. Although the "build once, run anywhere" (Build Once, Run Anywhere) of containers and "Write Once, Run Anywhere" (Write Once, Run Anywhere) of Java are not the same level - containers can only provide environment compatibility and Limited platform independence (referring to the ABI compatibility above the system kernel function), but the server-side applications all run on Linux, so it is not harmful to the business.

The second is that Java is generally designed for long-term "giant tower" server-side applications :

The language structure of static type and dynamic link is conducive to multi-person collaborative development, allowing the software to reach a larger scale;
The most representative technical features of Java, such as just-in-time compiler, performance-guided optimization, and garbage collection subsystem, are all to facilitate long-running programs to enjoy the dividends of hardware-scale development.

However, in the era of microservices, it is advocated that services should be built around business capabilities (different languages are suitable for different business scenarios) rather than technologies, and no longer pursue consistency in implementation. A system is implemented by services implemented in different languages and different technical frameworks. The composition is completely reasonable. After the service split, it is likely that a single microservice will no longer need to deal with tens, hundreds of GB or even TB of memory. With a highly available service cluster, there is no need to pursue a single service to run 24/7 without interruption, and they can be interrupted and updated at any time. Not only that, but microservices place new requirements on image size, memory consumption, startup speed, and time to peak performance. In the past two years, the concept of Internet celebrity Serverless (and the derived Faas) has further increased the consideration of these factors, but these are just the weaknesses of Java: even small Java programs must carry heavy Rumtime (VM and StandLibrary) - based on the execution mechanism of the Java virtual machine, any Java program will have a fixed memory overhead and startup time, and the widely used dependency injection in the Java ecosystem further lengthens the startup time, making the cold start time of the container very short. Difficult to shorten.

Give two examples. There have been more than one failures in the software industry due to these weaknesses in Java. For example, Logstash written in JRuby was originally responsible for both the collector (Shipper) deployed on the node and the server (Master) specialized in conversion processing. Later, due to resource occupation, it was replaced by Elstaic.co with Golang's Filebeat. The functions of the Shipper section. Another example is Linkerd, a sidecar proxy written in the Scala language. As the proposer of the concept of service mesh, it was eventually replaced by Envoy. One of its main weaknesses is also the disadvantage caused by the resource consumption of the Java virtual machine.

1. The fire of change

1.1 Compile Native Code

Obviously, all problems can be solved if the bytecode is compiled directly into native code that can be separated from the Java virtual machine.

If it is possible to generate native programs that run away from the Java virtual machine, it will mean that the problem of long startup time can be completely solved, because there is no process of initializing the virtual machine and class loading at this time. It also means that the program can achieve the best performance immediately, because there is no just-in-time compiler runtime compilation at this time, and all code is compiled and optimized at compile time. In the same way, the heavy Runtime will not appear in the image.

It's not that Java didn't try to go down this road. From GCJ to Excelsior JET to the SubstrateVM module in GraalVM to the Leyden project established in mid-2020, all are moving towards the goal of Ahead-of-Time Compilation (AOT) to generate native programs. The biggest difficulty Java supports ahead-of-time compilation is that it is a dynamically linked language, which assumes that the code space of the program is open (Open World), allowing new classes to be loaded through the class loader at any time in the program, as the program's code space. part of the run. To perform ahead-of-time compilation, we must give up this part of the dynamic, assuming that the code space of the program is closed (Closed World), all the code to be run must be known at compile time.

This not only affects the normal operation of the class loader, except that it can no longer be dynamically loaded, reflection (through reflection can call methods that are agnostic at compile time), dynamic proxy, bytecode generation library (such as CGLib), etc. The function of generating new code at runtime is no longer available - if these basic capabilities are directly removed, Hello World can still run, most productivity tools cannot run, and most of the superstructures in the entire Java ecosystem will crashed down. Just list two cases: Flink's SQL API will parse the SQL and generate an execution plan. At this time, it will dynamically generate classes and load them into the code space through JavaCC; Spring also has a similar situation, when AOP generates relevant logic through dynamic proxy. At the time, the essence is to generate code and load it at Runtime.

To obtain useful ahead-of-time compilation capabilities, it is only possible to rely on the cooperation of the ahead-of-time compiler, the component class library, and the developer—refer to Quarkus.

Quarkus is exactly the same as our method above. Take Dependency Inject as an example: all the code to be run must be known at compile time, the relevant beans are deduced at compile time, and finally handed over to GraalVM to run.

1.2 Memory Access Efficiency Improvement

The Java just-in-time compiler is excellent at optimizing, but due to Java's "everything is an object" premise, it suffers from poor memory access performance when dealing with a range of small objects of different types. This is an important constraint that Java has been difficult to achieve in the fields of games and graphics processing, and it is also the original intention of Java to establish the Valhalla project.

Here is an example to illustrate this problem. If I want to describe a collection of several line segments in the space, the code defined in Java will be like this:

 public record Point(float x, float y, float z) {}
public record Line(Point start, Point end) {}
Line[] lines;

In the object-oriented memory layout, the purpose of the object identifier (Object Identity) is to allow its properties and behaviors to be referenced without exposing the object structure, which is the basis of polymorphism in object-oriented programming. In Java, a series of functions such as heap memory allocation and recycling, null value judgment, reference comparison, synchronization lock, etc. will involve object identifiers, and memory access also relies on object identifiers for chain processing . A collection of line segments", which will constitute the reference relationship as shown in the following figure in the heap memory:

After 25 years of development in computer hardware, although both memory and processor are improving, the Von Neumann Bottleneck between memory latency and processor execution performance has not only not decreased, but has continued to increase.” RAM Is the New Disk" has gradually become a reality from a mockery.

A memory access (loading the main memory data into the processor cache) takes about hundreds of clock cycles, while the execution of most simple instructions takes only one clock cycle. Therefore, in terms of program execution performance, if the compiler can reduce one memory access, it may be more effective than optimizing dozens or hundreds of other instructions.

Additional knowledge: Von Neumann bottleneck The memory latency of different processors (modern processors have integrated memory managers, used to be in Northbridge chips) is about 40-80 nanoseconds (ns, billionths of a second), Depending on the clock frequency, a clock cycle is about 0.2-0.4 nanoseconds. In such a short period of time, even light propagating in a vacuum can only travel about 10 centimeters.
The contradiction between data storage and processor execution speed is one of the main limitations of the von Neumann architecture. In 1977, the Turing Award winner John Backus proposed the concept of "von Neumann bottleneck" to describe this limitation. sex.

The Java compiler is indeed working hard to reduce memory access. Since JDK 6, HotSpot's real-time compiler has tried to do Scalar Replacement and Stack Allocations optimization through escape analysis. The basic principle is that if the analysis can pass , knowing that an object will not be passed outside the method, then you don't need to actually create a complete object layout in the object, you can bypass the object identifier, split it into basic primitive data types to create, or even create Allocate space directly in the stack memory (HotSpot does not do this), and destroy it along with the stack frame after the method is executed.

However, escape analysis is an interprocedural optimization, which is very time-consuming and difficult to deal with situations that are theoretically possible but not practical . That means it happens at Runtime. The same problem does not exist in C and C++. In the above scenario, the programmer only needs to define both Point and Line as struct. There is also struct in C#, which is implemented by relying on the value type of .NET. of. These languages solve these problems at compile time.

And Valhalla's goal is to provide similar value type support, providing a new keyword (inline), so that users can not need to expose the object to the outside of the method, do not need polymorphism support, do not need to use the object as a synchronization lock In this case, the class is identified as a value type. At this point the compiler can bypass the object identifier and allocate memory for the object in a flat, compact way.

Valhalla is still in the Preview stage. The progress can be seen here . Hope to use it in the next LTS version.

1.3 Coroutines

The Java language abstracts a unified threading interface that hides the thread differences of various operating systems, which used to be a major advantage over other programming languages. However, this was also the case.

Java's current mainstream thread model is a 1:1 model directly mapped to the operating system kernel, which is suitable for computationally intensive tasks. It does not need to do scheduling by itself, and it is also beneficial for a thread to run the entire processor core. But for I/O-intensive tasks, such as accessing disks and accessing databases, this model is expensive, mainly in memory consumption and context switching.

for example. The thread stack capacity of HotSpot on 64-bit Linux is 1MB by default, and the kernel metadata of the thread consumes an additional 2-16KB of memory, so the maximum number of threads in a single virtual machine is generally only set to 200 to 400. When programmers pour millions of requests into the thread pool, even if the system can handle it, the switching loss is considerable.

The goal of the Loom project is to make Java support an additional N:M threading model, not a direct replacement like the transition from green threads to kernel threads, nor a parameter that allows users to choose between two options like the HotSpot virtual machine on the Solaris platform. .

What Loom wants to do is a stackful coroutine. Multiple virtual threads can be mapped to the same physical thread and scheduled in user space. The stack capacity of each virtual thread can also be determined by the user. In addition, there are two important points:

Try to be compatible with all the original interfaces. This means that all original thread interfaces can be used as coroutines. But I think it's very difficult - if the code inside is transferred to the Native method, this Stack will be bound to this thread, after all, Coroutine is a user-mode thing.
Support for structured concurrency: In short, asynchronous code is written like synchronous code, which Go does very well. Nested callbacks are a real pain after all.

If the above content is disassembled and elaborated, it is basically:

coroutine scheduling;
Synchronization, mutual exclusion and communication of coroutines;
System call wrappers for coroutines, especially for network IO requests;
Adaptation of the coroutine stack.

Little knowledge: Each coroutine has its own dedicated coroutine stack. This mechanism that requires an auxiliary stack to run coroutines is called Stackful Coroutine; and the mechanism that runs coroutines on the main stack is called Stackless Coroutine.
Stackless Coroutine means:
Runtime: The activity record is placed on the stack of the main thread
On pause: keep active records in heap
can call other functions
Can only be suspended at the top level, not in sub-functions/sub-coroutines
And Stackfull Coroutine means:
Runtime: separate runtime stack
Can pause at any level of the call stack
The lifetime can outlive its creator
Can run from one thread to another

Therefore, a complete coroutine library is basically equivalent to the process part of an operating system. It's just that it's in user mode and the process is in kernel mode.

This project can be seen here . You can try it out by visualizing JDK19.

2. Summary

At present, in the cloud native field, Java may not be a good choice - the most unbearable thing in this field is its huge Runtime and long startup time. In the past, this was the source of Java's advantages, but in the cloud native era , it becomes the obvious weakness of Java. Therefore, Java wants to continue the trend of previous decades in the cloud-native era, and it is urgent to solve this problem. From this point of view, I am bullish on Quarkus.

The optimization brought by Valhalla can be used in many scenarios, and some long-running applications can also get more performance benefits.

Coroutines are aimed at IO-intensive scenarios, and they can also avoid a lot of thread consumption through NIO and AIO methods. So Loom seems to me more like the icing on the cake.

Notes: Following Cloud-Native Java

0. Preface

1. The fire of change

1.1 Compile Native Code

1.2 Memory Access Efficiency Improvement

1.3 Coroutines

2. Summary

泊浮目

引用和评论

以防你不知道大佬认为写好注释有多重要

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性