Explore the startup speed optimization of Java applications

Introduction to the high performance of 1614199f95ac28, the poor startup performance of Java is also impressive. Most of the impression that Java is bulky and slow in everyone’s impression comes from this. There seems to be some contradiction between high performance and fast startup speed. This article will explore with you whether you can have both.

Author | Liang Xi

Can high performance and fast startup speed have both?

As an object-oriented programming language, Java is unique in its performance.

"Energy Efficiency across Programming Languages, How Does Energy, Time, and Memory Relate?" This report investigates the execution efficiency of major programming languages. Although the richness of the scene is limited, it can also let us see what we have learned.

From the table, we can see that the execution efficiency of Java is very high, about half of the fastest C language. This is second only to C, Rust and C++ in the mainstream programming languages.

The excellent performance of Java benefits from the excellent JIT compiler in Hotspot. Java's Server Compiler (C2) compiler is the work of Dr. Cliff Click and uses the Sea-of-Nodes model. And this technology has also proven through time that it represents the most advanced level in the industry:

The famous TurboFan compiler of V8 (JavaScript engine) uses the same design, but is implemented in a more modern way;
When Hotspot uses Graal JVMCI as JIT, the performance is basically the same as C2;
Azul's commercial product replaces the C2 compiler in Hotspot with LLVM, and its peak performance is the same as C2.

Behind the high performance, the poor startup performance of Java is also impressive, and most of the impression that Java is cumbersome and slow in everyone's impression comes from this. There seems to be some contradiction between high performance and fast startup speed. This article will explore with you whether you can have both.

The root cause of slow start of JAVA

1. Complex frame

JakartaEE is Oracle's new name after donating J2EE to the Eclipse Foundation. Java released the J2EE specification when it was launched in 1999. EJB (Java Enterprise Beans) defines the security, IoC, AOP, transaction, and concurrency capabilities required for enterprise-level development. The design is extremely complex, and the most basic applications require a large number of configuration files, which is very inconvenient to use.

With the rise of the Internet, EJB has been gradually replaced by the more lightweight and free Spring framework, and Spring has become the de facto standard for Java enterprise development. Although Spring is more lightweight, it is still greatly affected by JakartaEE in its bones, such as the use of a large number of xml configurations in early versions, a large number of JakartaEE-related annotations (such as JSR 330 dependency injection), and specifications (such as JSR 340 Servlet API) usage of.

But Spring is still an enterprise-level framework. Let's look at a few design philosophies of Spring framework:

Options are provided at each layer, and Spring allows you to postpone selection as much as possible.
Adapting to different perspectives, Spring is flexible, and it will not force you to decide what to choose. It supports a wide range of application requirements from different perspectives.
Maintain strong backward compatibility.

Under the influence of this design philosophy, there must be a lot of configurable and initialization logic, as well as complex design patterns to support this flexibility. Let's see through an experiment:

We run a spring-boot-web helloword, and we can see the dependent class files through -verbose:class:

$ java -verbose:class -jar myapp-1.0-SNAPSHOT.jar | grep spring | head -n 5
[Loaded org.springframework.boot.loader.Launcher from file:/Users/yulei/tmp/myapp-1.0-SNAPSHOT.jar]
[Loaded org.springframework.boot.loader.ExecutableArchiveLauncher from file:/Users/yulei/tmp/myapp-1.0-SNAPSHOT.jar]
[Loaded org.springframework.boot.loader.JarLauncher from file:/Users/yulei/tmp/myapp-1.0-SNAPSHOT.jar]
[Loaded org.springframework.boot.loader.archive.Archive from file:/Users/yulei/tmp/myapp-1.0-SNAPSHOT.jar]
[Loaded org.springframework.boot.loader.LaunchedURLClassLoader from file:/Users/yulei/tmp/myapp-1.0-SNAPSHOT.jar]
$ java -verbose:class -jar myapp-1.0-SNAPSHOT.jar | egrep '^\[Loaded' > classes
$ wc classes
    7404   29638 1175552 classes

The number of classes reached an astonishing 7404.

Let's compare the JavaScript ecology and write a basic application using the commonly used express:

const express = require('express')
const app = express()
app.get('/', (req, res) => {
  res.send('Hello World!')
})
  app.listen(3000, () => {
    console.log(`Example app listening at http://localhost:${port}`)
})

We borrow Node's debug environment variable analysis:

NODE_DEBUG=module node app.js 2>&1  | head -n 5
MODULE 18614: looking for "/Users/yulei/tmp/myapp/app.js" in ["/Users/yulei/.node_modules","/Users/yulei/.node_libraries","/usr/local/Cellar/node/14.4.0/lib/node"]
MODULE 18614: load "/Users/yulei/tmp/myapp/app.js" for module "."
MODULE 18614: Module._load REQUEST express parent: .
MODULE 18614: looking for "express" in ["/Users/yulei/tmp/myapp/node_modules","/Users/yulei/tmp/node_modules","/Users/yulei/node_modules","/Users/node_modules","/node_modules","/Users/yulei/.node_modules","/Users/yulei/.node_libraries","/usr/local/Cellar/node/14.4.0/lib/node"]
MODULE 18614: load "/Users/yulei/tmp/myapp/node_modules/express/index.js" for module "/Users/yulei/tmp/myapp/node_modules/express/index.js"
$ NODE_DEBUG=module node app.js 2>&1  | grep ': load "' > js
$ wc js
      55     392    8192 js

Only 55 js files are relied on here.

Although it is not fair to compare spring-boot with express. In the Java world, you can also build applications based on more lightweight frameworks such as Vert.X and Netty, but in practice, almost everyone will choose spring-boot without thinking about it in order to enjoy the convenience of the Java open source ecosystem.

2. Compile once, run

Is Java startup slow because of the complexity of the framework? The answer can only be that the complexity of the framework is one of the reasons for the slow startup. Through GraalVM's Native Image function combined with spring-native features, the startup time of spring-boot applications can be shortened by about ten times.

Java's Slogan is "Write once, run anywhere" (WORA), and Java does this through bytecode and virtual machine technology.

WORA enables developers to quickly deploy applications developed and debugged on MacOS to Linux servers. The cross-platform nature also makes the Maven central warehouse easier to maintain, contributing to the prosperity of the Java open source ecosystem.

Let's take a look at the impact of WORA on Java:

Class Loading

Java uses classes to organize source code. Classes are stuffed into JAR packages to be organized into modules and distributed. JAR packages are essentially a ZIP file:

$ jar tf slf4j-api-1.7.25.jar | head
META-INF/
META-INF/MANIFEST.MF
org/slf4j/
org/slf4j/event/EventConstants.class
org/slf4j/event/EventRecodingLogger.class
org/slf4j/event/Level.class

Each JAR package is a functionally independent module. Developers can rely on JARs with specific functions as needed. These JARs are known by the JVM through the class path and loaded.

According to this, the execution of new or invokestatic bytecode will trigger class loading. The JVM will give control to the Classloader. The most common implementation URLClassloader will traverse the JAR package to find the corresponding class file:

for (int i = 0; (loader = getNextLoader(cache, i)) != null; i++) {
    Resource res = loader.getResource(name, check);
    if (res != null) {
        return res;
    }
}

Therefore, the cost of searching classes is usually proportional to the number of JAR packages. In large-scale application scenarios, the number will be thousands, resulting in high overall search time.

When the class file is found, the JVM needs to verify whether the class file is legal, and parse it into an internally usable data structure, which is called InstanceKlass in the JVM. After listening to javap, take a look at the information contained in the class file:

$ javap -p SimpleMessage.class
public class org.apache.logging.log4j.message.SimpleMessage implements org.apache.logging.log4j.message.Message,org.apache.logging.log4j.util.StringBuilderFormattable,java.lang.CharSequence {
  private static final long serialVersionUID;
  private java.lang.String message;
  private transient java.lang.CharSequence charSequence;
  public org.apache.logging.log4j.message.SimpleMessage();
  public org.apache.logging.log4j.message.SimpleMessage(java.lang.String);

This structure contains interfaces, base classes, static data, object layouts, method bytecodes, constant pools, and so on. These data structures are all necessary for the interpreter to execute bytecode or JIT compilation.

Class initialize

When the class is loaded, the initialization must be completed to actually create the object or call the static method. Class initialization can be simply understood as a static block:

public class A {
  private final static String JAVA_VERSION_STRING = System.getProperty("java.version");
    private final static Set<Integer> idBlackList = new HashSet<>();
    static {
        idBlackList.add(10);
        idBlackList.add(65538);
    }
}

The initialization of the first static variable JAVA\_VERSION\_STRING above will also become a part of the static block after being compiled into bytecode.

Class initialization has the following characteristics:

Execute only once

When multiple threads try to access a class, only one thread will perform class initialization, and the JVM guarantees that other threads will block waiting for the initialization to complete.

These features are very suitable for reading the configuration, or constructing some data structures, caches, etc. required by the runtime, so many types of initialization logic will be more complicated to write.

Just In Time compile

After the Java class is initialized, the object can be instantiated and methods on the object can be called. The interpretation execution is similar to a big switch..case loop, with poor performance:

while (true) {
  switch(bytocode[pc]) {
        case AALOAD:
            ...
            break;
        case ATHROW:
            ...
            break;
    }
}

We use JMH to run a Hessian serialized Micro Benchmark test:

$ java -jar benchmarks.jar hessianIO
Benchmark                      Mode  Cnt       Score   Error  Units
SerializeBenchmark.hessianIO  thrpt       118194.452          ops/s
$ java -Xint -jar benchmarks.jar hessianIO
Benchmark                      Mode  Cnt     Score   Error  Units
SerializeBenchmark.hessianIO  thrpt       4535.820          ops/s

The -Xint parameter of the second run controls that we only use the interpreter, which is a difference of 26 times, which is caused by the difference between the execution of the direct machine execution and the execution of the interpretation. This gap has a lot to do with the scene, our usual experience value is 50 times.

Let's take a closer look at the behavior of JIT:

$ java -XX:+PrintFlagsFinal -version | grep CompileThreshold
     intx Tier3CompileThreshold                     = 2000                                {product}
     intx Tier4CompileThreshold                     = 15000                               {product}

Here are the values of the two JIT parameters inside the JDK. We will not introduce the principle of layered compilation for the time being. You can refer to Stack Overflow. Tier3 can be simply understood as (client compiler) C1, and Tier4 is C2. When a method is interpreted and executed 2000 times, C1 will be compiled, when C1 is compiled and executed 15000 times, it will be C2 compiled, which really reaches half of the performance of C at the beginning of the article.

In the initial stage of the application, the method has not been completely compiled by JIT, so most of the cases remain in the interpretation and execution, which affects the speed of application startup.

How to optimize the startup speed of Java applications

Earlier, we spent a lot of space analyzing the main reasons for the slow startup of Java applications. The summary is:

Affected by JakartaEE, common frameworks consider reuse and flexibility, and are more complex in design;

In order to be cross-platform, the code is dynamically loaded and dynamically compiled, and it takes time to load and execute during the startup phase;

The combination of these two results in the slow start of Java applications.

Both Python and Javascript dynamically parse and load modules. CPyhton does not even have a JIT. Theoretically, the startup will not be much faster than Java, but they do not use a very complex application framework, so the overall startup performance problem will not be felt.

Although we cannot easily change the user's habits of using the framework, it can be enhanced at the runtime level to make the startup performance as close to the native image as possible. The official OpenJDK community has also been working hard to solve the startup performance problem, so as ordinary Java developers, can we use the latest features of OpenJDK to help us improve the startup performance?

Class Loading

Solve the JAR package traversal problem through JarIndex, but the technology is too old to be used in modern projects including tomcat and fatJar

AppCDS can solve the performance problem of class file parsing and processing

Class Initialize: OpenJDK9 has joined HeapArchive, which can persist some Heap data related to class initialization, but only a few internal JDK classes (such as IntegerCache) can be accelerated, and there is no open usage.

JIT warm-up: JEP295 implements AOT compilation, but there are bugs. Improper use will cause the correct performance of the program. The performance is not well tuned, the effect is not seen in most cases, and performance regression may even occur.

Faced with the problems of the above features of OpenJDK, Alibaba Dragonwell has carried out research and development optimization of the above technologies and integrated them with cloud products. Users can easily optimize the start-up time without investing too much effort.

1、AppCDS

CDS (Class Data Sharing) was first introduced in Oracle JDK1.5, and AppCDS was introduced in Oracle JDK8u40, which supports classes other than JDK, but is provided as a commercial feature. Later, Oracle contributed AppCDS to the community. CDS was gradually improved in JDK10, and it also supports user-defined class loader (also known as AppCDS v2).

Object-oriented languages bind objects (data) and methods (operations on objects) together to provide stronger encapsulation and polymorphism. These features are implemented by relying on the type information in the object header, which is the case for Java and Python languages. The layout of Java objects in memory is as follows:

+-------------+|  mark       |+-------------+|  Klass*     |+-------------+|  fields     ||             |+-------------+

Mark represents the state of the object, including whether it is locked, GC age, and so on. And Klass* points to the data structure InstanceKlass that describes the object type:

//  InstanceKlass layout:
//    [C++ vtbl pointer           ] Klass
//    [java mirror                ] Klass
//    [super                      ] Klass
//    [access_flags               ] Klass
//    [name                       ] Klass
//    [methods                    ]
//    [fields                     ]
...

Based on this structure, expressions such as o instanceof String can have enough information to determine. It should be noted that the InstanceKlass structure is more complicated, including all methods, fields, etc. of the class, and the methods include bytecode and other information. This data structure is obtained by parsing the class file at runtime. In order to ensure security, the legitimacy of the bytecode must be checked when parsing the class (the bytecode of the method not generated by Javac can easily cause JVM crash).

CDS can store (dump) the data structure generated by this analysis and verification to a file, and reuse it in the next run. The dump product is called Shared Archive, with the suffix jsa (Java shared archive).

In order to reduce the overhead of CDS reading jsa dump and avoid the overhead of deserializing data to InstanceKlass, the storage layout in the jsa file is exactly the same as the InstanceKlass object, so when using jsa data, you only need to map the jsa file to memory, and Just let the type pointer in the object header point to this memory address, which is very efficient.

Object:+-------------+|  mark       |         +-------------------------++-------------+         |classes.jsa file         ||  Klass*     +--------->java_mirror|super|methods|+-------------+         |java_mirror|super|methods||  fields     |         |java_mirror|super|methods||             |    
Object:
+-------------+
|  mark       |         +-------------------------+
+-------------+         |classes.jsa file         |
|  Klass*     +--------->java_mirror|super|methods|
+-------------+         |java_mirror|super|methods|
|  fields     |         |java_mirror|super|methods|
|             |         +-------------------------+
+-------------+     +-------------------------++-------------+

1. AppCDS is incapable of customer class loader

InstanceKlass stored in jsa is the product of class file parsing. For the boot classloader (the classloader that loads the classes under jre/lib/rt.jar) and the system(app) classloader (the classloader that loads the classes under -classpath), CDS has an internal mechanism to skip reading the class files. Match the corresponding data structure in the jsa file only by the class name.

Java also provides a mechanism for user-defined class loader (custom class loader). Users can highly customize the logic of obtaining classes by Override their own Classloader.loadClass() method, such as obtaining them from the Internet and directly generating them dynamically in code. It works. In order to enhance the security of AppCDS and avoid getting unexpected classes because of loading class definitions from CDS, AppCDS customer class loader needs to go through the following steps:

Call the user-defined Classloader.loadClass() to get the class byte stream

Calculate the checksum of the class byte stream and compare it with the checksum of the same name structure in jsa

If the match is successful, return InstanceKlass in jsa, otherwise continue to use slow path to parse the class file

We have seen that in many scenarios, the first step above occupies the bulk of the time-consuming class loading, at this time AppCDS seems to be powerless. for example:

bar.jar
 +- com/bar/Bar.class
baz.jar
 +- com/baz/Baz.class
foo.jar
 +- com/foo/Foo.class

The class path contains the above three jar packages. When loading class com.foo.Foo, most of the Classloader implementations (including URLClassloader, tomcat, spring-boot) choose the simplest strategy (premature optimization is the root of all evil) ): Try to extract the file com/foo/Foo.class one by one in the order in which the jar packages appear on the disk.

The JAR package uses the zip format as storage. Each time a class is loaded, it is necessary to traverse the JAR packages under the classpath and try to extract a single file from the zip to ensure that the existing classes can be found. Assuming there are N JAR packages, then on average a class load needs to try to access N/2 zip files.

In one of our real scenarios, N reaches 2000. At this time, the JAR package search overhead is very large, and it is much larger than the InstanceKlass parsing overhead. In the face of such scenarios, AppCDS technology is not enough.

2、JAR Index

According to the jar file specification, JAR file is a format that uses zip package and uses text to store meta-information in the META-INF directory. The format has been designed to deal with the above search scenarios, and this technology is called JAR Index.

Suppose we want to find a class in the above bar.jar, baz.jar, foo.jar, and if we can use the type com.foo.Foo to immediately infer which jar package it is in, we can avoid the above scanning overhead.

JarIndex-Version: 1.0
foo.jar
com/foo
bar.jar
com/bar
baz.jar
com/baz

Through JAR Index technology, the above-mentioned index file INDEX.LIST can be generated. It becomes a HashMap after being loaded into memory:

com/bar --> bar.jar
com/baz --> baz.jar
com/foo --> foo.jar

When we see the class name com.foo.Foo, we can learn the specific jar package foo.jar from the index according to the package name com.foo, and quickly extract the class file.

The Jar Index technology seems to solve our problem, but this technology is very old and difficult to be used in modern applications:

jar i generates an index file according to the Class-Path attribute in META-INF/MANIFEST.MF. Modern projects hardly maintain this attribute

Only URLClassloader supports JAR Index

The jar with index is required to appear in front of the classpath as much as possible

Dragonwell uses agent injection to enable INDEX.LIST to be correctly generated and appear in the appropriate position of the classpath to help applications improve startup performance.

2, class is initialized in advance

The code execution in the static block of the class is called class initialization. After the class is loaded, the initialization code must be executed before it can be used (create instance, call static method).

The initialization of many classes is essentially just to construct some static fields:

class IntegerCache {
    static final Integer cache[];
    static {
        Integer[] c = new Integer[size];
        int j = low;
        for(int k = 0; k < c.length; k++)
            c[k] = new Integer(j++);
        cache = c;
    }
}

We know that the JDK has a cache for a section of box type commonly used to avoid excessive repetitive creation. This section of data needs to be constructed in advance. Since these methods will only be executed once, they are executed in a purely interpreted manner. If several static fields can be persisted to avoid calling the class initializer, we can get the pre-initialized class and reduce the startup time.

The most efficient way to load persistence into memory usage is memory mapping:

int fd = open("archive_file", O_READ);
struct person *persons = mmap(NULL, 100 * sizeof(struct person),
                              PROT_READ, fd, 0);
int age = persons[5].age;

C language is almost directly oriented to memory to manipulate data, while high-level languages such as Java abstract memory into objects with meta-information such as mark and Klass*. There are certain changes between each run, so it needs to be more complicated. The wit to obtain efficient object persistence.

1. Introduction to Heap Archive

OpenJDK9 introduced the HeapArchive capability, and the heap archive was officially used in OpenJDK12. As the name suggests, Heap Archive technology can persist objects on the heap.

The object graph is constructed in advance and then put into the archive. We call this stage dump; and using the data in the archive is called runtime. dump and runtime are usually not the same process, but in some scenarios they can be the same process.

Recall the memory layout after using AppCDS, the Klass* pointer of the object points to the data in SharedArchive. AppCDS persists the meta-information of InstanceKlass. If you want to reuse persistent objects, the type pointer of the object header must also point to a piece of persistent meta-information. Therefore, HeapArchive technology relies on AppCDS.

In order to adapt to a variety of scenarios, OpenJDK's HeapArchive also provides two levels: Open and Closed:

The figure above is the allowed reference relationship:

Closed Archive

References to objects in Open Archive and Heap are not allowed
Can refer to objects inside Closed Archive
Read only, not writable

Open Archive

Can refer to any object
Writable

The reason for this design is that for some read-only structures, placing them in the Closed Archive can achieve completely no overhead for GC.

Why is it read-only? Imagine if the object A in the Closed Archive references the object B in the heap, then when the object B moves, the GC needs to modify the field that points to B in A, which will bring GC overhead.

2. Use Heap Archive to do class initialization in advance

After supporting this structure, after the class is loaded, point the static variable to the archived object to complete the class initialization:

class Foo {
  static Object data;
}                 +
                  |
        <---------+
Open Archive Object:
+-------------+
|  mark       |         +-------------------------+
+-------------+         |classes.jsa file         |
|  Klass*     +--------->java_mirror|super|methods|
+-------------+         |java_mirror|super|methods|
|  fields     |         |java_mirror|super|methods|
|             |         +-------------------------+
+-------------+

3. AOT compilation

Except for class loading, the first few executions of the method are not compiled by the JIT compiler, and the bytecode is executed in interpreted mode. According to the analysis in the first half of this article, the interpretation execution speed is about one-tenth of that after JIT compilation, which is a major culprit for slow code interpretation and slow startup.

Traditional languages such as C/C++ are native machine codes that are directly compiled to the target platform. As everyone realizes the start-up preheating problem of interpreter JIT languages such as Java and JS, the way of directly compiling bytecode into native code through AOT has gradually entered the public eye.

Wasm, GraalVM, and OpenJDK all support AOT compilation to varying degrees. We mainly optimize the startup speed around the jaotc tool introduced by JEP295.

Note the use of terms here:
JEP295 uses AOT to compile the methods in the class file one by one into native code fragments, and replace the entry of the method to the AOT code after loading a certain class through the Java virtual machine. The Native Image function of GraalVM is a more thorough static compilation, through a small runtime SubstrateVM written in Java code, the runtime and application code are statically compiled into an executable file (similar to Go), no longer dependent on JVM . This approach is also a kind of AOT, but in order to distinguish terms, AOT here refers only to the JEP295 approach.

1. First experience of AOT features

Through the introduction of JEP295, we can quickly experience AOT

cat > HelloWorld.java <<EOF
public class HelloWorld {
    public static void main(String[] args) { System.out.println("Hello World!"); }
}
EOF
jaotc --output libHelloWorld.so HelloWorld.class
java -XX:+UnlockExperimentalVMOptions -XX:AOTLibrary=./libHelloWorld.so HelloWorld

The jaotc command will call the Graal compiler to compile the bytecode and generate the libHelloWorld.so file. The so file generated here is easy to mistakenly think that it will be directly called into the compiled library code like JNI. However, the loading mechanism of ld is not fully used to run the code here. The so file is more like a container for native code. The hotsopt runtime needs to perform further dynamic linking after loading AOT so. After the class is loaded, hotspot will automatically associate the AOT code entry, and use the AOT version for the next method call. The code generated by AOT will also actively interact with the hotspot runtime, jumping between aot, interpreter, and JIT code.

1) The twists and turns of AOT

It seems that JEP295 has implemented a complete AOT system, but why hasn't this technology been used on a large scale? Among the new features of OpenJDK, AOT can be regarded as a long-term fate.

2) Multi-Classloader problem

JDK-8206963: bug with multiple class loaders

This is because the design does not take into account the multi-Classloader scene of Java. When the same-named classes loaded by multiple Classloaders use AOT, their static field is shared, and according to the design of the Java language, this part of the data should be separated of.

Since there is no solution that can quickly fix this problem, OpenJDK just added the following code:

ClassLoaderData* cld = ik->class_loader_data();
  if (!cld->is_builtin_class_loader_data()) {
    log_trace(aot, class, load)("skip class  %s  for custom classloader %s (%p) tid=" INTPTR_FORMAT,
                                ik->internal_name(), cld->loader_name(), cld, p2i(thread));
    return false;
}

AOT is not allowed for user-defined class loader. From here, it can be seen initially that this feature has gradually lacked maintenance at the community level.

In this case, although the class specified by the class-path can still use AOT, the commonly used frameworks such as spring-boot and Tomcat need to load the application code through the Custom Classloader. It can be said that this change cut a large part of the AOT scene.

3) Lack of tuning and maintenance, returning to experimental features

JDK-8227439: Turn off AOT by default

JEP 295 AOT is still experimental, and while it can be useful for startup/warmup when used with custom generated archives tailored for the application, experimental data suggests that generating shared libraries at a module level has overall negative impact to startup, dubious efficacy for warmup and severe static footprint implications.

To open AOT from now on, you need to add experimental parameters:

java -XX:+UnlockExperimentalVMOptions -XX:AOTLibrary=...

According to the description of the issue, this feature has an adverse effect on startup speed and memory usage when the entire module is compiled. The reasons for our analysis are as follows:

The Java language itself is too complex, and runtime mechanisms such as dynamic class loading make the AOT code unable to run as fast as expected

As a phased project, AOT technology has not been maintained for a long time after entering Java 9, and lacks the necessary tuning (in contrast, AppCDS has been iteratively optimized)

4) Deleted in JDK16

JDK-8255616：Disable AOT and Graal in Oracle OpenJDK

On the eve of the release of OpenJDK16, Oracle officially decided not to maintain this technology:

We haven't seen much use of these features, and the effort required to support and enhance them is significant.

The root cause is the lack of necessary optimization and maintenance. As for AOT-related future plans, we can only speculate from just a few words that there are two technical directions for Java's AOT in the future:

Do AOT based on OpenJDK C2

Supports complete Java language features on the native-image of GraalVM, and users who need AOT gradually transition from OpenJDK to native-image

Neither of the above two technical directions can see progress in the short term, so Dragonwell’s technical direction is to make the existing JEP295 work better and bring users the ultimate start-up performance.

5) Quick start on Dragonwell

Dragonwell's quick start feature overcomes the weaknesses of AppCDS and AOT compilation technology, and develops the class early initialization feature based on the HeapArchive mechanism. These features almost completely eliminate the time-consuming application startup visible to the JVM.

In addition, because the above-mentioned technologies are in line with the use model of trace-dump-replay, Dragonwell unified the process of the above-mentioned startup acceleration technology and integrated it into SAE products.

SAE x Dragonwell: Serverless with Java startup acceleration best practices

With good ingredients, you also need matching condiments and a cooking master.

The combination of Dragonwell's startup acceleration technology and the serverless technology, which is known for its flexibility, complements each other better. At the same time, they can be used to shorten the end-to-end startup time of the application by jointly landing in the full life cycle management of microservice applications. Therefore, Dragonwell chose SAE has come to implement its startup acceleration technology.

SAE (Serverless Application Engine) is the first PaaS platform for Serverless. It can:
Java software package deployment: zero-code transformation to enjoy micro-service capabilities and reduce R&D costs
Serverless extreme flexibility: resource-free operation and maintenance, rapid expansion of application instances, reducing operation and maintenance and learning costs

1. Difficulty analysis

Through analysis, we found that users of microservices face some difficulties at the application startup level:

Large package: hundreds of MB or even GB level

Many dependent packages: hundreds of dependent packages, thousands of classes

Time-consuming loading: loading dependent packages from disk to on-demand loading of Class can take up to half of the startup time

With the help of Dragonwell's quick start-up capabilities, SAE provides a set of best practices for serverless Java applications to accelerate the startup of applications as much as possible, allowing developers to focus more on business development:

Java environment + JAR/WAR package deployment: integrate Dragonwell 11 to provide an accelerated startup environment

JVM quick setting: support one-key to open quick start, simplify operation

NAS network disk: supports cross-instance acceleration, and accelerates the startup speed of newly launched instances/release in batches when new packages are deployed

2. Acceleration effect

We select some typical demos or internal applications of microservices and complex dependent business scenarios to test the startup effect and find that applications can generally reduce startup time by 5% to 45%. If the application is started, there will be a significant acceleration effect in the following scenarios:

Many classes are loaded (spring-petclinic starts to load about 12000+ classes)

Less dependent on external data

3, guest household case

Alibaba search recommendation Serverless platform

Alibaba's internal search recommends that the Serverless platform uses a class loading isolation mechanism to merge and deploy multiple businesses in the same Java virtual machine. The scheduling system will merge and deploy business codes into idle containers as needed, so that multiple businesses can share the same resource pool, greatly improving deployment density and overall CPU usage.

To support the R&D and operation of a large number of different businesses, the platform itself needs to provide rich enough functions, such as caching and RPC calls. Therefore, each JVM that searches for the recommended serverless platform needs to pull up a middleware isolation container similar to Pandora Boot, which will load a large number of classes and slow down the startup speed of the platform itself. When a sudden increase in demand enters, the scheduling system needs to pull up more containers for business code deployment. At this time, the startup time of the container itself becomes particularly important.

Based on Dragonwell's fast start technology, the search recommendation platform will perform AppCDS, Jarindex and other optimizations in the pre-release environment, and will generate archive files into the container image, so that each container can enjoy acceleration at startup, reducing about 30% Start-up is time-consuming.

Tide brand spike SAE extreme flexibility

An external customer, with the help of the Jar package deployment and Dragonwell 11 provided by SAE, quickly iteratively launched a street fashion store App.

In the face of a big promotion spike, with the ultimate flexibility of SAE Serverless and the application index QPS RT index flexibility, it is easy to meet the demand for rapid expansion of more than 10 times; at the same time, one-click opening of Dragonwell's enhanced AppCDS startup acceleration capability, reducing Java applications by 20% The above startup is time-consuming, further speeding up the application startup and ensuring the smooth and healthy operation of the business.

Summarize

The fast start technology on Dragonwell is completely based on the work of the OpenJDK community, with detailed optimizations and bugfixes for various functions, and reduces the difficulty of getting started. This not only ensures compatibility with standards, avoids internal customization, but also contributes to the open source community.

As the basic software, Dragonwell can only generate/use archive files on disk. Combined with SAE's seamless integration of Dragonwell, JVM configuration and distribution of archive files are all automated. Customers can easily enjoy the technological dividends brought about by application acceleration.

author:

Liang Xi, from the Alibaba Cloud Java Virtual Machine team, responsible for the direction of Java Runtime. Leading the development and large-scale implementation of technologies such as Java coroutine and startup optimization.

generation sequence, from the Aliyun SAE team, responsible for Runtime evolution, flexibility and efficiency direction. Leading the development and implementation of technologies such as application flexibility, Java acceleration, and mirroring acceleration.

Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright, and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.