In this way, Spring Boot is optimized, and the startup speed is so fast!

Microservices are cool to use for a while, but they don’t work well, especially for the problems of service splitting, not controlling the business boundary, and the splitting granularity is too large. Some Spring Boot startup speed is too slow, and you may also have problems. This experience, here will explore some aspects of Spring Boot startup speed optimization.

Startup time analysis

IDEA comes with an integrated async-profile tool, so we can more intuitively see some problems in the startup process through the flame graph. For example, in the example below, through the flame graph, a lot of time is spent in bean loading and initialization. .

The picture comes from IDEA's own integrated async-profile tool, you can search for Java Profiler custom configuration in Preferences, and start using Run with xx Profiler.
The y-axis represents the call stack, each layer is a function, the deeper the call stack, the higher the flame, the top is the function being executed, and the bottom is its parent function.
The x-axis represents the number of samples. If a function occupies a wider width on the x-axis, it means that it has been sampled more times, that is, the execution time is long.

Start optimization

Reduce business initialization

Most of the time-consuming should be due to the fact that the business is too large or contains a lot of initialization logic, such as establishing database connections, Redis connections, various connection pools, etc. The suggestion for the business side is to minimize unnecessary dependencies, and it can be asynchronous. asynchronous.

lazy initialization

spring.main.lazy-initialization property was introduced after Spring Boot version 2.2, and the configuration is true to indicate that all beans will be initialized lazily.

The startup speed can be improved to a certain extent, but the first access may be slower.

 spring.main.lazy-initialization=true

Spring Context Indexer

The version after Spring5 provides spring-context-indexer function, the main function is to solve the problem of too slow scanning speed caused by too many classes during class scanning.

The method of use is also very simple, import dependencies, and then mark the startup class with @Indexed annotation, so that after the program is compiled and packaged, a META-INT/spring.components file will be generated, and when executing ComponentScan scan When the class is selected, the index file will be read to improve the scanning speed.

 <dependency>
  <groupId>org.springframework</groupId>
  <artifactId>spring-context-indexer</artifactId>
  <optional>true</optional>
</dependency>

Close JMX

Spring Boot version 2.2.X and below will enable JMX by default, you can use jconsole to view it, and if we don't need these monitoring, we can manually turn it off.

 spring.jmx.enabled=false

Turn off layered compilation

For versions after Java8, multi-layer compilation is turned on by default. Use the command java -XX:+PrintFlagsFinal -version | grep CompileThreshold to view.

Tier3 is C1, Tier4 is C2, which means that a method is interpreted and compiled 2000 times for C1 compilation, and after C1 compilation is executed 15000 times, C2 compilation is performed.

We can use the C1 compiler through the command, so that there is no C2 optimization stage, which can improve the startup speed, and at the same time cooperate with -Xverify:none/ -noverify to close the bytecode verification, but try not to use it in the online environment.

 -XX:TieredStopAtLevel=1 -noverify

another way of thinking

The above has introduced some optimizations from the business level and startup parameters. Let's take a look at the ways to optimize based on the Java application itself.

Before that, let's recall the process of creating objects in Java. First, we need to load the class, and then create the object. After the object is created, we can call the object method, which will also involve JIT. JIT converts the bytecode through runtime. Compile to native machine code to improve the performance of Java programs.

Therefore, the techniques involved below will summarize the steps involved above.

JAR Index

The Jar package is actually a ZIP file. When loading a class, we traverse the Jar package through the class loader, find the corresponding class file to load, and then verify, prepare, parse, initialize, and instantiate the object.

JarIndex is actually a very old technology, which is used to solve the performance problem of traversing Jars when loading classes. It was introduced as early as JDK1.3.

Suppose we want to find a class in the three Jar packages of A\B\C. If we can immediately infer which jar package is in through the type com.C, we can avoid the process of traversing the jar.

 A.jar
com/A

B.jar
com/B

C.jar
com/C

Through the Jar Index technology, the corresponding index file INDEX.LIST can be generated.

 com/A --> A.jar
com/B --> B.jar
com/C --> C.jar

However, for current projects, Jar Index is difficult to apply:

The index file generated by jar -i is based on the Class-Path in META-INF/MANIFEST.MF. Most of our current projects do not involve this, so the generation of the index file requires us to do additional processing.
Only supports URLClassloader, we need our own custom class loading logic

APPCDS

The full name of App CDS is Application Class Data Sharing, which is mainly used for startup acceleration and memory saving. In fact, it has been introduced as early as JDK1.5 version, but it is continuously optimized and upgraded in the subsequent version iteration process. In JDK13 version, it is Open by default, the early CDS only supports BootClassLoader, AppCDS was introduced in JDK8, and supports AppClassLoader and custom ClassLoader.

We all know that the process of class loading is accompanied by the process of parsing and verification. CDS stores the data structure generated by this process in an archive file and reuses it in the next run. This archive file is called Shared Archive, with jsa as the file suffix.

When in use, the jsa file is mapped into memory, and the type pointer in the object header points to the memory address.

Let's see how to use it.

First, we need to generate a list of classes that we want to share between applications, which is the lst file. For Oracle JDK, the -XX:+UnlockCommercialFeature command needs to be added to enable commercialization. OpenJDK does not need this parameter. The JDK13 version combines steps 1 and 2 into one step, but this is still required for lower versions.

 java -XX:DumpLoadedClassList=test.lst

Then after getting the lst class list, dump it to a jsa file suitable for memory mapping for archiving.

 java -Xshare:dump -XX:SharedClassListFile=test.lst -XX:SharedArchiveFile=test.jsa

Finally, add the running parameter to specify the archive file at startup.

 -Xshare:on -XX:SharedArchiveFile=test.jsa

It should be noted that AppCDS will only take effect in the FatJar that contains all the class files. The nested Jar structure of SpringBoot cannot take effect. You need to use the maven shade plugin to create the shade jar.

 <build>
  <finalName>helloworld</finalName>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <configuration>
        <keepDependenciesWithProvidedScope>true</keepDependenciesWithProvidedScope>
        <createDependencyReducedPom>false</createDependencyReducedPom>
        <filters>
          <filter>
            <artifact>*:*</artifact>
            <excludes>
              <exclude>META-INF/*.SF</exclude>
              <exclude>META-INF/*.DSA</exclude>
              <exclude>META-INF/*.RSA</exclude>
            </excludes>
          </filter>
        </filters>
      </configuration>
      <executions>
        <execution>
          <phase>package</phase>
          <goals><goal>shade</goal></goals>
          <configuration>
            <transformers>
              <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                <resource>META-INF/spring.handlers</resource>
              </transformer>
              <transformer implementation="org.springframework.boot.maven.PropertiesMergingResourceTransformer">
                <resource>META-INF/spring.factories</resource>
              </transformer>
              <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                <resource>META-INF/spring.schemas</resource>
              </transformer>
              <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
              <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                <mainClass>${mainClass}</mainClass>
              </transformer>
            </transformers>
          </configuration>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

Then follow the above steps to use it, but if the project is too large and the number of files is greater than 65535, an error will be reported:

Caused by: java.lang.IllegalStateException: Zip64 archives are not supported

The source code is as follows:

 public int getNumberOfRecords() {
  long numberOfRecords = Bytes.littleEndianValue(this.block, this.offset + 10, 2);
  if (numberOfRecords == 0xFFFF) {
    throw new IllegalStateException("Zip64 archives are not supported");
}

This problem is fixed in version 2.2 and above, so try to use a higher version to avoid such problems.

Heap Archive

HeapArchive was introduced in JDK9 and was officially used in JDK12. We can think that Heap Archive is an extension of APPCDS.

APPCDS persists the data generated by verification and parsing in the class loading process, and Heap Archive is the heap memory data related to class initialization (execute static code block cinit to initialize).

In short, it can be considered that HeapArchive persists some static fields through memory mapping when the class is initialized, avoids calling the class initializer, gets the initialized class in advance, and improves the startup speed.

AOT compilation

As we said, JIT compiles the bytecode into native machine code at runtime, and executes it directly when needed, which reduces the time of interpretation and improves the running speed of the program.

The three ways to improve the startup speed of the application mentioned above can all be classified as the process of class loading. When the object instance is actually created and the method is executed, since it may not be compiled by JIT, the execution speed in the interpretation mode is very slow. So the way of AOT compilation is produced.

AOT (Ahead-Of-Time) refers to the compilation behavior that occurs before the program runs. Its function is equivalent to preheating , compiling it into machine code in advance, and reducing the interpretation time.

For example, Spring Cloud Native is like this now. It is directly statically compiled into executable files at runtime, and does not depend on JVM, so it is very fast.

However, the AOT technology in Java is not mature enough. As an experimental technology, it is closed by default in versions after JDK8 and needs to be manually opened.

 java -XX:+UnlockExperimentalVMOptions -XX:AOTLibrary=

And due to the long-term lack of maintenance and tuning, this technology has been removed in the JDK 16 version, so I won't go into details here.

Offline time optimization

Offline gracefully

Spring Boot has added new features in version 2.3, graceful shutdown , supports Jetty, Reactor Netty, Tomcat and Undertow, using:

 server:
  shutdown: graceful

# 最大等待时间
spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

If it is lower than version 2.3, the official also provides a low-level implementation solution. The implementation in the new version is basically the same logic. First, suspend external requests and close the thread pool to process the remaining tasks.

 @SpringBootApplication
@RestController
public class Gh4657Application {

    public static void main(String[] args) {
        SpringApplication.run(Gh4657Application.class, args);
    }

    @RequestMapping("/pause")
    public String pause() throws InterruptedException {
        Thread.sleep(10000);
        return "Pause complete";
    }

    @Bean
    public GracefulShutdown gracefulShutdown() {
        return new GracefulShutdown();
    }

    @Bean
    public EmbeddedServletContainerCustomizer tomcatCustomizer() {
        return new EmbeddedServletContainerCustomizer() {

            @Override
            public void customize(ConfigurableEmbeddedServletContainer container) {
                if (container instanceof TomcatEmbeddedServletContainerFactory) {
                    ((TomcatEmbeddedServletContainerFactory) container)
                            .addConnectorCustomizers(gracefulShutdown());
                }

            }
        };
    }

    private static class GracefulShutdown implements TomcatConnectorCustomizer,
            ApplicationListener<ContextClosedEvent> {

        private static final Logger log = LoggerFactory.getLogger(GracefulShutdown.class);

        private volatile Connector connector;

        @Override
        public void customize(Connector connector) {
            this.connector = connector;
        }

        @Override
        public void onApplicationEvent(ContextClosedEvent event) {
            this.connector.pause();
            Executor executor = this.connector.getProtocolHandler().getExecutor();
            if (executor instanceof ThreadPoolExecutor) {
                try {
                    ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor;
                    threadPoolExecutor.shutdown();
                    if (!threadPoolExecutor.awaitTermination(30, TimeUnit.SECONDS)) {
                        log.warn("Tomcat thread pool did not shut down gracefully within "
                                + "30 seconds. Proceeding with forceful shutdown");
                    }
                }
                catch (InterruptedException ex) {
                    Thread.currentThread().interrupt();
                }
            }
        }

    }

}

Eureka service offline time

In addition, I mentioned in the previous article about the problem of the client's perception of the server's offline time.

Eureka uses a L3 cache to store service instance information.

When the service is registered, it will keep a heartbeat with the server. The heartbeat time is 30 seconds. After the service is registered, the client's instance information is saved to the Registry service registry, and the information in the registry will be synchronized to readWriteCacheMap immediately.

If the client perceives this service, it needs to read from readOnlyCacheMap. This read-only cache needs 30 seconds to synchronize from readWriteCacheMap.

Both the client and Ribbon load balancing maintain a local cache, which is synchronized every 30 seconds.

According to the above, let's calculate how long it takes for the client to perceive the extreme situation of a service offline.

The client sends a heartbeat to the server every 30 seconds
The registry saves the instance information of all service registrations. It will maintain a real-time synchronization with readWriteCacheMap, and readWriteCacheMap and readOnlyCacheMap will be synchronized every 30 seconds.
The client synchronizes the registered instance information of readOnlyCacheMap every 30 seconds
Considering that if the ribbon is used for load balancing, he also has a layer of cache that is synchronized every 30 seconds

If a service goes offline normally, in extreme cases, the time should be 30+30+30+30 about 120 seconds.

If the service goes offline abnormally, it is necessary to rely on the cleaning thread executed every 60 seconds to eliminate the service that has no heartbeat for more than 90 seconds, then the extreme situation here may take 3 times of 60 seconds to detect, which is 180 seconds.

The longest possible perception time accumulated is: 180 + 120 = 300 seconds, 5 minutes of time.

The solution, of course, is to change these times.

Modify the time of the ribbon synchronization cache to 3 seconds : ribbon.ServerListRefreshInterval = 3000

Modify the client synchronization cache time to 3 seconds : eureka.client.registry-fetch-interval-seconds = 3

The heartbeat interval is modified to 3 seconds : eureka.instance.lease-renewal-interval-in-seconds = 3

The time for timeout elimination is changed to 9 seconds : eureka.instance.lease-expiration-duration-in-seconds = 9

The cleaning thread timing time is changed to execute once every 5 seconds : eureka.server.eviction-interval-timer-in-ms = 5000

The time to synchronize to the read-only cache is modified to once every 3 seconds : eureka.server.response-cache-update-interval-ms = 3000

If we follow this time parameter setting, let us recalculate the maximum time that the service may be perceived to be offline:

Normal offline is 3+3+3+3=12 seconds, abnormal offline plus 15 seconds is 27 seconds.

Finish

OK, I've talked about the startup of Spring Boot services and the optimization of offline time, but I think the service splitting is good enough and the code is better written, so these problems may not be a problem.

In this way, Spring Boot is optimized, and the startup speed is so fast!

Startup time analysis

Start optimization

Reduce business initialization

lazy initialization

Spring Context Indexer

Close JMX

Turn off layered compilation

another way of thinking

JAR Index

APPCDS

Heap Archive

AOT compilation

Offline time optimization

Offline gracefully

Eureka service offline time

Finish

艾小仙

引用和评论

RabbitMQ、RocketMQ、Kafka延迟队列实现

在Java程序中监听mysql的binlog

Jerry和您聊聊Chrome开发者工具

Bitmap 和布隆过滤器傻傻分不清？你这不应该啊

Just for fun——迅速写完快速排序

Spring 实现 3 种异步流式接口，干掉接口超时烦恼

状态码 406， Not Acceptable, (Spring Boot)

In this way, Spring Boot is optimized, and the startup speed is so fast!

Startup time analysis

Start optimization

Reduce business initialization

lazy initialization

Spring Context Indexer

Close JMX

Turn off layered compilation

another way of thinking

JAR Index

APPCDS

Heap Archive

AOT compilation

Offline time optimization

Offline gracefully

Eureka service offline time

Finish

艾小仙

引用和评论

RabbitMQ、RocketMQ、Kafka延迟队列实现

在Java程序中监听mysql的binlog

Jerry和您聊聊Chrome开发者工具

Bitmap 和 布隆过滤器傻傻分不清？你这不应该啊

Just for fun——迅速写完快速排序

Spring 实现 3 种异步流式接口，干掉接口超时烦恼

状态码 406， Not Acceptable, (Spring Boot)

Bitmap 和布隆过滤器傻傻分不清？你这不应该啊