Research on the optimization mechanism of Android client network pre-connection

1. Background

In general, we use some encapsulated network frameworks to request the network, and don't pay much attention to the underlying implementation, and in most cases, we don't need to pay special attention to processing. Thanks to the Internet's protocol and network layering, we can only handle business at the application layer. However, understanding some of the underlying implementations is helpful for us to optimize network loading. This article is about the principle and details of optimizing network loading speed based on http connection reuse mechanism.

Two, connection reuse

For an ordinary interface request, capture the packet through Charles and view the information in the Timing column of the network request. We can see the request duration information similar to the following:

Duration 175 ms
DNS 6 ms
Connect 50 msTLS Handshake 75 ms
Request 1 ms
Response 1 ms
Latency 42 ms

The same request, once again, the duration information is as follows:

Duration 39 ms
DNS -
Connect -
TLS Handshake -
Request 0 ms
Response 0 ms
Latency 39 ms

We found that the overall network request time was reduced from 175ms to 39ms. Among them, DNS, Connect, and TLS Handshake are followed by a horizontal line, indicating that there is no duration information, so the overall request duration is greatly reduced. This is the effect of Http(s) connection reuse. So the question is, what is connection reuse and why can it reduce the request time?

Before solving this question, let's take a look at what happened between the initiation of a network request and the receipt of the returned data?

Client initiates network request
Resolve the domain name through the DNS service and obtain the server IP (DNS resolution based on UDP protocol)
Establish a TCP connection (3-way handshake)
Establish a TLS connection (only used for https)
Send network request request
The server receives the request, constructs and returns the response
TCP connection closed (4 waves of hands)

The above connection reuse directly eliminates the need for steps 2, 3, and 4 above. How to calculate the time saved in the middle? If we define a round-trip (one communication round-trip) when a network request is initiated once and a response is received as an RTT (Round-trip delay time).

1) DNS is based on UDP protocol by default, and resolution requires at least 1-RTT;

2) To establish a TCP connection, 3-way handshake, 2-RTT is required;

(The picture comes from the Internet)

3) Establish a TLS connection. There are differences according to different TLS versions. The common TLS1.2 requires 2-RTT.

 Client                                               Server

ClientHello                  -------->
                                                ServerHello
                                               Certificate*
                                         ServerKeyExchange*
                                        CertificateRequest*
                             <--------      ServerHelloDone
Certificate*
ClientKeyExchange
CertificateVerify*
[ChangeCipherSpec]
Finished                     -------->
                                         [ChangeCipherSpec]
                             <--------             Finished
Application Data             <------->     Application Data

                   TLS 1.2握手流程（来自 RFC 5246）

Note: TLS1.3 version supports 0-RTT data transmission (optional, generally 1-RTT), but the current support rate is relatively low and is rarely used.

In http1.0 version, a tcp socket connection needs to be established for each http request, and the connection is closed after the request is completed. The process of pre-establishing the connection may cost extra 4-RTT, resulting in low performance.

Starting from the http1.1 version, the http connection is a persistent connection by default, which can be reused. The connection is closed by adding Connection:Close to the header of the message. If there are multiple requests in parallel, multiple connections may still need to be established. Of course, we can also transmit on the same TCP connection. In this case, the server must send back the results in the order of the client's request.

Note: reused by default. However, idle persistent connections can also be closed by the client and server at any time. Not sending Connection: Close does not mean that the server promises to keep the connection open forever.

http2 goes a step further, supports binary framing, and realizes multiplexing of TCP connections. It is no longer necessary to establish multiple TCP connections with the server, and multiple requests for the same domain name can be performed in parallel.

(The picture comes from the Internet)

Another thing that is easy to be overlooked is that TCP has congestion control, and there is a slow start process after establishing a connection (according to the network situation, the number of data packets sent is increased little by little, the front is exponential growth, and the latter becomes linear). The connection can avoid this slow start process and send packets quickly.

Three, pre-connection realization

Commonly used network request frameworks on the client side, such as OkHttp, etc., can fully support the functions of http1.1 and HTTP2, and also support connection reuse. Knowing the advantages of this connection multiplexing mechanism, then we can use it. For example, when the APP splash screen is waiting, we can pre-establish the connection of multiple domain names of key pages such as the home page details page, so that we can enter the corresponding page faster Get the result of the network request and give users a better experience. In the case of network environment deviation, this kind of pre-connection will theoretically have a better effect.

How to implement

The first reaction is that we can simply initiate a HEAD request for the domain name link in advance (no body can save traffic), so that the connection can be established in advance, and the next request of the same domain name can be reused directly, which is simple and convenient to implement. So I wrote a demo, tried a simple interface, it was perfect, and rough statistics can increase the speed of the first request by more than 40%.

So the pre-connection related logic was added to the Game Center App to start the Activity, and I ran it and tried it, but it didn't work...

After packet capture analysis, it was found that the connection was not reused. The connection was recreated every time I entered the details page. The pre-connection may just save the DNS resolution time, and the effect on the demo cannot be reproduced. It seems that the analysis of OkHttp connection reuse related source code will not run away.

Fourth, source code analysis

OKHttp uses several default Interceptors to process network request-related logic, and establishes a connection in the ConnectInterceptor class;

public final class ConnectInterceptor implements Interceptor {
  @Override public Response intercept(Chain chain) throws IOException {
    RealInterceptorChain realChain = (RealInterceptorChain) chain;
    Request request = realChain.request();
    StreamAllocation streamAllocation = realChain.streamAllocation();

    // We need the network to satisfy this request. Possibly for validating a conditional GET.
    boolean doExtensiveHealthChecks = !request.method().equals("GET");
    HttpCodec httpCodec = streamAllocation.newStream(client, chain, doExtensiveHealthChecks);
    RealConnection connection = streamAllocation.connection();

    return realChain.proceed(request, streamAllocation, httpCodec, connection);
  }
}

RealConnection is the connection used later, and the related logic of connection generation is in the StreamAllocation class;

public HttpCodec newStream(
      OkHttpClient client, Interceptor.Chain chain, boolean doExtensiveHealthChecks) {
  ... 
    RealConnection resultConnection = findHealthyConnection(connectTimeout, readTimeout,
        writeTimeout, pingIntervalMillis, connectionRetryEnabled, doExtensiveHealthChecks);
    HttpCodec resultCodec = resultConnection.newCodec(client, chain, this);
  ...
}

private RealConnection findHealthyConnection(int connectTimeout, int readTimeout,
      int writeTimeout, int pingIntervalMillis, boolean connectionRetryEnabled,
      boolean doExtensiveHealthChecks) throws IOException {
    while (true) {
      RealConnection candidate = findConnection(connectTimeout, readTimeout, writeTimeout,
          pingIntervalMillis, connectionRetryEnabled);
    ...
      return candidate;
    }
}
  
  /**
   * Returns a connection to host a new stream. This prefers the existing connection if it exists,
   * then the pool, finally building a new connection.
   */
  private RealConnection findConnection(int connectTimeout, int readTimeout, int writeTimeout,
      int pingIntervalMillis, boolean connectionRetryEnabled) throws IOException {
    ...
    
    // 尝试从connectionPool中获取可用connection
    Internal.instance.acquire(connectionPool, address, this, null);
    if (connection != null) {
    foundPooledConnection = true;
    result = connection;
    } else {
    selectedRoute = route;
    }
    
   ...
   
    if (!foundPooledConnection) {
      ... 
      // 如果最终没有可复用的connection，则创建一个新的
        result = new RealConnection(connectionPool, selectedRoute);
    }
  ...
}

These source codes are based on the okhttp3.13 version of the code, and these logics have been modified since version 3.14.

In the StreamAllocation class, the connection is finally obtained in the findConnection method. The existing connection is reused first, and the new connection is established only if it is not available. Get a reusable connection is in the ConnectionPool class;

/**
 * Manages reuse of HTTP and HTTP/2 connections for reduced network latency. HTTP requests that
 * share the same {@link Address} may share a {@link Connection}. This class implements the policy
 * of which connections to keep open for future use.
 */
public final class ConnectionPool {

  private final Runnable cleanupRunnable = () -> {
    while (true) {
      long waitNanos = cleanup(System.nanoTime());
      if (waitNanos == -1) return;
      if (waitNanos > 0) {
        long waitMillis = waitNanos / 1000000L;
        waitNanos -= (waitMillis * 1000000L);
        synchronized (ConnectionPool.this) {
          try {
            ConnectionPool.this.wait(waitMillis, (int) waitNanos);
          } catch (InterruptedException ignored) {
          }
        }
      }
    }
  };

  // 用一个队列保存当前的连接
  private final Deque<RealConnection> connections = new ArrayDeque<>();
  
  
  /**
   * Create a new connection pool with tuning parameters appropriate for a single-user application.
   * The tuning parameters in this pool are subject to change in future OkHttp releases. Currently
   * this pool holds up to 5 idle connections which will be evicted after 5 minutes of inactivity.
   */
  public ConnectionPool() {
    this(5, 5, TimeUnit.MINUTES);
  }

  public ConnectionPool(int maxIdleConnections, long keepAliveDuration, TimeUnit timeUnit) {
  ...
  }
  
  void acquire(Address address, StreamAllocation streamAllocation, @Nullable Route route) {
    assert (Thread.holdsLock(this));
    for (RealConnection connection : connections) {
      if (connection.isEligible(address, route)) {
        streamAllocation.acquire(connection, true);
        return;
      }
    }
  }

As can be seen from the above source code, ConnectionPool maintains a maximum of 5 idle connections by default, and each idle connection is automatically released after 5 minutes. If the number of connections exceeds the maximum number of 5, the oldest idle connection will be removed.

The final judgment of whether the idle connection matches is in the isEligible method of RealConnection;

/**
   * Returns true if this connection can carry a stream allocation to {@code address}. If non-null
   * {@code route} is the resolved route for a connection.
   */
  public boolean isEligible(Address address, @Nullable Route route) {
    // If this connection is not accepting new streams, we're done.
    if (allocations.size() >= allocationLimit || noNewStreams) return false;

    // If the non-host fields of the address don't overlap, we're done.
    if (!Internal.instance.equalsNonHost(this.route.address(), address)) return false;

    // If the host exactly matches, we're done: this connection can carry the address.
    if (address.url().host().equals(this.route().address().url().host())) {
      return true; // This connection is a perfect match.
    }

    // At this point we don't have a hostname match. But we still be able to carry the request if
    // our connection coalescing requirements are met. See also:
    // https://hpbn.co/optimizing-application-delivery/#eliminate-domain-sharding
    // https://daniel.haxx.se/blog/2016/08/18/http2-connection-coalescing/

    // 1. This connection must be HTTP/2.
    if (http2Connection == null) return false;

    // 2. The routes must share an IP address. This requires us to have a DNS address for both
    // hosts, which only happens after route planning. We can't coalesce connections that use a
    // proxy, since proxies don't tell us the origin server's IP address.
    if (route == null) return false;
    if (route.proxy().type() != Proxy.Type.DIRECT) return false;
    if (this.route.proxy().type() != Proxy.Type.DIRECT) return false;
    if (!this.route.socketAddress().equals(route.socketAddress())) return false;

    // 3. This connection's server certificate's must cover the new host.
    if (route.address().hostnameVerifier() != OkHostnameVerifier.INSTANCE) return false;
    if (!supportsUrl(address.url())) return false;

    // 4. Certificate pinning must match the host.
    try {
      address.certificatePinner().check(address.url().host(), handshake().peerCertificates());
    } catch (SSLPeerUnverifiedException e) {
      return false;
    }

    return true; // The caller's address can be carried by this connection.
  }

This piece of code is relatively straightforward, let's briefly explain the comparison conditions:

If the connection has reached the upper limit of the stream carried (that is, a connection can carry several requests, http1 is 1 by default, and http2 is the maximum value of Int by default), it does not meet;
If the attributes of the two Addresses other than Host do not match, it is not compatible (If the okhttpClient used for the two requests is different, some important attributes are overwritten, or the attributes such as the server port are different, then reuse is not allowed) ；
If the host is the same, it matches and returns true directly (other fields have been compared in the previous one);
If it is http2, it is judged that there is no proxy, the server IP is the same, the certificate is the same, etc., and it will return true if all are met;

Looking at it as a whole, the problem should be that the queue capacity of ConnectionPool is too small. The game center business is complex. After entering the homepage, many interface requests are triggered, which causes the connection pool to be directly occupied, so the pre-connections made on the startup page are released. Through debugging verification, when entering the details page, there is indeed no previous pre-connected connection in the ConnectionPool.

Five, optimization

In http1.1, browsers generally limit a domain name to reserve at most 5 idle connections. However, okhttp's connection pool does not distinguish between domain names, and only makes the default maximum 5 idle connections. If different functional modules in the APP involve multiple domain names, then the default 5 idle connections are definitely not enough. There are 2 modification ideas:

Rewrite ConnectionPool and change the connection pool to limit the number according to the domain name, which can solve the problem perfectly. However, OkHttp's ConnectionPool is of final type, and the logic inside cannot be directly rewritten. In addition, on different versions of OkHttp, the ConnectionPool logic is also different. If you consider the use of ASM and other bytecode writing technologies in the compilation process, the cost is very high and the risk is very high. high.
Directly increase the number of connection pools and the timeout period. This is simple and effective. You can appropriately increase the maximum number of this connection pool according to your business situation. You can pass in this custom ConnectionPool object when building OkHttpClient.

We directly selected option 2.

6. Questions and Answers

1. How to confirm the maximum number of connection pools?

This number value has 2 parameters as a reference: the maximum number of simultaneous requests for the page, and the total number of domain names in the App. You can also simply set a large value, and then after entering the APP, click on each main page to see the number of connections remaining in the current ConnectionPool, and make appropriate adjustments.

the connection pool cause excessive memory usage?

After testing: Adjust the maximum value of connectionPool to 50. On one page, 13 domain name links are used and repeated 4 times in total, that is, after 52 requests are initiated at a time, the number of idle connections remaining in ConnectionPool is 22.5 on average, occupying memory of 97Kb, each additional connection in the ConnectionPool will occupy 4.3Kb of memory on average.

3. Will the server be affected if the connection pool is increased?

Not in theory. The connection is two-way, even if the client keeps the connection, the server will automatically close the connection according to the actual number and duration of the connection. For example, nginx, which is commonly used on the server side, can set the maximum number of connections reserved by itself, and the old connection will be automatically closed when the timeout expires. Therefore, if the maximum number of connections and the timeout period defined by the server are relatively small, our pre-connection may be invalid because the connection is closed by the server.

With Charles, you can see the effect of this connection being closed by the server: you can see the reuse information in Session Resumed in the TLS category.

In this case, the client will re-establish the connection, and there will be tcp and tls connection duration information.

4. Will pre-connection cause excessive server pressure?

Since entering the startup page initiates a network request for pre-connection, the number of interface requests increases, and the server will definitely have an impact. You need to determine whether to pre-connect according to your own business and server pressure.

5. How to maximize the pre-connection effect?

From the third question above, we can see that our effect is actually closely related to the server configuration. This problem involves server tuning.

If the server sets the connection timeout to a small value or closes it, it may need to re-establish the connection for each request. In this way, the server will consume a lot of resources due to the continuous creation and destruction of TCP connections when high concurrency, resulting in a lot of waste of resources.

If the server sets a large connection timeout, the number of concurrent server services will be affected because the connection has not been released for a long time. If the maximum number of connections is exceeded, new requests may fail.

It can be adjusted according to the average time it takes for client users to access the pre-connected interface. For example, the game center details page interface is pre-connected, so you can count how long the average user browses from the home page before entering the details page, and adjust it appropriately according to this length of time and server load.

Seven, reference materials

1. article 160d13d26baca5 HTTP/1HTTP/2HTTP/3

2. TLS1.3VSTLS1.2, let you understand the power of

3.https://www.cnblogs.com/xiaolincoding/p/12732052.html

Author: vivo internet client team-Cao Junlin

Research on the optimization mechanism of Android client network pre-connection

1. Background

Two, connection reuse

Three, pre-connection realization

Fourth, source code analysis

Five, optimization

6. Questions and Answers

Seven, reference materials

vivo互联网技术

引用和评论

vivo Pulsar 万亿级消息处理实践(2)-从0到1建设 Pulsar 指标监控链路

腾讯 tRPC-Go 教学——（1）搭建服务

@tanstack/react-query 实践

腾讯 tRPC-Go 教学——（2）trpc HTTP 能力

腾讯 tRPC-Go 教学——（4）tRPC 组件生态和使用

腾讯 tRPC-Go 教学——（3）微服务间调用

腾讯 tRPC-Go 教学——（7）服务配置和指标上报