2

1 Introduction

It is our goal to provide users with a smoother app experience. As an e-commerce app, community + transaction-related businesses rely heavily on static resources such as pictures, videos, and files. Since these static resources are deployed on the CDN, they are collectively referred to herein as "CDN resources".

Although deploying to the CDN service improves the request performance of CDN resources, it can only be regarded as the initial stage. There are still experience problems such as slow resource loading and freezes on the App side. There is still a certain distance from expectations and further optimization is required. Therefore, in the second half of 2021, we will focus on optimizing the network request performance of CDN resources, and have achieved obvious results. This article will make a phased summary here.

2. Content introduction

This article mainly introduces the optimization idea and optimization practice experience of reducing the average time spent on the iOS side by 18%+ and the Android side by 10%+ on the CDN resource request of Dewu. As a whole, the process of monitoring data analysis, optimization direction research, optimization scheme design, promotion of optimization implementation, and optimization effect feedback will be carried out to achieve the effect of gradual optimization. This article mainly focuses on four optimization directions, including: CDN deployment adjustment, TLS1.3 upgrade, OCSP Stapling enabled, and Http2.0 upgrade . The following will introduce in detail how each optimization direction is determined, the optimization scheme, and the optimization effect.

3. Optimization case

3.1 CDN Deployment Adjustment

3.1.1 What is CDN service?

Some students may not be familiar with CDN. Let's briefly introduce the concept and principle of CDN. CDN (Content Delivery Network) refers to a content delivery network, a distributed network composed of edge server nodes distributed in different regions, responsible for caching static resources of the origin site and distributing them to users nearby, so as to reduce the time-consuming of static resource requests and reduce the origin site server. purpose of stress.

Advantages of CDN services: nearby access, resource caching, reduction of back-to-source, intelligent routing, efficient transmission, intelligent compression, security, and high performance (excerpted from Alibaba Cloud CDN documentation). Simply put, it is the three major features of nearby access, resource caching, and efficient back-to-source .

After the static resources are deployed to the CDN service, the access link changes from the request source station to the nearest CDN edge node, which greatly shortens the network access link and reduces the time-consuming of CDN resource requests. For example, Guangdong users originally needed to visit the Hangzhou origin site to request CDN resources, but now they can directly access CDN edge nodes in Guangdong Province to obtain CDN resources.

If the CDN edge node has the currently requested CDN resource cache , it will directly return the cache to the client; if there is no cache, efficient back-to-source is achieved through the internal strategy of the CDN service system. After the resource is acquired, it will be returned to the CDN edge node and cached, and then returned to the client.

The following figure can intuitively show the changes in user requests for static resources before and after deployment to the CDN service

(Note: The picture comes from the Internet)

3.1.2 Analyze the time-consuming monitoring data of CDN resource requests from the geographical dimension

After understanding the principle of CDN, we know that the effect of CDN service is strongly related to the region . For users in different regions, whether CDN server nodes are deployed in the region, and how many CDN server nodes are deployed to carry traffic directly affect the time-consuming of CDN resource requests by users in the region. Therefore, consider analyzing the monitoring data of CDN resource requests from the perspective of CDN deployment by region.

We focus on the analysis of provinces with an average time of more than 500ms. Among them, Guangdong Province is a first-tier province, and its network infrastructure should be relatively complete. Why is the average time-consuming more than 500ms?

Preliminary speculation: It may be that the deployment of CDN server nodes is unreasonable, not deployed in Guangdong Province or less deployed in Guangdong Province, Guangdong users cannot access CDN services nearby, and cross-domain access results in a long time for CDN resource requests.

3.1.3 Research on Alibaba Cloud CDN Deployment

We fed this speculation back to Alibaba Cloud customer service to verify the distribution of CDN edge nodes serving Dewu in various provinces and the deployment of CDN edge nodes in Guangdong Province.

Investigation results: Guangdong and Beijing are not deployed, and Hunan, Sichuan, Jiangsu, and Jilin are deployed with fewer nodes.

Survey conclusion: Users in these regions have cross-domain requests for CDN resources, resulting in a long average request time.

3.1.4 CDN Deployment Optimization Solution

After the cause of the problem is located, the solution is relatively simple, and the CDN edge node deployment can be adjusted reasonably:

1. Added Guangdong and Beijing provincial nodes;

2. Added Hunan, Sichuan, Jilin, and Jiangsu provincial nodes to replace redundant nodes in other provinces.

3.1.5 CDN deployment optimization effect

CDN domain name cdn.poizon.com Before and after optimization data:

average time

  1. iOS: 429ms -> 331ms, down 98ms

  1. Android: 386ms -> 348ms, down 38ms

3.2 TLS1.3 upgrade

A complete Https network request includes 7 stages: request preparation, DNS resolution, TCP connection establishment, SSL handshake, Request stage, server processing stage, Response stage

In order to analyze the performance data of CDN resource requests in more detail, we sample and report the time spent in each stage of each request on the App side. The CDN network monitoring platform provides aggregated and quantified performance indicators by network request stages. Therefore, analyze the monitoring data of each stage of the network request to see if there are stages that can be optimized.

3.2.1 Analyze the time-consuming monitoring data of CDN resource requests from the dimension of the network request stage

The time-consuming details of the network request phase are as follows

As can be seen from the figure, the SSL stage takes an average of 127ms+, accounting for more than 25% of the cumulative time . The time-consuming of the SSL phase has a great impact on the overall time-consuming, so consider optimizing the performance of the SSL phase.

For Https requests, the SSL phase mainly performs key negotiation to ensure encrypted data transmission. Currently, Dewu CDN resource requests use the TLS1.2 protocol, and the latest TLS1.3 protocol is relatively mature. Can the performance optimization of the SSL stage be achieved by upgrading the TLS1.3 protocol?

3.2.2 TLS1.3 Protocol Research

What are the advantages of the TLS1.3 protocol? The difference with the TLS1.2 protocol?

TLS (Transport Layer Security Protocol) is a transport layer security protocol. The TLS1.2 protocol was released in 2008 and the TLS1.3 protocol was released in 2018.

The TLS1.2 protocol is currently the most widely used, and some shortcomings have been found in the more than 13 years since its release in 2008:

1. Poor performance: 2 RTTs are required for the handshake process;

2. Low security: use insecure encryption algorithms, such as SHA1, RC4, CBC and other encryption algorithms.

The TLS1.3 protocol is based on TLS1.2 and has undergone a number of optimizations, including:

1. Performance optimization: a new key negotiation mechanism PSK is introduced, and the handshake process only needs 1RTT, which is 50%+ lower than the TLS1.2 protocol;

2. Security improvement: Many insecure and old encryption algorithms in the TLS1.2 protocol are abandoned, DSA certificates are no longer used, and the handshake information after ServerHello is encrypted.

A survey on the use of TLS1.3 by friends

Investigate the TLS protocol of CDN resource requests of friends, and the TLS1.3 protocol used by the friends' App, verifying the feasibility of the TLS1.3 protocol.

Load the picture of the friend through the Chrome browser, and you can see that the TLS1.3 protocol is used in the Security panel of the DevTools window.

Client-side double-ended TLS1.3 compatibility survey

Both mainstream models already support TLS1.3, and models that do not support TLS1.3 are also compatible with Alibaba Cloud CDN service, which will automatically match and use the corresponding TLS protocol according to the TLS protocol version used by the client.

Offline test: Use the Debug package to request CDN resources using the TLS1.3 protocol and the TLS1.2 protocol in the offline environment, and all requests are normal.

3.2.3 TLS1.3 upgrade optimization scheme

Compared with TLS1.2, TLS1.3 can reduce the time consumption of the SSL stage by 50%+. Therefore, the time-consuming optimization of the SSL stage of CDN resource requests is realized by upgrading the TLS1.3 protocol. For the detailed optimization scheme, please refer to the previous article. Article "Dewu Network Optimization - TLS1.3 Upgrade Best Practices" https://mp.weixin.qq.com/s/C0dfQ52bWNOAWLkSC1f-4w

3.2.4 TLS1.3 upgrade optimization effect

After CDN cdn.poizon.com upgraded TLS 1.3, the average time consumption of both ends and TLS time were significantly reduced. The data before and after optimization:

average time

  1. iOS: 281ms -> 237ms, down 44ms
  2. Android: 307ms -> 269ms, down 38ms

Time-consuming SSL phase

  1. iOS: 210ms -> 137ms, down 73ms
  2. Android: 83ms -> 71ms, down 12ms

3.3 Enable OCSP Stapling

After the TLS1.3 upgrade, the performance of CDN resource requests has improved a lot, but there is still room for optimization in the time-consuming of the double-ended SSL stage. Therefore, we continue to investigate the optimization scheme for the SSL stage. After investigation, it is found that the SSL certificate exchange needs to use the OCSP protocol to verify the validity of the SSL certificate. So what is the OCSP protocol? Will the OCSP protocol affect the performance of the SSL stage?

3.3.1 What is the OCSP protocol?

OCSP (Online Certificate Status Protocol) is an online certificate status protocol used to verify the validity of an SSL certificate and ensure that the SSL certificate has not been revoked or expired. The CA server provides an interface to query the certificate status online. The client can initiate a certificate status query request to the CA server in real time during the SSL phase, and the CA server will reply with the certificate status information (such as "valid", "expired", etc.).

Since the client is blocked before waiting for the query result, the length of the OCSP query process affects the time spent in the SSL phase.

A request process for executing the OCSP protocol is as follows

As can be seen from the figure, the client will have one more OCSP query process. In order to solve the impact of the OCSP protocol on performance, the OCSP Stapling protocol came into being. The following introduces the OCSP Stapling protocol.

3.3.2 What is the OCSP Stapling protocol? What optimizations have been made compared to OCSP?

OCSP Stapling migrates the query process of the certificate status from the client to the server. The server executes the OCSP protocol at low frequency to request the CA server to query the certificate status and cache the query results. The server sends the certificate query results when the client requests the SSL stage. returned to the client.

A request process to execute the OCSP Stapling protocol is as follows

As can be seen from the figure, after executing the OCSP Stapling protocol, the client can save an OCSP query process.

3.3.3 OCSP Stapling opens optimization research

A survey on the use of OCSP Stapling by friends

Investigate the usage of OCSP Stapling by the CDN resource request of the friend, and the CDN service of the friend has enabled OCSP Stapling, which verifies the feasibility of OCSP Stapling.

The verification method for whether OCSP Stapling is valid for the CDN service of a friend:

Step 1: To query the IP of the CDN domain name (eg: cdn.xxx.com) of a friend, you can use the dig command

dig cdn.xxx.com

Step 2: Use the openssl command to view the OCSP Stapling status of the partner CDN domain name

openssl s_client -connect ip found in step 1: 443 -servername cdn.xxx.com -status

Result 1: The effective picture of OCSP Stapling is as follows:

You can see OCSP Response Status: successful (0x0), which means OCSP Stapling has taken effect

Result 2: OCSP Stapling does not take effect The picture is as follows:

You can see OCSP Response: no response sent, which means that OCSP Stapling does not take effect

Compatibility survey of client-side OCSP Stapling

iOS: The system supports OCSP Stapling by default

Android: OCSP process is not supported yet

Alibaba Cloud CDN compatibility:

It has been confirmed with Alibaba Cloud customer service that the Alibaba Cloud CDN service supports the OCSP Stapling function. If the client system supports the OCSP Stapling function, the OCSP Stapling function will take effect after enabling it; if the client system does not support it, the OCSP method is still supported for normal requests.

Offline test: Use the Debug package to enable OCSP Stapling and disable OCSP Stapling to request CDN resources in the offline environment. All requests are normal. Mainly cover these aspects:

1. Repeated verification: including repeated opening/closing, cold/warm startup of the app, front-end and back-end operations of the app, etc.;

2. Return of the main business link: covering core pages such as community homepage, community details page, video, transaction homepage, product details page, and order details page;

3. Compatibility test: covering the latest version of the App, mainstream models, mainstream systems, etc.

3.3.4 OCSP Stapling to open the optimization scheme

By operating the Alibaba Cloud CDN console and enabling the OCSP Stapling function, the time consumption of the SSL phase can be further optimized.

Change execution: Start OCSP Stapling on Alibaba Cloud CDN console at 2:00 a.m.

Verification scheme: As described above, use the openssl command to check whether the OCSP Stapling status of the CDN domain name cdn.poizon.com takes effect

Monitoring program:

1. Network monitoring platform: monitor whether there are abnormal fluctuations in indicators such as request exception rate and request time;

2. Alibaba Cloud CDN-real-time monitoring: Whether there are abnormal fluctuations in the minute-level 2xx, 3xx, 4xx, and 5xx status codes.

3.3.5 OCSP Stapling enables optimization effect

After CDN cdn.poizon.com enables Alibaba Cloud OCSP Stapling, the time consumption on the iOS side is significantly reduced, and the Android side has not changed much (Android: OCSP process is not supported at the moment, so OCSP Stapling optimization has no effect on Android for the time being). Data before and after optimization:

Average time (connection only)

iOS: 565ms -> 484ms, down 81ms

Android: 401ms -> 360ms, down 41ms

Time-consuming SSL phase

iOS: 97ms -> 87ms, down 10ms

Android: 79ms -> 70ms, down 9ms

3.4 HTTP2.0 upgrade

3.4.1 Monitoring data analysis from the dimension of Http protocol version

Looking at the single-day monitoring data of the CDN domain name cdn.poizon.com 2021.11.30, it is found that there are two protocol versions of Http2.0 and Http1.1 requests at both ends. Therefore, the monitoring data is analyzed from the dimension of some Http versions.

Check the ratio of double-ended Http2.0 and Http1.1

iOS: Http2.0 accounts for 51.98%, Http1.1 accounts for 48.01%

Android: Http2.0 accounts for 76.46%, Http1.1 accounts for 23.53%

It can be seen that more than 20% of Http1.1 traffic exists on both ends.

Check the respective TCP multiplexing rates of double-ended Http2.0 and Http1.1

iOS: Http2.0 TCP reuse rate of 9217%, Http1.1 TCP reuse rate of 396%

Android: Http2.0 TCP reuse rate of 2397%, Http1.1 TCP reuse rate of 681%

It is found that the TCP multiplexing rate of double-ended Http2.0 is several times higher than that of Http1.1, and the iOS side is almost an order of magnitude higher .

Note: TCP multiplexing rate = TCP connection multiplexing times / TCP connection establishment successful times

Since TCP needs to go through the time-consuming stages of DNS resolution, TCP three-way handshake, and SSL four-way handshake every time when TCP reconnects Requests take less time. That is to say, the higher the TCP reuse rate, the shorter the average time-consuming of CDN resource requests.

Speaking of this, many students may have raised questions, why can the TCP multiplexing rate of Http2.0 reach such a high level? Below we briefly introduce the Http2.0 protocol.

3.4.2 What is the Http2.0 protocol

What are the advantages of Http2.0? What is the difference compared to Http1.1?

The Http1.1 protocol is still the most widely used Http protocol, but its pain points are well known, including:

1. Request congestion: each concurrent request requires a TCP connection, and a maximum of 6, if more than that, there will be congestion waiting;

2. Header redundancy: header compression is not supported, and headers are repeatedly transmitted for each request, causing wasted bandwidth and affecting performance;

3. One-way transmission: only supports the client to send to the server, does not support the server to actively send to the client;

4. Clear text transmission: support Http request, data transmission in plain text, there are security risks;

5. Priority is limited: Only serial transmission is supported on the same TCP connection, and high-priority requests are not supported to be sent first, which affects performance.

The Http2.0 protocol is an extension and optimization of the http1.1 protocol. It introduces the concept of frame, adds a binary framing layer between the application layer and the transport layer, and proposes several new features:

1. Multiplexing: To solve the pain point of Http1.1 request congestion, multiple concurrent requests support transmission on the same TCP connection, each request will be split into frames, and the frame header information will be stored corresponding to After the request information is transmitted to the peer end in binary form, the request data will be reorganized according to the frame header information;

2. Header compression: To solve the pain point of Http1.1 header redundancy, on the one hand, the HPACK algorithm is used to compress the header to reduce the transmission volume; The corresponding key needs to be carried to avoid repeated transmission;

3. Server push: Solve the pain point of Http1.1 one-way transmission, allowing the server to actively push data to the client. For example, when an html request is made, the server will actively push the relevant css and js files to the client to avoid too many clients. requests, reducing the overall RT time;

4. Binary transmission: To solve the pain point of Http1.1 plaintext transmission, the data is encapsulated into frames, and the frames are then transmitted in binary form;

5. Support priority: To solve the pain point of limited priority of Http1.1, the new flow uses the message frame to set the priority, the created flow uses the priority frame type to set the priority, and the priority is used in the case of limited resources Select Stream for transmission;

A survey on the use of Http2.0 by friends and businessmen

Investigate the Http protocol version of the CDN resource requests of friends, and the Http2.0 protocol used by the friends' apps, which verifies the feasibility of the Http2.0 protocol.

Similar to the TLS1.3 protocol friend research method, load the friend merchant's picture through the Chrome browser, and you can see that the h2 protocol is used in the Network panel of the DevTools window.

3.4.3 Http2.0 upgrade optimization scheme

After analyzing the IP of the CDN edge node requested by the CDN domain name cdn.poizon.com Http1.1 protocol version, it was found that the non-Alibaba Cloud CDN, after confirming with the operation and maintenance classmates, learned that the CDN domain name cdn.poizon.com cut 40% of the traffic in July Niuyun CDN uses Http1.1 protocol.

You can upgrade this part of Qiniu Cloud's traffic to Http2.0 to improve the performance of CDN resource requests, and perform online changes to switch the traffic back to Alibaba Cloud CDN (with Http2.0 protocol enabled).

3.4.4 Http2.0 upgrade optimization effect

After the traffic of the CDN domain name cdn.poizon.com was switched back to Alibaba Cloud, the https1.1 traffic has been converted to https2.0, and the https2.0 traffic accounted for 95%+ of both ends. The data before and after optimization:

average time

  1. iOS 248ms -> 221ms, down 27ms
  2. Android 350ms -> 329ms, down 21ms

TCP multiplexing rate

  1. iOS: 841% -> 5127%, up 4286%

  1. Android: 1441% -> 2160%, up 719%

4. Summary

The above four optimization directions can allow you to obtain a relatively significant performance improvement of CDN resource requests with less human input. However, because online changes are involved, you should pay attention to offline regression testing, online changes, verification, monitoring, and restoration. Program.

Finally, I would like to thank the operation and maintenance, testing and related participating students for their assistance in the process of CDN resource request optimization.

Text/Aix

Pay attention to Dewu Technology and be the most fashionable technical person!


得物技术
846 声望1.5k 粉丝