Function computing GB mirroring starts in seconds: the next generation of software and hardware architecture collaborative optimization reveals the secret

Author:
review & proofreading: Chang Shuai, Wang Chen
Editing & Typesetting:

background

In August 2020, Function Computing innovatively provided a function deployment method for container images. AWS Lambda Re-Invent in December 2020, and other domestic FaaS providers also announced the heavy function of FaaS to support containers in June 2021. Cold start has always been a pain point of FaaS. After the introduction of container images that are dozens of times larger than the code compression package, the deterioration of cold start has become the biggest concern of developers.

In the design stage of supporting container mirroring, function computing decided to let developers experience using mirroring like using code packages (second-level elasticity). It must be easy to use and maintain the extreme flexibility of FaaS itself, so as to avoid users' entanglement and entanglement. Choice. The ideal user experience is that the function call hardly feels the extra delay caused by the remote transmission of mirrored data.

Optimizing the image to accelerate cold start can be roughly divided into two methods: reducing absolute delay and reducing the probability of cold start. Since the launch of container images, we have adopted image acceleration technology to reduce absolute latency in stages. On this basis, this article introduces the use of function computing next-generation IaaS base Shenlong bare metal and secure container to further reduce the absolute delay and greatly reduce the frequency of cold start.

Optimization process

(Take a mirror image as an example)

First-generation architecture: ECS virtual machine

The first stage (March 2021): Load on demand, reduce data transmission

The problem in the past was that the entire internal data of the mirror was pulled before starting the mirror, which caused the useless mirror data to be completely downloaded and took up too much preparation time. So our initial optimization direction is to ignore useless mirror data as much as possible to achieve on-demand loading. To this end, we use the mirror acceleration technology to omit the time to pull useless data, and realize the relevant technical details of the function computing custom mirror cold start from the minute level to the second level.

The second stage (June 2021): Record the container instance startup I/O trajectory, and prefetch the image data in advance during subsequent instance startup

We found that the I/O data access pattern of the function instance is highly consistent during the container startup and initialization phases. According to the characteristics of the FaaS platform based on the application operation mode scheduling resources, we recorded the desensitization data of the I/O trajectory when the function instance was started for the first time. When the subsequent instance started, the trajectory data was used as a reminder, and the mirrored data was prefetched to the local in advance. , Further reducing the cold start delay.

Although the above two acceleration optimizations greatly reduce the absolute delay of cold start, because the traditional ECS VM will be recycled after being idle for a period of time, the cold start will be triggered again when a new machine is restarted. Therefore, how to reduce the frequency of cold start has become one of the key problems to be tackled in the next stage.

Next-generation architecture: flexible bare metal server (Shenlong) + microVM

When designing the next-generation architecture, we not only considered solving the problem of cold start frequency, but also paid attention to the impact of cache on start-up delay. So we innovatively invented Serverless Caching, which builds a data-driven, intelligent and efficient caching system based on the characteristics of different storage services, realizes the collaborative optimization of software and hardware, and further enhances the Custom Container experience. The replacement time of the function calculation background Dragon is much longer than the idle recovery time of the ECS VM. For the user side, the hot start frequency is greatly increased. After a cold start, the cache will continue to remain on the Dragon machine, and the cache hit rate can reach more than 90% .

Compared with ECS virtual machine, the architecture of Shenlong bare metal and micro virtual machine brings more optimization space for image acceleration:

Reduce back-to-source bandwidth pressure and reduce duplicate data storage. Compared with ECS VM, when thousands of instances are started at the same time, the read amplification of the mirror warehouse and the write amplification of the disk storage space are reduced by at least two orders of magnitude.
Security isolation at the virtual machine level enables functional computing components to safely form an availability zone-level cache network, and the transmission speed is even better than that of cloud disks.

Function calculation Custom Container login to Shenlong also improves resource utilization and reduces costs, which is a win-win for users and server maintenance.

Serverless Caching architecture can provide more optimization potential without increasing the cost of resource usage.

(L1~L4 are different levels of cache, the distance and delay are from small to large)

horizontal contrast

So far, we have optimized the image acceleration to a higher level. We have selected 4 typical images in the public use cases of function computing and adapted them to several large cloud vendors (names replaced by vendors A and B) at home and abroad for horizontal comparison. The above images are called every 3 hours. Repeated several times, we got the following results:

1. AI online reasoning-cat and dog recognition

This image contains image recognition applications based on the TensorFlow deep learning framework. Both Alibaba Cloud Function Computing and Vendor A can run normally, but Vendor A has poor performance. Vendor B cannot operate normally. In the figure below, the delay data of Alibaba Cloud Function Computing and vendor A includes the end-to-end delay of mirror pull, container startup, and execution of inference calculations, while vendor B's data is only the delay of pulling the mirror part, which is already the slowest . FC is relatively stable, and it can be seen that function calculation has a greater advantage in CPU consuming types such as AI reasoning.

takes cloud disk hot start as a benchmark (gray), and compares the additional costs of various vendors (color)

2、Python Flask Web Service

This image is a common network service, and Python is used internally with the Flask service framework. The purpose of this image is to test whether different cloud products have the ability to complete efficient on-demand loading. FC and Vendor A both fluctuate, but the latter fluctuates the most.

takes cloud disk hot start as a benchmark (gray), and compares the additional costs of various vendors (color)

3. Python machine learning operations

The Python operating environment is also in the mirror. It can be seen that each manufacturer still maintains its own characteristics. Manufacturer B downloads the full amount, and manufacturer A requests some optimization but is unstable.

takes cloud disk hot start as a benchmark (gray), and compares the additional costs of various vendors (color)

4、Cypress Headless Chrome

This image contains the headless browser test process. Vendor A cannot run due to programming model limitations and incompatible operating environment. The manufacturer B is too slow and can only complete the application initialization in 71.1 seconds within the specified time. It is not difficult to see that function computing still has a good performance in the mirroring of heavy I/O.

Take the cloud disk hot start as the benchmark (gray), compare the additional cost of each manufacturer (color), the green part is the end-to-end time consuming better than the baseline 04

Recommended best practices

Supporting container technology is an essential feature of FaaS. Containers increase portability and delivery agility, while cloud services reduce operation and maintenance and idle costs, and provide flexible expansion and contraction capabilities. The combination of custom mirroring and function computing most directly solves the problems that users bring about custom transplantation of large-capacity business logic for cloud vendors.

FaaS needs to eliminate additional overhead as much as possible when running containers, so that the user experience is similar to the local running scenario. Stable and fast operation is also the standard of excellent FaaS. FC provides image loading optimization while greatly reducing the frequency of cold start and providing a guarantee for stable and fast operation. Not only that, it is more necessary to be smooth in terms of application portability, and to minimize the user's threshold while not restricting the development model. Function calculation custom image supports standard HTTP service, freely configures the available ports, can be read and written at the same time, provides multiple tool chains and diversified deployment schemes, there is no mandatory waiting time for the mirror preparation to be completed, comes with HTTP trigger and does not rely on Other cloud services support a series of high-quality solutions such as custom domain names.

Function calculation custom mirroring is applicable but not limited to artificial intelligence reasoning, big data analysis, game settlement, online course education, audio and video processing, etc. It is recommended to use the ACR EE instance of Alibaba Cloud Container Image Service Enterprise Edition, which has its own image acceleration function, eliminating the need to manually enable accelerated pull and accelerated image preparation steps when using ACR image.

AI/ML online reasoning

Inference calculations rely on a large-volume underlying training framework and a large amount of data processing. Common AI frameworks such as Tensorflow's mirroring can easily reach the gigabyte level. The CPU requirements are already high, and it is even more challenging to meet the expansion and contraction. The custom image of function computing can well solve such needs. Users only need to directly use the underlying training framework image and package it with the data processing logic into the new image to easily save the migration overhead caused by changing the operating environment. At the same time, It can also meet the rapid training results brought by elastic expansion and contraction. Song preference reasoning, image AI recognition analysis, etc. can be seamlessly connected with function calculations to achieve flexibility to meet a large number of dynamic online reasoning requests.

Lightweight and flexible ETL

Services rely on data, and data processing often requires a lot of resources to satisfy efficient and fast data change requests. Custom mirroring, like other function computing runtimes, can satisfy the security isolation during data processing, while retaining the user's convenient ability to freely package the business logic of the data processing part into a mirror. While providing smooth migration, it also meets the extremely low extra delay of mirroring startup, and meets the user's safe, efficient, and flexible data processing needs for application scenarios such as database management and the Internet of Things.

Game battle settlement

In various games, daily tasks and other scenes are usually set up in a short time to gather a large number of players and also require data processing such as battle settlement. In order to prevent game players from losing patience, battle data verification usually needs to be completed in just a few seconds. The player's data settlement unit time cannot deteriorate as the number of players grows. The business logic of this type of data processing is usually complicated and highly repetitive. Packing the player data processing logic into a function calculation custom mirror can flexibly satisfy a large number of similar player settlement requests in a short time.

Future Planning

The original intention of optimizing function calculation for custom images is to make users not feel the extra delay caused by container image transmission, and give cloud native developers the most extreme experience. Optimization will not stop. Our ultimate goal is to almost eliminate the extra overhead of container image pull and the image warehouse becomes a bottleneck during massive expansion, and it scales rapidly. While further improving Serverless Caching, the Custom Container function will help Web applications on Kubernetes in the future. Job-like workloads run seamlessly in function computing. Kubernetes is responsible for handling resident and stable workloads. Serverless services will gradually become cloud-native best practices for sharing volatile computing.

1) Public use cases for function computing

https://github.com/awesome-fc

2) Community official website

http://www.serverless-devs.com/

3) Project warehouse

https://github.com/Serverless-Devs/Serverless-Devs

4) Serverless Desktop desktop client

https://serverlessdevs.resume.net.cn/zh-cn/desktop/index.html

5) Serverless application developer kit

http://serverless-dk.oss.devsapp.net/docs/tutorial-dk/intro/react

6）Serverless Devs CLI

https://serverlessdevs.resume.net.cn/zh-cn/cli/index.html

7) Serverless Hub Application Center

https://serverlesshub.resume.net.cn/#/hubs/special-view

Stamp here for more function calculation details!

Function computing GB mirroring starts in seconds: the next generation of software and hardware architecture collaborative optimization reveals the secret

background

Optimization process

First-generation architecture: ECS virtual machine

The first stage (March 2021): Load on demand, reduce data transmission

The second stage (June 2021): Record the container instance startup I/O trajectory, and prefetch the image data in advance during subsequent instance startup

Next-generation architecture: flexible bare metal server (Shenlong) + microVM

horizontal contrast

1. AI online reasoning-cat and dog recognition

2、Python Flask Web Service

3. Python machine learning operations

4、Cypress Headless Chrome

Recommended best practices

AI/ML online reasoning

Lightweight and flexible ETL

Game battle settlement

Future Planning

阿里云云原生

引用和评论

用通义灵码，从 0 开始打造一个完整APP，无需编程经验就可以完成

Kubernetes CNI 网络模型概览：VETH & Bridge / Overlay / BGP

通义灵码使用安装教程，3分钟快速上手体验

2024年9月中国数据库流行度排行榜：TiDB重回前三，GoldenDB问鼎前五

使用 Prometheus 监控 SAP ABAP 应用程序

SkyWalking链路追踪上下文TraceContext的traceId生成的实现原理剖析

2024年9月中国数据库排行榜：openGauss系多点开花，根社区优势明显