Taking the first step, 7 major technological innovations and breakthroughs, Alibaba Cloud has solved these problems in the serverless field

Introduction to function calculation FC's first GPU instance, the industry's first instance-level observability and debugging, the first to provide end-cloud joint coordination and multi-environment deployment capabilities, GB-level image startup time is optimized to the second level, VPC network connection is optimized to 200ms, Serverless The application engine SAE supports seamless migration of microservice frameworks, no need for containerized transformation, and the industry's first hybrid elastic strategy. These innovations and breakthroughs have solved the technical problems in the serverless field and completely surmounted the stumbling block that affects the further implementation of serverless in the core production scenarios of the enterprise .

"Even if cloud computing has emerged, everyone's work still revolves around servers. However, this will not last long. Cloud applications are moving towards serverless."

This is Ken Form's view on the future of cloud computing in a 2012 article "Why The Future of Software and Apps is Serverless".

Serverless First: From cloud vendor proposition to customer proposition

Serverless's inherent flexibility and fault tolerance are well suited to the dual demands of flexibility and stability of enterprise online business, and become a new direction for the evolution of enterprise cloud architecture.

Today, as more and more large and medium-sized enterprises have stripped out the execution units that have flexible requirements for expansion in the traditional back-end field, they are running on the Serverless architecture, and the entrepreneurial team that pays more attention to the efficiency of R&D and delivery has made all the business Serverless. The concept of Serverless First has become more popular, making more and more cloud workloads run on serverless.

The changes in numbers represent the market maturity of the technology.

According to a report by Datadog this year, half of AWS customers on Datadog use Lambda, and 80% of AWS container customers use Lambda, and these users call functions 3.5 times a day than two years ago, and run for up to 900 hours/ sky. Looking at the domestic market again, according to the "2020 China Cloud Native Survey Report" released by CNCF this year, 31% of enterprises are using serverless technology in production, 41% are evaluating model selection, and 12% plan to use it in the next 12 months.

On October 21st, at the Cloud Native Summit of the Yunqi Conference, Alibaba Cloud Serverless released a series of technological breakthroughs, focusing on solving the difficulties and pain points faced by the industry. Following this is the large-scale practice of serverless by major companies. For example, NetEase Cloud Music uses Serverless technology to build an offline audio and video processing platform, and Pumpkin Movie is fully serverless for 7 days. Based on this, it has established business monitoring and release. And resilient system.

From First to Faster, FC 7 major technological breakthroughs, surmounting the stumbling block affecting the development of Serverless

The essence of Serverless is to realize the concentration and freedom of business layer development by shielding the underlying computing resources. But the more abstract it is, the more complicated the implementation of cloud vendors at the bottom. Function computing further splits the service into the granularity of the function, which will inevitably bring new challenges to development, operation and maintenance, and delivery, such as how to perform terminal-cloud joint debugging of functions, how to observe and debug functions, and how to Optimize the cold start of GB-level mirroring? In the past, the granularity of services was not a problem, and it became a stumbling block to the large-scale implementation of serverless in the core production business of enterprises.

Since entering the Forrester Leaders Quadrant last year, the Alibaba Cloud functional computing team has continued to overcome these technical problems in the industry and released 7 major technological innovations and breakthroughs at the Yunqi Conference.

1) Serverless Devs 2.0: Desktop is the first in the industry, supporting device-cloud joint debugging and multi-environment deployment

Open source for nearly a year, Serverless Devs 2.0 version of the Serverless developer platform was officially released. Compared with 1.0, 2.0 has achieved all-round improvements in performance and user experience. The industry's first desktop client, Serverless Desktop, has finely designed the desktop client with both aesthetics and pragmatism, and has stronger enterprise-level service capabilities.

As the industry's first cloud-native full lifecycle management platform that supports mainstream serverless services/frameworks, Serverless Devs is committed to creating a one-stop service for serverless application development for developers. Serverless Devs 2.0 proposes a multi-mode debugging solution, including connecting online and offline Environment; the terminal-cloud joint debugging program that connects the environment locally and debugs, the local debugging program that directly performs development debugging locally, and the online debugging/remote debugging program of cloud operation and maintenance debugging. The new version adds multi-environment deployment and deployment capabilities. Serverless Devs 2.0 already supports more than 30 one-click deployment frameworks, including Django, Express, Koa, Egg, Flask, Zblog, Wordpress, etc.

2) The industry's first instance level can be observed and debugged

The instance is the atomic unit that can be scheduled with the smallest function resource, analogous to the pod of the container. Serverless highly abstracts heterogeneous basic resources, so the "black box problem" is the core pain of the large-scale popularization of Serverless. Similar products in the industry have not revealed the concept of "instance", and have never revealed indicators such as CPU and memory in the observable function, but observability is the eyes of the developer. Without observability, how can we talk about high availability?

Function computing is heavily released for instance-level observability, real-time monitoring and performance data collection of function instances, and visual display, providing developers with an end-to-end monitoring and troubleshooting path for function instances. Through instance-level indicators, you can view core indicator information such as CPU and memory usage, instance network conditions, and the number of requests within an instance, so that the "black box" is not black. At the same time, function computing will open some instances to log in, so that it can be observed and debugged.

3) The industry's first instance reservation strategy with a fixed number, timing, and automatic water level scaling

The cold start of function computing is affected by many factors: code and image size, startup container, language runtime initialization, process initialization, execution logic, etc., which rely on the two-way optimization of users and cloud vendors. Cloud vendors will automatically allocate the most suitable number of instances for each function and perform cold start optimization on the platform side. However, some online business delays are very sensitive, and cloud vendors cannot substitute users for deeper business optimization, such as streamlining of codes or dependencies, selection of programming languages, process initialization, and algorithm optimization.

Similar products in the industry generally adopt a strategy of reserving a fixed number of instances, that is, allowing users to configure N concurrency values. Unless manually adjusted, they will not be extended or contracted after N instances are allocated. This solution only solves the cold start delay of some business peaks, but greatly increases operation and maintenance costs and resource costs. It is actually not friendly to businesses with irregular peaks and valleys such as red envelope promotions.

Therefore, Function Computing takes the lead in granting users the scheduling authority of some instance resources, allowing users to reserve appropriate function instances through multi-dimensional instance reservation strategies such as fixed number, timed scaling, water level scaling, and hybrid scaling to meet the business curve. Relatively stable (such as AI/ML scenarios), clear peak and valley time periods (such as gaming mutual entertainment, online education, new retail, etc.), unexpected traffic (such as e-commerce promotion, advertising, etc.), mixed services (Such as Web background, data processing, etc.) and other different scenarios, so as to reduce the impact of cold start on delay-sensitive services, and truly achieve the ultimate goal of flexibility and performance.

4) The industry takes the lead in launching GPU instances

Function computing provides two instance types: elastic instance and performance instance. Elastic instance specifications range from 128 MB to 3 GB. The isolation granularity is the smallest in the entire cloud ecosystem, and it can truly achieve 100% resource utilization in universal scenarios; performance instance specifications range The range includes 4 GB, 8 GB, 16 GB, and 32 GB. The upper limit of resources is higher, and it is mainly suitable for computing-intensive scenarios, such as audio and video processing, AI modeling, and enterprise-level Java applications.

With the vigorous development of hardware acceleration in the dedicated field, various GPU manufacturers have introduced dedicated ASICs for video encoding and decoding, such as: NVIDIA integrates dedicated video encoding circuits from the Kepler architecture, and integrates dedicated video decoding circuits from the Fermi architecture.

Function Computing officially launched a GPU instance based on the Turning architecture, allowing serverless developers to sink the video codec workload to GPU hardware acceleration, which greatly accelerates the efficiency of video production and video transcoding.

5) Up to 2w instances/minute can be delivered

The so-called "serverless" does not mean that software applications can run without a server, but that users do not need to care about the status, resources (such as CPU, memory, disk and network) and quantity of the underlying servers involved when the software application is running. . The computing resources required for the normal operation of software applications are dynamically provided by cloud computing vendors, but in fact, users still care about the resource delivery capabilities of cloud vendors and respond to access fluctuations caused by insufficient resources in burst traffic scenarios.

Function computing relies on Alibaba Cloud's powerful cloud infrastructure service capabilities. Through the dual pools of Shenlong bare metal resource pool and ECS resource pool, it can achieve a maximum delivery of 2w instances/minute during peak business periods, which further improves function computing. The ability to deliver on the customer's core business.

6) VPC network connection optimization: from 10s to 200ms

When the user needs to access the resources in the user's VPC in the function, such as RDS/NAS, the VPC network needs to be opened. FaaS products in the industry generally adopt the method of dynamically attaching ENIs to achieve VPC connection, that is, creating an ENI in the VPC and attaching it to the machine that executes the function in the VPC. This solution allows users to link back-end cloud services very simply, but the ENI mounting speed generally takes more than 10 seconds, which brings great performance overhead in delay-sensitive business scenarios.

Function computing realizes the decoupling of computing and network by servicing the VPC gateway, and the scaling of computing nodes is no longer limited by the ability of ENI mounting. In this solution, the gateway service is responsible for the mounting of ENI, the high availability and automatic scaling of gateway nodes, while the function computing focuses on the scheduling of computing nodes. When the VPC network is finally established, the function cold start time is reduced to 200 ms.

7) GB-level mirror startup: optimized from minute to second

Function Computing was the first to release the function deployment method of container mirroring in August 2020. AWS Lambda Re-Invent in December 2020, and domestic friends also announced the heavy function of FaaS to support containers in June 2021. Cold start has always been a pain point of FaaS. The introduction of container images that are dozens of times larger than the code compression package has increased the delay caused by the cold start process.

Function computing innovatively invented Serverless Caching. According to the characteristics of different storage services, it builds a data-driven, intelligent and efficient caching system, realizes the collaborative optimization of software and hardware, and further enhances the Custom Container experience. So far, function computing has optimized the image acceleration to a higher level. In the public use case of function computing ( https://github.com/awesome-fc ), we selected 4 typical images and adapted them to several large cloud vendors at home and abroad for horizontal comparison. Invoke the above-mentioned mirroring at an interval of 3 hours, and repeat it several times.

Experiments have proved that in the scenario of a GB-level mirroring cold start, functional computing has achieved a leap from minutes to seconds.

From dedicated to general, from complex to simple, SAE makes All on Serverless possible

If it is said that the problem of large-scale landing of FaaS in the core production business of enterprises needs to be solved through technical breakthroughs, SAE is the representative of Serverless PaaS, which places more breakthroughs in product ease of use and scene coverage.

1) From special purpose to general purpose, SAE is naturally suitable for large-scale implementation of enterprise core business

Different from Serverless in the form of FaaS, SAE is "application-centric" and provides application-oriented UI and API, maintaining the server and classic PaaS form of use experience, that is, the application is visible and tangible, avoiding FaaS The transformation of applications and the relatively weak observable and adjustable experience can achieve zero-code transformation and smooth migration of online applications.

SAE breaks the boundary of the implementation of Serverless, making Serverless no longer the special favorite of front-end full-stack and small programs. Back-end microservices, SaaS services, Internet of Things applications, etc. can also be built on Serverless, which is naturally suitable for the core business of enterprises. Large-scale landing. In addition, SAE supports the deployment of multi-language source code packages such as PHP and Python, supports multiple runtime environments and custom extensions, and truly allows Serverless to be dedicated to general purpose.

2) From complex to simple, SAE is naturally suitable for rapid containerization of enterprise applications

Traditional PaaS has been criticized as complicated to use, difficult to migrate, and troublesome to expand. The bottom layer of SAE transforms virtualization technology into container technology, making full use of container isolation technology to improve startup time and resource utilization, and realize rapid containerization of applications. In the application management layer, the original management paradigm of Spring Cloud/Dubbo and other microservice applications is retained, and there is no need to use large and complex K8s to manage applications.

In addition, after the underlying computing resources are pooled, its natural serverless attribute makes users no longer need to purchase and continue to maintain servers separately, but configure the required computing resources according to the amount of CPU and memory resources, plus the test of many years of Double 11 Advanced microservice governance capabilities allow containers + Serverless + PaaS to be combined into one, so that technological advancement, resource utilization optimization, and efficient development and operation experience can be integrated. Therefore, making the implementation of new technologies easier and more stable.

It can be said that SAE covers almost all scenarios of application cloud, which is not only the best choice for application cloud, but also a model of All on Serverless.

4 major changes, Serverless accelerates the innovation of enterprise modern application architecture

Technology leadership alone cannot promote the development of the industry. Serverless brings personal changes to corporate customers and developers. These two constitute the two-wheel drive of market maturity. Technology is evolving by itself, and customers are feeding back. This is the correct posture for any new technology to continue to develop.

1) Change one: server vs code

The full-stack engineer of the startup company: "My work is no longer centered around cold and boring servers. I bid farewell to the dilemma that the server processing time is longer than the time I wrote the code. It allows me to spend more time on my business. And use the most familiar code to ensure the stable operation of the application."

The daily life of a full-stack engineer who focuses on the front-end direction may be like this: master at least one front-end language such as Node.js or Puppeteer, write some API interfaces, modify some back-end bugs, and spend a lot of energy on the server In operation and maintenance, the greater the company's business volume, the more time it spends on operation and maintenance.

Function computing lowers the threshold of server maintenance for front-end languages such as Nodejs. As long as you can write JS code, you can maintain Node services without learning DevOps-related knowledge.

2) Change 2: Computing cluster vs computing resource pool

A Java engineer in the algorithm field: "I no longer worry about server specifications, complicated procurement, and difficult operation and maintenance caused by the increase in algorithms and complexity. Instead, I will use unlimited resource pools, fast cold starts, and reserved instances. Improve flexibility and freedom."

NetEase Cloud Music has implemented 60+ audio and video algorithms and 100+ business scenarios, using 1000+ cloud hosts and physical machines of different specifications. Although many ways have been adopted to simplify the docking of internal business scenarios and algorithms, more and more algorithms are mixed with inventory and incremental processing, the scale of business scenarios of different traffic, and different business scenarios may reuse the same type of algorithm. , Resulting in less and less time in business.

NetEase Cloud Music upgrades its offline audio and video processing platform based on functional computing, and applies it to business scenarios such as listening to music, karaoke, and recognizing music.

3) Change three: load vs scheduling

The main game program: "I no longer worry about the SLB polling mechanism that causes the inability to perceive the actual load of the Pod, and the resulting uneven load. The scheduling system of function calculation will arrange each request reasonably to ensure the performance in the battle verification scenario. High CPU consumption and high flexibility in processing requests."

Combat verification is a necessary business scenario for Lilith's many combat games. It is used to verify whether the battle uploaded by the player client has cheated. Battle verification generally needs to be calculated frame by frame, and the CPU consumption will be very high. Usually the battle of the 1 team v1 team takes n milliseconds, and the battle of the 5 team v5 team needs corresponding 5n milliseconds, which requires high flexibility. In addition, the SLB mounted under the container architecture will not be able to sense the actual load of the Pod due to the polling mechanism, resulting in uneven load, resulting in infinite loops and stability risks.

The scheduling system of function calculation helps Lilith to arrange each request rationally. For the problem of infinite loop, it also provides a mechanism for killing the process over time. Function computing has sunk the complexity of the scheduling system to the infrastructure, and the cold start delay after the manufacturer's deep optimization has dropped significantly. From scheduling to obtaining computing resources to service startup, it is basically about 1s+.

4) Change 4: Scripting vs. Automation

Inter-entertainment industry operation and maintenance engineer: I no longer worry about the problems of slow release and error-prone, difficult to ensure environmental consistency, cumbersome permission assignment, and troublesome rollback under the traditional server model. SAE's full set of service governance capabilities improve the efficiency of development, operation and maintenance. 70%, while the elastic resource pool reduces the expansion time of the business end by 70%.

In a hot show, the registered users of Pumpkin Movie Day exceeded 80w, which caused the traffic entrance API gateway to be unable to support. All the services immediately after the back-end faced great stability challenges, and then began to expand urgently, buy ECS, and upload scripts to the server , Run the script to expand the database, the whole process takes 4 hours. However, it is not uncommon for natural explosions brought about by such a hot show, which has accelerated the thinking of technological upgrading of the pumpkin film.

With the help of the serverless application engine SAE, Pumpkin Film is fully Severless within 7 days, embracing K8s with zero threshold, and easily coping with the burst traffic of hot movies. Compared with the traditional server operation and maintenance model, the development and operation efficiency is increased by 70%, the cost is reduced by 40%, and the capacity is expanded. The efficiency is increased by more than 10 times.

Go ahead and aim for a thousand miles

In 2009, Berkeley put forward 6 predictions on the emergence of cloud computing at the time, including the possibility of paying on demand for services and greatly increasing the utilization rate of physical hardware. These have all become facts in the past 12 years. In 2019, Berkeley once again predicted that Serverless computing will become the default computing paradigm in the cloud era and replace the Serverful (traditional cloud) computing model.

With reference to the 12-year development history of cloud computing, Serverless is in the third year of verifying Berkeley's forecast, just over a quarter. In the past three years, from the beautiful vision of the future of the cloud, to the Serverless First advocated by cloud vendors and large-scale investment, to enterprise users taking full advantage of the advantages of Serverless to optimize the existing architecture, and objectively face the impact of the large-scale implementation of Serverless enterprises The stumbling block of the core business, and to this day, through technological innovation and breakthroughs to resolve the common pain points of the industry. This requires not only the courage and courage to take the first step, but also the mission and responsibility of a thousand miles.

Author: Wang Chen etc.

Special thanks to Mo Yan, Dai Xin, Xiao Jiang, and Liu Yu for their contributions to this article.

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.