1

Guide

business has already exceeded one hundred thousand QPS. The platform ranges from 0 to 1, and the QPS ranges from zero to more than one hundred thousand, making it the largest serverless application in the 160b08dfcde702 Alibaba Group. BU . How was this process achieved, and what problems did you encounter? This article will share with you why AutoNavi wants to engage in Serverless/Faas, how to do it, and what is the technical solution? What are the current progress and follow-up plans? I hope it will be helpful to interested students.

want to engage in Serverless 160b08dfcde725

The background reason is that launched a 160b08dfcde749 client cloud project that main purpose of the project 160b08dfcde74c is to improve the efficiency of client development iteration . In the past, the business logic of the client was on the client. Changes in product requirements needed to be released through the client version. However, the client version required various testing procedures, gray-scale procedures, and solutions to problems such as client crashes.

After the client goes to the cloud, some changeable business logic is put on the cloud. New product requirements are developed in the cloud instead of monthly version releases, which speeds up the development and iteration efficiency of requirements, and is one step closer to the ideal goal of production and research at the same frequency. (Why do you say "again" is because before AutoNavi We have also made some efforts to optimize the same frequency of production and research, but we hope that cloud integrated development can be one of the most effective technical assistance).

1.1 Goal: client development model-one of the client and cloud

Although the development model has changed from the previous client development to the current cloud + client development, the development classmates should still be the classmates who were in charge of the corresponding business. As everyone knows, server-side development and client-side development are obviously different, and client-side development is often For the development of stand-alone mode, server-side development is usually cluster mode, and various complex issues such as coordination of distributed systems, load balancing, and failover degradation must be considered.

If you use the traditional server-side model for development, this transition risk will be greater. Faas solves this problem well. We combined the existing xbus framework of AutoNavi Client (a framework for local service registration and invocation on the client), and extended the xbus-cloud component, making the development on the cloud just like the development on the end. The goal of A set of codes, running two places, a set of business codes can run on both the client and the server.

The AutoNavi client has three main terminals: iOS, Android, and car machine (Linux-like operating system). There are mainly two languages, C++ and Node.js. Traditional map functions: such as map display, navigation path display, navigation broadcast, etc., due to the need to span three terminals, using C++ language to develop. Some map application functions based on map navigation, such as cards before and after the trip, recommended destinations, etc. are mainly developed with Node.js.

In Alibaba Group, the front-end team of Tao Department developed Node.js Faas Runtime. In the cloud project of AutoNavi Client, the Node.js part adopts the existing Node.js Runtime of Taobao to connect to the Faas platform of the group to complete some services of Node.js to the cloud. The November 2020 is a good support for AutoNavi’s November Travel Festival business.

C++ Faas does not have an existing solution, so we decided to add to the group’s infrastructure and build a new C++ Faas basic platform to help AutoNavi clients go to the cloud.

1.1.1 The key to the best practice of integrating device and cloud: the abstraction of the interface between the client and Faas

The logic of the original client is moved to the Faas server, or part of the new requirements is developed on the Faas server. The here lies in the definition of the interface protocol between the client and Faas, which is also the API definition of Faas. good API definition is not only good for the maintainability of the system, but also important for the iterative development of the subsequent supporting business.

Ideally: the client is made into a browser that parses the result data returned by Faas. Once the browser protocol is defined, it will not change frequently. If you look at IE, Chrome is rarely updated.

Of course, our browser will be more complicated. Our browser is a map browser. How to check whether the interface definition between the client and Faas is good or not can be seen in the subsequent product requirement iterations. If some product requirement iterations only need to be completed on Faas without any modification of the client, then this interface abstraction is successful.

1.2 BFF layer development to improve efficiency

When it comes to AutoNavi, the first thing you think of should be its tool properties: AutoNavi is a navigation tool. (This statement is not accurate now, because AutoNavi has been transforming from tooling to platformization in recent years. AutoNavi’s Transactional business is on the rise, and AutoNavi’s taxi-hailing, ticketing, hotel and other businesses are developing very rapidly). For AutoNavi, compared with other businesses of the group, compared with e-commerce, there are a large number of read-only scenarios which is a major technical feature of AutoNavi's business.

Among these read-only scenarios, a large number of requirements are BFF (Backend For Frontend) type read-only scenarios. Why do you say that? Because the core functions of navigation, such as routing, traffic, eta, etc., are relatively stable. The main work of this part is to use continuous optimization algorithms to make AutoNavi’s navigation more accurate and the calculated path better. . These core functions are relatively stable in terms of interfaces and functions, but the front-end requirements are changeable, such as adding a reminder of the limit width on the path.

Faas is particularly suitable for BFF layer development, calling various Baas services with relatively stable backends on Faas, and Faas services to encapsulate data and call logic for rapid development and release. In the industry, Faas is also the BFF scene (another name is SFF scene, service for frontend).

1.3 Serverless is a high-level language in the cloud era

Although has been fully the cloud. Containers have been standardized. In terms of scale and resource utilization, you can fully enjoy the benefits of the cloud. , But the business development model is basically the same as before, and it is still a large-scale distributed system.

For the R&D model, we have not yet enjoyed the benefits of the cloud. , a service running on the cloud in assembly language. Serverless and cloud native can be understood as high-level languages in the cloud era. Cloud as a computer is truly achieved. It only needs to focus on business development and does not need to consider the complexities of large-scale distributed systems.

1.4 Go-Faas supplements the Go language ecology

As mentioned earlier, because of the client-side cloud project, we added on the Alibaba Cloud FC (Functional Computing) team and developed the C++ Faas Runtime.

Not only that, we have also developed Go-Faas . Why do we do Go-Faas? Here is a brief introduction to the background. The QPS peak value of the Go part of the AutoNavi server has exceeded one million. AutoNavi has completed the Go clients of Ali's middleware and built it together with the group's middleware department. The observability and automated test system are also basically complete, and the Go ecosystem is now basically complete. After completing the Go-Faas, we can use Go to write Baas services and Go to write Faas services. Different service implementation methods are used in different business scenarios. Go-Faas is mainly used in the above mentioned BFF scene.

2. Technical solution introduction-add on the existing infrastructure of the group

2.1 Overall technical architecture

I talked about why we have to do this above. Now let’s talk about how we do it: how to achieve it, and what the specific technical solution is.

Based on the on the basis of the group’s existing infrastructure and existing middleware Cloud FC function computing team to build together, and developed C++ Faas Runtime and Go Faas Runtime. The technical architecture of the whole and the group pull-through is shown in the figure below, which is mainly divided into three parts: research and development state, operation state, and operation and maintenance state.

2.1.1 Running state

Let's talk about the running state first, business traffic comes in from the gateway, calls to FC API Server, and forwards to C++/Go Faas Runtime. Runtime completes the functions in the user function. The architecture of Runtime will be introduced in detail in the next chapter.

Deployed together with Runtime Container are various side cars such as monitoring, log, and Dapr. Side car completes various log collection and reporting functions, and Dapr Side car completes the function of calling group middleware.

In addition, Dapr is still in the pilot phase. The invocation of middleware is mainly done through Broker and various middleware Proxy. The middleware calls HSF, Tair, Metaq, Diamond and other middleware Proxy.

Finally, the Autoscaling module manages the expansion and contraction of function instances to achieve the purpose of function auto-scaling. There are various strategies for scheduling here, such as scheduling based on the amount of concurrent requests and scheduling of the CPU usage of function instances. The number of reserved instances can also be set in advance to avoid cold start problems after shrinking to zero.

The bottom layer calls the capabilities of the group's ASI. ASI can be simply understood as the group's K8S + Sigma (the group's scheduling system). The final deployment is that the FC calls ASI to complete the function instance deployment. Flexible, the smallest unit of deployment is the POD in the figure above. A POD contains Runtime Container and Sidecar Set Container.

2.1.2 Research and development

Let's look at the R&D state again. The running state determines how the function runs, and the development experience of the function that the R&D state focuses on. How to easily let developers develop, debug, deploy, and test a function.

C++ Faas has a cross-platform difficult problem. There are some dependent libraries in C++ Faas Runtime. These dependent libraries are not as convenient as Java dependent library management. The installation of these dependent libraries is more troublesome. Faas scaffolding is to solve this problem, calling the scaffolding, generating C++ Faas sample projects with one click, and installing various dependent packages. In order to facilitate local debugging, a C++ Faas Runtime Boot module has been developed. The entry of the function Runtime is in the Boot module. The Boot module integrates Runtime and user Faas functions, which can be used for single-step debugging of Runtime.

We cooperated with the group Aone team. The release of functions is integrated into the Aone environment. It is very convenient to publish Go or C++ Faas on Aone. Aone also integrates the function of one-click generation of Example code libraries.

The compilation of C++ and Go Faas depends on the corresponding compilation environment. Aone provides the function of customizing the compilation image. We upload the compilation image to the group's public image library. When the function is compiled, the corresponding compilation image is specified in the function code library. . Faas dependent libraries, SDK, etc. are installed in the compiled image.

2.1.3 Operation and maintenance

Finally, let’s look at the operation and maintenance monitoring of the function. Runtime integrates the functions of Eagle Eye and Sunfire to collect logs. These logs will be written in Runtime and collected by the Agent in Sidecar to the Eagle Eye or Sunfire monitoring platform (FC is through SLS). After collecting), you can use the existing monitoring platform of the group to do Faas monitoring. It can also access the group’s GOC alarm platform.

2.2 C++/Go Faas Runtime architecture

The above is an overall architecture integrated with Aone, FC/CSE, and ASI. Runtime is a part of this overall architecture. The following specifically talks about the architecture of Runtime and how Runtime is designed and implemented.

The user Faas code in the top part only needs to rely on the Faas SDK. Users only need to implement the Function interface in the Faas SDK to write their own Faas.

Then, if you need to call an external system, you can call it through the Http Client in the SDK. If you want to call an external middleware, you can call the middleware through the Diamond/Tair/HSF/Metaq Client in the SDK. These interfaces in the SDK shield the complexity of the underlying implementation. Users do not need to care about how these calls are finally implemented, and they do not need to care about the specific implementation of Runtime.

The SDK layer is the function definition mentioned above and the interface definition of various middleware calls. The SDK code is developed for Faas users. The SDK is relatively thin and light, mainly for interface definitions, and does not include specific implementations. There are two ways to implement the call middleware in Runtime.

Let's look at the blue part in the middle of the above figure, which is an overall architecture of Runtime. Starter is the startup module of Runtime. After startup, Runtime itself is a Server. When it starts, it starts Runtime according to the configuration of the Function Config module. After Runtime starts, it starts the request and management monitoring mode.

Down is the Service layer, which implements the middleware call interface defined in the SDK, including two implementation methods: RSocket and Dapr. RSocket calls the middleware through the RSocket broker mode, and Dapr (distributed application runtime) is integrated in Runtime. Invoking middleware can also be invoked through Dapr. In the early Dapr pilot phase, if the invocation of middleware through Dapr fails, it will be downgraded to RSocket to invoke the middleware.

Further down is the protocol layer of RSocket, which encapsulates various Metadata protocols that call RSocket. Dapr calls are made through GRPC. The bottom layer is the integration of RSocket and Dapr.

RSocket calls also involve the issue of Broker selection. The Upstream module manages the Broker cluster, Broker's registration and anti-registration, Keepalive check, etc., and the LoadBalance module implements Broker's load balancing selection, as well as event management, connection management, reconnection, and so on.

Finally, the Metrics module in Runtime is responsible for the access of the Hawkeye Trace, intercepting the time consumption of the Faas link through the Filter mode, and outputting the Hawkeye log. Print Sunfire logs for Sidecar to collect. The following figure is a Sunfire monitoring interface of an actual business:

2.2.1 Dapr

The Dapr architecture is shown in the figure below, you can refer to the official documents for details

The middleware used in Runtime was called through RSocket. Here, RSocket Broker will have a centralization problem. In order to solve the problem of decentralization of Outgoing traffic, AutoNavi and the group middleware team have cooperated to introduce the Dapr architecture. It's just that Dapr is integrated at the Runtime level, which is unaware to the user Faas and does not need to care about whether the specific call middleware is called through RSocket or Dapr. After the Runtime calls the middleware to switch to Dapr, the user Faas does not need to make any changes.

3. How to connect the business to Serverless

As mentioned earlier, unified access on Aone. We provide access documents for C++ Faas/Go Faas. Provides an Example code library of functions. The code library has examples of various scenarios, including code examples of calling various middleware of the group.

The access of C++ Faas/Go Faas is open to the entire group. At present, there are already some BUs other than Gaode who have implemented C++/Go Faas in their own businesses.

Node.js Faas uses the Runtime and templates provided by Taobao to access, and Java Faas uses the Runtime and templates provided by Alibaba Cloud FC to access it.

3.1 Access Specification-Three Stability: Can be monitored, can be grayscale, and can be rolled back

In view of the stability issues that everyone may be worried about when implementing new technologies, the magic weapon to deal with is the stability of Alibaba Group: it can be monitored, grayed out, and rolled back. Establish a Faas link guarantee group, link up and downstream related business parties and basic platforms together, and respond to online alarms within 1 minute, quickly investigate, and process within 5 minutes in accordance with the group's 1-5-10 requirements; Recovery within 10 minutes.

In order to standardize the access process and avoid online failures caused by mistakes, we have formulated Faas access specifications and CheckList to help business parties quickly use Faas.

Monitoring, grayscale, and rollback are hard requirements. In addition to this, it would be better if the business side could be downgradable. Our C++ client cloud service is ready to be downgraded at the beginning of the pilot phase. If the Faas terminal fails to be called, the call will be automatically downgraded to the local call. Basically, there is no loss to the client function, but it will increase some response delay.

In addition, the version of the function on the client may be slightly older than the server, but the function is forward compatible and basically does not affect the use of the client.

4. Our current situation

4.1 Basic platform construction

  • Go/C++ Faas Runtime has been developed, docking with FC-Ginkgo/CSE, and Aone has been completed, and the stable version 1.0 has been released.
  • A lot of stability construction, graceful offline, performance optimization, and C compiler optimization have been done, and the compilation method provided by the compiler optimization team of Alibaba Cloud Basic Software Department has been used to optimize the compilation of C++ Faas, and the performance has improved significantly.
  • C++/Go Faas is connected to Eagle Eye and Sunfire monitoring is completed, and the function is observable.
  • The pooling function is completed, and it has the ability to be flexible in seconds. The pooled Runtime mirrors the access to the CSE, and the time to expand a new instance is changed from the original minute level to the second level.

's Serverless business

C++ Faas, Go Faas and Node.js Faas have already had a large number of applications in AutoNavi. To give a few examples:

The first two pictures in the above picture are the business developed by C++ Faas: long-distance weather, search along the way. The last two screenshots are the business developed by Go-Faas: navigation Tips, footprint map.

AutoNavi is the BU with the largest scale of serverless applications in the Alibaba Group. The daily peak value of the serverless applications that have been deployed has already exceeded 100,000 QPS.

4.3 Main income

After AutoNavi has implemented the largest serverless application in the group, what are the benefits? First, first and most important benefits are: Development, Improve Efficiency . Our client-cloud integrated component based on Serverless implementation helps the client to go to the cloud, relieves the client release dependency problem when requirements are realized, and improves the efficiency of client development iteration. Based on the BFF layer developed by Serverless, the development iteration efficiency of BFF scenes is improved.

second benefit is: operation and maintenance efficiency. 160b08dfcdf085 Utilizing Serverless's automatic elastic expansion and contraction technology, For example, the annual travel festivals of 10-1, 5-1, Qingming, Shuangdan, and Spring Festival travel peaks, no longer need operation and maintenance or business development students to expand in advance before the holiday, and then shrink after the holiday.

The characteristics of AutoNavi’s business peak are also different from the e-commerce spike scenario. The peak travel traffic does not suddenly rise within one second. The second-level elasticity achieved by our current pooling technology can fully meet the needs of this business scenario of AutoNavi.

third benefit is: cost reduction. ’s business features include large traffic during the day and low traffic at night, with large differences in peak and trough values, and distinct time periods. Use Serverless to automatically shrink the capacity at night when the traffic is low and peak, which greatly reduces the cost of server resources.

5. Follow-up plan

  • Optimize the use of function calculation in the FC bomb, and continue to optimize the performance, stability, and experience of the function calculation in the bomb together with the FC team. Use the rich high-traffic business scenarios within the group to continuously polish the C++/Go Faas Runtime, and finally output it to the public cloud, inclusive of more companies in the wave of digital transformation.
  • Dapr landed to solve the problem of decentralization of Outcoming traffic, gradually launched some C++/Go Faas, and used Dapr to call group middleware.
  • Faas chaos engineering, fault drill, escape ability construction. Faas will also participate in our BU's fault drills in the new fiscal year to solve the problems found during the drills one by one.
  • Access edge computing. In the scenario where the device and the cloud are integrated, Faas + edge computing can provide lower latency and a better user experience.

The above tasks have a long way to go. In addition, we will do more cloud-native pilots and implementations in the future. Technical students all know that there is still a long way to go from technology selection, technology prototypes to actual business implementation. .

Welcome friends who are interested in serverless, cloud native, or Go application development, and students who want to do something together to join us (no matter what technology stack you are before, the hero does not ask the source, and submit your resume to gdtech@alibaba-inc.com , The subject of the email is: name-technical direction-from AutoNavi Technology), there are large-scale landing scenes and a simple and open technical atmosphere. Self-recommendations or recommendations are welcome.


高德技术
458 声望1.8k 粉丝

关于高德的技术创新内容将呈现于此