1688 Serverless Efficiency Improvement Practices in Complex Business Scenarios

Author | Yuan Yan (Head of Serverless & Engineering Effectiveness, Alibaba CBU Technology Department)

foreword

First of all, let me briefly introduce our business scenario. 1688 belongs to the domestic trade division (CBU) of Ali Group. It is the earliest business of Ali with a history of more than ten years. We are mainly responsible for the PC terminal 1688.com and the mobile terminal Alibaba APP. It is currently the largest B-type e-commerce trading platform in China. business transaction channels.

I am the wireless server technical team of the 1688 team. The team mainly provides business support to the App and is responsible for the construction of various scenarios within the 1688 mobile App, such as homepage recommendation, product details, etc., which is also a typical e-commerce business. Scenes.
image.png

The evolution of FaaS in 1688

image.png
1688's exploration of FaaS (Function as a Serverles) technology dates back to around 2015. At that time, the biggest business goal of the entire Ali Group was "ALL IN Wireless". The mobile Internet was just emerging, and the business that existed on the PC side, whether it was Taobao or 1688, needed to be quickly transplanted to the mobile terminal. , generate an App to seize mobile traffic.

In such a large business context, 1688's solution is to set up a wireless server. Call the business interface on the PC side through the micro-service system, and then perform some lightweight business logic arrangement and UI layer mapping for the mobile terminal business in the foreground. Finally, through the mobile gateway, you can quickly provide the APP side with the same business capabilities. service interface.

The function iteration on the mobile Internet side is very fast. In this mode, we quickly encountered problems: the traditional microservice system takes a long time to build, play, deploy and debug, while the front-end business changes are very frequent and fast. Requested quickly. The misalignment between technical capabilities and business demands pushes us to find and explore better solutions.

Two Stages of FaaS Capability Landing

The implementation of FaaS within CBU has gone through two major stages. The first stage was in 2015. The department developed a set of dynamic loading system based on JVM to realize rapid release, rapid launch, hot deployment, etc., basically realizing the effect of FaaS. The second stage started last year. We worked together with the Alibaba Cloud FC team to replace the base of the entire FaaS capability with Alibaba Cloud's function computing to obtain better elastic scaling, container isolation, and other capabilities.

MBOX: FaaS system based on JVM dynamic loading capability

As mentioned above, in the context of rapid iteration of the entire wireless business, we need a capability to quickly publish and change the server interface. Around 2015, the mainstream idea in the Java engineering community is to make full use of the dynamic loading feature of the JVM: that is, without restarting the JVM, the external code is compiled into class bytecode in real time through a certain mechanism, and then dynamically loaded In the running JVM instance, so as to achieve the effect of hot loading.

So, based on the above ideas, we built a JVM-based dynamic service loading system - MBOX. In MBOX, we build a general lightweight service container, which can receive a piece of code from outside (maybe a Java class or a simple groovy script), and compile the code in real time to generate the class word Section code. After that, the container itself will perform certain security reinforcement operations on the generated bytecode (such as eliminating infinite loops, etc.), and finally load it into a class in the running JVM through a custom class loader to generate object instances. , inject middleware proxy, you can provide external services.

image.png

Based on MBOX, we have realized the ability of online coding, online preview and second-level release. From the current point of view, it is a very typical FaaS service platform, and has the following characteristics:

1. Compared with traditional which can be used as you write, what you see and what you get, and the research and development efficiency is very high.

2. hot loading update mechanism . It can be released in seconds, and the iteration efficiency of the entire business online is very high.

3. For developers who use the platform, it brings Serverless experience , because all operations and maintenance, machine deployment, etc. are all undertaken by the MBOX platform, and development only needs to care about the implementation of its business logic. Can. (Here, we play a role similar to the current cloud service provider)

For five years, the MBOX system has carried the entire 1688 business calls of more than 100,000 QPS. At its peak, there were more than 1,500 online functions, which saved a lot of human resources and made great contributions to the entire wireless business expansion stage. It also opened a door for our exploration of serverless technology.

After talking about the advantages, let's talk about the disadvantages and risks of this system:

First of all, the first point is isolation problem , because MBOX is based on JVM, JVM itself has no way to provide an effective resource isolation mechanism (such as CPU, memory, etc.), so there is a relatively large security risk: that is, the same business container Multiple services loaded in the server will affect each other. For example, today in this business cluster A, the code written by one person has a memory leak. As a result, the performance of the entire cluster may be slowed down, and all the above services will be affected. This is a very serious security risk.

The second point is that the development mode of the code is too light , pure script development, can only write a code fragment. Although it is very fast to develop, but there is no engineering structure, you cannot use frameworks, and you cannot use any design patterns. As a result, the applicable scenarios are very limited, and the quality of the code itself is also very poor.

The third point is that for MBOX maintainers, the management of resources is a very difficult problem. The water level of the entire cluster often soars, but there is no way to determine which service is occupying the resources. The system itself cannot perform elastic scaling very well, and can only rely on human flesh to manually expand the capacity. In the later stage of platform maintenance, the operation and maintenance cost is very high.

Around 2019, some problems with MBOX have become more prominent. At this time, under the influence of K8S, the industry has set off a wave of serverless and cloud-native technologies. We immediately started the corresponding technical research, and finally cooperated with Alibaba at the end of 2020. The cloud function computing team started to build together, hoping to build a truly cloud-native FaaS platform.

Alibaba Cloud FC: FaaS system based on Serverless + Sidecar

Based on the automatic operation and maintenance capabilities of K8S containers, Alibaba Cloud Function Computing (FC) has created a set of FaaS infrastructure that is highly elastic, strongly isolated, and easy to open and customize. At present, it has basically become a unified solution for the internal FaaS capabilities of the entire Alibaba Group. plan.

image.png

In addition to the underlying highly mature and powerful elastic automatic operation and maintenance capabilities, FC also provides a very open Runtime design. Any language or even any team can customize its own runtime framework to maximize the satisfaction of front-line business developers. 's demands.

For the long-standing cross-language problem - middleware invocation, the FC team and the middleware team, combined with Microsoft's latest open source DAPR technology, have realized a set of standardized sidecar capabilities, covering RPC, cache, message queue, configuration center, etc. Common middleware, while smoothing out multi-language differences, further simplifies the user's runtime container, so that the cold start and elastic speed of functions can be further improved.

In the end, under the support of the powerful underlying technology of FC, we jointly built a common runtime framework and R&D operation and maintenance facilities for Java developers in the group, replaced the original JVM MBOX system, and realized the technical replacement of FaaS capabilities.

image.png

Looking back on the evolution of CBU's serverless, from the earliest microservice architecture, to the self-developed JVM FaaS system, to the current FC function computing, we have gradually explored the most suitable technical solutions for business scenarios, and it is also considered to be the first in the industry. A step towards large-scale implementation of FaaS: As the first department of the Group to implement the Serverless concept on a large scale, our FaaS system has a business penetration rate of over 80% within the department, and has been used for more than 5 years.

Three questions about the soul of FaaS landing

After understanding the capabilities and implementation of FaaS, let's discuss the question that everyone cares about most: how to implement FaaS capabilities in business systems?

Based on our past practical experience, the implementation of FaaS in actual business is not as "silky" as most people think, and it is bound to encounter some problems. Here are three "soul torture" that we think are the most core:

The first is the transformation of 's stock business . For existing businesses (especially ours that have been running for many years), there is a lot of historical baggage that cannot be thrown away. This faces a problem. How to convert a large number of traditional Serverful Apps online into Serverless Functions? The risk of direct transformation is very high. Is there a way to obtain the advantages of FaaS at the minimum cost while ensuring the stable operation of the existing business?

The second is the fragmentation problem of . The construction ideas of traditional serverful applications are very cohesive. The core capabilities of a business are basically aggregated in several microservice applications, and the number is relatively small. But functions are different. Its lightweight characteristics will lead to a very rapid expansion of the number. If it is not designed and controlled, it is easy to appear fragmented scenarios corresponding to dozens of functions behind a business.

The third is the problem of improving the efficiency of research and development. industry generally believes that serverless can greatly improve R&D efficiency, but after the actual implementation, it will be found that the R&D efficiency improvement is not that simple, but the R&D efficiency improvement that can be brought about by replacing traditional serverful applications with serverless functions is very high limited.

image.png

1. How to combine complex business with FaaS?

First of all, to answer the first question, in terms of the combination of existing complex business and FaaS capabilities, through practical exploration, we have concluded two more practical models, namely the BFF model and the extension point model.

BFF mode

The BFF model is a mainstream practice that is more common in the implementation of FaaS. We can abstract the logic in the traditional Serverful App. Generally speaking, we can divide the logic in the business scenario into two layers according to the frequency of changes: part of the code logic is relatively light, and there are no complex dependencies. But the demand for the product may be mostly concentrated in this part, which is called the variable layer . Another part of the code may be the application framework of the business, the second-party dependencies of middleware, and some core business logic. Relatively speaking, there will not be too many changes, but the risk of transformation is very high, and the benefits of transformation may not be ideal. stable layer.

If your business application can be split according to the above ideas, it is very suitable for to adopt the BFF mode , abstract the change layer and put it into the serverless function, so as to achieve an effect similar to the BFF layer, the consumption of the foreground Fang actually consumes the API of the stable layer in the Serverful App directly through the function. In this way, a business buffer can be constructed, which can achieve faster release & delivery and less operation and maintenance. Although a considerable part of the old code burden cannot be lost, 80% of the energy can be concentrated to improve efficiency.

This mode is more suitable for front-end business scenarios. For example, the controller layer in the traditional application MVC architecture is very suitable to be replaced by FaaS.

image.png

Expansion point mode

The second mode is the extension point mode . The extension point mode is more suitable for the middle and back-end scenarios, or some middle-end systems in the business, such as our commodity center, use this mode.
image.png
For middle and back-end applications, they are generally very complex and have a lot of business logic. In addition, they have a long history and are not suitable for drastic transformation. However, we can abstract the complex business logic layer to a certain extent, design some key extension points for the future, and provide FaaS adaptation solutions for extension points. In this way, for some subsequent incremental business logic, the capabilities of FaaS can be used to provide it, while for the existing business logic, it can be basically kept unchanged. It only needs to be slightly adapted in the code structure, and it can also become a standard extension. point to achieve.

Another advantage of the extension point mode is that it can make the original closed architecture more open. After adopting this mode, even if it is a middle-stage application, as long as the docking specification of the extension point is formulated, any business party can By providing custom FaaS functions to achieve the scalability you want. In the commodity middle-end system of 1688, the business open customization capability of commodity price calculation logic is realized through this extension point mode.

2. Avoid the fragmentation problem of FaaS

Programming Interface Tradeoffs

In the early days, when we defined the programming interface of functions in the MBOX system, we used script programming, and the user's programming granularity was a piece of code, a Java class. Although this method is very lightweight, it will lead to a large number of functions (in order to implement a slightly complex business, many scripts may need to be written), and because there is no engineering structure, the code quality is very low, and some designs cannot be used. model). Therefore, when formulating the programming interface of functions based on FC, we set a rule based on the above experience, that is, the running granularity of user functions should be a "Micro App" instead of "Single Function"; the granularity of the programming interface should be a "Code Project", not "Single Script".

Based on this principle, for developers, a function instance is closer to a micro-application, retaining the most streamlined engineering structure as a whole, that is, a single function point can be implemented at a lower cost, and complex logic can also be implemented. , the introduction of various second- and third-party libraries will not cause very serious problems of function expansion and fragmentation.

Building an internal service market

Although the Micro App-style function granularity definition is adopted, the number of functions used in the business will still be several orders of magnitude more than the traditional microservice application. To solve this problem, we designed [Business Domain]-[Function Group] - [Function]- [Interface] The function classification definition of the four-layer latitude, and the plug-in is embedded in the project template of the function, and the group classification information is automatically collected and reported when the function is completed and released, and finally built for internal R&D personnel. The function service market, so that everyone can intuitively see the current function classification, as well as the attribution information of each interface API.
image.png

3. Where is the bottleneck of R&D efficiency?

Allowing developers to "Only focus on business" is the core concept that Serverless has advertised since its inception, but if you simply switch the underlying infrastructure such as operation and maintenance to Serverless infrastructure, and introduce FaaS and other related technical capabilities, In fact, for R&D personnel, it is far from "Only focus on business" in the true sense.

When we first implemented Serverless, we did find that the efficiency of R&D students in writing code seemed to be much higher. However, from the perspective of the overall business team's delivery of business requirements, the R&D efficiency has not undergone a qualitative improvement: most of the requirements R&D processes are still relatively It is lengthy, and the cost of communication and collaboration in the process of demand promotion is still very high. If we look carefully, we will find that the key bottleneck of R&D efficiency may not lie in "R&D" itself, but outside the code.

So is it a false proposition that Serverless can improve R&D efficiency? Of course not. First of all, Serverless and FaaS can indeed greatly reduce the cost of operation and maintenance and coding, and improve efficiency; secondly, the emergence of Serverless technology has lowered the technical threshold of the server side, enabling some non-server-side professional R&D personnel to be capable Develop some simple business logic, which makes it possible to develop a full stack of required by . Considering the efficiency of the overall demand delivery, assuming that the R&D personnel can complete all the development work of the entire demand independently, without the need for joint debugging and communication with others, then the efficiency is bound to be maximized - Serverless is the implementation and popularization of this new R&D model. Possibilities are brought about, which may be the real meaning of "Only Focus on Business".

In short, serverless is not a silver bullet for improving efficiency. In fact, no technology is a silver bullet. When we expect to improve R&D efficiency, we should examine it from a global perspective, rather than thinking about it. Only focus on the "R&D" stage.

Efficiency Improvement Practices in Complex Business Scenarios

Finally, we take an actual business scenario as an example to introduce to you how 1688 combines FaaS capabilities for R&D and efficiency improvement in a complex business scenario.

Here is a brief introduction to the product details business scenario of 1688: the product details is the final display page of the product for buyers, which carries a lot of product information. The difference between 1688's product details page and other ordinary C-type e-commerce is that there are multiple transaction channels for B-type transactions, such as spot wholesale, distribution distribution, processing customization, etc., and the transaction method of each channel. , prices, and inventory logic are different; other than the first time, B-type e-commerce will have huge differences in the expression of products in different industries such as consumer goods and industrial products; different channels and industries are customized, and then superimposed various e-commerce marketing activities The gameplay makes the business complexity of 1688's product details page very high.
image.png

The original technical architecture involved a number of teams, including the underlying commodity basic team (the ability to connect the group's middle and Taiwan commodities, the services of precipitation domain models and some core commodity logic), the wireless server team (the underlying commodity basic services) It is encapsulated into a dedicated product details interface for the client and front-end), builds a delivery team (responsible for providing certain horizontal support for the construction and delivery of the page), and finally displays ios, Android, and front-end three terminals.

In addition, in the face of some business customization requirements (such as distribution, etc.), the corresponding business team is also required to participate. In this model, a maximum of 5 or 6 teams will collaborate at the same time, and communication costs Very high; in addition, the server side adopts a relatively heavy microservice application mode, and the R&D and operation and maintenance efficiency are relatively low.

Front-end business logic collapsed: BFF mode

We first introduced the FaaS capability in the BFF mode in the front-end business logic layer where the business core changed the most and the fastest, and all UI-related logic was upstreamed into the FaaS function. On the one hand, this improves the development and deployment efficiency of the corresponding logic on the server side, and on the other hand, the front-end and client-side components only need to deal with the simplest display logic, so as to smooth out the multi-end technical differences as much as possible, so that some cross-end capabilities can be realized. .

Commodity backend: extension point schema support definition

The main problem faced by the back-end side of commodities is that the access cost of various customized business logic is too high, so we adopt the FaaS extension point model for optimization: abstract the core logic of commodity information (such as price, inventory) into a standard It connects to the function gateway, allowing any business party to write a function according to the template to customize its business logic, so as to realize the openness of the closed architecture.

image.png

After the above-mentioned technical architecture transformation, we can see that the mode and link of the entire demand research and development have undergone great changes. In the old model, if you want to customize the product details of a business, multiple teams from the business backend to the client need to participate, and many links need to be changed, and the cost is very high. In the new mode, from the customization of core business logic to the realization of business logic on the front-end presentation side, it can be implemented by writing simple FaaS functions, and even one student can complete all changes to the back-end business logic. (If the front-end and client-side components can achieve low-code + cross-end development, then full-stack development of business requirements can be achieved!)

image.png
Finally, let's take a look at the results brought by the transformation of the overall business scenario. In the R&D model combined with FaaS, the waiting time for product details related needs has been reduced by 80%, the release frequency has been increased by 300% +, and the throughput of demand has been reduced by 80%. There are also improvements; the most important thing is that the investment of R&D personnel has been reduced by 50%, and the human input of the entire back-end side has been reduced from the original 2 regular employees to only half a regular employee + 1 outsourced employee to support and participate in the related requirements development. The team and personnel have been reduced a lot, and the entire R&D delivery chain has become very concise and clear.

Summary and Outlook

Finally, standing at the current time node, from the perspective of the business team, briefly make a summary and outlook on serverless technology.

Based on our past experience in business scenarios, we summarize several key conclusions:

  1. Serverless is undoubtedly a new technological revolution 162300847ba419 The productivity improvement brought by core technologies such as FaaS has been strongly proved in many scenarios.
  2. The implementation of Serverless should be combined with the actual business scenario and have a definite purpose, rather than it to the end.
  3. Any technology is not a "silver bullet" for efficiency. The improvement of R&D efficiency is more about the combination of business and technology by thinking from the perspective of the team's organizational structure and people.

As for the follow-up development of Serverless and FaaS, I also make some personal opinions here, and welcome everyone to discuss rationally:

  1. FaaS will continue to evolve : Although the FaaS capability of Serverless is relatively mature at present, there is still a lot of room for improvement. Can have better resilience and cold start speed. We can see that the industry is constantly exploring new directions such as WASM and eBPF, and there is reason to believe that we will see continuous breakthroughs in this field.
  2. FaaS will not completely replace traditional : The two forms of traditional microservices (including those running on K8S) and FaaS will coexist at least for a long time in the future. The business team should be prepared for this. Combine the two technologies in business scenarios according to local conditions and leverage their advantages.
  3. Full-stack and low-code (or low threshold) R&D will become a Dramatically reduce the cost of communication and collaboration on projects and increase the throughput of requirements. With the further maturity and promotion of serverless technology, this will likely rewrite the current form of the R&D team.

Serverless
69 声望265 粉丝