: In the Faas scenario, cost and R&D efficiency are more attractive to users. Costs are mainly achieved through on-demand allocation and extreme elastic efficiency. Application developers expect to provide a multi-language programming environment through FaaS to improve R&D efficiency, including startup time, release time, and development efficiency.​

Author|Cao Shengli

What is Service Mesh?

Since 2010, SOA architecture has become popular among medium and large Internet companies, and Alibaba also open sourced Dubbo in 2012. After that, the microservice architecture became popular, and a large number of Internet and traditional enterprises have devoted themselves to the construction of microservices. Two major microservice camps, Dubbo and Spring Cloud, have gradually formed in China. In 2016, a more cutting-edge microservices solution in the field of microservices, which is more in line with containers and Kubernetes, is being bred. This technology is called Service Mesh. Today, the concept of Service Mesh has been widely popularized, and many companies have landed in the field of Service Mesh.

Service Mesh definition

Service Mesh is an infrastructure layer that mainly revolves around communication between services. The service topology of current cloud-native applications is very complex, and Service Mesh can achieve reliable request transmission in this complex topology. Service Mesh runs in a Sidecar mode, and an independent Service Mesh process runs next to the application, and Service Mesh is responsible for the communication of remote services. Military tricycles are very similar to Service Mesh. On military tricycles, one soldier is responsible for driving, and one soldier is responsible for shooting people.

Pain points solved by Service Mesh

1.png

Traditional microservice architectures are mostly based on the RPC communication framework, and the RPC SDK provides services such as service registration/discovery, service routing, load balancing, and full link tracking. The application business logic and the RPC SDK are in the same process. This approach brings many challenges to the traditional microservice architecture: the middleware capability-related code invades the business code, and the coupling is very high; the upgrade cost of the RPC SDK is very high. This has led to serious differentiation of SDK versions. At the same time, this method has relatively high requirements for application developers, requires rich service governance operation and maintenance capabilities, and has background knowledge of middleware, and the threshold for using middleware is relatively high.

Some RPC capabilities are submerged through Service Mesh, so that separation of concerns and clear boundaries of responsibilities can be achieved. With the development of container and Kubernetes technology, Service Mesh has become a cloud-native infrastructure.

Introduction to Istio

2.png

In the field of Service Mesh, Istio is undoubtedly the king. Istio consists of a control plane and a data plane. In ServiceMesh, different services communicate through the proxy sidecar. The core function of Istio is traffic management, which is completed through the coordination of the data plane and the control plane. Istio was initiated by Google, IBM, and Lyft. It is the purest lineage in the field of CNCF Ecological Map Service Mesh and is expected to become the de facto standard for Service Mesh.

The data plane of Istio uses Envoy by default, and Envoy is the best data plane by default in the community. The interaction protocol between the Istio data plane and the control plane is xDS.

Service Mesh summary

Finally, make a summary of Service Mesh:

  • The positioning of Service Mesh is to provide the infrastructure for communication between services. The community mainly supports RPC and http.
  • It is deployed in Sidecar mode and supports deployment on Kubernetes and virtual machines.
  • Service Mesh uses the original protocol for forwarding, so Service Mesh is also called a network proxy. It is precisely because of this method that zero intrusion to the application can be achieved.

What is Dapr?

Challenges of Service Mesh

3.png

The forms of users deploying services on the cloud mainly include common application types and FaaS types. In the Faas scenario, costs and R&D efficiency are more attractive to users. Costs are mainly achieved through on-demand allocation and extreme elastic efficiency. Application developers expect to provide a multi-language programming environment through FaaS to improve R&D efficiency, including startup time, release time, and development efficiency.

The implementation of Service Mesh is essentially original protocol forwarding, which can bring zero intrusion advantages to applications. However, the original protocol forwarding also brings some problems. The application-side middleware SDK also needs to implement serialization and encoding and decoding, so there is a certain cost in multi-language implementation; with the continuous development of open source technology, the technology used is also In continuous iteration, if you want to migrate from Spring Cloud to Dubbo, either application developers need to switch the dependent SDK. If you want to use Service Mesh to achieve this effect, Service Mesh needs to perform protocol conversion, which is costly.

Service Mesh focuses more on the communication between services, and there is very little support for other forms of Mesh. For example, Envoy, in addition to being relatively successful in the RPC field, attempts in the fields of Redis, messaging, etc. have not yielded results. The integration of RPC and messaging is supported in Ant's Mosn. The overall demand for multiple Mesh forms exists, but each Mesh product develops independently, lacking abstraction and standards. With so many forms of Mesh, do they share the same process? If you share a process, do you share a port? Many questions have no answers. As for the control surface, from a functional point of view, most of it focuses on traffic. After reading the content in the xDS protocol, the core is to discover services and routing. Other types of distributed capabilities are basically not involved in the control plane of Service Mesh, let alone abstract various xDS-like protocols to support these distributed capabilities.

Because of cost and R&D efficiency, FaaS has been chosen by more and more customers. FaaS has more demands for the friendliness of multi-language and programming APIs, so Service Mesh still cannot bring customers extra in these two areas. The value of.

The needs of distributed applications

4.png

Bilgin Ibryam is the author of Kubernetes Patterns, the chief middleware architect of RedHat, and is very active in the Apache community. He published an article that abstracted some of the current difficulties and problems of distributed applications, and divided distributed application requirements into four major categories: life cycle, network, state, and binding. There are some sub-capabilities under each type, such as the more classic middleware capabilities such as Point-to-Point, pub/sub, and Caching. Applications have so many requirements for distributed capabilities, and Service Mesh obviously cannot meet the current needs of applications. Biligin Ibryam also put forward the concept of Multiple Runtime in the article to solve the dilemma of Service Mesh.

Multiple Runtime concept derivation

5.png

In the traditional middleware model, applications and distributed capabilities are integrated in the same process by means of SDK. With the sinking of various infrastructures, various distributed capabilities have moved from the application to the outside of the application. For example, K8s is responsible for life cycle-related requirements, and Istio, Knative, etc. are responsible for some distributed capabilities. If these capabilities are moved to an independent Runtime, this situation is unacceptable regardless of the operation and maintenance level or the resource level. So at this time, it is definitely necessary to integrate some Runtimes, and the most ideal way is to integrate them into one. This method is defined as Mecha, which means mecha in Chinese, just like the protagonist in Japanese anime turns into a mecha. Each component of the mecha is like a distributed capability, and the person in the mecha corresponds to The main application, also called Micrologic Runtime. The two Runtimes can be a one-to-one Sidecar mode, which is very suitable for traditional applications; it can also be a many-to-one Node mode, which is suitable for edge scenarios or network management modes.

So the goal of Mecha Runtime, which integrates various distributed capabilities, is not a big problem, so how to integrate? What are the requirements for Mecha?

  1. Mecha's component capabilities are abstract, and any open source product can be expanded and integrated quickly.
  2. Mecha needs to have certain configurability, which can be configured and activated through yaml/json. These file formats are best aligned with mainstream cloud native methods.
  3. Mecha provides a standard API, and the network communication with the main application is completed based on this API, instead of the original protocol forwarding, so that it can bring great convenience to component extension and SDK maintenance.
  4. In the life cycle of distributed capabilities, some capabilities can be handed over to the underlying infrastructure, such as K8s. Of course, some complex scenarios may require K8s, APP, and Mecha Runtime to complete them together.

Since there is only one Runtime left, why is it called Multiple Runtime? Because the application itself is actually a Runtime, plus Mecha Runtime, there are at least two Runtimes.

Dapr introduction

The introduction of Multiple Runtime in the previous section is rather abstract, and you can re-understand Multiple Runtime from Dapr. Dapr is a good practitioner of Multiple Runtime, so Dapr must coexist with applications, either in Sidecar mode or Node mode. The word Dapr is actually not made, but the first letters of Distributed Application Runtime are spliced ​​together. The icon of Dapr can be seen as a hat. This hat is actually a hat of a waiter, which means it is to be made for the application. Good service.

Dapr is open sourced by Microsoft, and Alibaba is deeply involved in the cooperation. The current Dapr version 1.1 has been released and is now close to production capacity.

6.png

Since Dapr is the best practitioner of Multiple, Dapr's operating mechanism is also constructed based on the concept of Mulitple Runtime. Dapr abstracts distributed capabilities and defines a set of APIs for distributed capabilities, and these APIs are built based on Http and gRPC. This abstraction and capabilities are called Building Blocks in Dapr; Dapr is in order to support open source products. Different types of products such as commercialization and commercialization extend the distributed capabilities in Dapr. There is a set of SPI extension mechanism inside. This SPI mechanism is called Components. After using Dapr, application developers only need to program for the APIs of various distributed capabilities without paying too much attention to the specific implementation. In Dapr, the corresponding components can be freely activated according to the Yaml file.

Dapr features

7.png

Application developers can directly have various distributed capabilities by using various multi-language Dapr SDKs. Of course, developers can also make calls based on HTTP and gRPC. Dapr can run in most environments, including your own computer environment, or any Kubernetes environment, or edge computing scenario, or cloud vendors such as Alibaba Cloud, AWS, and GCP.

The Dapr community has integrated 70+ component implementations, and application developers can quickly select and use them. The replacement of components with similar capabilities can be done through Dapr, and the application side can be non-perceptual.

Dapr core module

8.png

Let's analyze the latitude of the Dapr product module and see why Dapr is a good practice of Mulitiple Runtime.

The Component mechanism ensures the realization of rapid expansion capabilities. There are now more than 70 component implementations in the community, including not only open source products, but also commercial products on the cloud.

The distributed capabilities indicated by the Building Block currently only support seven, and more distributed capabilities are needed to be able to come in in the future. BuildingBlock now supports HTTP and gRPC these two open, and the popularity of the agreement has been very high. The specific components under the Building Block in Dapr will be activated and need to rely on YAML files to do so. Because Dapr uses HTTP and gRPC to expose capabilities, it becomes easier to support multi-language standard API programming interfaces on the application side.

Dapr core: Component & Building Block

Dapr Component is the core of Dapr plug-in extension and is Dapr's SPI. The currently supported Components are Bindings, Pub/Sub, Middleware, ServiceDiscovery, Secret Stores, and State. Some of the extension points are functional latitude such as Bindings, pub/sub, state, etc., and some are horizontal such as Middleware. Suppose you want to implement Redis Dapr integration, you only need to implement Dapr State Component. Dapr Building Block is a capability provided by Dapr, which supports gRPC and HTTP methods. The currently supported capabilities include Service Invocation, State, Pub/Sub, etc.

9.png

A Building Block is composed of one or more Component. The Building Block of Binding contains two Components, Bindings and Middleware.

Dapr overall architecture

10.png

Dapr, like Istio, has a data plane and a control plane. The control surface includes Actor Placement, Sidecar Injector, Sentry, and OPerator. Actor Placement mainly serves Actors, Sentry does security and certificate-related work, and Sidecar Injector is mainly responsible for Dapr Sidecar injection. The activation of a component in Dapr is done through the YAML file. The YAML file can be specified in two ways: one is to specify the runtime parameters locally, and the other is to complete it through the control plane Operator to activate the component. It is stored in K8s CRD mode and sent to Dapr's Sidecar. Both core components of the control plane rely on K8s to run. The current Dapr Dashboard function is still weak, and there is no improvement direction in the short term. Now after the integration of various components, the operation and maintenance of each component needs to be completed in the original console. The Dapr control plane does not participate in the operation and maintenance of specific components. .

Dapr's standard running form is in the same Pod as the application, but it belongs to two containers. The other content of Dapr has been introduced enough before, so I won't introduce it here.

Dapr Microsoft landing scene

Dapr has gone through about 2 years of development. What is the status of its internal implementation in Microsoft?

11.png

There are two projects on Dapr's github: workflows and Azure Functions Dapr extensions. Azure Logic App is a cloud-based automated workflow platform from Microsoft. And Workflows is the integration of Azure Logic App and Dapr. There are several key concepts in Azure Logic App, Trigger and Connector and Dapr fit very well. Trigger can be completed using Dapr's Input Binding, relying on Dapr's Input Binding's large number of components to achieve, can expand the type of traffic entrance. The capabilities of the Connector and Dapr's Output Binding or Service Invocation are very matched, allowing quick access to external resources. Azure Functions Dapr extensions is based on the Dapr support made by Azure Function extension, which allows Azure Function to quickly use the capabilities of Dapr's various Building Blocks, and at the same time can bring function developers a relatively simple and consistent programming experience in multiple languages.

12.png

The perspectives of Azure API Management Service and the two landing scenarios mentioned above are not the same. It is based on the premise that applications have been accessed through Dapr Sidecar, and the services provided by applications are exposed through Dapr. At this time, if a non-K8s application or a cross-cluster application wants to access the services of the current cluster, a gateway is needed. This gateway can directly expose the capabilities of Dapr, and some security and authority controls will be added to the gateway. Currently, 3 types of Building Blocks are supported: Service Invocation, pub/sub, and resource Bindings.

Dapr summary

The capability-oriented APIs provided by Dapr can bring developers a consistent programming experience that supports multiple languages, and the SDKs of these APIs are relatively lightweight. These features are very suitable for FaaS scenarios. With the continuous improvement of Dapr integration ecology, the advantages of developers' ability-oriented programming will be further expanded. Through Dapr, the implementation of Dapr components can be replaced more conveniently, without the need for developers to make code adjustments. Of course, the original component and the new component implementation must be the same type of distributed capability.

and Service Mesh:

Provide capabilities: Service Mesh focuses on service invocation; Dapr provides a wider range of distributed capabilities, covering a variety of distributed primitives.

Working principle: Service Mesh adopts original protocol forwarding to achieve zero intrusion; Dapr adopts multi-language SDK + standard API + various distributed capabilities.

Domain-oriented: Service Mesh is very friendly to the non-intrusive upgrade support of traditional microservices; Dapr provides a more friendly programming experience for application-oriented developers.

Ali's exploration on Dapr

Ali's development route in Dapr

In October 2019, Microsoft open sourced Dapr and released version 0.1.0. At this time, Alibaba and Microsoft happened to have started some cooperation with OAM and learned about the Dapr project, so they began to evaluate it. At the beginning of 2020, Alibaba and Microsoft conducted a round of Dapr communication under Alibaba, and learned about Microsoft's views on Dapr, investment, and subsequent development plans. At this time, Ali has determined that the Dapr project has greater value. It was not until mid-2020 that work began around Dapr. By October, Dapr started the online grayscale part of the function in the function calculation scene. As of today, the grayscale of all functions of Dapr related to function calculation has been basically completed, and the open beta has begun. In February 2021, version 1.0 was finally released.

Alibaba Cloud Function Computing Integration Dapr

In addition to the benefits of extreme flexibility and other operation and maintenance side, the difference between function computing and middle-station applications is that function computing pays more attention to bringing developers a better R&D experience and improving overall R&D efficiency. The value Dapr can give to function calculations is to provide a multi-language unified capability-oriented programming interface, and developers do not need to pay attention to specific products. Like Java language, if you want to use the OSS service on Alibaba Cloud, you need to introduce maven dependency and write some OSS code at the same time. With Dapr, you only need to call the Binding method of the Dapr SDK. The package does not need to introduce redundant dependent packages, but is controllable.

14.png
image.png

The English name of function calculation is Function Compute, abbreviated as FC. FC's architecture contains more systems, and developers related mainly include Function Compute Gateway and the environment in which functions are run. The FC Gateway is mainly responsible for receiving traffic, and at the same time, it will expand and shrink the current function instance according to the size of the received traffic, the current CPU and memory usage. The function computing runtime environment is deployed in a Pod, the function instance is in the main container, and the dapr is in the sidecar container. When there is external traffic accessing the service of function calculation, the traffic will go to Gateway first. Gateway will forward the traffic to the function instance that provides the current service according to the accessed content. After the function instance receives the request, if it needs to access external resources, you can Initiate calls through Dapr's multi-language SDK. At this time, the SDK will initiate a gRPC request to the Dapr instance, and in the dapr instance, according to the type and body of the request, select the corresponding capabilities and component implementations, and then initiate calls to external resources.

15.png

In the Service Mesh scenario, Mesh exists in the form of Sidecar and is deployed in two containers in the same Pod as the application, which can well meet the needs of Service Mesh. However, in the function computing scenario, Dapr runs as an independent container which consumes too much resources, and multiple function instances are deployed in a Pod in order to save resource expenses and second-level flexibility. Therefore, in the function computing scenario, the function instance and the Dapr process need to be deployed in the same container, but they exist in two processes.

In the function computing scenario, you can set the number of reserved instances to indicate the minimum number of instances of the current function. If there are reserved instances, but these instances need to enter the suspend/sleep state without traffic access for a long time, this method is consistent with the AWS method. To enter the dormant state, the process or thread in the instance needs to stop running. The Extension structure is added to the function runtime to support Dapr life cycle scheduling. When the function instance enters the dormant state, the extension informs Dapr to enter the dormant state; when the function instance resumes running, the extension informs Dapr to resume the previous running state. The implementation of components within Dapr needs to support this way of life cycle management. Take Dubbo as an example. Dubbo's registry, nacos, needs to send heartbeats to the Nacos server regularly to keep track of it. At the same time, the Dubbo Consumer integrated by Dapr also needs to send heartbeats to the Dubbo Provider. After entering the transient state, the heartbeat needs to exit; when the operation resumes, the entire operating state needs to be restored.

The combination of the function calculation and Dapr mentioned above is based on the external flow. What about the incoming flow? Can message traffic flow directly into Dapr without going through the Gateway? To achieve this, it is also necessary to report some performance data to Gateway in Dapr Sidecar in time, so that Gateway can achieve resource flexibility.

SasS business goes to the cloud

With more and more SaaS businesses incubated internally by Alibaba, the demand for external services from the SaaS business is very strong. The SaaS business has a strong demand for multi-cloud deployment, and customers expect that SaaS business can be deployed on Alibaba Cloud public cloud or Huawei's private cloud. And customers expect the underlying technologies to be open source or commercial products of standard cloud vendors.
16.png

Taking Ali’s SaaS business to go to the cloud to illustrate, the left side is the original internal system of Ali, and the right side is the system after the transformation. The goal of the transformation is to switch the dependent Ali’s internal system to open source software, and Ali RPC to Dubbo. Ali's internal Cache, Message, and Config are switched to Redis, RocketMq and Nacos respectively. It is expected that Dapr will be used to achieve the minimum cost of handover.

Since we want to use Dapr to accomplish this mission, the easiest and rude way is definitely to make the application rely on Dapr SDK, but this method is too expensive to modify, so we will adapt the underlying implementation while keeping the original API unchanged. To Dapr SDK. In this way, applications can directly use the original API to access Dapr, and only need to upgrade the corresponding dependent JAR package version. After the transformation, the developers are still programming for the original SDK, but the bottom layer has been replaced with Dapr's capability-oriented programming, so during the migration process, the application can use a set of codes without requiring maintenance for each cloud environment or different technologies Different branches. When Dapr Sidecar is used within the group, it will use rpc.yaml, cache.yaml, msg.yaml, config.yaml to activate the components, and on the public cloud, it will use dubbo.yaml, redis.yaml, rocketmq.yaml, nacos. .yaml file to activate the component implementation suitable for the Alibaba Cloud environment. This way of activating different components through different yaml files to shield the implementation of components brings great convenience to the multi-cloud deployment of SaaS services.

17.png

DingTalk is an important partner and promoter of Dapr, and cooperates with the cloud native team to promote Dapr's landing in DingTalk. By sinking some middleware capabilities to Dapr Sidecar, the realization of middleware with similar underlying capabilities is shielded. However, Dingding has its own business pain points. Dingding’s general business components are strong business binding and require specific business customization. This also leads to a low degree of reuse. So Dingding expects to improve the capabilities of some business components. Sink to Dapr. This allows different businesses to have the same programming experience, and component maintainers only need to maintain the Components implementation.

Dapr Outlook

The sinking of infrastructure has become a software development trend

The development history of software architecture is very exciting. Looking back at the history of the evolution of Alibaba's system architecture, one can understand the development history of domestic and even global software architecture. When Taobao was first established, it was a single application; with the development of the business scale, the system first upgraded the hardware by the Scale Up method; but soon found that this method encountered various problems, so in Microservice solutions were introduced in 2008; SOA solutions are distributed. For stability, observability and other aspects, high availability solutions such as fusing, isolation, and full link monitoring need to be introduced; next problems are faced How to make the business reach more than 99.99% available SLA at the computer room and IDC level, then there are solutions such as dual computer rooms in the same city and multiple activities in different places. With the continuous development of cloud technology, Alibaba embraces and guides the development of cloud-native technology, actively embraces cloud-native technology, and actively upgrades cloud-native technology based on K8s.

18.png

From this history, we can find that there are more and more new requirements for software architecture. The original underlying infrastructure cannot be completed and can only be completed by the application-side rich SDK. After K8s and containers gradually become the standard, the microservices and Some distributed capabilities are returned to the infrastructure. The future trend is the sinking of distributed capabilities represented by Service Mesh and Dapr, releasing the dividends of cloud and cloud-native technology development.

Demands of application developers in the cloud-native scenario

Future application developers should expect a development experience that is capability-oriented, has nothing to say, and is not tied to specific cloud vendors and technologies, and at the same time can achieve the cost advantage brought by extreme flexibility through the bonus of cloud technology. I believe that this ideal is still possible in one day. From the current perspective, how can we accomplish this goal?

  1. The Multiple Runtime concept can be truly implemented and can continue to develop;
  2. Taking Dapr as an example, it is expected that Dapr's API for distributed capabilities can be promoted as an industry standard, and this standard needs to be continuously developed;
  3. With the continuous development of K8s and Serverless technology, flexibility can be maximized in the future.

Dapr community direction

Finally, let’s take a look at Dapr’s community development:

1. Promote API standardization and integrate more distributed capabilities;
2. More Component integration, complete Dapr ecology;
3. More companies land, expand product boundaries and polish Dapr products to reach production availability;
4. Enter the CNCF, the de facto standard of the Cloud-native Multiple Runtime of the members.

Click https://developer.aliyun.com/community/cloudnative to learn more about cloud native content

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。