Managed Service Networks: Application Architecture Evolution in the Cloud-Native Era

Author: Wang Xining

The content of this article is based on the author's speech at the 2022 Cloud Native Industry Conference.

background

Review the evolution of the application service architecture system. From the perspective of the processing method of the service caller and the provider, it can be divided into three stages.

title=

The first stage is centralized load balancing , that is, the service caller is routed to the corresponding service provider through an external load balancing. Its advantages are obvious, it is non-intrusive to the application itself, it can support multi-language and multi-framework to develop and implement the application itself, load balancing is unified and centralized management, and the whole deployment is simple. However, the disadvantage is also very significant. Because it is centralized, the scalability is limited, and the service governance capability of this centralized load balancing is relatively weak.

The second stage refers to the distributed governance of microservices , that is, the built-in governance capabilities of service callers are integrated into applications in the form of SDK libraries. The advantage is that the whole scalability is better, and the service governance ability is strong, but at the same time, it will pay attention to its disadvantages, including the intrusion of the application itself, the difficulty of multi-language support because of its dependence on the SDK, and the distributed management and deployment. Complications, etc.

The third stage is the current service mesh technology . By sidecarizing these service governance capabilities, the service governance capabilities can be decoupled from the application itself, and multiple programming languages can be better supported. At the same time, these sidecar capabilities do not need to depend on a specific technical framework. These sidecar proxies form a meshed data plane through which all inter-service traffic is processed and observed. The control face provides unified management of these sidecar proxies. But this brings a certain complexity.

The following figure is an architectural diagram of a service mesh. As mentioned earlier, under the service mesh technology, each application service instance will be accompanied by a sidecar proxy, and the business code is not aware of the existence of the sidecar. This Sidecar proxy is responsible for intercepting application traffic and provides three functions: traffic governance, security, and observability.

title=

In the cloud native application model, an application may contain several services, and each service is composed of several instances, then the sidecar proxies of these hundreds of applications form a data plane, which is the figure in the figure data plane layer.

And how to manage these sidecar proxies uniformly, this is the problem to be solved by the control plane part of the service mesh. The control plane is the brain of the service grid. It is responsible for distributing configuration to the sidecar proxy of the data plane, managing how the components of the data plane are executed, and also providing a unified API for grid users to easily manipulate grid management capabilities.

Typically, with a service mesh enabled, developers, operators, and SRE teams address application service management in a unified, declarative manner.

Cloud-native application infrastructure powered by service mesh

As a basic core technology for managing application service communication, service mesh brings secure, reliable, fast, and application-agnostic traffic routing, security, and observability capabilities for calls between application services.

It can be seen that the cloud-native application infrastructure under the blessing of service mesh brings important advantages, which are divided into six aspects.

title=

One of the advantages: unified governance of heterogeneous services

• Interoperability and governance of multi-language and multi-framework, dual-mode architecture integrated with traditional microservice system

• Refined multi-protocol traffic control, unified management of east-west and north-south traffic

• Automatic service discovery for unified heterogeneous computing infrastructure

The second advantage: end-to-end observability

• Integrated intelligent operation and maintenance integrating log, monitoring and tracking

• Intuitive and easy-to-use visual grid topology, color-coded health identification system

• Built-in best practices, self-service grid diagnostics

Advantage 3: Zero Trust Security

• End-to-end mTLS encryption, attribute-based access control (ABAC)

• OPA declarative policy engine, globally unique workload identity (Identity)

• Full audit history and insights with dashboards

Advantage 4: Soft and hard combination performance optimization

• The first service mesh platform based on Intel Multi-Buffer technology to enhance TLS encryption and decryption

• NFD automatically detects hardware features, adaptively supports features such as AVX instruction set, QAT acceleration, etc.

• The first batch to pass the advanced certification of trusted cloud service grid platform and performance evaluation

Benefit #5: SLO-Driven Application Resiliency

• Service Level Objective (SLO) Policy

• Automatic elastic scaling of application services based on observable data

• Automatic switching and fault recovery under multi-cluster traffic bursts

Advantage 6: Out-of-the-box expansion & ecological compatibility

• Out-of-the-box EnvoyFilter plugin marketplace, WebAssembly plugin lifecycle management

• Unified integration with Proxyless mode, support SDK, kernel eBPF mode

• Compatible with Istio ecosystem, support Serverless/Knative, AI Serving/KServe

The following diagram is the current architecture of the service mesh ASM product. As the industry's first fully managed Istio-compatible service mesh product, ASM has maintained consistency with the community and industry trends from the very beginning. The components of the control plane are hosted on the Alibaba Cloud side and are independent from the user clusters on the data plane side. . ASM products are customized and implemented based on Istio, which is open sourced by the community, and provide component capabilities for supporting refined traffic management and security management on the managed control plane side. Through the managed mode, the lifecycle management of the Istio components and the managed K8s cluster is decoupled, making the architecture more flexible and improving the scalability of the system.

title=

Managed service mesh ASM provides unified traffic management capabilities, unified service security capabilities, unified service observability capabilities, and unified WebAssembly-based implementation of unified management infrastructure for multiple heterogeneous computing services. Proxy scalable capabilities to build enterprise-level capabilities.

How the next leg of service mesh technology is evolving

The fusion of Sidecar Proxy and Proxyless mode can be summed up in one sentence, that is, the same control plane supports different data plane forms . The same control plane refers to using the ASM hosting side components as a unified standard form of control entry. This control plane runs on the Alibaba Cloud side and belongs to the hosted hosting mode.

title=

The data plane supports the integration of Sidecar Proxy and Proxyless mode. Although the components of the data plane are not hosted in the managed mode, they are also in the managed mode. That is to say, the life cycle of these components is also managed by ASM, including distribution to the data plane and upgrade. , uninstall, etc.

Specifically, in the Sidecar Proxy mode, in addition to the current standard Envoy proxy, our architecture can easily support other Sidecars, such as Dapr Sidecar, which is currently adopted by Microsoft OSM+Dapr.

In Proxyless mode, in order to improve QPS and reduce latency, SDK can be used. For example, gRPC already supports xDS protocol clients, and our Dubbo team is also on this road. I think this year, the North Latitude team and I can make some breakthroughs at this point together.

Another proxyless mode refers to - kernel eBPF + Node-level Proxy mode. This mode is a fundamental change to the sidecar mode, a node has only one Proxy, and the ability to offload to the node. In this part, we will also have some products launched this year.

Around the service mesh technology, there are a series of application-centric ecosystems in the industry. Among them, Alibaba Cloud Managed Service Mesh ASM supports the following ecosystems. Listed below:

Lifecycle management and DevOps innovation for modern software development

The core principles of service mesh (security, reliability, and observability) underpin modern software development lifecycle management and DevOps innovations, providing insights into how to architect, develop, automate, deploy, and operate in a cloud computing environment Flexibility, scalability and testability capabilities. As such, service meshes provide a solid foundation for handling modern software development, and any team building and deploying applications for Kubernetes should seriously consider implementing a service mesh.

One of the essential components of DevOps is creating continuous integration and deployment (CI/CD) to deliver containerized application code to production systems faster and more reliably. Enabling canary or blue-green deployments in the CI/CD Pipeline provides more robust testing of new application versions in production systems with a safe rollback strategy. In this case, a service mesh facilitates canary deployment in production systems. Currently, Alibaba Cloud Service Grid ASM supports integration with ArgoCD, Argo Rollout, KubeVela, cloud effect, Flagger and other systems to achieve blue-green or canary release of applications, as follows:

The main responsibility of ArgoCD [ 1] is to monitor the changes of the application orchestration in the Git repository, compare the real running status of the applications in the cluster, and automatically/manually pull the changes of the application orchestration to the deployment cluster. How to integrate ArgoCD in Alibaba Cloud Service Grid ASM for application release and update, which simplifies operation and maintenance costs.

Argo Rollouts [ 2] provides more powerful blue-green, canary deployment capabilities. In practice the two can be combined to provide incremental delivery capabilities based on GitOps.

KubeVela [ 3] is an out-of-the-box, modern application delivery and management platform. Using service mesh ASM combined with KubeVela can realize the gradual grayscale release of the application, so as to achieve the purpose of upgrading the application smoothly.

Alibaba Cloud's cloud efficiency pipeline Flow [ 4] provides the blue-green release of Kubernetes applications based on Alibaba Cloud Service Grid ASM.

Flagger [ 5] is another progressive delivery tool that automates the release process of applications running on Kubernetes. It reduces the risk of introducing new software versions in production by gradually shifting traffic to new versions while measuring metrics and running compliance tests. Alibaba Cloud Service Mesh ASM already supports this progressive release capability through Flagger.

Microservice framework compatible [6]

It supports the seamless migration of Spring Boot/Cloud applications to service grids for unified management and governance, and provides the ability to solve typical problems in the integration process, including how services inside and outside the container cluster can communicate with each other, and how different language services can communicate with each other. and other common scenarios.

Serverless containers and autoscaling based on traffic patterns [7]

Serverless and Service Mesh are two popular cloud-native technologies that customers are exploring to create value from. As we delve deeper into these solutions with our clients, questions often arise about the intersection between these two popular technologies and how they complement each other. Can we leverage Service Mesh to secure, observe and expose our Knative serverless application? Support Knative-based serverless containers on a managed service mesh ASM technology platform, and automatic scaling capabilities based on traffic patterns, which can replace how to simplify the complexity of maintaining the underlying infrastructure for users through managed service meshes, allowing Users can easily build their own serverless platforms.

AI Serving [ 8]

Kubeflow Serving is a community project initiated by Google to support machine learning based on Kubernetes. Its next-generation name is changed to KServe. The purpose of this project is to support different machine learning frameworks in a cloud-native way and realize traffic based on service mesh. Updates and rollbacks of control and model versions.

Zero Trust Security and Policy As Code[9]

On top of using Kubernetes Network Policy to implement three-layer network security control, service mesh ASM provides OPA (Open Policy Agent)-based policy control including peer identity and request authentication capabilities, Istio authorization policies, and more refined management. ability.

Specifically, building a service mesh-based zero-trust security capability system includes the following aspects:

The foundation of zero trust: workload identity ; how to provide a unified identity for cloud-native workloads; ASM products provide an easy-to-use identity definition for each workload under the service mesh, and provide customized mechanisms for expansion according to specific scenarios Identity construction system, compatible with community SPIFFE standard;
Carrier of zero trust: security certificate , ASM products provide mechanisms such as how to issue certificates and manage the life cycle and rotation of certificates. Identity is established through X509 TLS certificates, which are used by each agent. And provide certificate and private key rotation;
Zero-trust engine: Policy execution , the policy-based trust engine is the key to building zero-trust. In addition to supporting Istio RBAC authorization policies, ASM products also provide more fine-grained authorization policies based on OPA;
Zero-trust insight: visualization and analysis , ASM products provide an observable mechanism to monitor the logs and indicators of policy execution to judge the implementation of each policy, etc.;

The transformation brings a lot of business value to cloud-native applications, one of which is elastic expansion and contraction, which can better cope with traffic peaks and troughs, and achieve the purpose of reducing costs and improving efficiency. Service mesh ASM provides a non-intrusive ability to generate telemetry data for communication between application services, and metrics acquisition does not require modification of the application logic itself.

title=

Based on the four golden metrics monitored (latency, traffic, errors, and saturation), Service Mesh ASM generates a series of metrics for managed services, supporting multiple protocols, including HTTP, HTTP/2, GRPC, TCP, and more.

In addition, the service mesh has built-in more than 20 monitoring tags, supports all Envoy proxy metric attribute definitions, common expression language CEL, and supports custom Istio-generated metrics.

At the same time, we are also exploring new scenarios driven by service mesh. Here is an example of AI Serving [ 10 ] .

title=

This demand source also comes from our actual customers. The customer's usage scenario is to run KServe on the service mesh technology to implement AI services. KServe runs smoothly on a service mesh, enabling blue/green and canary deployment of model services, traffic distribution between revisions, and more. It supports automatic scaling of serverless inference workload deployment, high scalability, and concurrency-based intelligent load routing.

Summarize

As the industry's first fully managed Istio-compatible Alibaba Cloud service mesh product, ASM has maintained consistency with the community and industry trends from the very beginning. Cluster independent. ASM products are customized and implemented based on community Istio, and provide component capabilities to support refined traffic management and security management on the managed control plane side. Through the managed mode, the lifecycle management of the Istio components and the managed K8s cluster is decoupled, making the architecture more flexible and improving the scalability of the system.

From April 1, 2022, Alibaba Cloud Service Mesh ASM will officially launch a commercial version, providing richer capabilities, larger scale support, and better technical support to better meet customers' different demand scenarios.

Reference link:

[1] ArgoCD:

https://developer.aliyun.com/article/971976

[2] Argo Rollouts:

https://developer.aliyun.com/article/971975

[3] KubeVela:

https://help.aliyun.com/document_detail/337899.html

[4] Alibaba Cloud Cloud Effect Pipeline Flow:

https://help.aliyun.com/document_detail/160071.html

[5] Flagger:

https://docs.flagger.app/install/flagger-install-on-alibaba-servicemesh

[6] Microservice framework compatible: