In the 2021 of service mesh, the word "stable" takes the lead. Whether it is the development of native communities or the implementation of industry practices, "stability" is the first priority. Without the Great Leap Forward-style architecture evolution and function changes in previous years, and more pragmatic and practical industry exploration and practice, the service grid in 2021 is growing from the rushing "teenager" and "traffic star" back then. For the real "power faction", it has gradually entered a mature stage and has been accepted by more industries, enterprises and standardization organizations. This article will review the 2021 service grid from the perspectives of community progress, practice implementation, industry standards, and technical ecology, to help readers understand the overall progress of service grids in the past year, and to provide some references for enterprise selection and implementation of service grids.

Community Progress: Stable and Pragmatic

In 2021, the Istio community will release a version every three months: 1.9, 1.10, 1.11, 1.12. The stable version release cycle shows that the development of the Istio community has become normal, and it also provides convenience for enterprises to choose the appropriate version. In general, in 2021, the Istio community will not release particularly significant architectural adjustments or innovation capabilities, and will provide better native support in terms of accessibility, operation and maintenance, and APIs:

1.9 - Better virtual machine integration (Beta), request classification (Beta), Kubernetes Service API support (Alpha), integration with external authorization systems (Experimental), etc. Among them, the virtual machine integration continues the continuous optimization of the experience of virtual machine access after the introduction of smart DNS in version 1.8 (solving the problem of cross-environment service name resolution), which further enhances the ability of the service mesh to manage non-container environments.

1.10 - Kubernetes resource discovery selector, stable revision tags, sidecar network changes, etc. The Kubernetes resource discovery selector can limit the configuration set that Istiod receives and processes from Kubernetes, and cooperates with Sidecar CRD/API resources to further optimize the amount of configuration from Istiod to Envoy.

1.11 - CNI plugin (Beta), external control plane (Beta), gateway injection, updates to revision and label deployment, support for Kubernetes multi-cluster services (experimental). Among them, the CNI plugin provides users with a solution to replace the istio-init container in the Kubernetes environment (no higher Kubernetes permissions are required); the external control plane can provide users with a grid control plane deployed in the management cluster; updates to revision and label deployment It allows users to deploy and upgrade Istio itself in Grayscale, reducing Istio's own operation and maintenance risks.

1.12 - WebAssembly API, Telemetry API, Kubernetes Gateway API. Among them, WasmPlugin was added as a WebAssembly API to improve the experience of Istio using WebAssembly for plug-in expansion.

Looking at the four versions released by the Istio community in 2021, it is not difficult to see:

No particularly significant structural adjustments and innovation capabilities have been released: Enterprises have no special threshold for choosing Istio versions.

Improved ease of access: Added support for virtual machines, CNI plug-ins, WebAssembly, etc., to provide native capability support for more complex business deployment environments, more demanding container environments, and more language expansion requirements.

Operation and maintenance improvement: stable revision tags, external control plane, etc., provide better native support for Istio's own operation and maintenance, multi-cluster management and control.

API standardization: including WebAssembly API, Kubernetes Gateway API, Kubernetes Service API support, etc. Whether it is the standardization of Istio's own API or the support for Kubernetes standard APIs, the Istio community is making continuous efforts in API standardization.

Practical implementation: industry extension

Service grid technology originated from large Internet companies (Google, IBM, Twitter/Buoyant), and most of the early applications of service grid technology are Internet companies: Internet companies rely on their profound skills and continuous investment in technology. In the past few years, the leap from initial exploration of service grids to large-scale production applications has been completed; small and medium-sized Internet companies have also followed the pace of large manufacturers, followed the wave of cloud-native technology, and completed the "first experience" of service grids. In 2021, companies in more industries will begin to try to implement service grids.

Enterprise appeal

Taking the financial industry known for its large scale, high stability and strong security as an example, in 2021, the infrastructure teams of many large state-owned banks, leading joint-stock banks, and leading securities companies in China will begin to introduce service grid technology to conduct technical research and platform development. Construction, business trial. Here, we summarize the typical demands of financial industry companies for service grid technology based on the many leading financial industry companies we have served in 2021, as well as other public technical materials.

zero threshold

In the article on the review of microservices in 2020, we proposed that "smooth landing support" is one of the two key elements for enterprises to implement service grids. This is especially true in the financial industry. Zero threshold for service grid implementation is one of the core demands of enterprises.

We have summarized the "three elements" that service grids need to support the implementation of enterprises: communication protocols, registration centers, and deployment environments.

Communication protocol: The service communication protocol that the service grid can support, such as HTTP, gRPC, Dubbo, etc., and there are also private RPC protocols with industry attributes;

Registry: a registry that can be managed by the service grid, including common Eureka, Consul, Nacos, Zookeeper, and Kubernetes (ETCD);

Deployment environment: The business deployment environment that the service grid can support, in addition to the natural cloud-native Kubernetes + Docker, also needs to be treated equally to the virtual machine and physical machine where the legacy system is located.

Only after meeting the "three elements" can the service grid achieve the "passing line" of business landing.

In addition, we found that there are more "blockers" in the financial industry:

Strict environmental control: Deployment platforms (containers, virtual machines, physical machines) and basic platforms (microservices, middleware) belong to different teams, and due to factors such as the division of corporate responsibilities, financial compliance requirements, etc., the implementation of service grids is affected. more restrictions such as network environment, management authority, financial regulations, etc.;

Complex stock systems: Most of the leading financial industry companies already have a relatively complete distributed system, but there are also many complex and heterogeneous outsourcing and legacy systems. Due to multiple development languages and multiple communication protocols, the code cannot be modified, Without the registration and discovery mechanism and other factors, many systems cannot be managed in the existing system, and become "islands" of the enterprise's distributed system.

Architecture Scenario Matching

Different from the traditional microservice framework that focuses on covering the business scenarios of service governance capabilities, the service mesh focuses on solving the problem of enterprise architecture scenarios. In addition to realizing the management and governance capabilities of microservices under the cloud native system, it also needs to cover the requirements of architecture scenarios such as unified governance of heterogeneous applications and migration of legacy systems, so as to truly solve the overall problems that exist after enterprise microservices.

We summarize the typical demands of enterprises in the financial industry in terms of architectural scenarios as follows:

Multi-cluster, multi-room business management: including providing normal service discovery, invocation, governance, cross-regional disaster recovery, etc.;

Existing monolithic and micro-service architecture, long-term, smooth, and stable migration and evolution to cloud-native service grid architecture: In a business-agnostic manner, gradually evolve from the existing architecture to a service grid architecture in a grayscale manner. During the migration process, service interoperability, Governable, observable, and guaranteed high SLAs.

core value

After the initial completion of the service grid awareness, enterprise users often ask their souls: why should they go to the service grid? What is the value of a service mesh?

Generally speaking, the general "standard answer" of service mesh core values is:

Businesses do not need to perceive microservice components: Microservice architecture support, network communication, governance and other related capabilities sink to the infrastructure layer, and business departments do not need to invest in specialized development and maintenance, which can effectively reduce R&D and maintenance costs under the microservice architecture;

Support multiple development languages and frameworks: Service Grid naturally does not limit development languages and development frameworks, and provides multi-language service governance capabilities;

Zero cost of framework upgrade: support hot framework upgrade, reduce middleware and technical framework client, SDK upgrade cost;

Unified management and evolution of the microservice system: Unified management and evolution of the existing microservice cluster, legacy system, and outsourcing system microservice system.

For different teams within an enterprise, the focus of service mesh value will be different:

Infrastructure/platform R&D team: pay more attention to the architectural scenarios covered by service mesh

Independent of multiple development languages and frameworks, it can manage access to various business applications;

Zero cost of framework upgrade, no need for business restart or perception;

The microservice system is managed and evolved in a unified way, and the existing microservice clusters, legacy systems, outsourcing systems, etc.

Business R&D team: pay more attention to the business scenarios covered by service mesh

One-click access to a full set of governance and monitoring capabilities for microservice governance, such as fusing, current limiting, downgrade, fault tolerance, fault injection, indicator monitoring, link tracking, etc.;

Legacy and outsourced systems can be incorporated into unified governance, with the same governance and monitoring capabilities, and interconnected with other business microservices;

Without the need to perceive microservice components, business developers no longer need to learn, research and maintain microservice-related technologies and frameworks.

face the challenge

Even if the Istio version becomes stable, many Internet companies have successfully completed the implementation of service grids, and more industry enterprises still face challenges in implementing service grids.

Technical aspects: zero threshold access is not easy

From a technical point of view, the realization of "zero threshold" faces three challenges:

Communication protocol extension - As the first of the "three elements" of the enterprise landing service grid, it is a huge project to realize the full set of capabilities such as proxy, parsing, governance, and observability of communication protocols, especially for those designed far away from HTTP and gRPC. Private RPC protocols such as general protocols (especially common in the financial industry) need to be implemented with clever and complete extension mechanisms.

Custom plug-in extension - Most developers cannot directly write Envoy C++ extension code. The native Lua language extension provided by Envoy is weak, and the performance of WASM (WebAssembly), which is highly expected by the community, is still far from production. There needs to be a really useful and production-ready Envoy custom plugin extension mechanism.

Virtual machine/physical machine environment management - Even though the Istio community has been improving the virtual machine/physical machine environment management experience of service meshes, various public cloud vendors also provide corresponding "remnant version" capabilities, but they are deployed in non- The container business is always like a "second-class citizen" - it is difficult to obtain the same service mesh capabilities as the containerized environment deployment business, requiring more complete and compatible non-container environment sidecar management, traffic interception, etc. landing plan.

Scene surface: It is not easy to cover complex scenes

Enterprise business in the financial industry is often deployed and maintained under various environments and regulatory constraints. In addition to the complexity of the business system itself, the combination of stock, legacy, and outsourcing systems exists. The financial industry naturally presents challenges in scenario coverage when service grids are implemented:

Multi-cluster and multi-computer room deployment of business, interconnection, unified governance, abnormal disaster recovery, various high-availability guarantees, etc., of cross-cluster and cross-computer room calls all require service grid systems to have adaptability;

The smooth evolution of the business architecture, from the existing monolithic and micro-service architecture, to the cloud-native service grid architecture, including the long-term coexistence, service discovery, mutual Access, governance, and observation require the ability to adapt and high SLA guarantees to truly realize business architecture migration scenarios.

Industry standard: set sail

After the service grid technology has gradually stabilized in terms of community progress and practical implementation, the corresponding industry standards and standard platforms have also been established and set sail.

ICT standards

In July 2021, at the 2021 Trusted Cloud Conference hosted by the China Academy of Information and Communications Technology, the "Service Grid Technical Capability Requirements" standard was officially released, and Alibaba, NetEase, Byte, and Flomesh passed the first batch of evaluations. Earned the highest level of service mesh assessment. Interestingly, the first batch of four companies can be said to be typical representatives of large cloud computing companies, established Internet companies, new Internet companies, and technology-based startups. Standard and landing attention.

Standard platform

In 2021, cloud computing vendors will gradually improve and mature their service grid standard platforms. Enterprises can choose standard platforms as needed, or build service grids jointly with vendors.

Different manufacturers provide slightly different types of standard platforms:

Native Istio resources + public cloud infrastructure + ecological integration: focus on compatibility with native Istio and integration with the existing ecosystem of public cloud;

Native Istio platformization + privatization deployment + tripartite integration: based on Istio expansion and enhancement, shielding the complexity of native Istio, focusing on the unified control and governance of the microservice system, as well as the adaptation, compatibility and integration of the enterprise privatization environment;

Part of the system or the whole system of self-developed service grids: Unrestricted and open source communities such as Istio, strengthen the targeting of the weaknesses of open source service grids.

Different platforms have their own applicable scenarios and strengths and weaknesses, and enterprises can choose according to their own circumstances.

Technology ecology: a hundred schools of thought contend

The service grid will enter a stable period in 2021, and the service grid technology ecosystem will also flourish in this year.

open source project

In 2021, a large number of Istio-related excellent projects will be open sourced to enhance Istio in terms of ease of use, scalability, and operation and maintenance:

Slime: An intelligent service mesh manager based on Istio that adds a non-intrusive management plane to Istio. Open sourced by NetEase in January 2021.

GetMesh: Istio integration and command line management tool for Istio multi-version management. Open sourced by Tetrate in February 2021.

Aeraki: Manages any Layer 7 workload for Istio, provides multi-protocol scaling support for service meshes. Open sourced by Tencent in March 2021.

Layotto: Cloud-native application runtime that acts as the data plane for Istio. Open sourced by Ant in June 2021.

Hango Gateway: An API gateway built on Envoy and Istio, which is naturally compatible with Istio and provides native high performance and rich proxy capabilities. Open sourced by NetEase in August 2021.

The emergence of many open source projects in the service grid ecosystem has confirmed the vitality of the service grid field.

multiple runtimes

Similar to the idea of service mesh sinking microservice governance capabilities to the infrastructure layer (Sidecar), Multi-Runtime (Multi-Runtime) was proposed by Bilgin Ibryam in 2020, which summarizes and summarizes the various forms of the Sidecar model. sublimation. The characteristics of multi-runtime can be summarized as follows:

Capability: Provide broader distributed capabilities than service mesh, such as middleware proxy, message pub/sub, etc.;

Deployment: It can correspond to business 1:1 (per-pod) or 1:N (per-node), and can be deployed in a variety of environments as needed, and components can be combined;

Interaction: Communicate with the application through standard APIs, without emphasizing non-intrusive business, and there will be SDKs that carry standard APIs in the application.

A typical multi-runtime open source framework is Dapr (Distributed Application Runtime), which is open sourced by Microsoft. It will usher in the iconic 1.0 version in 2021 and enter the CNCF Sandbox for incubation. It is still developing rapidly.

From the perspective of landing practice, multi-runtime shows a good potential and development trend in 2021:

The concept is advanced and may be the future trend of distributed architecture;

Large factories dominate, the community develops rapidly, and many large factories have entered the bureau to explore;

The overall maturity is not high, and there are still deficiencies in point-to-point service communication governance, capability integrity, and API stability;

It can be ecologically integrated with existing technologies such as service grids to make up for shortcomings.

eBPF

The advent of eBPF technology makes it possible to program and run sandboxed programs in the Linux kernel without changing the kernel source code or loading kernel modules. This allows developers to enhance the observability of the system, optimize the network and its security from the kernel. In the field of service mesh, eBPF can be used for sidecar network acceleration, and can observe the kernel message queue, task queue, network packet information, network connection and other deeper information from the bottom layer.

In 2021, Cilium (eBPF open source framework) proposed the idea of using eBPF instead of Sidecar to implement a kernel-level service mesh (data plane proxy) to solve the problems of deployment resource consumption and delay performance loss brought by independent Sidecar, and realize the real In a sense, traffic governance and observation capabilities sink to the infrastructure layer. However, Cilium's bold idea soon received a "counterattack" from the "traditional" service mesh camp, citing the many limitations of eBPF's ability to implement service proxying, complex operations, high protocol processing complexity, and kernel version dependencies. etc.

In any case, the integration of eBPF technology into the service grid ecosystem has become a new trend. Even if it cannot truly replace Sidecar, eBPF can also be used as a powerful supplement to Sidecar, enabling the two to be inseparable on the traffic link.

Proxyless

At the beginning of its birth, the service grid used an independent Sidecar Proxy to be responsible for the proxy, governance, and observation of traffic. The service grid implementation framework also used an independent proxy method to organize data plane capabilities by default, and was aligned with the traditional microservice framework in the application process. Clear boundaries, discuss the pros and cons, it seems that the Proxy mode is the standard mode of the service mesh data plane. In 2021, the "dimension wall" between the application in-process framework and the independent Sidecar Proxy will be broken, and the concept of Proxyless will be mentioned more and more.

WHY Proxyless (essentially for the "cons" of the service mesh independent Sidecar Proxy mode):

Performance issues: additional deployment resource overhead and latency performance overhead brought by independent Proxy;

Traffic interception: Most of the traffic interception of independent proxy needs to cooperate with IPTables and other technologies, requires management rights, complex logic, and difficult troubleshooting;

Governance granularity: Independent Proxy works outside the application process and is stateless, and cannot manage and observe the programs and methods in the application process.

WHAT Proxyless (can provide supplementary capabilities for various distributed scenarios):

Service mesh optimization: Provides fine-grained governance, monitoring, and traffic interception capabilities within applications;

Multi-runtime operation: Provide standard SDK in the application to provide operation interface for infrastructure resources for business;

Capabilities continue to sink: traffic processing, governance, and observation are implemented in the operating system kernel.

HOW Proxyless (several common implementations):

Framework/SDK: classic usage, back to the past;

Non-intrusive Agent: Implement business code enhancement in a non-intrusive way. For the principle, please refer to the introduction in the "Service Framework: Non-Intrusive Agent Service Governance" in the article "Service Framework: Non-Intrusive Agent Service Governance" in our previous article, NetEase Qingzhou Dual-Engine Multi-Mode Service Governance Evolution Practice;

Native RPC support: The new version of gRPC directly provides governance functions and supports the standard xDS protocol that directly connects to the control plane;

eBPF: Processes, manages, and observes traffic in the Linux kernel.

From the perspective of architecture evolution, Proxyless is suspected of "countercurrent" development. However, from a pragmatic point of view, the capabilities that Proxyless brings to Proxy may better help enterprises to complete the gradual migration from traditional architecture to cloud-native architecture.

future outlook

The review of service mesh 2021 has come to an end here, and we are full of confidence in the future of service mesh. At the end of this article, we give our vision for the future of service mesh:

zero threshold

With the gradual improvement and maturity of service grid technology and the accumulation of landing experience in more and more industries, the challenges faced by technology and scenarios will eventually be overcome, and the threshold for service grid landing will gradually tend to zero.

standardization

The technical capabilities and scenario coverage of service grids can be highly abstracted and generalized, and service grid platforms/products will be highly standardized, making it easier for enterprises to choose service grid platforms/products.

fully unified

Service mesh technologies represented by Envoy and Istio will help achieve the unification of related software fields. For example, more L7 traffic proxies will be built with Envoy as the core, and the data plane and the control plane will interact with the xDS protocol. The global unified governance of distributed systems that enterprise architects want to achieve will no longer be an extravagant hope.

Ecological integration: Proxyless + Proxy + eBPF + multi-runtime

The different ecosystems of the service mesh will not be antagonistic, and will eventually form a "joint force" in a "pragmatic" way, and win-win with each other: Proxyless -> Proxy -> eBPF on the traffic link cooperates with complementary capabilities; Capability shortcomings can integrate the mature capabilities of service grids to accelerate their own development.

References (special thanks to many practitioners and sharers in the field of service mesh):

From service framework to service mesh, NetEase Qingzhou dual-engine multi-mode service governance evolution practice: https://www.infoq.cn/article/KNp1ibj40vS8IIZCizMW

Interpretation of Microservices in 2020: The framework is on the left, the grid is on the right, and where is the cloud-native era: 161dbfca9206eb https://www.infoq.cn/article/4Zog2lMBqKjAeMTc8Add

Traffic portal in the cloud native era: Envoy Gateway: https://www.infoq.cn/article/SF5sl4IlUtUxuED3Musl

Istio 1.9 release - focus on improving Istio's Day2 operations: https://mp.weixin.qq.com/s/E7iwBF6hhPm5aTukTlTCMg

Istio 1.10 release and official website revision: https://mp.weixin.qq.com/s/Lq6zF90FR-ohT9ON-88Z_Q

Istio 1.11 release: https://mp.weixin.qq.com/s/QkLUFOCQz2AWt2En-G-VQg

Istio 1.12 release: https://mp.weixin.qq.com/s/Q52IQrXxxHEn2c8rkAVTgA

Service mesh without sidecar proxy based on gRPC and Istio: https://mp.weixin.qq.com/s/aYwo2criOotqNp8lD39QAA

It's 2021, what is the community discussing about service mesh: https://mp.weixin.qq.com/s/ZDDC4YAebbdws8Md9zCrqQ

Dapr v1.0 Outlook: From servicemesh to : 161dbfca920853 https://skyao.io/talk/202103-dapr-from-servicemesh-to-cloudnative

Farewell to Sidecar - Unlocking Kernel-Level Service Meshes with EBPF: https://mp.weixin.qq.com/s/W9NySdKnxuQ6S917QQn3PA

Translation: Will service meshes use eBPF? Yes, but the Envoy proxy will live on: https://mp.weixin.qq.com/s/iZYXPec7Lh0fhflA42d8gA

about the author:

Pei Fei, senior technical expert and senior architect of NetEase Shufan. With more than 10 years of experience in enterprise-level platform architecture and development, he is currently mainly responsible for the NetEase Shufan Qingzhou micro-service team, focusing on the research and implementation of enterprise micro-service architecture and cloud native technology. Led the team to complete several projects such as Qingzhou service grid, micro-service framework, API gateway, etc. in NetEase Group and commercialized product output, and led the construction of several cloud-native open source projects such as Slime and Hango.


网易数帆
391 声望550 粉丝

网易数智旗下全链路大数据生产力平台,聚焦全链路数据开发、治理及分析,为企业量身打造稳定、可控、创新的数据生产力平台,服务“看数”、“管数”、“用数”等业务场景,盘活数据资产,释放数据价值。