Abstract: Cloud native technology and cloud market continue to mature, multi-cloud, multi-cluster deployment has become the norm, and the future will be the era of programmatic multi-cloud management services.
This article is shared from the HUAWEI CLOUD community " HUAWEI CLOUD MCP multi-cloud container governance and practice ", the original author: technical torchbearer.
At the Huawei Developer Conference (Cloud) 2021, Huawei Cloud CTO Zhang Yuxin announced that the multi-cloud container orchestration project Karmada was officially open source. On April 26, HUAWEI CLOUD native open source leader Wang Zefeng and HUAWEI CLOUD senior engineer Xu Zhonghu delivered a keynote speech on "Huawei Cloud MCP Multi-cloud and Cross-cloud Container Governance and Practice", sharing the development of Huawei Cloud MCP and the core of the Karmada project Black technology.
speech mainly contains five aspects:
1) Current status and challenges of cloud-native multi-cloud
2) HUAWEI CLOUD MCP history
3) Karmada project
4) Multi-cluster service governance
5) Summary and outlook
The status and challenges of cloud-native multi-cloud
According to the latest survey report, more than 93% of enterprises are using the services of multiple cloud vendors at the same time. Cloud native technology and the cloud market continue to mature, and multi-cloud and multi-cluster deployment has become the norm. The future will be the era of programmatic multi-cloud management services. When the business is deployed to a multi-cloud or multi-cluster, it is actually divided into several stages:
Typical stage 1: Multi-cloud deployment in multiple locations, unified management and operation and maintenance, reducing duplication of labor
In the first stage, we believe that the unified operation and maintenance management is deployed in multiple locations, which can be understood as multiple interoperable islands. Interoperability means that in different environments and different clouds, the software technology station used is a set of standardized Yes, when switching between public cloud, public cloud 1, and public cloud 2, the command request input for the operation command is the same, but there is no business correlation between them or the business correlation is very weak, so do it at this time Unified application delivery, such as deployment, operation and maintenance, either manually executes repetitive commands or scripts, or the simplest is to use a set of CI/CD system stacks. Because most of the business at this stage is relatively fixed, which public cloud, which data center, and which computer room is deployed in, does not require too much dynamics and variability.
Typical Phase 2: Multi-cloud unified resource pool to deal with business pressure fluctuations
The second stage is the unified resource pool, which has certain demands on the dynamics of the resource pool. Generally speaking, the application delivery we think of here is not a simple CI/CD, because we hope that after the dynamics, the traffic can also be migrated. In this case, the above application delivery needs to have the ability to automatically schedule, and the traffic can be obtained by itself according to the distribution of the number of instances. Of course, there will be other situations, such as using some simple scripting to handle your traffic, and it can also be considered that the second stage has been reached. Of course, in an ideal state, we think these should be fully automated.
Typical Phase 3: Multi-cloud collaboration, unified application platform, and business deployment across clouds
The third stage is what we think is the final form of multi-cloud and multi-cluster that we think is currently foreseeable, and it is also what we think is an ideal form. In fact, whether you use clusters, Kubernetes, or previous virtual machines, judging from the entire history of cloud computing, it has been constantly breaking through boundaries, or redefining boundaries. For example, when installing some new applications and deploying new services at the earliest, a physical server was required, and the boundary was very inflexible. Later, when virtual machines and containers became available, the granularity became smaller, but in cross-machine and cross-environmental access Under the form, many new challenges have arisen, so the emergence of Kubernetes has actually redrawn a large cluster as the boundary after producing so many delicate containers.
And multi-cloud is actually based on these constantly evolving boundaries. When it is limited by data centers or clouds at a certain stage of development, multi-cloud technology can be used to break through the boundaries of the cloud, break through the boundaries of the cluster, and truly achieve business. Applications can be freely deployed and migrated between clusters and clouds.
But in fact, under the topic of cloud native, there are still many challenges in multi-cloud. The reasons are as follows:
- Numerous clusters: tedious and repetitive cluster configuration, cluster management differences of cloud vendors, fragmented API access
- Business decentralization: differentiated configuration of applications in each cluster, cross-cloud access to services, and application synchronization between clusters
- Cluster boundary restrictions: resource scheduling is limited by the cluster, application availability is limited by the cluster, and elastic scaling is limited by the cluster
- Vendor binding: "stickiness" of business deployment, lack of automatic failover, lack of neutral open source multi-cluster orchestration projects
HUAWEI CLOUD MCP History
In Kubernetes, the concept of multi-cluster appeared very early. Huawei was also one of the earliest initiators. In 2015, we proposed the concept of Federation in the community. The v1 version of Federation was developed in 2016 and launched in 2017. Independent, developed as an independent sub-project of Kubernetes. In mid-2017, we started the development of Federation v2. In terms of commercialization, Huawei actually launched the entire commercialization platform in the middle of 2018 and provided commercial capabilities at the end of the year. However, we also found some problems in the process of serving customers for a long time. Therefore, in 2020, we launched The brand-new engine Karmada.
Karmada is developed based on Kubernetes Federation v1 and v2. It can run cloud-native applications across multiple Kubernetes clusters and clouds without requiring application changes. By directly using the Kubernetes native API and providing advanced scheduling functions, Karmada can realize a truly open multi-cloud Kubernetes.
Karmada project
The above picture shows the perspective of a multi-cloud and multi-cluster technology station that we think should be presented in the open source community. The gray box in the picture is also all the capabilities that Karmada wants to cover. From the perspective of data plane, storage and operation and maintenance related dimensions, we are going to solve the multi-cloud multi-cluster of container network, multi-cloud multi-cluster of service discovery and even traffic management, and the persistence of original data. These will be the Karmada project in the community. The scope to be covered.
In the initial stage, we will mainly focus on several aspects. One is compatibility with the K8s native API. This feature is actually a slightly larger obstacle to the original Federation's v2, because everyone is used to using the k8s API instead of the new API, so When we are working on the new Karmada project, we directly use the native API to provide multi-cluster application deployment capabilities.
In terms of cluster synchronization, we will support a variety of network modes, including the control plane in the public cloud, the sub-cluster in the private cloud, or vice versa, or even the edge scenario. The Karmada project can also be used to cover, and we will build in Out-of-the-box capabilities to achieve the lowest cost adaptation.
The Karmada architecture diagram has a unified control surface. In fact, we have an independent API-server to produce the Kubernetes native API and the ability of the additional policy API provided by Karmada to do the core functions of auxiliary advanced scheduling. In terms of cluster synchronization, we have two modes: central Controller and Agent, which correspond to the situation where the control plane and sub-clusters are in public and private clouds or inverted.
The other type is the big edge scenario. It needs to manage an edge cluster in a cloud-side network environment. Therefore, we will combine the optimization of KubeEdge in the entire network environment to provide edge-oriented cluster management capabilities.
Karmada project core value:
- K8s native API compatibility, rich cloud native ecology
- Built-in strategy, out of the box
- Rich multi-cluster scheduling support
- Cluster resource space isolation
- Multi-mode cluster synchronization, shielding geographical and network restrictions
Multi-cluster application deployment
1) Zero transformation-use K8s native API to deploy a multi-cluster application
Example strategy: Configure a multi-AZ HA deployment scheme for all deployments
Use standard K8s API to define and deploy applications
kubectl create -f nginx-deployment.yaml
2) Propagation Policy: Reusable application multi-cluster scheduling strategy
resourceSelector
- Supports associating multiple resource types
- Supports object filtering using name or labelSelector
placement
clusterAffinity:
- Define the target cluster that tends to schedule
- Support filtering by names or labelselector
clusterTolerations:
Similar to Pod tolerations and node taints in a single cluster
spreadConstraints:
- Define HA strategy for application distribution
- Supports dynamic grouping of clusters: grouping by Region, AZ, and feature label to achieve different levels of HA
3) Override Policy: a differentiated configuration strategy that can be reused across clusters
resourceSelector
- Supports object filtering using name or labelSelector
overriders
- Support multiple types of override plugins
- plainTextOverrider: basic plug-in, plain text operation replacement
- imageOverrider: A differentiated configuration plugin for container images
Multi-cluster service governance
Problems to be solved in multi-cluster service governance
- Service discovery
- DNS resolution
- Advanced traffic management such as load balancing, fusing, fault injection, and traffic segmentation
- Access security across clouds
Advantages of ServiceMesh
The figure above is a typical architecture diagram of Istio ServiceMesh. Istio is completely non-intrusive, intercepting the traffic sent by the application by automatically injecting the Certificate, and managing the traffic of the application through the Certificate.
Basic functions of Istio:
1) Traffic management
- Improve the resilience of the entire system through Resilience (fuse, timeout, retry, fault injection, etc.)
- Grayscale release: it is convenient for us to deploy the new version faster
- Load balancing and routing matching can basically replace the original microservice governance framework
2) Security: data encryption, authentication, and authentication
3) Observability: Observability is more convenient for application operation and maintenance personnel to diagnose system faults. The three typical observability indicators here are Metrics, Traces (call chain), and Access Log (access log).
In the multi-cluster multi-cloud scenario, how to select the service grid technology?
Next, the technical details will be explained in detail from the following three dimensions:
- Flat network vs. non-flat network
- Single-service grid vs. multi-service grid
- Single control surface vs. multiple control surfaces
1) Multi-cluster service grid-flat network
advantage:
- Low east-west service access latency
Disadvantages:
- Network complexity
- Security: All workloads are in the same network.
- Scalability: Pod and service IP addresses do not conflict
2) Multi-cluster service grid-non-flat network
advantage:
- Network isolation, relatively higher security
- Simple networking
- Scalability: Pod/service network addresses can overlap
Disadvantages:
- Cross-cluster service access needs to pass through the east-west gateway
- Gateway's work relies on TLS Auto Passthrough
3) Non-flat network-single control plane
- Single control plane (can be deployed in user clusters or fully managed)
- Service discovery
- Configuration discovery
- Split Horizon EDS
- East-west gateway
4) Non-flat network-multiple control planes
- The control plane is deployed in each cluster
- Service discovery
- Configuration discovery
- Sidecar is connected to the Istio control plane in the cluster. Compared with a single control plane, it has better performance and availability
5) Non-flat network-east-west gateway
- Gateway address acquisition
- Split horizon EDS:
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: cross-network-gateway
namespace: istio-system
spec:
selector:
istio: eastwestgateway
servers:
- hosts:
- '*.local'
port:
name: tls
number: 15443
protocol: TLS
tls:
- mode: AUTO_PASSTHROUGH
Network filter: “envoy.filters.network.sni_cluster”
Attachment: Karmada Community Technical Exchange Address
project address: https://github.com/karmada-io/karmada
Slack address: https://karmada-io.slack.com
Click to follow and learn about Huawei Cloud's fresh technology for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。