Abstract: Cloud native technology and cloud market continue to mature, multi-cloud, multi-cluster deployment has become the norm, and the future will be the era of programmatic multi-cloud management services.

This article is shared from the HUAWEI CLOUD community " HUAWEI CLOUD MCP multi-cloud container governance and practice ", the original author: technical torchbearer.

At the Huawei Developer Conference (Cloud) 2021, Huawei Cloud CTO Zhang Yuxin announced that the multi-cloud container orchestration project Karmada was officially open source. On April 26, HUAWEI CLOUD native open source leader Wang Zefeng and HUAWEI CLOUD senior engineer Xu Zhonghu delivered a keynote speech on "Huawei Cloud MCP Multi-cloud and Cross-cloud Container Governance and Practice", sharing the development of Huawei Cloud MCP and the core of the Karmada project Black technology.
image.png

speech mainly contains five aspects:

1) Current status and challenges of cloud-native multi-cloud
2) HUAWEI CLOUD MCP history
3) Karmada project
4) Multi-cluster service governance
5) Summary and outlook

The status and challenges of cloud-native multi-cloud

image.png

According to the latest survey report, more than 93% of enterprises are using the services of multiple cloud vendors at the same time. Cloud native technology and the cloud market continue to mature, and multi-cloud and multi-cluster deployment has become the norm. The future will be the era of programmatic multi-cloud management services. When the business is deployed to a multi-cloud or multi-cluster, it is actually divided into several stages:

Typical stage 1: Multi-cloud deployment in multiple locations, unified management and operation and maintenance, reducing duplication of labor

image.png

In the first stage, we believe that the unified operation and maintenance management is deployed in multiple locations, which can be understood as multiple interoperable islands. Interoperability means that in different environments and different clouds, the software technology station used is a set of standardized Yes, when switching between public cloud, public cloud 1, and public cloud 2, the command request input for the operation command is the same, but there is no business correlation between them or the business correlation is very weak, so do it at this time Unified application delivery, such as deployment, operation and maintenance, either manually executes repetitive commands or scripts, or the simplest is to use a set of CI/CD system stacks. Because most of the business at this stage is relatively fixed, which public cloud, which data center, and which computer room is deployed in, does not require too much dynamics and variability.

Typical Phase 2: Multi-cloud unified resource pool to deal with business pressure fluctuations

image.png

The second stage is the unified resource pool, which has certain demands on the dynamics of the resource pool. Generally speaking, the application delivery we think of here is not a simple CI/CD, because we hope that after the dynamics, the traffic can also be migrated. In this case, the above application delivery needs to have the ability to automatically schedule, and the traffic can be obtained by itself according to the distribution of the number of instances. Of course, there will be other situations, such as using some simple scripting to handle your traffic, and it can also be considered that the second stage has been reached. Of course, in an ideal state, we think these should be fully automated.

Typical Phase 3: Multi-cloud collaboration, unified application platform, and business deployment across clouds

image.png

The third stage is what we think is the final form of multi-cloud and multi-cluster that we think is currently foreseeable, and it is also what we think is an ideal form. In fact, whether you use clusters, Kubernetes, or previous virtual machines, judging from the entire history of cloud computing, it has been constantly breaking through boundaries, or redefining boundaries. For example, when installing some new applications and deploying new services at the earliest, a physical server was required, and the boundary was very inflexible. Later, when virtual machines and containers became available, the granularity became smaller, but in cross-machine and cross-environmental access Under the form, many new challenges have arisen, so the emergence of Kubernetes has actually redrawn a large cluster as the boundary after producing so many delicate containers.

And multi-cloud is actually based on these constantly evolving boundaries. When it is limited by data centers or clouds at a certain stage of development, multi-cloud technology can be used to break through the boundaries of the cloud, break through the boundaries of the cluster, and truly achieve business. Applications can be freely deployed and migrated between clusters and clouds.

But in fact, under the topic of cloud native, there are still many challenges in multi-cloud. The reasons are as follows:

  • Numerous clusters: tedious and repetitive cluster configuration, cluster management differences of cloud vendors, fragmented API access
  • Business decentralization: differentiated configuration of applications in each cluster, cross-cloud access to services, and application synchronization between clusters
  • Cluster boundary restrictions: resource scheduling is limited by the cluster, application availability is limited by the cluster, and elastic scaling is limited by the cluster
  • Vendor binding: "stickiness" of business deployment, lack of automatic failover, lack of neutral open source multi-cluster orchestration projects

HUAWEI CLOUD MCP History

In Kubernetes, the concept of multi-cluster appeared very early. Huawei was also one of the earliest initiators. In 2015, we proposed the concept of Federation in the community. The v1 version of Federation was developed in 2016 and launched in 2017. Independent, developed as an independent sub-project of Kubernetes. In mid-2017, we started the development of Federation v2. In terms of commercialization, Huawei actually launched the entire commercialization platform in the middle of 2018 and provided commercial capabilities at the end of the year. However, we also found some problems in the process of serving customers for a long time. Therefore, in 2020, we launched The brand-new engine Karmada.

Karmada is developed based on Kubernetes Federation v1 and v2. It can run cloud-native applications across multiple Kubernetes clusters and clouds without requiring application changes. By directly using the Kubernetes native API and providing advanced scheduling functions, Karmada can realize a truly open multi-cloud Kubernetes.

Karmada project

image.png

The above picture shows the perspective of a multi-cloud and multi-cluster technology station that we think should be presented in the open source community. The gray box in the picture is also all the capabilities that Karmada wants to cover. From the perspective of data plane, storage and operation and maintenance related dimensions, we are going to solve the multi-cloud multi-cluster of container network, multi-cloud multi-cluster of service discovery and even traffic management, and the persistence of original data. These will be the Karmada project in the community. The scope to be covered.

In the initial stage, we will mainly focus on several aspects. One is compatibility with the K8s native API. This feature is actually a slightly larger obstacle to the original Federation's v2, because everyone is used to using the k8s API instead of the new API, so When we are working on the new Karmada project, we directly use the native API to provide multi-cluster application deployment capabilities.

In terms of cluster synchronization, we will support a variety of network modes, including the control plane in the public cloud, the sub-cluster in the private cloud, or vice versa, or even the edge scenario. The Karmada project can also be used to cover, and we will build in Out-of-the-box capabilities to achieve the lowest cost adaptation.
image.png

The Karmada architecture diagram has a unified control surface. In fact, we have an independent API-server to produce the Kubernetes native API and the ability of the additional policy API provided by Karmada to do the core functions of auxiliary advanced scheduling. In terms of cluster synchronization, we have two modes: central Controller and Agent, which correspond to the situation where the control plane and sub-clusters are in public and private clouds or inverted.

The other type is the big edge scenario. It needs to manage an edge cluster in a cloud-side network environment. Therefore, we will combine the optimization of KubeEdge in the entire network environment to provide edge-oriented cluster management capabilities.

Karmada project core value:

  • K8s native API compatibility, rich cloud native ecology
  • Built-in strategy, out of the box
  • Rich multi-cluster scheduling support
  • Cluster resource space isolation
  • Multi-mode cluster synchronization, shielding geographical and network restrictions

Multi-cluster application deployment

1) Zero transformation-use K8s native API to deploy a multi-cluster application
image.png

Example strategy: Configure a multi-AZ HA deployment scheme for all deployments
image.png

Use standard K8s API to define and deploy applications
kubectl create -f nginx-deployment.yaml
2) Propagation Policy: Reusable application multi-cluster scheduling strategy
image.png

resourceSelector

  • Supports associating multiple resource types
  • Supports object filtering using name or labelSelector

placement

clusterAffinity:

  • Define the target cluster that tends to schedule
  • Support filtering by names or labelselector

clusterTolerations:

Similar to Pod tolerations and node taints in a single cluster
spreadConstraints:

  • Define HA strategy for application distribution
  • Supports dynamic grouping of clusters: grouping by Region, AZ, and feature label to achieve different levels of HA

3) Override Policy: a differentiated configuration strategy that can be reused across clusters
image.png

resourceSelector

  • Supports object filtering using name or labelSelector

overriders

  • Support multiple types of override plugins
  • plainTextOverrider: basic plug-in, plain text operation replacement
  • imageOverrider: A differentiated configuration plugin for container images

Multi-cluster service governance

Problems to be solved in multi-cluster service governance

  • Service discovery
  • DNS resolution
  • Advanced traffic management such as load balancing, fusing, fault injection, and traffic segmentation
  • Access security across clouds

Advantages of ServiceMesh

image.png

The figure above is a typical architecture diagram of Istio ServiceMesh. Istio is completely non-intrusive, intercepting the traffic sent by the application by automatically injecting the Certificate, and managing the traffic of the application through the Certificate.

Basic functions of Istio:

1) Traffic management

  • Improve the resilience of the entire system through Resilience (fuse, timeout, retry, fault injection, etc.)
  • Grayscale release: it is convenient for us to deploy the new version faster
  • Load balancing and routing matching can basically replace the original microservice governance framework

2) Security: data encryption, authentication, and authentication

3) Observability: Observability is more convenient for application operation and maintenance personnel to diagnose system faults. The three typical observability indicators here are Metrics, Traces (call chain), and Access Log (access log).

In the multi-cluster multi-cloud scenario, how to select the service grid technology?

Next, the technical details will be explained in detail from the following three dimensions:

  • Flat network vs. non-flat network
  • Single-service grid vs. multi-service grid
  • Single control surface vs. multiple control surfaces

1) Multi-cluster service grid-flat network

image.png

advantage:

  • Low east-west service access latency

Disadvantages:

  • Network complexity
  • Security: All workloads are in the same network.
  • Scalability: Pod and service IP addresses do not conflict

2) Multi-cluster service grid-non-flat network

image.png

advantage:

  • Network isolation, relatively higher security
  • Simple networking
  • Scalability: Pod/service network addresses can overlap

Disadvantages:

  • Cross-cluster service access needs to pass through the east-west gateway
  • Gateway's work relies on TLS Auto Passthrough

3) Non-flat network-single control plane

image.png

  • Single control plane (can be deployed in user clusters or fully managed)
  • Service discovery
  • Configuration discovery
  • Split Horizon EDS
  • East-west gateway

4) Non-flat network-multiple control planes

image.png

  • The control plane is deployed in each cluster
  • Service discovery
  • Configuration discovery
  • Sidecar is connected to the Istio control plane in the cluster. Compared with a single control plane, it has better performance and availability

5) Non-flat network-east-west gateway

  • Gateway address acquisition
  • Split horizon EDS:
apiVersion: networking.istio.io/v1beta1

kind: Gateway
metadata:
name: cross-network-gateway
namespace: istio-system
spec:
selector:
istio: eastwestgateway
servers:
- hosts:
- '*.local'
port:
name: tls
number: 15443
protocol: TLS
tls:
  • mode: AUTO_PASSTHROUGH

Network filter: “envoy.filters.network.sni_cluster”
Attachment: Karmada Community Technical Exchange Address

project address: https://github.com/karmada-io/karmada

Slack address: https://karmada-io.slack.com

Click to follow and learn about Huawei Cloud's fresh technology for the first time~


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量