Abstract: At the Huawei Developer Conference (Cloud) 2021, ICBC Paas Cloud Platform Architect Shen Yifan delivered a keynote speech on "ICBC Multi-k8s Cluster Management and Disaster Recovery Practice", sharing ICBC's use of the multi-cloud container orchestration engine Karmada The practical process of landing.
This article is shared from the Huawei Cloud Community " Karmada | Industrial and Commercial Bank of China Multi-k8s Cluster Management and Disaster Recovery Practice ", the original author: Technical Torchbearer.
At the Huawei Developer Conference (Cloud) 2021, ICBC Paas cloud platform architect Shen Yifan delivered a keynote speech on "ICBC Multi-k8s Cluster Management and Disaster Recovery Practice", sharing the practical process of ICBC using the multi-cloud container orchestration engine Karmada .
speech mainly contains 4 aspects:
1) Current status of ICBC cloud platform
2) Multi-cluster management solutions and selection in the industry
3) Why choose Karmada?
4) Landing situation and future prospects
The business background of ICBC cloud computing
The rise of the Internet in recent years has had a huge impact on the financial model and service model of the financial industry, which has forced us to make some huge innovations. At the same time, from the current point of view, it is the general trend for the banking business system to enter the cloud. Up to now, ICBC has formed four modules consisting of infrastructure cloud, application platform cloud, financial ecological cloud, and branch cloud with more ICBC characteristics. This is our overall cloud platform architecture.
The overall architecture of the ICBC cloud platform
ICBC Cloud Platform Technology Stack
The industry's leading cloud products and mainstream open source technologies are adopted, and some of our financial business scenarios are combined on this basis to carry out in-depth customization.
- Infrastructure cloud: Based on Huawei Cloud Stack8.0 products combined with operation and maintenance requirements for customized customization, build a new generation of infrastructure cloud.
- Application platform cloud: Through the introduction of open source container technology Docker, container cluster scheduling technology Kubernetes, etc., independent research and development and construction of application platform cloud.
- Upper-level application solution: Establish a supporting cloud ecosystem such as load balancing, microservices, holographic monitoring, log center, etc. based on HaProxy, Dubbo, ElasticSerch, etc.
ICBC Financial Cloud Achievement
In terms of container cloud, the financial cloud of ICBC is also very effective. First of all, it is reflected in its large scale of cloud access. Up to now, the scale of application platform cloud containers has exceeded 200,000, and business containers account for about 55,000. Some of the overall core businesses are Has entered the container cloud. In addition to the largest cloud access in the industry, our business scenarios involve a wide range of core applications, core banking systems, including personal financial system accounts, quick payments, online channels, commemorative coin reservations, etc. Core business applications have also been containerized deployment; in addition, some of our core technical support applications such as MySQL, as well as some middleware and microservice frameworks, this type of core support applications have also entered the cloud; in addition, some new technologies Fields, including Internet of Things, artificial intelligence, big data, etc.
When more and more core business applications enter the cloud, the biggest challenge for us is disaster tolerance and high availability. In this regard, we have also done a lot of practices:
1) The cloud platform supports a multi-level failure protection mechanism to ensure that different instances of the same business will be evenly distributed to different resource domains in the two locations and three centers, ensuring that the failure of a single storage, a single cluster or even a single data center will not affect the overall business Availability.
2) In the event of a failure, the cloud platform realizes automatic recovery of the failure through container restart and automatic drift.
Under the overall container cloud practice, we have also encountered some problems, of which the more prominent is the status quo of multi-cluster at the pass layer container layer. At present, the total number of k8s within ICBC, the total number of k8s clusters has reached nearly one hundred, and the main reasons are as follows:
1) There are many types of clusters: I just said that our business scenarios are very wide. For example, GPUs need different devices that support GPUs, some middleware, and databases. The requirements for the underlying network container storage are different, which is bound to happen. Different solutions, so we need to customize different clusters for different business scenarios.
2) Limited by the performance of k8s itself, including some performance bottlenecks such as scheduler, etcd, API server, etc., each cluster also has its upper limit.
3) Business expansion is very fast.
4) There are many fault domain partitions. Our two-site three-center architecture has at least three DCs, and each DC has different network areas, which are isolated by firewalls. Such a multiplication relationship will result in many cluster fault domains. distributed.
Based on the above 4 points, in view of the current status quo, our existing solutions still rely on the cloud management platform of the container cloud to manage these multi-k8s clusters through the cloud management platform. In addition, the upper-level business applications need to independently choose its clusters, including its needs. To select a specific k8s cluster based on its preference, network, region, etc. After selecting the k8s cluster, we internally use the failure rate to automatically disperse the scheduling.
But the existing solutions still expose a lot of problems for upper-layer applications:
1) For upper-layer applications, it may be a part of the container cloud that is more concerned about, that is, we have the ability to automatically scale during business peaks, but automatic scaling is now only within the cluster, and there is no overall cross-cluster automatic scaling Ability.
2) There is no cross-cluster automatic scheduling capability, including scheduling capabilities may all be within the cluster, and applications need to independently select specific clusters
3) The cluster is opaque to upper users
4) No cross-cluster failure automatic migration capability. We still mainly rely on the redundancy of the replicas on the three-center architecture in the two places, so in the automatic recovery process of failure recovery, this piece of high availability is missing.
Industry multi-cluster management solutions and selection
Based on the current status quo, we have set some goals for overall technical selection of some solutions in the industry, which are divided into 5 modules in total:
Why I hope it is an open source project with a certain degree of community support? There are three main considerations:
- 1) It is hoped that the overall solution will be autonomous and controllable within the enterprise, which is also a major benefit of open source
- 2) Don’t want to spend more capacity
- 3) Why not integrate all the capabilities of scheduling and failure recovery into the cloud management platform just now. This part is the overall multi-cluster management module we hope to isolate from the cloud management platform and sink to a multi-cluster management module below.
Based on these goals, we conduct some research on community solutions.
Kubefed
The first one we investigated was a cluster federation project that was relatively popular before. The federation as a whole was divided into v1 and v2 versions. When we went to investigate, it was mainly v2 or Kubefed at that time.
Kubefed itself can also solve some of the problems. It has the capabilities of cluster lifecycle management, override, and basic scheduling, but it has several fatal flaws for us, and it is currently unable to meet our needs:
1) The scheduling level is a very basic scheduling capability, and they are not prepared to spend more energy on scheduling to support custom scheduling, and do not support scheduling based on resource margin.
2) The second point is also more criticized. It does not support native k8s objects. I want to use its newly defined CRD in its management cluster. For the upper application of k8s native resource objects that we have used for so long , We also need to re-develop the cloud management platform itself for docking API, this part of the cost is very large.
3) It basically does not have the ability to automatically migrate faults
RHACM
The second project we researched is RHACM. The project is mainly led by Red Hat and IBM. After the overall investigation, we found that its functions are relatively sound, including the capabilities we just mentioned, and its upper application layer. For the user-level capability is closer to the cloud management platform, but it only supports Openshift. For the k8s cluster we already have so many in stock, the transformation cost is very large and too heavy. At the time of our research, it was not open source, and the overall support of the community was not enough.
Karmada
At that time, we communicated with Huawei on the status quo and pain points of multi-cluster management, including discussions on community federal projects. Both of us also very much hope to innovate in this area. The following picture shows the functional view of Karmada:
Karmada function view
From its overall functional view and planning, it fits very well with the goals we mentioned above. It has overall cluster lifecycle management, cluster registration, multi-cluster scaling, multi-cluster scheduling, overall unified API, and underlying standard API support. , It is CNI/CSI in its overall functional view, including upper-layer applications, and overall planning considerations for Istio, Mash, CI/CD, etc. Therefore, based on the overall idea, its function fits very well with us. ICBC also decided to invest in this project, build the Karmada project with Huawei and many of our project partners, and give it back to our community as a whole.
Why choose Karmada
Technology Architecture
In my personal understanding, it has the following advantages:
1) Karmada is deployed in the form of k8s-like. It has API Server, Controller Manager, etc. For an enterprise that already has so many k8s clusters in stock, the transformation cost is relatively small. We only need to deploy a management cluster on it. can
2) Karmada-Controller-Manager management includes cluster, policy, binding, works and other CRD resources as management-side resource objects, but it does not invade the k8s native resource objects that we want to deploy.
3) Karmada only manages the scheduling of resources between clusters, and a high degree of autonomy is allocated within sub-clusters
How is Karmada's overall Resources distributed?
- The first step is to register the cluster to Karmada
- The second step is to define Resource Template
- The third step is to formulate a distribution policy Propagation Policy
- The fourth step is to develop an Override strategy
- The fifth step, watch Karmada work
The following figure shows the overall delivery process:
After we define the deployment, it is matched through the Propagation Policy, and then it will generate a binding relationship, that is, the Propagation Binding, and then through a policy of an override, each works is generated. This works is actually an encapsulation of each resource object in the sub-cluster. I think the more important thing here is the propagation and workers mechanism.
Propagation mechanism
First of all, we define the Propagation Policy. You can see the overall yaml. We first set a relatively simple strategy and choose a cluster, which is called member 1. The second is which K8s I need this strategy to match. resource template, it matches that we defined a nginx deployment with a namespace of default. In addition to supporting the affinity of Cluster, it also supports cluster tolerance, which is distributed by cluster label and fault domain.
After defining the Propagation Policy, the previously defined K8s resource template that you want to distribute will automatically match it. After matching, the deployment will be distributed to, for example, three clusters ABC, which is tied to the three clusters ABC. Certainly, this binding relationship is Resource Bindding.
The overall yaml of Resource Bindding may be a cluster of your choice, here is a cluster such as member 1. Now the entire Resource Bindding supports two levels of cluster and namespace. These two levels correspond to different scenarios, the namespace level It is the scope of the namespace used by Resource Bindding when the namespace is used for tenant isolation within a cluster. There is also a cluster scenario, that is, the entire sub-cluster is for one person, and for a tenant, we can directly bind it with Cluster Resource Bindding.
Work mechanism
After we have established Propagation Bindding, how is the work distributed?
When Propagation Bindding is generated, for example, three clusters of ABC are generated, where 1:m refers to these three clusters. After finding these three clusters, the Bindding Controller will work, and then generate specific works objects based on the resource template and your binding relationship. The works object as a whole is the package of the resource in the specific sub-cluster, and then works at the same time. A status has a sub-cluster resource feedback, you can see the entire work yaml, you can see that the manifest is a sub-cluster as a whole, which has been specifically overridden, and the entire deployment yaml that needs to be distributed to the sub-cluster is here. So it's just a package of resource.
Karmada advantage
After specific use and verification, ICBC found that Karmada has the following advantages:
1) Resource scheduling
- Customize cross-cluster scheduling strategy
- Transparent to the upper application
- Support two kinds of resource binding scheduling
2) Disaster tolerance
- Dynamic binding adjustment
- Automatically distribute resource objects according to cluster labels or fault domains
3) Cluster management
- Support cluster registration
- Full life cycle management
- Unified standard API
4) Resource management
- Support k8s native objects
- works supports sub-cluster resource deployment status acquisition
- Resource object distribution supports both pull and push methods
Karmada's implementation in ICBC and its future outlook
First, let's take a look at how two functions of Karmada manage clusters and resource distribution?
Karmada has already managed some existing clusters in the ICBC test environment so far. In terms of future planning, a key point is how to integrate with our overall cloud platform.
Cloud platform integration
In this regard, we hope that the previously mentioned multi-cluster management, cross-cluster scheduling, cross-cluster scaling, failure recovery, and overall view of resources will all sink to a control plane such as Karmada.
For the upper-level cloud management platform, we pay more attention to its user business management, including user management, application management, input management, etc., as well as policies derived from Karmada, such as the definition of Policy on the cloud platform. The above picture is relatively simple. The specific cloud platform may have to connect a cable to each k8s cluster. For example, which node the pod is on, the plane of Karmada may not be concerned, but the specific information of the pod location may be known. To get it from the sub-cluster of k8s, this may also be a problem that we need to solve in the later integration. Of course, this is also in line with the design concept of Karmada itself. It does not need to care about where the specific pod is in the k8s sub-cluster.
Future Outlook 1-Cross-cluster scheduling
For cross-cluster scheduling, Karmada has already supported the failure domain planning, application preferences, and weight comparison mentioned above. But we hope that it can also be scheduled according to the resources and amount of the cluster, so that it will not produce an unbalanced state of resources in each sub-cluster. Although it is not implemented for the time being, one of its CRDs is called cluster, and the cluster has a status information, which will collect the node ready status. The remaining Allocatable on the node is the remaining information of the CPU memory. In fact, after we have this information, we can do custom scheduling, which is completely a matter of planning.
After the overall scheduling design is completed, we hope to produce the effect in ICBC as shown in the figure below. When I schedule deployment A, it is scheduled to cluster 1 because of the preference setting; deployment B may be scheduled to cluster 123 because a fault domain is broken up. Three clusters; deployment C is also a failure domain to disperse, but due to resource surplus, its redundant pods are scheduled to cluster 2 such a cluster.
Future Outlook 2-Scaling across clusters
Cross-cluster scaling is now also in Karmada planning, and now we may still need to solve some problems:
1) Consider the relationship between cross-cluster scaling and sub-cluster scaling, because now our upper-level business applications are often configured with a single cluster scaling strategy, so when the overall cross-cluster strategy and sub-cluster scaling strategy are both configured, the difference between them The relationship, whether the upper-level management as a whole, or has priority, may be something we need to consider later.
2) The relationship between cross-cluster scaling and cross-cluster scheduling is based on one scheduler as a whole. One of my Multi-cluster is only responsible for the scaling part, such as how many CPUs and how much memory is reached, such as 70% -80% when scaling, and how many scaling is performed, the specific scheduling is still performed by the overall scheduler.
3) The metrics of each cluster need to be aggregated, including some performance bottlenecks. Its overall working mode needs to be considered later.
Future Outlook 3-Cross-cluster failure recovery and high availability
1) Judgment strategy of the health status of the sub-cluster: it may just be disconnected from the management cluster, and the business container of the sub-cluster itself is not damaged
2) Customized failure recovery strategy: Like RestartPolicy, Always, Never, OnFailure
3) The relationship between rescheduling and cross-cluster scaling: I hope that its multi-cluster scheduling is a scheduler as a whole, and the scaling can control its own scaling strategy.
For some of the business scenarios of our ICBC, it is foreseeable that Karmada's current capabilities and future plans should be able to solve the pain points of our business scenarios. I am very happy to have the opportunity to join the Karmada project. I hope that more developers can join Karmada, build a community with us, and build such a new multi-cloud management project.
Attachment: Karmada Community Technical Exchange Address
Project address: https://github.com/karmada-io/karmada
Slack address: https://karmada-io.slack.com
Click to follow, and get to know the fresh technology of Huawei Cloud for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。