background
Jobbang's server-side technology system is developing towards cloud-native. Improving resource utilization is one of the core goals of the cloud-native technology stack. The improvement of resource utilization means that fewer computing nodes are used to carry more application instances. Greatly reduces resource overhead. Serverless has the characteristics of elastic scaling, strong isolation, charge-as-you-go, and automation of operation and maintenance, which brings core advantages such as reducing delivery time, reducing risks, reducing infrastructure costs, and reducing labor costs. Serverless has always been the core direction of Jobbang's infrastructure exploration. In the long run, there are two solutions for serverlessization, one is function computing, and the other is Kubernetes serverless virtual nodes. Kubernetes Serverless virtual nodes have no difference in actual usage of services already running on Kubernetes, with better user experience, no perception of business service usage, and can be scheduled and migrated by the infrastructure. Alibaba Cloud ECI is a typical Kubernetes virtual node solution.
Therefore, in 2020, we will start to try to schedule some intensive computing to serverless virtual nodes, and use the strong isolation capability of the underlying servers of serverless virtual nodes to avoid mutual influence between services; in 2021, we will schedule scheduled tasks to serverless virtual nodes , instead of node expansion and shrinkage, to cope with short-term running tasks, improve resource utilization and reduce costs; in 2022, we will begin to schedule core online services to serverless virtual nodes, and online services are the most sensitive and easily perceived by users. At the same time, online services have obvious peaks and troughs. Flexible scheduling to serverless virtual nodes during peak periods will bring huge cost benefits, and the accompanying requirements will be higher. As far as possible, ensure the performance, stability and physical performance of online services. The server effect is consistent, and the business observation and perception are consistent, that is, the upper-layer business service cannot perceive the difference between the serverless virtual node and the physical server.
Kubernetes Serverless virtual nodes
Virtual nodes are not real nodes, but a scheduling capability that enables scheduling of pods in a standard Kubernetes cluster to resources other than cluster server nodes. Pods deployed on virtual nodes have the same security isolation, network isolation, and network connectivity as bare metal servers. They also have the feature of not requiring resource reservation and billing by volume.
Kubernetes Serverless Virtual Node Cost Advantage
Most of Jobbang's services have been containerized. Online business has a typical peak period, and the peak period lasts for a short period of time (4 hours/day). All bare metal servers are used. The average server load during peak period is close to 60%. The low peak load is only about 10%. This scenario is very suitable for the implementation of serverless elastic scaling. A simple calculation can be done: Assuming that the hourly cost of using all own servers is C, the average daily peak time is 4 hours, and the unit time cost of using Serverless is 1.5C ,So:
- The total cost of using all owned servers is C * 24 = 24C\
- Retain 70% of the own servers, and increase the serverless by 30% during the peak period. At this time, the total becomes: C 24 0.7 + 1.5C 4 0.3 = 18.6C\
Theoretically, the cost reduction by using serverless in the peak period is: (24C - 18.6C) / 24C = 22.5%. It can be seen that flexible scheduling of online services to serverless during peak hours can save a lot of resource costs.
Problems and Solutions
Although Kubernetes Serverless virtual nodes have many advantages, there are still some problems. At present, the main problems are as follows:
Scheduling and Control Issues
The scheduling level mainly solves two problems: one is to create pods based on what scheduling policy to schedule to virtual nodes when expanding, and the other is that pods on virtual nodes should be preferentially reduced when scaling down. These two capabilities are insufficient in the version of Kubernetes we are currently using.
Expansion/Shrinkage Scheduling Policy
The expansion scheduling strategy should be controlled by the infrastructure and operation and maintenance, and is not closely related to the business, because the business side does not know how many server computing resources can be used in the underlying resource layer. Ideally, we hope that when the physical server resources in the cluster pool reach the utilization bottleneck, the capacity can be expanded to the serverless virtual nodes. This allows for cost advantages without capacity risk.
The evolution process of using virtual nodes in the industry:
- Virtual nodes and standard nodes are completely separate and can only be scheduled to specified pools.
- The user does not need to specify the selector. When the pod is scheduled pending due to insufficient fixed node resources, it will be automatically scheduled to the virtual node, and the process will be delayed.
- The scheduler of cloud vendors such as (Alibaba Cloud ACK Pro) will automatically schedule to virtual nodes when resources are insufficient. This process is automatic and without delay, which is relatively ideal.
However, our business scenario requires a more refined resource management strategy and a scheduling strategy that needs to be more closely integrated with resource management requirements. Therefore, we developed our own solution based on the capabilities of Alibaba Cloud ACK:
Capacity expansion: Based on the peaks and troughs of online services, predict and recommend scheduling, predict the threshold of the number of replicas that the service can run on the cluster physical machine at peak times, and schedule pods that exceed the threshold to virtual nodes through the self-developed scheduler. The threshold data is the optimal solution of the replica running on the cluster physical machine, which can meet both the utilization of the physical machine cluster and the performance requirements. If the threshold is too low, physical machine resources will be wasted. If the threshold is too high, the deployment density of physical machines will be too high, and resource utilization will be too high, affecting services.
Scaling: It is easy to understand that the pods on the serverless virtual nodes should be reduced first when scaling down, because the regular resource pool has a lower unit price on an annual and monthly basis. The pod on the node is optimized for cost; we inject custom annotations into the pod on the virtual node through the self-developed scheduler, modify the scaling logic of kube-controller-manager, and set the pod with the custom annotation of the virtual node. At the top of the shrinking queue, to complete the preferential shrinking of the pods on the virtual node.
On the control plane, the DevOps platform not only supports automatic calculation and scheduling of thresholds to virtual nodes, but also supports manual settings for finer business regulation. The ability to schedule to virtual nodes can be used in combination with hpa and cron-hpa to meet the needs of more flexible business. The control plane also supports one-click blocking of virtual nodes in failure scenarios, as well as multi-cloud scheduling capabilities for more extreme situations (such as overall computer room failure).
observational problem
Our observation services are all self-built, such as (log retrieval, monitoring and alarming, distributed tracing). Because it is a virtual node, the monitoring components running in the pod and log collection are built-in by the cloud vendor. We need to ensure that at the business-awareness level, pods run the same on serverless virtual nodes and physical servers, and there is a process of conversion to our own observation service.
Monitoring: In terms of monitoring, cloud vendor virtual nodes are fully compatible with the kubelet monitoring interface and can seamlessly access Prometheus. Complete real-time CPU/memory/disk/network traffic monitoring of pods, which is consistent with pods on ordinary nodes.
Log: In terms of log collection, log collection is configured through CRD, and logs are sent to a unified Kafka. Through our self-developed log consumption service, we can consume logs from various cloud vendors and our own nodes.
Distributed tracing: In terms of distributed tracing, since the jeager agent in the form of daemonset cannot be deployed, we have modified the jeager client side to identify the environment in which the pod runs through environment variables. If it is on a virtual node, skip the jeager agent and directly convert the The distributed tracing data is pushed to the jeager colletror.
Performance, Stability, and Other Issues
Differences in underlying performance of serverless virtual nodes: Due to the difference in underlying computing resource specifications and the overhead brought by the virtualization layer, the performance may be different from that of bare metal servers. This requires services that are very sensitive to latency to be performed before virtual nodes are installed. Adequate testing and evaluation.
Cloud server inventory risk: A large number of capacity expansions during peak periods. If the water level of a cloud vendor's resource pool of a certain specification is low, it may not be able to expand resources of the specified specification. Here we turn on automatic upgrade, that is, apply for 2c2G. In theory, it should match the ECI of 2c2G. If there is no inventory, it will match the ECI of 2c4G. And so on.
Troubleshooting: Because virtual nodes essentially use cloud vendor resource pools, which are not within our own control, we can automatically extract streams when services are abnormal, but we cannot log in to the machine to troubleshoot problems, such as viewing system logs, Operations such as getting back core dump files are more difficult. At our suggestion, cloud service (Alibaba Cloud ECI) already supports the automatic upload of core dump to oss.
scale and benefits
At present, the solution is mature, and during the peak period, nearly 10,000-core core link online services are running on Kubernetes Serverless virtual nodes based on Alibaba Cloud's ACK+ECI. With the increase in business volume, the scale of services running on serverless virtual nodes will be further expanded in the future, which will save a lot of resource costs.
Click here to quickly learn about the details of Alibaba Cloud Container Service's ACK elastic scheduling solution!
This article is reproduced from the "Jobbang Technical Team", and more relevant technical practice sharing can be found in this public account.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。