author
Liu Xu, senior engineer of Tencent Cloud, focuses on the field of container cloud native. He has many years of experience in large-scale Kubernetes cluster management. He is currently responsible for the research and development of Tencent Cloud GPU containers.
background
At present, TKE has provided a shared GPU scheduling isolation solution based on qGPU with strong isolation of computing power and video memory. However, some users reported that they lack the observability of GPU resources, such as the inability to obtain the remaining resources of a single GPU device, which is not conducive to the operation and maintenance of GPU resources. manage. In this context, we hope to provide a solution that allows users to intuitively count and query the usage of GPU resources in a Kubernetes cluster.
Target
Based on the current TKE shared GPU scheduling scheme, the observability of GPU devices is enhanced from the following aspects:
- Supports getting resource allocation information for a single GPU device.
- Supports getting the health status of a single GPU device.
- Support to obtain the information of each GPU device on a node.
- Supports getting GPU device and Pod/Container association information.
Our solution
We scan the physical GPU information through GPU CRD and update the used physical GPU resources during the qGPU life cycle to solve the problem of lack of visibility in shared GPU scenarios.
- Custom GPU CRD : Each GPU device corresponds to a GPU object, and the hardware information, health status and resource allocation of the GPU device can be obtained through the GPU object.
- Elastic GPU Device Plugin : Create a GPU object based on the hardware information of the GPU device, and regularly update the health status of the GPU device.
- Elastic GPU Scheduler : Schedules Pods based on GPU resource usage and updates the scheduling results to GPU objects.
TKE GPU CRD Design
apiVersion: elasticgpu.io/v1alpha1
kind: GPU
metadata:
labels:
elasticgpu.io/node: 10.0.0.2
name: 192.168.2.5-00
spec:
index: 0
memory: 34089730048
model: Tesla V100-SXM2-32GB
nodeName: 10.0.0.2
path: /dev/nvidia0
uuid: GPU-cf0f5fe7-0e15-4915-be3c-a6d976d65ad4
status:
state: Healthy
allocatable:
tke.cloud.tencent.com/qgpu-core: "50"
tke.cloud.tencent.com/qgpu-memory: "23"
allocated:
0dc3c905-2955-4346-b74e-7e65e29368d2:
containers:
- container: test
resource:
tke.cloud.tencent.com/qgpu-core: "50"
tke.cloud.tencent.com/qgpu-memory: "8"
namespace: default
pod: test
capacity:
tke.cloud.tencent.com/qgpu-core: "100"
tke.cloud.tencent.com/qgpu-memory: "31"
Each GPU physical card corresponds to a GPU CRD. Through the GPU CRD, you can clearly understand the hardware information such as the model and video memory of each card. At the same time, you can obtain the health status and resource allocation of each GPU device through status
.
TKE GPU scheduling process
Kubernetes provides Scheduler Extender to extend the scheduler to meet the scheduling requirements in complex scenarios. The extended scheduler will call the extension program through HTTP protocol to perform pre-selection and optimization again after calling the built-in pre-selection strategy and optimization strategy, and finally select a suitable Node for Pod scheduling.
In TKE Elastic GPU Scheduler (original TKE qGPU Scheduler), we combine the GPU CRD design. When scheduling, we first filter out abnormal GPU devices according to status.state
status.allocatable
The GPU devices that meet the requirements are updated when the scheduling is finally completed status.allocatable
and status.allocated
.
TKE GPU allocation process
Kubernetes provides the Device Plugin mechanism to support hardware devices such as GPU FPGA. Device manufacturers only need to implement Device Plugin according to the interface without modifying the Kubernetes source code. Device Plugin generally runs on nodes in the form of DaemonSet.
When the TKE Elastic GPU Device Plugin (original TKE qGPU Device Plugin) starts, we will create a GPU object based on the hardware information of the GPU device on the node. At the same time, we will regularly check the health status of the GPU device and synchronize it to the GPU object's status.state
.
Summarize
In order to solve the problem of the lack of observability of GPU resources in the current TKE cluster, we have introduced GPU CRD, which allows users to intuitively count and query the usage of GPU resources in the cluster. At present, this solution has been integrated with qGPU and can be displayed in the TKE console. It can be enabled by selecting Use CRD when installing the qGPU plugin.
At present, TKE qGPU has been fully launched. For details, please click: https://cloud.tencent.com/document/product/457/61448
about us
For more cases and knowledge about cloud native, you can pay attention to the public account of the same name [Tencent Cloud Native]~
Welfare:
① Reply to the [Manual] in the background of the official account, you can get the "Tencent Cloud Native Roadmap Manual" & "Tencent Cloud Native Best Practices"~
②The official account will reply to the [series] in the background, and you can get "15 series of 100+ super practical cloud native original dry goods collection", including Kubernetes cost reduction and efficiency enhancement, K8s performance optimization practices, best practices and other series.
③If you reply to the [White Paper] in the background of the official account, you can get the "Tencent Cloud Container Security White Paper" & "The Source of Cost Reduction - Cloud Native Cost Management White Paper v1.0"
④ Reply to [Introduction to the Speed of Light] in the background of the official account, you can get a 50,000-word essence tutorial of Tencent Cloud experts, Prometheus and Grafana of the speed of light.
[Tencent Cloud Native] New products of Yunshuo, new techniques of Yunyan, new activities of Yunyou, and information of cloud appreciation, scan the code to follow the public account of the same name, and get more dry goods in time! !
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。