1.png

Source | Alibaba Cloud Native Public

Alibaba's open source chaos engineering project ChaosBlade passed the CNCF TOC voting and smoothly promoted the CNCF Sandbox. CNCF stands for Cloud Native Computing Foundation (Cloud Native Computing Foundation), which aims to build a sustainable ecosystem for cloud native software and serve vendor-neutral, fast-growing open source projects such as Kubernetes, Prometheus, Envoy, etc.

ChaosBlade github address:
https://github.com/chaosblade-io/chaosblade

Project Introduction

2.png

ChaosBlade is Alibaba's open source chaos engineering project in 2019. It includes chaos engineering experimental tool chaosblade and chaos engineering platform chaosblade-box. It aims to help enterprises solve high-availability problems in the cloud-native process through chaos engineering. The experimental tool chaosblade supports 3 large system platforms, 4 programming language applications, involving more than 200 experimental scenarios and more than 3000 experimental parameters, which can finely control the scope of the experiment. The chaos engineering platform chaosblade-box supports the hosting of experimental tools. In addition to the hosted chaosblade, it also supports Litmuschaos experimental tools. There are more than 40 registered companies, of which the ICBC, China Mobile, Xiaomi, JD.com and other companies have landed and used it.

Core competence

ChaosBlade has the following features:

  • Rich experimental scene : Contains basic resources (CPU, memory, network, disk, process, kernel, files, etc.), multi-language application services (Java, C++, NodeJS, Golang, etc.), Kubernetes platform (covering Container, Pod, etc.) Node resource scenarios, including the above experimental scenarios).
  • diversified execution methods : In addition to using the platform white screen operation, it can also be executed through the blade tool or kubectl or coding that comes with the tool.
  • Convenient scene expansion capability : All experimental scenes follow the chaos experimental model, and different levels of scenes correspond to different actuators, which is simple to implement and easy to expand.
  • Automatic deployment of experimental tools : There is no need to manually deploy experimental tools, and the automatic deployment of experimental tools on the host or cluster is realized.
  • supports open source experiment tool hosting : The platform can host mainstream experiment tools in the industry, such as its own chaosblade and external litmuschaos.
  • Unified Chaos Experiment User Interface : Users do not need to care about the way of using different tools, and perform chaos experiments in the unified user interface.
  • Multi-dimensional experiment method : Supports experiment arrangement from the host to Kubernetes resources, and then to the application dimension.
  • Integrated cloud native ecology : Adopt Helm deployment management, integrated Prometheus monitoring, support cloud native experiment tool hosting, etc.

Architecture design

The Chaosblade-box architecture is as follows:

3.png

Through the console page, automated deployment of managed tools such as chaosblade, litmuschaos, etc. can be realized. The experiment scene is unified according to the chaos experiment model established by the community, and the target resources are divided according to the host, Kubernetes, and applications. The target resource is controlled by the target manager, and the page is created in the experiment. The target resource selection of white screen can be realized. The platform executes the experimental scenarios of different tools by calling the chaos experiment execution, and with the access to prometheus monitoring, the experimental metric indicators can be observed, and a wealth of experimental reports will be provided in the follow-up.

The deployment of Chaosblade-box is also very simple. For details, please check: _ https://github.com/chaosblade-io/chaosblade-box/releases_ .

Customer case

4.png

future plan

ChaosBlade will be based on cloud native in the future, providing chaos engineering platform and chaos engineering experiment tools for multi-cluster, multi-environment, and multi-language. Experimental tools will continue to focus on the richness and stability of experimental scenes, support more Kubernetes resource scenes and standardized application service experiment scene standards, and provide multi-language experiment scene standard implementations. The chaos engineering platform focuses on simplifying the deployment and implementation of chaos engineering. In the future, it will host more chaos experiment tools and mainstream compatible platforms to implement scene recommendations, provide business and system monitoring integration, output experiment reports, and complete chaos on the basis of ease of use Engineering operation closed loop. Everyone is welcome to join the community to jointly promote the development of the chaotic engineering field, effectively land in the enterprise, and build a highly available distributed system.


阿里云云原生
1k 声望302 粉丝