For cloud services, if the system is abnormal, it will bring great losses. In order to minimize the loss, we can only constantly explore when the system will be abnormal, and even narrow it down to whether certain specific parameter changes will cause the system to be abnormal. However, with the development of cloud native, the further decoupling of microservices is continuously promoted, and the massive amount of data and user scale have also brought about the large-scale distributed evolution of infrastructure, and failures in the system have become more and more unpredictable. We need to constantly experiment in the system to actively find out the defects of the system. This method is called Chaos Engineering . After all, practice is the only criterion for testing truth, so chaos engineering can help us to more thoroughly grasp the operation laws of the system and improve the flexibility of the system.
Litmus is an open source cloud-native chaos engineering tool set that focuses on Kubernetes clusters for simulated failure testing to help developers and SREs find defects in clusters and programs, thereby improving the robustness of the system.
Litmus architecture
The architecture of Litmus is shown in the figure:
The components of Litmus can be divided into two parts:
- Portal
- Agents
Portal is a set of Litmus components, as a control plane (WebUI) for cross-cloud management of chaos experiments, used to coordinate and observe the chaos experiment workflow on the Agent.
Agent is also a set of Litmus components, including the chaos experiment workflow running on the K8s cluster.
Using Portal , users can create and schedule a new chaos experiment workflow Agent , and observe the results Portal Users can also connect more clusters to Portal and use Portal as a single portal for cross-cloud chaos engineering management.
Portal component
Litmus WebUI
Litmus WebUI provides a Web user interface, where users can easily build and observe the chaos experiment workflow. Litmus WebUI also serves as the control plane for the cross-cloud chaos experiment.
Litmus Server
As middleware, Litmus Server is used to process API requests from the user interface, and store configuration and processing result details in the database. It also acts as a communication interface between requests and schedules the work flow to Agent .
Litmus DB
Litmus DB serves as a storage system for the chaos experiment workflow and its test results details.
Agent component
Chaos Operator
Chaos Operator monitors
ChaosEngine
and performs the chaos experiment mentioned inCR
Chaos Operator is namespace-scoped and runs in thelitmus
namespace by default. After the experiment is completed, Chaos Operator will callchaos-exporter
to export the indicators of the chaos experiment to the Prometheus database.CRDs
The following CRDs will be generated during the Litmus installation process:
chaosexperiments.litmuschaos.io chaosengines.litmuschaos.io chaosresults.litmuschaos.io
Chaos Experiment
Chaos Experiment is the basic unit in the LitmusChaos architecture. Users can Chaos Hub or create a new chaos experiment by themselves to build the required chaos experiment workflow. Simply put, it is to define a list of CRD resources such as which operations the test supports, which parameters can be passed in, and which types of objects can be tested. It is usually divided into three categories: general tests (such as memory, disk, CPU, etc.) ), application testing (for example, testing for Nginx), platform testing (for testing on a certain cloud platform: AWS, Azure, GCP). For details, please refer to Chaos Hub document .
Chaos Engine
ChaosEngine implements the functions implemented by Chaos Experiment into applications in the namespace. The CR is monitored by Chaos Operator.
Chaos Results
ChaosResult saves the results of the chaos experiment. It will be created or updated when the experiment is running. It contains various information, including the configuration of Chaos Engine and the state of the experiment.
chaos-exporter
will read the results and export them to the Prometheus database.Chaos Probes
Chaos Probes are pluggable indicator probes that can be defined in ChaosEngine of any chaos experiment. The experimental Pod will perform corresponding detection according to its defined mode, and whether it is successful is a necessary condition for determining the experimental results (including Standard "built-in" detection).
Chaos Exporter
You can choose to export metrics to the Prometheus database. Chaos Exporter implements Prometheus metrics endpoint.
Subscriber
Subscriber is used to interact with Litmus Server to obtain detailed results of the chaos experiment workflow and send it back to the Agent.
Prepare KubeSphere application template
KubeSphere integrates OpenPitrix to provide application lifecycle management. OpenPitrix is a multi-cloud application management platform. KubeSphere uses it to implement application stores and application templates to deploy and manage applications in a visual manner. For applications that do not exist in the application store, users can deliver Helm Chart to the public warehouse of KubeSphere, or import a private application warehouse to provide application templates.
This tutorial will use KubeSphere's application template to deploy Litmus.
To deploy an application from an application template, you need to create an enterprise space, a project, and two user accounts ( ws-admin
and project-regular
). ws-admin
workspace-admin
role in the corporate space, project-regular
must be granted the operator
role in the project. Before creating it, let's review the multi-tenant architecture of KubeSphere.
Multi-tenant architecture
KubeSphere's multi-tenant system is divided into three levels, namely clusters, enterprise spaces, and projects. The project in KubeSphere is equivalent to the namespace .
You need to create a new enterprise space to operate, instead of using the system enterprise space, the system enterprise space runs system resources, most of which are for viewing only. For security reasons, it is strongly recommended to grant different permissions to different tenants to collaborate in the corporate space.
You can create multiple enterprise spaces in a KubeSphere cluster, and multiple projects can be created in each enterprise space. KubeSphere has multiple built-in roles for each level by default. In addition, you can also create roles with custom permissions. The KubeSphere multi-level structure is suitable for enterprise users who have different teams or organizations and require different roles in each team.
Create account
After installing KubeSphere, you need to add users with different roles to the platform so that they can work at different levels for their authorized resources. At the beginning, the system defaulted to only one account admin
, with the role of platform-admin
In this step, you will create an account user-manager
and then use user-manager
create a new account.
- To
admin
identity using the default account and password (admin/P@88w0rd
) sign-on Web console.
For security reasons, it is strongly recommended that you change your password when you log in to the console for the first time. personal settings in the upper right corner of the drop-down menu, set a new password in password settings , you can also modify the console language personal settings
After logging in to the console, click platform management upper left corner, and then select access control .
In the account role , there are four available built-in roles as shown below. The first account to be created next will be assigned the
users-manager
role.Built-in role description workspaces-manager
Enterprise space administrator, manage all enterprise spaces on the platform. users-manager
User administrator, manage all users of the platform. platform-regular
Ordinary users of the platform do not have any resource operation rights before being invited to join the corporate space or cluster. platform-admin
The platform administrator can manage all the resources in the platform. In account management , click create . In the pop-up window, provide all the necessary information (marked with *), and then select
users-manager
role field. Please refer to the example below.When finished, click confirm . The newly created account will be displayed in the account list Account Management
Switch account and
user-manager
again using 060c33434d132a, and create the following three new accounts.account Character description ws-manager
workspaces-manager
Create and manage all corporate spaces. ws-admin
platform-regular
Manage all resources in the designated enterprise space (this account is used to invite project-regular members to join the enterprise space). project-regular
platform-regular
This account will be used to create workloads, pipelines, and other resources in the specified project. View the three accounts created.
Create a corporate space
In this step, you need to create a corporate space ws-manager
As the basic logical unit for managing projects, creating workloads, and organizing members, the enterprise space is the foundation of the KubeSphere multi-tenant system.
Log in to
ws-manager
as 060c33434d1467, which has the authority to manage all corporate spaces on the platform. platform management upper left corner, select access control . In enterprise space , you can see that only one default enterprise spacesystem-workspace
is listed, that is, the system enterprise space, which runs system-related components and services, and you cannot delete the enterprise space.on the right to create , name the new enterprise space
demo-workspace
, and set userws-admin
as the enterprise space administrator, as shown in the following figure:When finished, click create .
Log out of the console and log in again as
ws-admin
. In corporate space setting , select corporate member , and then click invite member .Invite
project-regular
enter the corporate space and grant it the role ofworkspace-viewer
The format of the actual role name:
<workspace name>-<role name>
. For example, in the corporate spacedemo-workspace
, the actual role name of theviewer
demo-workspace-viewer
.After adding
project-regular
to the corporate space, click confirm . In Corporate Member , you can see the two members listed.account Character description ws-admin
workspace-admin
Manage all resources in the specified corporate space (in this example, this account is used to invite new members to join the corporate space and create projects). project-regular
workspace-viewer
This account will be used to create workloads and other resources in the specified project.
Create project
In this step, you need to use the account ws-admin
created in the previous step to create the project. The project in KubeSphere is the same as the namespace in Kubernetes, providing virtual isolation for resources. For more information, see namespace .
Log in to
ws-admin
as 060c33434d1739. In project management , click create .Enter the project name (for example,
litmus
), and then click confirm that complete. You can also add an alias and description for the project.In project management , click on the project you just created to view its detailed information.
Invite
project-regular
to the project and grant the user the role ofoperator
Please refer to the figure below for specific steps.Users with the
operator
are project maintainers and can manage resources other than users and roles in the project.
Add application repository
To
ws-admin
user login KubeSphere the Web console. In your corporate space, enter the application warehouse application management , and click add warehouse .In the pop-up dialog box, set the application warehouse name to
litmus
, the URL of the application warehouse tohttps://litmuschaos.github.io/litmus-helm/
, click verify to verify the URL, and then click confirm enter the next step.After the application warehouse is successfully imported, it will be displayed in the list as shown in the figure below.
Deploy the Litmus control plane
After importing the Litmus application repository, you can deploy Litmus through the application template.
Log out of KubeSphere and log in
project-regular
user 060c33434d19f6. In your project, enter the application application load , and then click deploy the new application .In the pop-up dialog box, select from the application template .
In the pop-up dialog box, select from the application template .
from the application store : select the built-in application and the application uploaded separately in the form of Helm Chart.
comes from application template : select applications from private application warehouses and enterprise space application pools.
Select the previously added private application warehouse
litmus
from the drop-down list.Select litmus-2-0-0-beta to deploy.
You can view the application information and configuration files, version drop-down list, and then click Deploy.
Set the application name, confirm the application version and deployment location, and click Next.
On the application configuration page, you can manually edit the manifest file or click Deploy directly.
Wait for Litmus to be created and run.
Access Portal service
The service name of the Portal is litmusportal-frontend-service
. You can first go to the service interface to check its NodePort:
Use the ${Node IP}:${NODEPORT}
access the Portal:
The default username and password:
Username: admin
Password: litmus
Deploy Agent (optional)
Litmus contains two types of agents:
- Self Agent
- External Agent
By default, Litmus
is installed will be automatically registered as Self Agent, and Portal
will perform chaos experiments in Self Agent by default.
Portal
earlier, 060c33434d1d4f is a cross-cloud chaos experiment control plane. In other words, users can connect multiple External Agents deployed in external K8s clusters to the current Portal
, so that the chaos experiment can be sent to Agent
and in Portal
Observe the results.
For the deployment method of External Agent, please refer to Litmus official document .
Create Chaos Experiment
After the Portal installation is complete, you can create a chaos experiment through the Portal interface. You need to create an application for testing first:
$ kubectl create deployment nginx --image=nginx --replicas=2 --namespace=default
Let’s start creating an experiment.
Login Portal
Enter the Workflows page and click [Schedule a workflow]
Select Agent, such as Self-Agent:
Choose to add chaos experiment from Chaos Hub:
Set the name of Workflow:
Click [Add a new experiment] to add chaos experiment to Workflow:
Select experimental pod-delete:
Start scheduling immediately:
In KubeSphere, you can see that the Pod has been deleted and rebuilt:
You can also see that the experiment was successful in the Portal interface:
Click on the specific Workflow node, you can see the detailed log:
Repeat the above steps to create a chaotic experiment pod-cpu-hog:
In KubeSphere, you can see that the CPU usage of Pod is close to 1C:
The following experiment is used to simulate Pod network packet loss. Before starting the experiment, set the number of copies of Nginx to 1:
Now there is only one Pod, the IP is
10.233.71.170
:Now repeat the above steps to create a chaos experiment pod-network-loss, and modify the packet loss rate to
50%
:After entering the KubeSphere interface, toolbox Kubectl in the pop-up menu.
Test the packet loss rate by pinging the Pod's IP, you can see that the packet loss rate is close to
50%
, the experiment is successful:
All the above experiments are conducted on Pod. In addition to Pod, you can also experiment on various services such as Node and K8s components. Interested readers can test by themselves.
Workflow detailed
The so-called Workflow is actually a workflow of chaos experiments. Although there is only one experiment for each Workflow in the demonstration in the previous section, in fact, each Workflow can set up multiple experiments and execute them in order.
Workflow is implemented by CRD. You can view the CRD in the KubeSphere Console interface. Here you can see all the previously created Workflows:
Take pod-network-loss as an example to see which parameters are available:
Each experiment in Workflow is also a CRD, and the CRD name is ChaosEngine
.
Explain the meaning of each environment variable here:
- appns : The namespace of the object to be executed.
- experiments : The name of the test to be performed (such as network delay test, Pod deletion test, etc.), you can use
kubectl get chaosexperiments -n test
to view the supported experiments. - chaosServiceAccount : the sa to be used.
- jobCleanUpPolicy : Whether to retain the job that executes this test, the field can be delete/retain.
- annotationCheck : Whether to perform annotation check, if not, all Pods will be tested, the field can be true/false.
- engineState : The state of this test can be set to active/stop.
- TOTAL_CHAOS_DURATION : Chaos test duration, the default is 15s.
- CHAOS_INTERVAL : Chaos test time interval, the default is 5s.
- FORCE : Whether to use the --force option to delete a pod.
- TARGET_CONTAINER : Delete a container in the Pod (the first one is deleted by default).
- PODS_AFFECTED_PERC total, the default is 0 (equivalent to 1 copy).
- RAMP_TIME : The time to wait before and after the chaos test.
- SEQUENCE : Test execution strategy, the default is parallel (parallel) execution, can be set to serial/parallel.
The detailed parameters of each other experiment will not be repeated here, and interested readers can refer to the relevant documents by themselves.
to sum up
This article introduces you to the architecture of the chaos engineering framework Litmus and the deployment method on KubeSphere, and verifies the ability of the entire infrastructure and services to resist failures through a series of chaos experiments. Litmus is a particularly excellent chaos engineering framework, with strong community support behind it. There will be more and more experiments built in its experiment store (ie Chaos Hub). You can deploy these chaos experiments to the cluster with one click. Confusion, through the visual interface to visually display the experimental results to verify the flexibility of the cluster. With Litmus, we can not only face failures directly, but also take the initiative to create failures to find system defects and avoid black swan events.
Reference
This article is published by the blog one article multi-posting OpenWrite
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。