1
Original: https://dzone.com/articles/top-open-source-projects-for-sres-and-devops
Translation: Zhu Kunrong

Building a scalable and highly reliable software system is the ultimate goal of all SREs. Follow the open source projects in the field of monitoring, deployment, and operation and maintenance given in the blog we recently provided for continuous learning.

To become a successful SRE requires continuous learning. There are many open source projects available for SRE/DevOps, each of which is a new and exciting implementation that often addresses challenges in specific areas. The weight that these open source projects help you bear makes it easier for you to do it. In addition to these open source projects, there is also a continuous learning platform that you can experience for free.

1.CLoudprober

Cloudprober is an application that focuses on proactive tracking and monitoring of faults before they are discovered by your customers. It uses an "active" monitoring model to check whether the components you are running meet expectations. It actively runs probes, for example, to ensure that your front end can access your back end. Similarly, probes can also be used to ensure that your system can reach the virtual machine VM on your cloud. This tracking method is simpler and independent of the implementation. It tracks the configuration of your application so that you can easily find out what went wrong in the system.

characteristic:

  • Native integration with the open source monitoring stack Prometheus and Grafana. Cloudprober can also export probe results.
  • For the target on the cloud, the target is automatically discovered. It can be used out of the box for GCE and Kubernetes; other cloud services can also be completed through simple configuration.
  • Very simple in deployment. Cloudprober is written in Go and compiled into an executable binary package. It can be quickly deployed through Docker containers. For updates, there is basically no need to redeploy and reconfigure Cloudprober due to its automatic discovery.
  • The Cloudprober Docker image is very small and only contains a statically compiled binary file. It requires only a small amount of CPU and RAM to run a large number of probes.

2. Cloud Operations Sandbox(Alpha)

Cloud Operations Sandbox ( https://github.com/GoogleCloudPlatform/cloud-ops-sandbox) is an SRE practice for experts to learn from Google and adapt it to their own cloud system Open source platform on the Internet. It is based on Hipster Shop, a cloud platform with native microservices. Remember: it requires a Google cloud service account.

characteristic:

  • Demo Service-An application based on modern, cloud-native, microservice architecture in design.
  • One-click deployment-a script that handles deploying services to the Google Cloud Platform.
  • Load Generator-The part to create simulated traffic on Demo Service.

3. Kubernetes version check

A Kubernetes tool that allows you to observe the mirrored version of the current running cluster ( https://github.com/jetstack/version-checker#:~:text=version%2Dchecker%20is%20a%20Kubernetes,This%20tool%20is% 20currently%20experimental.). This tool allows you to see the current mirrored version in tabular form on the Grafana dashboard.

characteristic:

  • Multiple self-deployed registries can be configured at once.
  • The tool allows you to see version information similar to Prometheus metrics.
  • Support registries such as ACR, DockerHub, ECR.

4. Istio

Istio ( https://istio.io/) is an open source framework for monitoring active traffic between microservices, implements strategies, and aggregates telemetry data using standard methods. Istio's control panel provides an abstraction layer on top of managing the underlying cluster Kubernetes.

characteristic:

  • Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic.
  • There are rich rules for routing, retry, failover, failure injection and other controls.
  • There are pluggable policy layers and configuration APIs to support access control, current limiting and quotas.
  • All traffic in the cluster can be automatically measured, logged and tracked, which also includes the traffic entering and exiting the cluster.
  • Through strong identification verification and authorization, a secure service-to-service communication is ensured in the cluster.

5. Checkov

Checkov ( https://www.checkov.io/) is a static code review tool for infrastructure as code. It scans Terraform, Cloud Details, Cubanet, Serverless, or ARM model cloud infrastructure to check for security and misconfigurations.

characteristic:

  • More than 400 built-in rules cover the best protection and security practices of AWS, Azure, and Google Cloud.
  • The Terraform supplier configuration can monitor the deployment, maintenance and update of IaaS, PaaS, or SaaS managed by Terraform.
  • Detect EC2 user data, Lambda context variables, AWS identity information in Terraform vendors.

6. Litmus

Cloud native chaos engineering ( https://github.com/litmuschaos/litmus)

Limus is a cloud-native chaos engineering toolbox. Litmus provides tools for orchestrating chaos on Kubernetes to help SREs find vulnerabilities in their deployment. SRE first conducts chaos testing in the pre-launch environment, and finally finds faults and vulnerabilities in the deployment environment. Fixing these problems can improve the usability of the system.

characteristic:

  • Developers can perform chaos testing when the application is deployed, just like when running unit tests or integration tests.
  • For CI pipeline builders: Run rolling tests in the pipeline stage to find bugs.

7. Locust

Locust ( https://github.com/locustio/locust) is a simple to use, scriptable and flexible performance testing application. You can define your user's behavior through standard Python code, no need to use clunky UI or domain specific language. This makes Locust very extensible and developer friendly.

characteristic:

  • Locust is distributed and scalable-it can easily support hundreds of users.
  • Its web-based UI can display real-time progress.
  • Any system can be tested with a little modification.

8. Prometheus

Prometheus ( https://github.com/prometheus/prometheus), a cloud native foundation project, is a monitoring system for systems and services. It extracts measurement information from the configured target location and displays the result. If the query information conflicts, it will trigger a notification.

characteristic:

  • A multi-dimensional data model (time series defined by the indicator name and the dimension of the key-value pair set)
  • The target is discovered through service discovery or static configuration.
  • There is no dependency on distributed storage; single-service nodes are also available.
  • PromQL, a powerful and flexible query language

9. Kube-monkey

Kube-monkey ( https://github.com/asobti/kube-monkey) is a Kubernetes cluster implementation Randomly delete Kubernetes pods to detect failure prevention resources and perform detection and verification at the same time.

characteristic:

  • Kube-monkey uses the opt-in model to operate and only runs on Kubernetes that has accepted kube-monkey to terminate cluster pods.
  • The running schedule is highly customized based on your needs.

10. PowerfulSeal

PowerfulSeal ( https://github.com/powerfulseal/powerfulseal) injects faults into the Kubernetes cluster to help you identify problems as quickly as possible. It solves the scene created by the chaos experiment.

characteristic:

  • Compatible with Kubernetes, OpenStack, AWS, Azure, GCP and local machines.
  • Integrate with Prometheus and Datadog to collect metrics.
  • Support multiple modes such as custom use cases.

in conclusion

With the convenience provided by the scalability of open source technology, you can add features that suit your custom architecture. These open source projects are documented and supported by the open source community. Since the microservice architecture will dominate the cloud computing field, reliable tools for monitoring and locating these instance problems will certainly become part of every developer's library.


This article is from Zhu Kunrong's WeChat public account "Malt Bread", the public account id "darkjune_think"

Developer/Science Fiction Enthusiast/Hardcore Host Player/Amateur Translator
Please specify if reprinted.

Weibo: Zhu Kunrong
Station B: https://space.bilibili.com/23185593/

Communication Email: zhukunrong@yeah.net


祝坤荣
1k 声望1.5k 粉丝

科幻影迷,书虫,硬核玩家,译者