头图

[Article source] https://faun.pub/devops-roadmap-2022-340934d360f9

DevOps skills are in high demand, and continuous learning requires keeping yourself abreast of market demands. This post is to share notes that can help you.

Fundamentally powerful networking technologies Understand concepts such as HTTP/2, QUIC or HTTP3, Layer 4 and Layer 7 protocols, mTLS, proxies, DNS, BGP, how load balancing works, IP tables, how the Internet works, IP Addresses and schemes, and finally network design.

Master operating system basics, especially Linux
Since most systems (VMs, containers, etc.) run Linux, it's important to understand this from top to bottom. Learn scheduling, the systemd interface, the init system, cgroups and namespaces, performance tuning, and master command line utilities — awk, sed, jq, yq, curl, ssh, openssl, and more.

CI/CD
If you still like Jenkins, that's fine. However, the world has turned to cloud-native pipelines. Conceptually, the space hasn't changed much, but you can take a look at Github Actions, Tekton, etc. How to publish better? Learn about various deployment strategies such as blue-green and canary.

Containerization and Virtualization In addition to the popular Docker runtime, try containerd, podman, etc. and learn how to containerize applications, how to implement container security, how to run and orchestrate VMs in Kubernetes, see the KubeVirt project.

Container Orchestration
Kubernetes is now the de facto standard for running containers. There are a lot of things to learn about Kubernetes online. Focus on configuration best practices, application design, security, and scheduling. Setting up a cluster becomes trivial now, but operational issues the next day like setup, monitoring, logging, CI/CD, how to scale a cluster, cost optimization, and security are some of the questions one might expect you to ask.

Large-scale observability Most engineers are aware of the Prometheus Grafana stack or similar. Trends show that many organizations are consolidating their Kubernetes clusters and observability, which is helpful from a performance and cost perspective. Learn about the high-level configuration and architecture of Prometheus, and how to extend them. Research technologies like Thanos, Cortex, VictoriaMetrics, Datadog, and Loki. Continuous analysis tools such as Parca, Periscope, Hypertracking and Distributed Tracking with Open Telemetry. Service meshes like Istio are a popular ingredient in cloud-native recipes.

Platform Teams as Product Teams Platform teams function more and more like a centralized product team focused on their internal platform customers, such as developers and testers. The goal is to improve the way work is done and bring some order to the team. Try to improvise to solve problems faced by developers and QA teams. You are the facilitator for other teams, and instead of doing all the work in one central team, you guide the development team in typical DevOps responsibilities. This way you can scale up and not burn yourself too much.

image.png

Security In many small organizations, security is a second-class citizen. Product features are given more priority. However, companies are adapting to a shift-left security strategy due to increasingly sophisticated attacks and various stringent compliance requirements. Implementations of benchmarks such as end-to-end encryption, strong RBAC, IAM policies, governance and auditing, NIST, CIS, ISO27001, etc. are common. Container security, policy as code, cloud governance, and supply chain security are hot topics.

programming
The DevOps or SRE role is now taking developers' cross-cutting concerns into account and creating tools that help increase productivity while enforcing standards. Producing high-quality platform components requires good software engineering practices and skills.

I can't put enough pressure on this. Great organizations are looking for platform engineers with good programming experience. This is also important in site reliability engineering, where you need to be proficient in programming to be able to read, understand and debug code written by others, and fix it if necessary.

Python and Golang are the most popular. My recommendation is Golang, because of its strong concurrency, strict type checking, adoption in various organizations, toolchains, and features that many major projects are built with Golang, it makes sense to learn through Python.

You can try some simple things:

Write the CLI in your programming language.

Learn to write REST APIs and interact with databases

Parallelism and Concurrency

Infrastructure as Code
Terraform is the standard in the project. Once you understand the concept, it's easy to adapt to any other tool as most of them are DSL based.

Clouds Most clouds work the same way. So if you are familiar with one cloud, you can easily work with other cloud providers. Focus on how to design applications using cloud-native components in a highly available, resilient, secure, and cost-effective manner.

Technical Writing You might be wondering why I'm talking about technical writing when I'm discussing DevOps. A lot of people don't pay enough attention to this, but it's very important for how you communicate and collaborate with other teams. The future of work is remote, with email, slack/teams, chat being the primary channels for talking to others and communicating ideas.

You may regularly create documents such as runbooks, postmortems, RFCs, records of architectural decisions, and software design documents. A clear, easy-to-understand documentation can do wonders. It can help you save time for you and your readers and increase overall productivity. I suggest you read this article.

Field Reliability Engineering
The line between DevOps and SRE is getting narrower. In some organizations, the same person may serve both roles. Learn about the concepts behind SLI, SLO, and error budgeting and SRE practices. Every organization does it differently, so I don't recommend copy-pasting someone else's culture into your team. Refer to Google SRE culture.

Conclusion Personally, I'm excited to follow along this year. This is not a definitive list as it will change over time.

Service meshes - Gloo mesh offerings from Istio, Cilium Sidecarless mesh, Tetrate and Solo.

How to improve developer productivity? It's a mix of culture, automation, and tools.

SRE Platform - Honeycomb, Last9.

DevPortals - again associated with motivation to increase productivity and bridge knowledge gaps.

Observability - Technologies such as Open Telemetry, HyperTrack, Thanos, VictoriaMetrics, Vector, etc.

Security - Supply chain security, code signing, enhanced cloud security.

Golang - Improve current skills.

Serverless Computing and Event-Driven Architecture

Web3 - Understanding DevOps and Infrastructure Related Environments

Be curious and keep learning. It's easy to sustain bite-sized studies that you can do while working full-time. If you still have any questions, please feel free to make an appointment with me. I'd be happy to help.


观测云
21 声望85 粉丝

云时代的系统可观测平台