In the era of microservices, cloud computing, and serverless architecture, it is very useful to understand Kubernetes and know how to use it. However, the official Kubernetes documentation is a bit difficult to understand for users who are new to cloud computing. In this article, we will understand the important concepts in Kubernetes. In the next series of articles, we will also learn how to write configuration files, use Helm as a package manager, create a cloud infrastructure, use Kubernetes to easily orchestrate services, and create a CI/CD pipeline to automate the entire workflow. With this information, you can start any kind of project and create a powerful infrastructure.
First of all, we know that there are many benefits of using containers, from the speed of deployment to large-scale consistent delivery. Even so, containers are not the solution to all problems, because the use of containers will bring certain overhead, such as maintaining a container orchestration layer. Therefore, you need to analyze the cost/benefit at the beginning of the project.
Now, let us start the journey of the Kubernetes world!
Kubernetes hardware structure
node
A node is a worker machine in Kubernetes, which can be any device with CPU and RAM. For example, a smart watch, a smart phone or a notebook, or even a Raspberry Pi can become a node. When we use the cloud, the node is a virtual machine (VM). So, in simple terms, a node is an abstract concept of a single device. The advantage of this abstraction is that we don't need to know the underlying hardware structure. We only use nodes, so that our infrastructure is independent of the platform.
cluster
A cluster is a group of nodes. When you deploy the program to the cluster, it will automatically distribute the work to each node. If more resources are needed (in short, we need more money), then new nodes will be added to the cluster and work will be automatically redistributed.
We run our code on the cluster, but we don't need to care about which part of the code is running on which node. The assignment of work is automatic.
persistent volumes
Because our code can be transferred from one node to another node (for example, a node does not have enough memory, then the work will be rescheduled to another node with sufficient memory), so the data saved on the node is easy to lose . If we want to keep our data permanently, we should use persistent volumes. Persistent volume is similar to an external hard disk, you can insert it and save your data on it.
Kubernetes developed by Google is a platform for stateless applications, and its persistent data is stored elsewhere. When this project matures, many companies want to use it in stateful applications, so developers need to add persistent volume management. Like the early virtualization technology, the database server is usually not the first server to migrate to the new architecture. This is because the database is the core of many applications and may contain a lot of important information, so the local database system is usually very large in a virtual or physical machine.
So, the question is, when should we start using persistent volumes? To answer this question, first of all, we should understand the different types of database applications.
We divide data management solutions into the following two categories:
- Vertical scaling-including traditional RDMS solutions, such as MySQL, PostgreSQL, and SQL Server
- Horizontal scaling-including "NoSQL" solutions, such as ElasticSearch or Hadoop-based solutions
Vertical scaling solutions (such as MySQL, PostgreSQL, and Microsoft SQL) should not be used in containers. These database platforms require high I/O, shared disks, and block storage, and cannot handle the loss of nodes in the cluster, but this situation often occurs in a container-based ecosystem.
Containers can be used for horizontally scalable applications (such as Elastic, Cassanda, Kafka, etc.). They can withstand the loss of nodes in the database cluster and the database applications can restore balance on their own.
Normally, you should containerize a distributed database to take advantage of redundant storage technology and be able to handle node loss in the database cluster (ElasticSearch is a good example).
Kubernetes software components
container
One of the goals of modern software development is to ensure that various applications can be isolated from each other on the same host or cluster. A virtual machine is a solution to this problem. But virtual machines need their own operating system, so their size is usually gigabytes.
The container is the opposite. It can isolate the execution environment of the application but share the underlying operating system. Therefore, the container is like a box in which we can store everything we need to run the application: code, runtime, system tools, system repositories, settings, etc. They usually require only a few megabytes to run, far less than the resources required by a virtual machine, and can be started immediately.
Pods
Pod is a set of containers. In Kubernetes, the smallest unit is Pod. A pod can contain multiple containers, but usually we only use one container in each pod, because the smallest unit of replication in Kubernetes is the pod. If we want to expand each container individually, we can add a container to the Pod.
Deployments
The initial function of Deployment is to provide declarative updates for pods and ReplicaSets (the same Pod will be replicated many times). Using deployment, we can specify how many copies of the same pod should be running at any time. Deployment is similar to a pod manager, it can automatically start the required number of pods, monitor pods, and recreate pods in the event of a failure. Deployment is extremely useful because you don't need to create and manage each pod separately.
We usually use deployment for stateless applications. However, you can survive the deployment state and make it stateful by attaching a persistent volume to it.
Stateful Sets
StatefulSet is a new concept in Kubernetes and it is used to manage the resources of stateful applications. It manages deployment and expansion of a set of pods, and ensures the order and uniqueness of these pods. It is similar to deployment. The only difference is that deployment creates a set of pods with arbitrary names, and the order of the pods is not important to it, while the pods created by the StatefulSet have unique names and orders. So, if you want to create 3 copies of a pod named example, then the StatefulSet will be created as: example-0, example-1, example-2. Therefore, the most important advantage of this creation method is that you can get the general situation through the pod name.
DaemonSets
DaemonSet can ensure that the pod runs on all nodes in the cluster. If a node is added/removed from the cluster, DaemonSet will automatically add/remove the pod. This is very important for monitoring and logging, because you can monitor each node and do not need to manually monitor the cluster.
Services
Deployment is responsible for keeping a group of Pods in a running state, then Service is responsible for starting network access for a group of Pods. Services can provide standardized features across clusters: load balancing, service discovery between applications, and zero downtime application deployment. Each service has a unique IP address and DNS host name. You can manually configure the corresponding IP address or host name for the application that needs to use the service, and then the traffic will be load balanced to the correct pod. In the external traffic section, we will learn more about the types of services and how we communicate between internal services and the external world.
ConfigMaps
If you want to deploy to multiple environments, such as staging, development, and production environments, bake configuration into the application is not a good operation because of the differences between the environments. Ideally, you would want different configurations for each deployment environment. Thus, ConfigMap came into being. ConfigMaps allow you to decouple configuration artifacts from images to maintain the portability of containerized applications.
External flow
Now that you know the services running in the cluster, how do you get external traffic to your cluster? There are three types of services that can handle external traffic: ClusterIP, NodePort, and LoadBalancer. There is also a fourth solution: add another abstraction layer called Ingress Controller.
ClusterIP
ClusterIP is the default service type in Kubernetes, which allows you to communicate with other services within the cluster. Although ClusterIP is not designed for external access, as long as some changes are made using the proxy, external traffic can access our services. Do not use this solution in a production environment, but you can use it for debugging. Services declared as ClusterIP should not be directly visible from the outside.
NodePort
As we saw in the first part of this article, the pod is running on the node. Nodes can be a variety of different devices, such as laptops or virtual machines (but when running in the cloud). Each node has a fixed IP address. By declaring a service as a NodePort, the service will expose the node IP address so that you can access it from the outside. You can use NodePort in a production environment, but for large applications with many services, manually managing all the different IP addresses is very troublesome.
LoadBalancer
Declaring a LoadBalancer type of service, you can use the Cloud provider's LoadBalancer to expose it to the outside world. How the external load balancer routes traffic to the service Pod depends on the cluster provider. With this solution, you don't have to manage all the IP addresses of each node in the cluster, but you will be equipped with a load balancer for each service. The disadvantage is that each service has a separate load balancer, and you will be charged according to the load balancer instance.
This solution is suitable for production environments, but it is somewhat expensive. Next, let's take a look at a slightly cheaper solution.
Ingress
Ingress is not a service, but an API object, which can manage external access to cluster services. It acts as a reverse proxy and a single entry point into your cluster, routing requests to different services. I usually use NGINX Ingress Controller, which assumes reverse proxy and also functions as SSL. The best production solution for exposing ingress is to use a load balancer.
With this solution, you can expose any number of services with a single load balancer, so you can keep costs to a minimum.
to sum up
In this article, we learned about the basic concepts in Kubernetes and its hardware architecture. We also discussed different software components, such as Pod, Deployment, StatefulSets, and Services, and learned how services communicate with the outside world. Hope it can help you sort out the intricate component architecture in Kubernetes again.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。