3
头图

This article is compiled based on the content of my speech shared online in [deeplus live issue 247].

Today, the content to be shared is divided into the following 4 aspects:

  • 1. Origin
  • 2. Introduce various containerization technologies
  • Three, Redis introduction
  • Fourth, the comparison of Redis containerization schemes

1. Origin

First, let's talk about why I will share this topic today. My friend and I organized a Redis technology exchange group. It has been in business for about 6 years. One day, a small partner in the group raised a question:

He asked everyone if the online Redis was installed using Docker? How about the solution that Docker uses Host's network mode and disks use local mount mode? Here we will not talk about this plan for the time being, because after today’s sharing, I believe that everyone should have a clearer understanding and evaluation of this plan.

2. Introduce various containerization technologies

1, chroot and jails

In terms of containerization technology, there is actually a long history. Although the containerization technology we are currently using, or k8s, and the concept of cloud native have only become popular in recent years, in fact, the development of containerization technology is actually very early. For example, chroot came from chroot at the earliest. Everyone may have used chroot, or have known about it. In 1979, it came from Unix. Its main function is to modify processes and subprocesses.

What effect can be achieved by using chroot? Use chroot to add a directory, and then start a process, then the process itself sees / is what we usually call the / directory, this / will be the folder we just specified, or the path we just specified . In this way, some files on our operating system or things related to permissions and security can be effectively protected.

In 2000, a new technology called jails appeared. In fact, it already had a sandbox, which was the embryonic form of a sandbox environment. If you use jails, you can allow a process or created environment to have independent network interfaces and IP addresses, and when we talk about using jails, we will definitely think of a problem, that is, if you have independent network interfaces and IP addresses , In this case, the original socket cannot be sent. Usually, the Ping command we use is the one that has more contact with the original socket. By default, this is not allowed to use the original socket, but has its own network interface and IP address, this feels like our commonly used virtual machine.

2, Linux VServer and OpenVZ

Then in 2001, a new technology appeared in the Linux community called Linux VServer. Linux VServer can sometimes be abbreviated as lvs, but it is actually different from the 4-layer proxy lvs we usually use. It is actually a patch to the Linux kernel. It needs to modify the Linux kernel. After the modification is completed, we can make it support system-level virtualization. At the same time, if Linux VServer is used, it can share system calls, and it has no simulation overhead. Yes, that is to say, some of our commonly used system calls and some functions of system calls can be shared.

In 2005, a new technology appeared-OpenVZ. OpenVZ is actually very similar to Linux VServer. It is also a kind of patch to the kernel. The biggest change between these two technologies is that it has a lot of patches on Linux and added a lot of new functions, but in 2005 At that time, all of these were not merged into the backbone of Linux, and when using OpenVZ, it can allow each process to have its own /proc or its own /sys.

In fact, we all know that in Linux, for example, to start a process, you can see the process-related information under his /proc/self. If you have your own independent /proc, you can actually achieve the effect of isolation from other processes.

Another notable feature is that it has independent users and groups, which means you can have independent users or independent groups, and this can be independent of other users or groups in the system.

Secondly, OpenVZ is commercially used, that is, many foreign hosts and various VPS use OpenVZ technology.

3, namespace and cgroups

By 2002, the new technology was namespace. In Linux, we have a new technology called namespace, namespace can achieve the isolation of specific resources in the process group. Because there are actually many kinds of namespaces that we usually use, such as PID, net, etc., and if you are not under the same namespace, you will not be able to see other process-specific resources.

In 2013, a new namespace feature was created, which is user namespace. In fact, when there is a user namespace, it is similar to the functions of independent users and groups implemented by OpenVZ mentioned above.

There are usually three types of namespace operations.

1)Clone

You can specify what namespace the child process is under.

2)Unshare

It is not shared with other processes. Adding a -net to unshare can be opened independently from other processes. It does not share its own net or the namespace of its own network.

3)Setns

Is to set the namespace for the process.

In 2008, cgroups began to be introduced into the Linux kernel, it can be used to isolate the resource usage of process groups, for example, it can isolate CPU, memory, disk, and network, especially when he did it with user namespace in 2013 After combining and redesigning, this time, it becomes more modern, just like the related features of Docker that we often use now, in fact, they all come from this time. So cgroups and namespace form the basis of modern container technology.

4, LXC and CloudFoundry

In 2008, a new technology was called LXC, and we would also call it Linux Container (hereinafter referred to as LXC). Above we mentioned many containerized technologies, such as Linux VServer and OpenVZ, but these are all achieved by patching, and LXC is the first to work directly with the upstream Linux kernel.

LXC can support privileged containers, which means that uid map, gid map, and mapping can be done on the machine, and there is no need to start with the root user, which has great convenience. And this method can greatly reduce your attack surface. These more conventional operations supported by LXC are LXC-start, which can be used to start the container, and LXC-attach can enter the container.

By 2011, CloudFoundry began to appear. He actually used a combination of LXC and Warden technologies. What I have to mention at this time is that his technical architecture is the CS model, which means that There is a client and a server. The Warden container usually has two layers. One layer is read-only OS, which is the file system of the read-only operating system, and the other layer is used for applications and their dependent non- The persistent read-write layer is a combination of these two layers.

Most of the technologies we mentioned earlier are for a certain machine, that is, for a single machine. The biggest difference of CloudFoundry is that it can manage cross-computer container clusters, which actually already has the relevant characteristics of modern container technology.

5, LMCTFY and systemd-nspawn

In 2013, Google open sourced its own containerized solution called LMCTFY. This solution can support the isolation of CPU, memory and equipment. And it supports sub-containers, allowing applications to perceive that they are currently in the container. In addition, you can create a sub-container for itself, but after the development in 2013, it gradually discovered that relying on itself to do these technologies continuously is equivalent to doing it alone. The development is always limited, so it gradually changes My main focus is on abstraction and porting, and I ported my core features to libcontainer. After libcontainer is a core of Docker's runtime, then it is donated to OCI by Docker, and then it develops to runC. We will explain this part in detail later.

Everyone knows that the server must have a process with PID 1. It is its initial process and daemon process. In modern operating systems, most people use systemd, and systemd also provides a containerized solution called systemd-nspawn. With this technology, it can be combined with systemd-related tool chains.

In addition to systemd we usually use systemctl and the like, there is also systemd machine ctl, which can manage the machine. This machine supports two main interfaces, one is to manage container-related interfaces, and the other is to manage virtual machines. Related interface.

And we usually say, that is, the container technology solution provided by systemd, which allows us to interact with the container through machine ctl, for example, you can start a container supported by systemd through machine ctl start, or through machine ctl stop, go to turn it off, and under this technology, it supports isolation of resources and networks. In fact, the most important thing is systemd ns, which actually uses namespace to do isolation. For resources, it comes from systemd. Systemd can use cgroups to isolate resources. In fact, this is also a combination of these two technical solutions.

6、Docker

And in 2013 Docker also appeared. Generally speaking, Docker is the leader of the container era. Why do you say that? Because when Docker appeared in 2013, he first mentioned the standardized deployment unit, which is Docker image. At the same time, it also launched DockerHub, which is the central mirror repository. Allow everyone to download the pre-built Docker image through DockerHub, and start the container with a line of Docker run.

With many cumbersome and complicated technologies to use, Docker proposed at this time that you only need one line of Docker run to start a container. It greatly simplifies the complexity of starting a container and improves the convenience.

So the Docker technology has become popular all over the world. And what are some of the functions provided by Docker? For example, the isolation and management of resources. And Docker before 0.9, its container runtime is LXC, after 0.9, he began to replace LXC, replaced with libcontainer, and this libcontainer is actually Google's LMCTFY we mentioned above. Then libcontainer was donated to OCI. After that, what is Docker's current container runtime? It is containerd. The lower layer of containerd is runc, and the core of runc is actually libcontainer.

By 2014, Google discovered that most of the containerized solutions actually only provide stand-alone solutions. At the same time, because Docker is also a CS architecture, it needs a Docker demand, and it needs a daemon to exist. Yes, this daemon process needs to be started by the root user, and the daemon process started by the root user actually increases the attack surface, so the security of Docker has also been criticized by many people.

At this time, Google discovered this point and made its own Borg system open source. The open source version is Kubernetes. Google also united some companies to form a Cloud Native Foundation (CNCF).

7、Kubernetes

Generally speaking, Kubernetes is the cornerstone of cloud-native applications, which means that after the emergence of Kubernetes, our cloud-native technology has gradually developed and gradually led the trend. Kubernetes provides some main features.

It can support more flexible scheduling, control and management, and this scheduler, in addition to its default, can also be more convenient to expand it, for example, we can write our own scheduler, or dear Harmony, anti-affinity, these are actually some of the characteristics we use more frequently.

There are also some services that he provides, such as built-in DNS, kube-DNS or now CoreDNS, to do service discovery through domain names, and there are many controllers in Kubernetes. It can adjust the state of the cluster to the state we expect. For example, if a pod is down, it can automatically restore it to the state we expect.

The other is that it supports a wealth of resource types, such as several main levels, the smallest is pod, and then there are deployments, or StatefulSets, similar to this kind of resources.

The last point is that it makes us like it more, that is, it has a wealth of CRD expansion, that is, you can write some custom resources by yourself, and then expand it, such as CRD.

8. More containerization technology

In addition to the main technologies we mentioned just now, there are actually many containerized technologies that we have not mentioned, such as runc, which we did not introduce much above, and containerd. Containerd is actually its own core open sourced by Docker. His goal is to make a standardized industry-available container runtime, and the open source solution of CoreOS is called rkt. The point that rkt aims at is the Docker-related security issues mentioned above. But the rkt project has now been terminated.

There is also podman open sourced by Red Hat. Podman is a kind of container that can be used to start and manage containers, and there is no daemon. So in terms of security, podman can be said to be better than Docker. Security would be better if viewed intuitively, but its convenience would be greatly reduced. For example, restarting the container, booting up, etc., but we all have some different solutions.

In 2017, there was a Kata Container at this time, and this Kata Container had a period of development. At first it was Intel. When Intel was working on its own container operation, there was also a startup company called hyper.sh. This company When they are also working on their own container operations, the two companies are aiming at making more secure containers, and the underlying technologies they use are all based on K8S. After the two companies merged, one of the open-source solutions of hyper.sh was runv. After Intel took a fancy to it, Kata Container was born. In 2018, AWS open sourced its own Firecracker.

These two technologies are actually quite different from the containerization technology on the machine we mentioned above, because the bottom layer is actually a virtual machine, and we usually think it is a kind of lightweight virtual machine Containerized technology. The above is an introduction to various containerization technologies.

Three, Redis introduction

Next, enter the introduction about Redis. The following is an introduction excerpted from the official website of Redis.

1, the main scene used by Redis

In fact, Redis is now the most widely used KV database. When we use it, the main usage scenarios may be as follows:

  • Use it as a cache, put it in front of the database, and use it as a cache;
  • Use it as a DB, this is the need to actually use it to store data for persistence.
  • As a message queue, it supports more data types, so I won’t introduce it here.

2. Features of Redis

  • It is a single-threaded model. It can actually have multiple threads, but it has only one worker thread. Starting in Redis 6.0, it supports io multi-threading, but io multi-threading can only be processed by multiple threads. There are network-related parts. In fact, you are actually single-threaded to process data, so overall, we still call it the single-threaded model.
  • Redis data is actually in memory, it is an in-memory database.
  • Related to HA, Redis wants to do HA. We used to do Redis HA mainly relying on Redis sentinel, and after Redis came out of the cluster, we mainly rely on Redis cluster to do HA. These are the two main HA solutions.

4. Comparison of Redis containerization scheme

When we talk about Redis operation and maintenance, what do we need to consider:

  • Deployment, how to deploy quickly, how to deploy quickly, but also to manage the listening ports, so that the ports are not conflicted, and there are logs and persistent files. This part is related to deployment;
  • Expansion/shrinkage is also a problem we often encounter;
  • Monitoring and alarm;
  • Failure and recovery.

The above are the aspects that we are most concerned about. I will give some introduction to these aspects next.

1. Deploy

When we mentioned to do single-machine multi-instance, when Redis is deployed as single-machine multi-instance, the first point is that we hope to have process-level resource isolation. All the Redis instances deployed on a certain node can have Your own resources can be unaffected by other instances. This is the process-level resource isolation.

Process-level resource isolation is actually divided into two aspects, one is the CPU, the other is the memory, and secondly, we hope that we can also have our own port management on a single machine, or we can have an independent network Related technologies for resource isolation.

In this case, first we mentioned process-level resource isolation. We have introduced so many containerization-related technologies. We already know that to support process-level resource isolation, the simplest solution is to use Cgroups, if you want to isolate network resources, we have namespace, which means that all solutions that support cgroups and namespaces can meet the needs of our place.

Another solution is the virtualization solution, which is what we mentioned above. For example, Kata Container, Kata Container is a virtualization-based method, because everyone has contact with the virtualization solution, and everyone knows that it is virtualization. This kind of technology, in fact, by default, all are isolated at the beginning.

So for deployment, if you use, for example, Docker, for example, you want to use systemd-nspawn, it can use both cgroups and namespace, you can use it, but it is You need to consider some convenience. For example, if you are using Docker, run a Docker command, and then just let it map to a different port, and it's over.

If you are using systemd-nspawn, in this case, you need to write some configuration files. If you want to use some virtualization solutions, you also need to prepare some images.

2, expand/shrink

Regarding expansion/reduction, there are actually two main scenarios. One scenario is the single-instance maxmemory adjustment, which is the adjustment of our maximum memory. There is also a cluster solution for our clustering, which is Redis Cluster. For this cluster size, if we expand/shrink, there will be two changes.

On the one hand, it is Node, which is the change of our node. If new nodes are added, they may also be removed.

The other is the change of Slot. I hope to migrate my slot. Of course, these will be related to Node nodes, because when we expand, we increase some Node nodes in Redis Cluster. After that, I don’t need to assign Slots to him, or I want to focus certain Slots on certain nodes. In fact, these requirements also exist.

Let’s take a look. If you wanted to adjust maxmemory at the time, if we had already done containerization on the premise, and wanted to restrict resources through cgroups, we needed a solution that could support dynamic adjustment of cgroups quotas. Program.

For example, if we use Docker update, it can directly modify an instance, or some restrictions on the cgroups resources of a certain container. For example, we can Docker update, assign new memory to it, and limit its maximum usable capacity. Memory, when you increase the number of available memory, you can then adjust the maxmemory of the instance, which means that for single-instance maxmemory, the most important thing is to have cgroups technology and provide cgroups technology Some support.

For changes to cluster nodes, this part will be introduced in detail later.

3. Monitoring alarm

The third point is monitoring alarms, whether using a physical machine, or using a cloud environment, or using any solution, the most desired effect of monitoring and alarming is that it can be automatically discovered.

We hope that when an instance is started, we can immediately know that this instance A has gotten up and knows what its status is. If monitoring alarms, this part is actually not dependent on specific containerization technology, even if It is deployed on a purely physical machine. It can also be automatically discovered through some solutions and automatically registered in our monitoring system. So it belongs to the monitoring and alarm part. In fact, it does not depend on a specific one. Container technology, but the only problem is that if a containerized solution is used, the commonly used Redis exporter can be used with Prometheus to monitor, so that the Redis exporter and Redis server can be in the same network. Namespace.

If the two of them are in the same network namespace, you can directly pass 127.0.0.1:6379, and then directly connect to it. If we are using k8s, we can directly put them in a pod, but these are both It doesn't matter, because it does not rely on specific containerization technology. You can use any technology you want to monitor and alarm.

4. Failure recovery

The last part we mentioned is failure and recovery. In this part, I think there are three main aspects:

  • first is the restart of the instance.

It is possible that in some cases, in some scenarios, your instance may crash during its operation. If you want it to restart automatically, you need a process management tool. For us, we mentioned systemd above. Systemd can support the automatic start of a certain process, and it can also support automatic start after the process hangs. It can be restarted, or if you use Docker, it also has a restart policy. , You can specify it as always or as on-failure, and then let it pull it up when it hangs up.

If you are using k8s, it is even simpler and can be pulled up automatically.

  • second is the master-slave switch.

The master-slave switchover is relatively normal, but here I also list it in the failure recovery, because it may be in the process of the master-slave switchover, it may be that your cluster status is already unhealthy. This is the case. existing. What do we need to do when the master and slave switch? On the first aspect, it needs to allow him to have data persistence. On the other hand, when the master-slave switch is performed, there may be a situation that is insufficient resources, causing the master-slave switch to fail. In this case, in fact, It is actually related to the expansion/reduction we mentioned above, so in this case, you must add resources to him, and the best way is to automatically add resources to him.

In k8s, if we want it to add resources automatically, we usually set its request and limit, which is the resource quota, and hope it can automatically add up. We usually call it vpa.

  • third is data recovery.

Generally speaking, for example, we have opened RDB and AOF, and hope that our data can be saved, so that we can directly start using it when we restore the data. So data persistence is also one aspect.

When we go to containerization, what do we need to consider? If you are using Docker, you need to hang a ticket, and then you can persist this data. For example, if you use systemd-nspawn, you also need a file directory to persist this data.

If you are using k8s, you will have a variety of options when listing the coupon. For example, you can link the RDB, s3 of ceph, or a certain local file directory. But for higher reliability, distributed storage with more copies may be used.

5. Node node change

Next, let’s talk about the Node node changes mentioned above when we mentioned service expansion/reduction. For example, I want one of my Redis clusters to expand some nodes and expand nodes. At that time, it means that you must be able to join the cluster and be able to communicate with the cluster normally, which means that you have truly joined the cluster.

And we also know that in Redis cluster, you are going to build a cluster. The biggest problem is k8s. In k8s, service discovery is actually done through a domain name. In our Redis, for example, our NodeIP, it In fact, it only supports IP, it does not support our domain name.

So if the Node node changes, what needs to be done is to allow us to dynamically write the NodeIP into our cluster configuration.

If you want it to have a complete life cycle, the following screenshot is from an operator called Kubedb. As you can see in the figure below, Redis provides the three main parts:

  • PVCs, PVCs are for data persistence.
  • Services, Services is to do service discovery.
  • StatefulSets, StatefulSets are actually a kind of resource in k8s, and this resource will be more friendly to our stateful applications.

In fact, what is still missing in the whole content? The company behind Redis is called Redis Labs, which provides a commercial solution, a solution called Redis Enterprise. In fact, it is also based on the k8s solution, which also uses the Redis operator. His solution is basically similar to Kubedb's solution, because the core still uses StatefulSets, and then adds its own Controller to accomplish this.

V. Summary

Let’s take a look at today’s summary. If it is our stand-alone use, we can give it to Docker or other tools that support cgroups or namespace resource management. Because when you use cgroups and namespaces, you can isolate resources, avoid network port conflicts and other things.

And if it is a problem like the one mentioned by my little partner above: he intends to use the host's network, just wants Docker to do a process management, and you don't want to introduce new content. Then systemd or systemd-nspawn can be considered, because this is also a containerized solution.

If it is scheduling and management in complex scenarios, for example, if you want to have a large scale and want more flexible scheduling and management, I recommend that you use the Kubernetes operator, such as Kubedb, which is actually a solution Program. If your scenario is not that complicated and relatively simple, in fact, some adjustments and modifications to the native Kubernetes StatefulSets can also meet your needs.

The above is the main content I shared today, thank you for your participation.

>>>>

Q&A

160e28ad9c4afd Q1: If the Redis cluster is built with three physical machines and each running two instances, how to ensure that the instances of each physical machine are not

A1: This problem is actually encountered by everyone under normal circumstances. The first point is that if you use a physical machine to do it, and you run two instances on each machine, and three machines run 2 instances on each machine, there are a total of 6 instances. In these six instances, can you guarantee that each of them is not a master or follower of each other? In fact, you can directly guarantee it.

The only problem is that if this is a cluster, and then a failover occurs, and an active switch of nodes occurs, it is very likely that your master and slave have changed. In fact, in this place, I actually suggest that if you find this kind of problem, you should switch manually, because the physical machine environment is used to do this. So far, I haven't seen any particularly good solutions.

Q2: How to add new nodes when expanding the capacity in k8s. How to automate the steps of capacity expansion and slot allocation?

A2: Let’s talk about it in two steps. The first part is to add new nodes. If you add new nodes, I actually mentioned it in the process just now. After adding new nodes, first you must let it go with the cluster. Do communication, but this place requires you to modify the configuration file of the cluster, and then you need it to have a NodeIP, because the communication is done through IP, so you need to modify its configuration file and add its NodeIP Go in, so that he can communicate with other nodes in the cluster. However, for this part, I recommend using operator to do it.

Q3: If you do not use Redis operator or distributed storage, how does k8s deploy a cluster?

A3: is actually possible without the Redis operator. As I have just introduced, there are two modes. One mode is to use StatefulSets. This mode is relatively safer. At the same time, the most important part of it is still to modify the configuration. You need to add an init container to your Redis container image, and then you can modify the configuration in this part first. This is OK of. After modifying the configuration operation, pull it up again so that he can join the cluster.

Q4: What is the difference between network delay under different network models?

A4: If we directly use the physical machine's network, generally speaking, we don't think this method has delay, it is the host network. Under normal circumstances, we will ignore its delay, but if you use the overlay type In the case of the network model, because it is an overlay network, when you send and unpack, of course there will be different resource losses and performance losses.

Q5: Is it generally recommended for companies to share a Redis cluster or separate clusters for each system?

A5: This question of course suggests that each system should be clustered independently. Let's take a simple example. For example, if you use a list in it, we all know that Redis has a ziplist configuration item, but it is actually yours. Storage is related to your performance. If you use the same cluster for everything in your company, then if you modify the configuration of the Redis cluster, it is likely to affect all services. But if you use a Redis group for each system independently, it will not affect each other, and there will be no situation where a certain application accidentally shuts down the cluster and then causes a chain reaction.

Q6: How do you consider Redis persistence in production?

A6: this part of things I think so. If you really need to do persistence, on the one hand Redis provides two cores, one is RDB, and the other is AOF. If you have a lot of data, your RDB may become very large. If you are going to do persistence, I usually suggest to do a trade-off between the two. Because generally speaking, even if you are using a physical machine environment, I also suggest that your storage directory can be placed in a separate disk, or you can go to a distributed storage, but the premise is that you need to ensure it Because you can't drag down your cluster because of its write performance, it is more recommended that you can turn them all on, but if your data is not that important, you can just turn on AOF.

Q7: Is the production-level ceph reliable?

A7: In fact, many people have discussed the reliability of ceph. Personally, the reliability of ceph is guaranteed. I use a lot of ceph here, and save a lot of core data, the reliability of ceph is ok.

The point is to say whether you can handle it, and there is a company that everyone may know, called SUSE, which is a distribution of Linux. This company actually provides an enterprise-level storage solution, and its In fact, ceph is still used at the bottom level. In fact, this is normal. As long as there is a dedicated person to do this and then solve it, I think ceph is stable enough.

By the way, if you are using k8s, there is now a project called rook, which actually provides a ceph operator. This solution is relatively stable now. I recommend you try it.

Q8: May I ask how to apply for memory, limit memory, and how to configure Redis memory?

A8: There are a few issues to consider here. First of all, let’s talk about the memory of Redis itself. In fact, it depends on your actual business usage scenarios, or the actual needs of your business. You definitely cannot make your Redis instance or The memory of the Redis cluster is full.

If it is full, you need to turn on lru to do things like expulsion. This is one aspect. On the other hand is to apply for memory. In fact, I understand that the question you want to ask here should be that under the k8s environment, one is request and the other is limit. Limit is definitely your available limited memory. Limit memory You must consider some of the memory used by Redis itself.


Welcome to subscribe to my article public account【MoeLove】

TheMoeLove


张晋涛
1.7k 声望19.7k 粉丝