The grievances between Docker and k8s (3)—The coming waves of Docker are menacing

Please indicate the source for reprinting: Grape City official website, Grape City provides developers with professional development tools, solutions and services, and empowers developers.

In the previous section, we introduced how the original PaaS platforms such as Cloud Foundry solve the container problem. This article will show you how Docker solves the two problems of consistency and reusability encountered by Cloud Foundry, and compare and analyze Docker and traditional virtual machines. The difference.

Docker's improvements compared to Cloud Foundry

Use "Mount Namespace" to solve consistency problems

In the first section of this series of articles, we mentioned that Docker quickly replaced Cloud Foundry with the Docker Image function. What is this Docker image? How to solve it by using different file systems for different containers Consistency issues? First, let’s take a look at the isolation function and the Namespace mechanism mentioned in the previous section.

Mount Namespace, the "Mount" in the name can remind us that this mechanism is related to the content of file mounting. Mount Namespace is used to isolate the mount directory of the process, so we can use a "simple" example to see how it works.


(Develop a container that does not implement file isolation in C language)

The above is a simple C language code, the content only includes two logics:
1. Create a child process in the main function, and pass a parameter CLONE_NEWNS, this parameter is used to implement Mount Namespace;
2. In the subprocess, the /bin/bash command is called to run a shell inside the subprocess.

Let's compile and execute this program:

gcc -o ns ns.c

So we enter the shell of this child process. Here, we can run ls /tmp to view the structure of the directory, and compare it with the host:


(The /tmp directory inside and outside the container)

We will find that the data displayed on both sides are actually exactly the same. According to the conclusion of the last part of Cpu Namespace, you should see two different file directories. why?

The contents of the folders inside and outside the container are the same because we modified the Mount Namespace. Mount Namespace modifies the process’ perception of the “mount point” of the file system, which means that only all directories generated after the mounting operation will be a new system, and if the mounting operation is not performed, then It's exactly the same as that of the host.

How to solve this problem and realize file isolation? We only need to tell the process that a mount operation is needed in addition to the Mount Namespace declaration when creating the process. Simply modify the code of the new process, and then run it to view:


(Code and execution effect of file isolation)

At this time, the file isolation is successful, and the /tmp of the child process has been mounted into tmpfs (a memory disk), which is equivalent to creating a completely new tmp environment, so the newly created directory host machine inside the child process Can't see it anymore.

The simple code above is from the implementation of the Docker image. Docker image is essentially a encapsulation of rootfs in file operations. Docker encapsulates the rootfs of the operating system required by an application through Mount Namespace, which changes the dependency between the application and the operating system, that is, the original application is in the operating system It runs, and Docker turns the "operating system" package into a dependent library of the application, which solves the problem of consistency in the operating environment of the application. No matter where it is, the system the application runs on has become a "dependency library", so that consistency can be guaranteed.

Use "layers" to solve reusability problems

After achieving file system isolation and solving the consistency problem, we still need to face the problem of reusability. In actual use, it is unlikely that we will mount a new rootfs every time we make a mirror, which is time-consuming and laborious. The "CD" without any programs also takes up a lot of disk space to mount these contents.

Therefore, Docker mirroring uses another technology: UnionFS and a brand-new concept: layer to optimize the disk space occupation of each mirror and improve the reusability of the mirror.

Let's take a brief look at what UnionFS does. UnionFS is a joint mounting function, which can jointly mount files in multiple paths to the same directory. For example, "chestnuts", now there is a directory structure as follows:


(Use the tree command to view two folders containing A and B)

There are two files a and x in the A directory, and two files b and x in the B directory. Through the function of UnionFS, we can mount these two directories to the C directory. The effect is shown in the following figure:

mount -t aufs -o dirs=./a:./b none ./C


(Use the tree command to view the effect of joint mounting)

In the end, there is only one copy of x in the C directory, and if we modify a, b, and x in the C directory, the files in the previous directories A and B will also be modified. And Docker uses this technology to jointly mount the files in its mirror. For example, you can mount the /sys, /etc, /tmp directories together to rootfs to form a subprocess that looks like a complete Rootfs, but does not occupy additional disk space.

On this basis, Docker also innovated a layer concept. First, it mounts the files in the rootfs required by the system kernel to a "read-only layer", and mounts the files that can be modified such as user applications, system configuration files, etc., to the "read-write layer" in. When the container starts, we can also mount the initialization parameters to a special "init layer". In the final stage of the container startup, these three layers are again mounted jointly, and finally form the rootfs in the container.


(Docker's read-only layer, read-write layer and init layer)

From the above description, we can understand that the read-only layer is most suitable for placing fixed versions of files, and the code will hardly change in order to achieve the greatest degree of reuse. For example, the movable type grid public cloud is developed based on .net core, and the basic environment we use will be designed in the read-only layer. Every time the latest image is obtained, each read-only layer is exactly the same. So there is no need to download at all.

The "layer" of Docker explains why the Docker image is only so slow when it is downloaded for the first time, and the subsequent images are fast, and each image seems to be hundreds of megabytes, but in the end the hard disk on the machine is not occupied so much. Many reasons. Smaller disk space and faster loading speed have greatly improved the reusability of Docker.

Docker container creation process

The above is the whole principle of Docker container. Based on the previous article, we can summarize the process of Docker creating a container is actually:

  • Enable Linux Namespace configuration;
  • Set the specified Cgroups parameters;
  • The root directory of the process
  • Jointly mount all layers of files

Off-topic: the difference between Docker and traditional virtual machines

In fact, Docker has also done a lot of functions, such as permission configuration, DeviceMapper, etc. What I'm talking about here is just a generalized conceptual explanation, and the various implementations at the bottom have very complicated concepts. Specifically, what is the difference between a container and a traditional virtual machine?

In fact, container technology and virtual machine are two means to realize virtualization technology, but the virtual machine controls the hardware through the Hypervisor and simulates a GuestOS for virtualization. The inside is an almost real virtual operating system, and the inside and outside are Completely isolated. The container technology is the isolation and allocation of system resources through software similar to Docker Engine through the Linux operating system. The comparison between them is roughly as follows:


(Docker vs virtual machine)

Virtual machines are physically isolated, which is more secure than Docker containers, but it will also bring about a result: without optimization, a KVM virtual machine running CentOS needs to occupy 100~200MB of memory after it is started. In addition, user applications also run in virtual machines, and the application system calls the host's operating system inevitably need to be intercepted and processed by virtualization software, which will bring performance loss, especially for computing resources, network and disk I/O. The loss is very large.

But the container is in contrast, the application after containerization is still a normal process on the host, which means that the loss caused by virtualization does not exist; on the other hand, the container that uses Namespace as an isolation method does not need to be separate In this way, the additional resource content occupied by the container is almost negligible.

Therefore, for the PaaS platform that needs more fine-grained resource management, this "agile" and "efficient" container has become one of the best. It looks like a container that solves all problems. Are there no shortcomings?

In fact, the drawbacks of containers are particularly obvious. First of all, because the container is simulated isolation, resources that cannot be simulated by Namespace: For example, the operating system kernel cannot be isolated at all, and the program and the host computer inside the container share the operating system kernel, that is, a low version The Linux host machine is likely to be unable to run high-version containers. Another typical chestnut is time. If the system time is modified by some means in the container, the time of the host computer will also change.

Another drawback is security. Ordinary companies will not directly expose the container to external users for direct use, because the kernel code can be directly manipulated in the container. If a hacker can modify the kernel program by some means, then the entire host can be hacked. This is why The direct reason why our own project started writing Docker by ourselves to the end was abandoned. There are generally two ways to solve security: one is to limit the running permissions of the process in Docker, and control its value to operate the system equipment we want it to operate, but this requires a lot of customized code, because we may not know What does it need to operate? Another way is to add a layer of virtual machine implementation sandbox outside the container, which is also the main implementation method of many leading manufacturers.


Docker defeated the predecessor Cloud Foundry by virtue of its consistency and reusability. This article introduces the specific changes that Docker has made to containers, and also introduces the obvious shortcomings of containers. In the next article, we will introduce to you how Docker is lonely, and who is the new star in the post-Docker era. Stay tuned.

阅读 502



1.7k 声望
14.1k 粉丝
0 条评论


1.7k 声望
14.1k 粉丝