This'container' is not the other container

Many people may already know this: In fact, there is no real Linux "container", there is no so-called "container" in Linux. As we all know, containers are regular processes that use two features of the Linux kernel (called namespaces and cgroups) to run. The namespace can provide a "view" for the process to hide all the content outside the namespace, thereby providing itself with an independent operating environment, where the process running in it will not be able to see or interfere with other processes.

The namespace includes the following:
• Hostname
• Process IDs
• File System
• Network interfaces
• Inter-Process Communication (IPC)

Although I said above that processes running in the namespace will not interfere with other processes, but a process can theoretically use all the resources on the physical computer where it is located, making other processes unable to use resources. To limit this, Linux introduced a feature called cgroups. A process can run in a cgroup like a namespace, but the cgroup limits the resources that the process can use. These resources include CPU, RAM, block device I/O, network I/O, etc. The CPU is usually limited by the millicore (one thousandth of the core), and the memory is limited by the number of RAM bytes. The process itself can run normally, but will only be able to use the maximum CPU within the limit of the cgroup, and if the process exceeds the memory limit set on the cgroup, an out of memory error will occur.

About namespace and cgroups, you can refer to the previous article What is a container: namespces and cgroups

What I want to point out here is that cgroups and each namespce type are independent features. Some subsets of the namespaces listed above can be used or not at all. You can only use cgroups or some other combination of the two. (Well, you are still using namespaces and cgroups, but only the root cgroup). Namespaces and cgroups can also be used for process groups (multiple processes running in a namespace), so that processes can see and interact with each other, or run them in a single cgroup, so that these processes will be Limited to a certain amount of CPU and RAM.

Combination of combinations

When running containers on Docker in the usual way, Docker will create namespace and cgroup for each container and map them one by one. This is what developers usually think of containers.
image.png
Containers are essentially independent islands, but they may have storage volumes or ports mapped to the host so that they can communicate. However, with some additional command line parameters, you can combine Docker containers with a single namespace. First, we create an nginx container.

$ cat <<EOF >> nginx.conf
> error_log stderr;
> events { worker_connections  1024; }
> http {
>     access_log /dev/stdout combined;
>     server {
>         listen 80 default_server;
>         server_name example.com www.example.com;
>         location / {
>             proxy_pass http://127.0.0.1:2368;
>         }
>     }
> }
> EOF
$ docker run -d --name nginx -v `pwd`/nginx.conf:/etc/nginx/nginx.conf -p 8080:80 nginx

Then, run the ghost container in it and add additional parameters to add it to the namespace of the nginx container.

$ docker run -d --name ghost --net=container:nginx --ipc=container:nginx --pid=container:nginx ghost

Now, the nginx container can directly proxy requests on the local host to our ghost container. Visit http://localhost :8080/, and you can see the Ghost running through the Nginx proxy. These commands create a set of containers running in the same namespace, and Docker containers running in this namespace can discover each other and communicate with each other.
image.png

Is Pod some kind of container?

Now we see that namespace and cgroup can be combined with multiple processes, and this is the essence of Kubernetes Pods. Pod allows to specify the container to run, Kubernetes will automatically set the namespace and cgroup in the correct way, but it is much more complicated than the above method, because Kubernetes does not use Docker network (it uses CNI)
image.png
Once the container is set up in this way, each process will "feel" running on the same machine. They can communicate with each other on localhost, can use shared volumes, and even use IPC or send signals to each other, such as HUP or TERM (with shared PID namespace in Kubernetes 1.7, Docker>= 1.13).

Assuming you want to run nginx and use confd now, confd will update the nginx configuration and restart nginx every time an application server is added/removed. Suppose there is an etcd server, which contains the IP address of the back-end application server. When the list changes, confd can be notified and write a new nginx configuration, and send a HUP signal to Nginx, so that nginx can reload the configuration.
image.png
The way to use Docker only is to put both nginx and confd in one container. Because Docker has only one entry point, it is necessary to make both processes run under conditions such as supervisor. This is not ideal, because Need to use supervisord to run each copy of nginx. More importantly, Docker only "understands" supervisord, because this is entrypoint, it does not understand each process, which means that other tools cannot obtain this information through the Docker API, Nginx may crash, but Docker does not know.
image.png
Kubernetes uses Pod to manage each process to gain insight into the status of each process. In this way, it can provide users with relevant status information through APIs, and can also provide services such as restarting or automatic logging when the service crashes.
image.png
Pod is a container masquerading as an API
By combining containers into Pods in this way, we can essentially create containers that other people can use "API" to add to Pods. From a normal Web API perspective, this is not an API, but an abstract API that other Pods can use.

Take the above nginx + confd as an example. confd knows nothing about the nginx process. All it knows is monitoring the values in etcd and sending HUP signals to the process or running commands. The application it runs should not be nginx. It can be any kind of application, so that you can use the confd container image and configuration, and replace it with any number of different types of Pods. The pod in which operations can be performed is usually called a "sidecar container". Imagine an image of a sidecar on a motorcycle.

You can also imagine other types of abstractions. Service meshes such as istio can be fixed as sidecars to provide service routing, telemetry, and policy execution without changing the main application. You can also use multiple sidecars, nothing prevents you from using both the confd sidecar and the istio sidecar at the same time. Applications combined in this way can build a more complex and reliable system while keeping a single app simple.


EngineerLeo
598 声望38 粉丝

专注于云原生、AI等相关技术