introduction
This is the fourth article in a series of articles on how to make a minimal Docker image: static binary files. In the first article, I talked about how to create a smaller image by writing better Dockerfiles; in the second article, I discussed how to use docker-squash to compress the image layer to make a smaller image; In three articles, I introduced how to use Alpine Linux as a smaller base image.
In this article, I will explore the final way to make a minimal image: static binary files. If the application does not have any dependencies and does not need anything except the application itself, what should be done in this case? This is what static binaries do. They include all the dependencies of a statically compiled program that runs in the binary file itself. In order to understand its meaning, let us take a step back.
Dynamic link
Most applications are built using a process called dynamic linking. Each application is completed in such a way at compile time that it defines the library that needs to be run, but in fact it does not Include these libraries. This is very important for operating system releases because the library can be updated independently of the application, but it is not that important when running the application in a container. Each container image contains all the files it will use, so these libraries will not be reused anyway.
Let's look at an example, create a simple C++ program and compile it as shown below, you will get a dynamically linked executable file.
ianlewis@test:~$ cat hello.cpp
#include <iostream>
int main() {
std::cout << "Hello World!\n";
return 0;
}
ianlewis@test:~$ g++ -o hello hello.cpp
$ ls -lh hello
-rwxrwxr-x 1 ianlewis ianlewis 8.9K Jul 6 07:31 hello
g++ is actually performing two steps, it is compiling and linking my program. The compilation step will only create a normal C++ object file, and the linking step is to add the dependencies required to run the application. Fortunately, most compilation tools do this, and compilation and linking can be done as follows.
ianlewis@test:~$ g++ -c hello.cpp -o hello.o
ianlewis@test:~$ g++ -o hello hello.o
ianlewis@test:~$ ls -lh
total 20K
-rwxrwxr-x 1 ianlewis ianlewis 8.9K Jul 6 07:41 hello
-rw-rw-r-- 1 ianlewis ianlewis 85 Jul 6 07:31 hello.cpp
-rw-rw-r-- 1 ianlewis ianlewis 2.5K Jul 6 07:41 hello.o
By running the ldd command on the Linux system, the shared object (shared library) required by each program or shared object specified on the command line will be output
. If you are using Mac OS, you can get the same information by running otool -L.
ianlewis@test:~$ ldd hello
linux-vdso.so.1 => (0x00007ffc0075c000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f88c92d0000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f88c8f06000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f88c8bfc000)
/lib64/ld-linux-x86-64.so.2 (0x0000558132cbf000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f88c89e6000)
As you can see, my program depends on the C and C++ standard libraries libc and libstdc++. When running the program, the dynamic linker will find the libraries I need, and link them at runtime. The configuration file on Linux is usually under /etc/ld.so.conf/.
So, what happens if one of the libraries is deleted or moved to a location unknown to the dynamic linker? (!! Moving library files will destroy your system, don't try it lightly!)
ianlewis@test:~$ sudo mv /usr/lib/x86_64-linux-gnu/libstdc++.so.6 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.bk
ianlewis@test:~$ ldd ./hello
linux-vdso.so.1 => (0x00007ffd511c6000)
libstdc++.so.6 => not found
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdace840000)
/lib64/ld-linux-x86-64.so.2 (0x0000560da65aa000)
You can see that the dynamic linker did not find the library. What will happen if we try to run the program?
ianlewis@test:~$ ./hello
./hello: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory
As expected: the libstdc++ library could not be loaded and the application crashed, which gave us an idea of why this would be bad for the container.
Why are dynamic links bad for containers?
The main reason why dynamic linking is bad for containers is that the system on which the application is compiled may be completely different from the system on which the application is run. For Linux distributions, they can package the application as a dynamically linked executable file because they know how to set up a dynamic linker. But even for similar Linux distributions (such as Ubuntu or Debian), copying binary files from another system to another system, even naming them with different names, may cause problems.
This is why most Dockerfiles build applications in the same container image. It would be better to use Docker for multi-stage builds, but it is still not widely adopted (as of the time of writing). Regarding all the issues with copying files between systems, even with a multi-stage build, you might still want to run the application on the same Linux distribution as the one you built.
Let's try to compile the hello program on the Alpine Linux version of Ubuntu.
ianlewis@test:~$ g++ -o hello hello.cpp
ianlewis@test:~$ cat << EOF > Dockerfile
FROM alpine
COPY hello /hello
ENTRYPOINT [ "/hello" ]
EOF
ianlewis@test:~$ docker build -t hello .
Sending build context to Docker daemon 29.18kB
Step 1/3 : FROM alpine
latest: Pulling from library/alpine
88286f41530e: Pull complete
Digest: sha256:1072e499f3f655a032e88542330cf75b02e7bdf673278f701d7ba61629ee3ebe
Status: Downloaded newer image for alpine:latest
---> 7328f6f8b418
Step 2/3 : COPY hello /hello
---> 6f5aca4d2acb
Removing intermediate container 904f7c441936
Step 3/3 : ENTRYPOINT /hello
---> Running in 635f6cbde8d6
---> bbcaa65bf2e5
Removing intermediate container 635f6cbde8d6
Successfully built bbcaa65bf2e5
Successfully tagged hello:latest
ianlewis@test:~$ docker run hello
standard_init_linux.go:187: exec user process caused "no such file or directory"
Errors such as "no such file or directory" are not very descriptive, but they are the same as what we have seen before, which means that the program cannot find one of the dynamically linked dependencies.
For containers, we want the image to be as small as possible. Managing the dependencies of dynamically linked applications is a heavy task and requires a lot of tools, such as a compiled package manager that has a large number of dependencies. When we only want to run a single application, it will bring a lot of burden to our runtime environment. How to solve this problem?
Static linking allows us to bundle all the libraries that the application depends on into one binary file. This will allow the program to copy the application code and all its dependencies from a single binary file when the program is running, and try to operate it.
ianlewis@test:~$ g++ -o hello -static hello.cpp
ianlewis@test:~$ ls -lh
total 2.1M
-rwxrwxr-x 1 ianlewis ianlewis 2.1M Jul 6 08:08 hello
-rw-rw-r-- 1 ianlewis ianlewis 85 Jul 6 07:31 hello.cpp
ianlewis@test:~$ ./hello
Hello World!
ianlewis@test:~$ ldd hello
not a dynamic executable
很好,这意味着现在有了一个二进制可执行文件,可以在任何容器镜像中进行复制,并且可以正常工作!
ianlewis@test:~$ cat << EOF > Dockerfile
> FROM scratch
> COPY hello /hello
> ENTRYPOINT [ "/hello" ]
> EOF
ianlewis@test:~$ docker build -t hello .
Sending build context to Docker daemon 2.202MB
Step 1/3 : FROM scratch
--->
Step 2/3 : COPY hello /hello
---> d3b2040b4df0
Removing intermediate container 78e434104023
Step 3/3 : ENTRYPOINT /hello
---> Running in b6340a5907f5
---> 88af34342471
Removing intermediate container b6340a5907f5
Successfully built 88af34342471
Successfully tagged hello:latest
ianlewis@test:~$ docker run hello
Hello World!
As mentioned earlier, the program now includes all dependencies, so it can actually run on any other Linux server. There may be some warnings, for example, the program needs to run on a server with the same CPU architecture, but in most cases, it can be copied and work normally.
Mirror size
The size of an image based on a compiled static binary file may be much smaller than an image of an application written in languages such as Python or Java that needs to run a VM. In the previous article, we studied the Python image based on Alpine Linux for deploying Python applications.
ianlewis@test:~$ docker images python:2.7.13-alpine
REPOSITORY TAG IMAGE ID CREATED SIZE
python 2.7.13-alpine 3dd614730c9c 4 days ago 72.02 MB
The python image is only 72MB, and the application code only needs to be added to it. If only static binary files are included, the image may be much smaller, and it only needs to be as large as the binary file.
ianlewis@test:~$ ls -lh hello
-rwxrwxr-x 1 ianlewis ianlewis 2.1M Jul 6 08:41 hello
ianlewis@test:~$ docker images hello
REPOSITORY TAG IMAGE ID CREATED SIZE
hello latest 88af34342471 5 minutes ago 2.18MB
Now, it has finally reached a level where there is almost no extra point in the mirror size.
But in fact, you may want to include other applications in the image to help with troubleshooting and debugging. In this case, you may need to use Alpine Linux to install compilation tools with static binaries for the application. Tools such as shell and trace may be very helpful for your follow-up work.
Use go to write containerized applications
I cannot write an article about writing statically linked applications without mentioning Go. For reasons outside the scope of this article, without much dedication and willpower, compiling large C++ applications into Static binary files may be impractical. Many third-party or open source programs do not even provide a way to compile applications into static binary files, so they have to use images based on large Linux distributions for deployment.
Go uses statically linked binaries as part of its tools to make compilation very easy. It can be said that Go was created in this way, because Google deploys statically linked binaries in containers in its production system. And Go was created specifically to make it easy to implement, even for large applications like Kubernetes.
ianlewis@test:~$ git clone https://github.com/kubernetes/kubernetes
Cloning into 'kubernetes'...
...
ianlewis@test:~$ cd kubernetes/
ianlewis@test:~/kubernetes$ make quick-release
+++ [0711 06:33:32] Verifying Prerequisites....
+++ [0711 06:33:32] Building Docker image kube-build:build-36cca30eef-5-v1.8.3-1
+++ [0711 06:34:18] Creating data container kube-build-data-36cca30eef-5-v1.8.3-1
+++ [0711 06:34:19] Syncing sources to container
+++ [0711 06:34:22] Running build command...
...
ianlewis@test:~/kubernetes$ ldd _output/dockerized/bin/linux/amd64/kube-apiserver
not a dynamic executable
In summary, the image obtained as a static binary file is the smallest and contains all the dependencies required for operation. Therefore, it can be easily run in a container and can be easily built using modern languages such as Go. How can it not be allowed? Do people like it?
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。