introduction

This is the second article on how to make a minimized Docker image. In the previous article, I talked about how to create a minimized Docker image, but the size of the image that can be made is actually limited. I outlined a way to make The layer layer added to the Docker image becomes smaller, but sometimes it may not be possible, and it may need to run some additional steps in a specific order.
For example, in the following example, you need to add a file in the middle step:

RUN ...
ADD some_file /
RUN ...

What if I need to do some processing in the first RUN command before adding files, and then do more processing before doing some cleaning in the second RUN command? In this case, it is very unlucky that Docker will create a layer after each command, and may also encounter the following tricky situation: the base image to be used inherits from many other images, and each image Both have added their own large layer.

Docker Squash

Docker does not provide a way to separate the running command from the layer cache. In theory, this can be done, but this will cause the image to be too large. In order to reduce the number of layers and their size, the layer layer can be compressed like git commit. There is a very cool tool called docker-squash that can do this, you can learn more about this original article
image.png

Docker squash will compress multiple image layer layers in order to delete all the data stored in the intermediate steps. This is really great when you encounter the above situation, or if you want to make the Dockerfile less complicated, try it Give it a try.

Squashing Python

I want to see if I can shrink the standard python:2.7.11 image on the docker hub. I noticed from the Dockerfile that I must first clear the currently installed Debian python, and then download and compile my own version. However, since Debian python has been included in an earlier layer, this space has been taken up by our mirrors, which also depend on several other Dockerfiles, each of which adds its own layer. Let's see how much space can be saved by squashing.

First pull the image of python:2.7.11 to the local

$ docker pull python:2.7.11
2.7.11: Pulling from library/python
7a01cc5f27b1: Pull complete 
3842411e5c4c: Pull complete 
...
127e6c8b9452: Pull complete 
88690041a8a3: Pull complete 
Digest: sha256:590ee32a8cab49d2e7aaa92513e40a61abc46a81e5fdce678ea74e6d26e574b9
Status: Downloaded newer image for python:2.7.11

You can see that the image has many layers and the size is about 676MB.

$ docker images python:2.7.11
REPOSITORY          TAG                 IMAGE ID            CREATED
VIRTUAL SIZE
python              2.7.11              88690041a8a3        2 weeks ago
676.1 MB

docker-squash does not allow to compress the images in the local mirror warehouse, which is annoying. Instead, it requires that the image be exported as a file, and then continue to operate and create a new squashed image.

$ docker save python:2.7.11 > python-2.7.11.tar
$ sudo bin/docker-squash -i python-2.7.11.tar -o python-squashed-2.7.11.tar

Now you can see that the new file has been reduced by about 75MB.

~$ ls -lh python-*.tar
-rw-rw-r-- 1 ian  ian  666M Feb 15 16:32 python-2.7.11.tar
-rw-r--r-- 1 root root 590M Feb 15 16:33 python-squashed-2.7.11.tar

After reloading it into the local mirror warehouse, I checked the mirror size again and found that it was much smaller:

$ cat python-squashed-2.7.11.tar | docker load
$ docker images python-squashed
REPOSITORY          TAG                 IMAGE ID            CREATED
VIRTUAL SIZE
python-squashed     latest              18d8ebf067fd        11 days ago
599.9 MB

Virtual Size

You will notice that although docker shows us the "virtual size" of the image, it is because Docker reuses the image that depends on the same layer, just like the way of git commit and the way of modification or squashing submission, this is a brand new In the submission method, docker-squash will create a brand new independent layer that contains all the content.

Docker-squash allows you to handle this situation by providing the -from parameter. The default value of this parameter is the first FROM layer. The above situation is like this. Because there are many FROM layers, it can compress some unnecessary data, but still leave this layer from the base image. By specifying this parameter, you can decide which base image to use, so you don't have to Go download it.

$ docker-squash -from 18d8ebf067fd -i ... -o ...

Docker-squash is not a panacea, but it does add a tool to your toolbox for managing Docker image size. In the next two articles, I will discuss some other tools and methods to reduce the size of Docker images.


EngineerLeo
598 声望38 粉丝

专注于云原生、AI等相关技术