头图

Remember a real experience of Kafka server downtime! !

冰河
中文

Hello everyone, I am Glacier~~

It is estimated that the worship server is not working before the festival, and there will always be more or less problems with the server after the new year. I don't know if it is a human problem or a feng shui problem. Yesterday, when I got off work, I explained to the operation and maintenance partner several times: If you use Docker to install the Kafka cluster, you also need to allocate a larger number of server hard disks in the Kafka cluster. Log collection, transmission, etc., are all carried out through the Kafka message bus.

Unexpectedly, when I arrived at the company this morning, I just arrived at the work station and turned on my computer. I instantly received a large number of server warning messages in the email. Then I saw a display on the large monitoring screen that several test servers on the intranet were down. At this moment, my expression is like this.

I rely on, what's the situation? Do things as soon as you come? Which servers are having problems? Looking at the big screen again, let me go, aren't these the few Kafka cluster servers that I told the operation and maintenance partner yesterday?

Hung up just after testing? Isn't it so bad?

So, I quickly walked to the operation and maintenance partner and said: How did you configure the server yesterday?

He said: I have no configuration? Isn't it a test environment? I didn't have much configuration. I gave 120G space for each server and installed Kafka cluster according to the default settings!

Me: Didn't I tell you that I asked you to set a larger server disk space? . . .

No matter how speechless in my heart, I still have to solve the problem! So I hurried to log in to the server, execute commands on the server command line, and switch the directory where the current server terminal is located to the default directory of the Docker image.

[root@localhost ~]# cd /var/lib/docker

The result was an error, and the error message is as follows.

[root@localhost ~]# ls -bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间
-bash: 无法为立即文档创建临时文件: 设备上没有空间

Cannot switch directories anymore. What to do? I subconsciously took a look at the disk situation of the server, and I noticed something.

[root@localhost ~]# df -lh
文件系统                      容量  已用  可用 已用% 挂载点
devtmpfs                      3.8G     0  3.8G    0% /dev
tmpfs                         3.9G     0  3.9G    0% /dev/shm
tmpfs                         3.9G   82M  3.8G    3% /run
tmpfs                         3.9G     0  3.9G    0% /sys/fs/cgroup
/dev/mapper/localhost-root   50G   50G   0G   100% /
/dev/sda1                     976M  144M  766M   16% /boot
/dev/mapper/localhost-home   53G   5G   48G   91% /home
tmpfs                         779M     0  779M    0% /run/user/0
overlay                        50G   50G   0G   100% /var/lib/docker/overlay2/d51b7c0afcc29c49b8b322d1822a961e6a86401f0c6d1c29c42033efe8e9f070/merged
overlay                        50G   50G   0G   100% /var/lib/docker/overlay2/0e52ccd3ee566cc16ce4568eda40d0364049e804c36328bcfb5fdb92339724d5/merged
overlay                        50G   50G   0G   100% /var/lib/docker/overlay2/16fb25124e9b85c7c91f271887d9ae578bf8df058ecdfece24297967075cf829/merged

I went, the root directory disk space occupancy rate was 100%, and it was exactly the same as I thought. And in the output result information, several important information is displayed, as shown below.

overlay                        50G   50G   0G   100% /var/lib/docker/overlay2/d51b7c0afcc29c49b8b322d1822a961e6a86401f0c6d1c29c42033efe8e9f070/merged
overlay                        50G   50G   0G   100% /var/lib/docker/overlay2/0e52ccd3ee566cc16ce4568eda40d0364049e804c36328bcfb5fdb92339724d5/merged
overlay                        50G   50G   0G   100% /var/lib/docker/overlay2/16fb25124e9b85c7c91f271887d9ae578bf8df058ecdfece24297967075cf829/merged

Isn't this the default installation image of Docker?

What's the next step? We see that the /home directory is still relatively free. We can move the Docker default mirror directory from the /var/lib/docker directory to the /home/docker directory to temporarily ease the pressure on the server and perform testing. For the rest, wait until the servers are re-allocated before switching.

I started to do it right away, so I started migrating the Docker default image directory.

There are two schemes for migrating the Docker default mirroring directory. One scheme is the soft link method; the other is: modify the configuration method. Next, we will look at these two methods separately.

1. Soft link method

(1) By default, the storage location of Docker is: /var/lib/docker . We can use the following command to view the Docker default image installation directory.

[root@localhost ~]# docker info | grep "Docker Root Dir"
Docker Root Dir: /var/lib/docker

(2) Next, we execute the following command to stop the Docker server.

systemctl stop docker

or

service docker stop

(3) Then move the /var/lib/docker as a whole to the directory /home

mv /var/lib/docker /home

This process may take a long time.

(4) Next, create a soft link, as shown below.

ln -s /home/docker /var/lib/docker

(5) Finally, we start the Docker server.

systemctl start docker

or

service docker start

(6) Check the directory of the Docker image again, as shown below.

[root@localhost ~]# docker info | grep "Docker Root Dir"
Docker Root Dir: /home/docker

At this point, the migration of the Docker image directory is successful.

Next, let's talk about modifying the configuration method.

2. Modify the configuration method

The parameter that specifies the storage path of the image and container is –graph=/var/lib/docker , we only need to modify the configuration file to specify the startup parameters.

Here, the server operating system I use is CentOS. Therefore, you can modify the Docker configuration in the following ways.

(1) Stop the Docker service

systemctl stop docker

or

service docker stop

(2) Modify the docker service startup file.

vim /etc/systemd/system/multi-user.target.wants/docker.service

Add the following line of code to the startup file.

ExecStart=/usr/bin/dockerd --graph=/home/docker

(3) Reload the configuration and start

systemctl daemon-reload
systemctl start docker

(4) Check the directory of the Docker image again, as shown below.

[root@localhost ~]# docker info | grep "Docker Root Dir"
Docker Root Dir: /home/docker

At this point, the migration of the Docker image directory is successful.

Kafka cluster can be used temporarily, let the data run first. So I reassigned the servers, set up the Kafka cluster, and migrated the test environment to the new Kafka cluster at noon. It is still being tested. . .

Did the friends learn?

PS: The server operating system version I use is as follows.

[root@localhost ~]# cat /etc/redhat-release
CentOS Linux release 8.1.1911 (Core) 

The Docker version used is as follows.

[root@localhost ~]# docker info
Client:
 Debug Mode: false
Server:
 Containers: 4
  Running: 3
  Paused: 0
  Stopped: 1
 Images: 33
 Server Version: 19.03.8
############其他输出信息略############

Finally, I would like to talk briefly to my friends, why do I need to set a larger hard disk for the Kafka cluster server from the operation and maintenance partner?

Because the traffic in our production environment is relatively large, it is usually between 50,000 to 80,000 QPS. If there is a peak period, it will be much larger than these traffic. At that time, I divided a part of the traffic from the production environment to the test environment. If the disk of the Kafka cluster is not set to be larger, when the performance of Kafka consumers decreases or due to other reasons, messages accumulate in Kafka, it will cause Kafka to occupy a large amount of disk space. If the disk space is full, then the server where Kafka is located will crash and go down.

here today. I’m Glacier. If you have any questions, you can leave a message below,

阅读 395
57 声望
24 粉丝
0 条评论
你知道吗?

57 声望
24 粉丝
文章目录
宣传栏