Gitlab suddenly fails to deploy
One day about three months ago, the apprentice said to me with a frown: Master, it's not good, I recently used our gitlab
when it was packaged and released, it always failed.
I said: Wasn't it good before?
He said: Yes, it was really easy to use in the past, but I don't know what happened. It just doesn't work for a while now. You say it can't be used at all. No, it can be used occasionally, but you have to try again several times. times, sometimes even a dozen times.
I thought about it carefully, but I didn't change anything, just for the sake of code security, I switched the original giblab
IP address from the public network to the internal network, but DNS
I also It has also been changed, and no one has reported any problems.
I said: You are like this, you first change the compilation script to ping gitlab.mydomain.com
see if it can ping
pass.
ping
, it is unstable, sometimes it works and sometimes it doesn't work.
This is very strange, can ping
to the IP address, indicating DNS
no problem, but the IP address is not available. (At this point, I'm still obsessed, I didn't think it was a routing problem, I'll talk about it later)
Three months misled by MTU
The apprentice began to try to locate the problem by means of subtraction, and soon made a major discovery.
As long as we don't add the docker:dind
service to the task, there will be no problem with the network, which means that the problem lies in this dind
.
The full name of dind
b7000a464c07ad33da3e61ceddcf64c0--- is docker in docker
, because we compiled gitlab runner
0541e0ed0b8bfce9ee1d837f362e1fafb--- in the container of docker
To execute the docker
command in the container for packaging, you must rely on this dind
service, dind
docker
service itself is a small container, and it starts another- docker
daemon, so that the outside container can run the docker
command.
Habitually open Google and start solving problems with Google.
很多网页都把问题的焦点指向了一个名叫---9869962a7eb81b446436d7d9fe7ed03b MTU
的神秘设置,说这个dind
e81853be603fc05eb42f21a4d0564fc3---容器MTU
是1500
, in some cases will cause unstable network transmission, similar to the symptoms we encountered.
MTU ( Maximum Transmission Unit
): MTU refers to the maximum data packet size that can pass above the data link layer. The maximum transmission unit parameter is usually related to the communication interface. The Internet Protocol allows IP fragmentation, so that a datagram packet can be broken into pieces small enough to pass over links whose MTU is smaller than the original size of the datagram. This fragmentation process occurs at the IP layer, and it uses the value of the maximum transmission unit to send the packet to the network interface on the link.
The question becomes: how to set this MTU
? We need to understand gitlab
how to set up the sub-container in the service in the pipeline task. After checking a lot of information, someone service
that as long as we add one to command
The parameter mtu=1400
is fine, but after the experiment, it is found that it still does not work. We changed the command line to ifconfig
, and directly checked the network card parameters, and found that it was still 1500
, so check docker:dind source code , found dind
will read an environment variable DOCKERD_ROOTLESS_ROOTLESSKIT_MTU
.
So the question becomes: how to set this environment variable so that dind
read? Tried various methods, dind
still can't read.
Calm down and re-read the documentation on Gitlab's official website , which introduces a parameter variables
:
Additional environment variables that are passed exclusively to the service.
Additional environment variables provided for use by the service
It seems that this thing is what we want, but it has an additional condition: this setting can only be used for Gitlab 14.5
version and above. And our Gitlab
version is still 13.12.3
.
Upgrading Gitlab Psycho
I found a quiet weekend, started a decisive upgrade Gitlab
, and nearly fell into a doom.
I think, upgrading, this is not a very simple thing, and for people like us who strictly abide by the rules, upgrading is just adding a version number.
docker-compose.yml
already has this sentence, this is still set up during the initial installation gitlab
about a year ago: image: 'gitlab/gitlab-ee:latest'
, then this is already the highest version , so directly docker-compose down
and then docker-compose up -d
should be fine, right?
No, it's still 13.12.3
.
Check the information, only to know that you should first docker-compose pull
to get the latest version.
Well, follow the steps.
Broken, gitlab
Why does the container keep restarting? Hurry up and execute docker logs
Check the container and find that there is a big line inside: To upgrade the major version, you must first upgrade to version 14.0!
Depressed, let's take a look at the gitlab upgrade manual first, which mentions the upgrade path:
8.11.Z -> 8.12.0 -> 8.17.7 -> 9.5.10 -> 10.8.7 -> 11.11.8 -> 12.0.12 -> 12.1.17 -> 12.10.14 -> 13.0.14 -> 13.1.11 -> 13.8.8 -> 13.12.15 -> 14.0.12 -> 14.9.0 -> latest 14.YZ
So, I directly changed docker-compose.yml
to image: 'gitlab/gitlab-ee:14.0.12'
, but it reported that the package could not be found, and then checked gitlab
label, and found that there is still a -ee.0
after the version number. -ee.0
, you already have ee before, why do you have to write it again in the label? (The so-called ee is the abbreviation of the enterprise version Enterprise Edition
, gitlab actually does not distinguish between the enterprise version and the community version, and all require everyone to use the enterprise version, but if you do not pay, the part of the functions that belong to the enterprise version will be cannot be used)
Change it to image: 'gitlab/gitlab-ee:14.0.12-ee.0'
try again, this time it finally succeeded!
Well, keep going, and rise to 14.9.0
, but gitlab
the container can't get up again!
Open it again docker logs
to check, this time it is full of characters flying, it seems to be doing database upgrades, but why does it keep restarting?
Google search again, it is Gitlab
from 14.0
version introduced database migration
mechanism, each upgrade must wait until the migration of the previous version is completed. Upgrade to the next version, and this migration process can take hours or even days!
It's broken. I must have upgraded too fast. Before the last version was finished, I started to upgrade to the next version 😭.
What should we do now? My brain is running fast. Report a disaster to your boss? Admit mistakes to colleagues? Say I lost all your code?
calm down. I thought about it for five minutes, let's see if I can downgrade it back to 14.0
.
So I changed ---f6ab64846c705afb503c773d7c749d10 docker-compose.yml
to 14.0
again. Reboot and pray that the data is not lost.
14.0
finally started, but when I visit the page:
Big trouble now!
Resist the grief, open docker logs
look, there is no clue, the report says everything is normal.
But the page is 500
ah!
There is information on the Internet that if you encounter 500
don't panic, go into the container and look at it. Well, docker exec -it
into the container, run gitlab-ctl tail
to see the output, when the page is refreshed here, the log reports that a database table called services
is missing!
Isn't that still ruined? My database can't be upgraded to half, and the code is old, what should I do? You can't get up and down, and you can't get down again. This is a big trouble.
No way, Google again, and finally found the brothers who share the same fate . Hear his blood and tears indictment:
I upgraded from 13.12 to 14.0.7, I thought the migration was over and everything was fine, so I stopped the container and upgraded to 14.2, but it couldn't start, so I went back to 14.0.7, this time A 500 error was generated, and the log details are as follows:
ActionView::Template::Error (PG::UndefinedTable: ERROR: relation "services" does not exist
and I don't have a backup.
Exactly the exact same situation I encountered. Someone below said, Gitlab
the database will be automatically backed up in the backup
folder before each upgrade. This guy said no, I also went to see it, and there was no.
Fortunately, in the bug report he submitted, our Chinese brothers solved this problem, and this is this one:
The method is so simple: slowly rise!
Since there will be problems with fast upgrade, upgrade slowly, first upgrade to 14.1.1
, and then upgrade to 14.2.1
after the migration is over.
With the last glimmer of hope, I upgraded docker-compose.yml
to 14.1.1
again and started it.
After five minutes of waiting, the container finally started.
Open a browser, visit the web page, and pray for no more 500
.
Ah, let out a long sigh of relief, and finally saw the familiar page.
But dare not move. According to the established steps, go to the administrator monitor to check the migration progress. Sure enough, there are 14
tasks being migrated. After this 14
task migration is complete, I start to think about the next steps.
Finally found the source of the problem
But I still want to upgrade to Gitlab 14.5
, I think that since I have already stood on 14.1.1
, and the migration is completed, it should not be too difficult to follow up, to be on the safe side, or upgrade 14.2.1
, this time also succeeded.
I silently waited for the migration to complete before going up to 14.9.0
. In fact, it can be upgraded to 14.10.0
, but don't use it for now, 14.9.0
is enough.
So we go back to Gitlab
, set the environment variable of dind
, compile again, no, network MTU
or 1500
.
再次搜索关于dind
MTU
的问题,设置方法其实还是和以前一样,就是command
就够了,然后查看docker network inspect bridge
, MTU
ifconfig
查出来的MTU
, docker network
里虽然已经是1400
, but ifconfig
still shows 1500
, what does this mean?
I vaguely feel that this docker
bridged network MTU
may not need to be consistent with the outside, but anyway, the inside of the container is now 1400
, It can be even lower, but I can't communicate with the outside anyway, and I also tried ping www.baidu.com
and it works ping
It works, only to our intranet server ping
pass.
I'm too tired, so I'm going to take a nap first.
After waking up, lying in bed, I started thinking about this: if I don't add the dind
service, it works ping
, and if I add it, it ping
--No way, that means this ping
dind
service has modified my network configuration.
I see that if I don't add dind
, my container has two network cards, one is eth0
, and the other is localhost
, in this case it is normal Yes, but when I add the dind
service, there will be three network cards in the container, and one more network card docker0
, will it be mine ping
-Request only works on ping
eth0
, not on docker0
, and when adding dind
service, all ping
The requests all automatically went to docker0
went up? Can I force ping
to request eth0
?
Try again: ping -I eth0 -c 10 gitlab.mydomain.com
, this time it worked!
That shows that the problem lies in this network card docker0
.
Because of it, all network requests go to this network card, resulting in network failure.
So why does the network request go to this network card? Looking closely at its IP
settings, I suddenly realized the problem. docker0
is a bridged network for docker
.
By default, Docker uses 172.17.0.0/16 subnet range.
By default, Docker uses 172.17.0.0/16 as the subnet range.
And our intranet address 172.17.111.27
just happens to be in this network segment!
Then this makes sense. Originally, we had no problem using the public network IP
. Our container access www.baidu.com
also had no problem, and only had access to our intranet server. Sometimes there is a problem, because our intranet server address just coincides with the default intranet address of docker
, so all network requests are forwarded to the bridge network of docker
, resulting in Unable to communicate with the intranet server!
final battle
It is impossible and unnecessary for us to modify the address of the intranet server. Now we need to study how to modify the default subnet address of docker
.
All the posts on the Internet say to modify /etc/docker/daemon.json
this file, but there is no such file in our container at all, because we are starting a service in the container dind
, we must make dind
Get the modified settings, and gitlab
service
the settings in the container are very limited and cannot be easily modified service
content.
After another intense search, I finally found the answer from another old man:
variables:
DAEMON_CONFIG: '{"bip": "192.168.123.1/24"}'
services:
- name: docker:dind
entrypoint: ["/bin/sh", "-c", "mkdir -p /etc/docker && echo \"${DAEMON_CONFIG}\" > /etc/docker/daemon.json && exec dockerd-entrypoint.sh"]
The principle is also very simple: forcibly modify the entry address of this service dind
a72b422f0e17614a106db6d8e8b969f9---, and write the content we want to modify before starting execution daemon.json
, this time docker
network segment of the bridge mesh docker0
does not overlap with our intranet segment, so it should take effect.
After modifying according to this method, execute the compilation process again. Now we can ping gitlab.mydomain.com
directly ping
pass, no need to specify the network card, indicating that the entire network is normal.
At this point, the problem that has been bothering us for three months has finally been completely solved: 在kubernetes网络中安装gitlab runner并运行一个docker打包的任务
.
Looking back at the whole process of solving the problem, we still ignored the biggest variable of network environment change at the beginning, and took a lot of detours. During this process, we learned what is MTU
Gitlab
and understood-- Gitlab
How to upgrade, understand the bridge network settings of Docker
. Although we suffered a little loss, the gain is huge, and we can use Gitlab
for continuous deployment without any hindrance from now on. 😄
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。