Dockerfile
FROM python:3.10-buster
# 如果要阿里源,就用下面这个
# RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list)
# 如果要清华源,就用下面这个
RUN (echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free" > /etc/apt/sources.list)
RUN (apt update) && (apt upgrade -y)
RUN (apt install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*)
WORKDIR /code
RUN mkdir /code/depends
# 下载并安装 chrome, TIPS: dpkg 不会处理依赖,要使用 apt 安装 deb
RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)
COPY install.py /code/
RUN python install.py
RUN /usr/local/bin/python -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
COPY requirements-prd.txt /code/
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements-prd.txt
COPY config.yaml /code/
COPY . /code/
Let's go line by line
-
RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list)
The function of this line is to use the debian apt repository of aliyun, the reason is of course the Great Wall of Evil -
RUN (apt update) && (apt upgrade -y)
Update the apt source and update the software. You can just delete ---13aaa982d706548319268f77a00b9a59apt-get update
and deleteapt-get upgrade
, the latter is not required -
RUN (apt install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*)
What are these packages used for? Install Chinese fonts, the role will be described below -
RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)
, remember to use apt to install chrome, not dpkg
Solve the problem that Chinese are displayed as squares:
On the Internet, someone will teach you how to install and manually download the ttf file by yourself, then copy and paste it, and then how to do it, a bunch of operations. I am speechless, do they really know nothing about Linux?
There are not so many troublesome things, don't you install a Linux Desktop with its own Chinese? Do you want to download font files from the Internet yourself?
It's very simple, there are prepared fonts in the apt repository, just use the apt command to install it with one click!
apt-get install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*
How to install chrome in Docker?
On the Internet, I like to install chrome with dkpg, but this is very stupid! They may not know Linux or apt
The right way: use apt to install chrome, because apt will automatically handle the dependencies for you!
Solve the problem of zombie process in Docker + selenium + chromedriver + chrome:
1 18042 18041 18041 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18046 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18047 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18060 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18062 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18095 1 1 ? -1 Z 0 0:02 [chrome] <defunct>
1 18116 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18117 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18119 18118 18118 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18123 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18124 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18140 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18141 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18171 1 1 ? -1 Z 0 0:02 [chrome] <defunct>
1 18193 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18194 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18196 18195 18195 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18200 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18201 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18216 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18218 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18248 1 1 ? -1 Z 0 0:02 [chrome] <defunct>
1 18271 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18272 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18274 18273 18273 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18278 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18279 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18293 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18295 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18328 1 1 ? -1 Z 0 0:02 [chrome] <defunct>
1 18350 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18351 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18353 18352 18352 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18357 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18358 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18373 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18375 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18406 1 1 ? -1 Z 0 0:01 [chrome] <defunct>
1 18428 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18429 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18431 18430 18430 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18435 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18436 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18450 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18451 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18483 1 1 ? -1 Z 0 0:03 [chrome] <defunct>
1 18507 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18508 1 1 ? -1 Z 0 0:00 [cat] <defunct>
1 18510 18509 18509 ? -1 Z 0 0:00 [chrome_crashpad] <defunct>
1 18514 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18515 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18530 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18532 1 1 ? -1 Z 0 0:00 [chrome] <defunct>
1 18562 1 1 ? -1 Z 0 0:02 [chrome] <defunct>
What is defunct? It's a zombie process!
Too many zombie processes will exhaust the pid table, resulting in Chrome failed to start: exited abnormally.
snapshot-consumer | selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
snapshot-consumer | (unknown error: DevToolsActivePort file doesn't exist)
snapshot-consumer | (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
snapshot-consumer | Stacktrace:
Solution reference:
- An analysis of a large number of zombie processes in a Docker container
- Kubernetes equivalent of
docker run --init
If you use docker and docker-compose directly, use the first one
If it is k8s, use the second one!
Solve because the shm swap space is too small, resulting in session deleted because of page crash
The combination of selenium + chrome + chromedriver requires a lot of shm space. By default, Docker only allocates shm size of 16 MB
A single selenium + chrome + chromedriver instance requires around 20 MB of shm space.
If you leave it alone, you will get the following error:
snapshot-consumer | File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
snapshot-consumer | raise exception_class(message, screen, stacktrace)
snapshot-consumer | │ │ │ └ ['#0 0x556b82b0db13 <unknown>', '#1 0x556b8291451f <unknown>', '#2 0x556b8290193d <unknown>', '#3 0x556b82901355 <unknown>', ...
snapshot-consumer | │ │ └ None
snapshot-consumer | │ └ 'unknown error: session deleted because of page crash\nfrom tab crashed\n (Session info: headless chrome=103.0.5060.114)'
snapshot-consumer | └ <class 'selenium.common.exceptions.WebDriverException'>
snapshot-consumer |
snapshot-consumer | selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
snapshot-consumer | from tab crashed
snapshot-consumer | (Session info: headless chrome=103.0.5060.114)
snapshot-consumer | Stacktrace:
snapshot-consumer | #0 0x556b82b0db13 <unknown>
snapshot-consumer | #1 0x556b8291451f <unknown>
snapshot-consumer | #2 0x556b8290193d <unknown>
How to solve it?
version: "3"
services:
snapshot:
container_name: snapshot
image: ponponon/snapshot
restart: always
logging:
driver: json-file
options:
max-size: "30m"
max-file: "1"
shm_size: "2048M"
command: python main.py
What is the appropriate size for shm_size? Through naked eye observation, the usage is generally around 50MB, so it is more than enough to set it as 512M
Solution: https://developer.aliyun.com/article/833847
docker-compose how to set shm-size: reference https://stackoverflow.com/questions/30210362/how-to-increase-the-size-of-the-dev-shm-in-docker-container
How to get a jpg screenshot
I made an open source tutorial and put it on github: ponponon/snapshot
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。