Dockerfile

 FROM python:3.10-buster

# 如果要阿里源,就用下面这个
# RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list) 
# 如果要清华源,就用下面这个
RUN (echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free" > /etc/apt/sources.list) 
RUN (apt update) && (apt upgrade -y)
RUN (apt install -y  lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*) 

WORKDIR /code
RUN mkdir /code/depends
# 下载并安装 chrome, TIPS: dpkg 不会处理依赖,要使用 apt 安装 deb
RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)


COPY install.py /code/
RUN python install.py

RUN /usr/local/bin/python -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
COPY requirements-prd.txt /code/
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements-prd.txt
COPY config.yaml /code/
COPY . /code/

Let's go line by line

  • RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list) The function of this line is to use the debian apt repository of aliyun, the reason is of course the Great Wall of Evil
  • RUN (apt update) && (apt upgrade -y) Update the apt source and update the software. You can just delete ---13aaa982d706548319268f77a00b9a59 apt-get update and delete apt-get upgrade , the latter is not required
  • RUN (apt install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*) What are these packages used for? Install Chinese fonts, the role will be described below
  • RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb) , remember to use apt to install chrome, not dpkg

Solve the problem that Chinese are displayed as squares:

On the Internet, someone will teach you how to install and manually download the ttf file by yourself, then copy and paste it, and then how to do it, a bunch of operations. I am speechless, do they really know nothing about Linux?

There are not so many troublesome things, don't you install a Linux Desktop with its own Chinese? Do you want to download font files from the Internet yourself?

It's very simple, there are prepared fonts in the apt repository, just use the apt command to install it with one click!

 apt-get install -y  lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*

How to install chrome in Docker?

On the Internet, I like to install chrome with dkpg, but this is very stupid! They may not know Linux or apt

The right way: use apt to install chrome, because apt will automatically handle the dependencies for you!

Solve the problem of zombie process in Docker + selenium + chromedriver + chrome:

 1   18042   18041   18041 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18046       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18047       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18060       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18062       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18095       1       1 ?             -1 Z        0   0:02 [chrome] <defunct>
      1   18116       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18117       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18119   18118   18118 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18123       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18124       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18140       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18141       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18171       1       1 ?             -1 Z        0   0:02 [chrome] <defunct>
      1   18193       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18194       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18196   18195   18195 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18200       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18201       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18216       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18218       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18248       1       1 ?             -1 Z        0   0:02 [chrome] <defunct>
      1   18271       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18272       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18274   18273   18273 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18278       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18279       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18293       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18295       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18328       1       1 ?             -1 Z        0   0:02 [chrome] <defunct>
      1   18350       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18351       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18353   18352   18352 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18357       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18358       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18373       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18375       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18406       1       1 ?             -1 Z        0   0:01 [chrome] <defunct>
      1   18428       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18429       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18431   18430   18430 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18435       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18436       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18450       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18451       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18483       1       1 ?             -1 Z        0   0:03 [chrome] <defunct>
      1   18507       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18508       1       1 ?             -1 Z        0   0:00 [cat] <defunct>
      1   18510   18509   18509 ?             -1 Z        0   0:00 [chrome_crashpad] <defunct>
      1   18514       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18515       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18530       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18532       1       1 ?             -1 Z        0   0:00 [chrome] <defunct>
      1   18562       1       1 ?             -1 Z        0   0:02 [chrome] <defunct>
What is defunct? It's a zombie process!

Too many zombie processes will exhaust the pid table, resulting in Chrome failed to start: exited abnormally.

 snapshot-consumer    | selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
snapshot-consumer    |   (unknown error: DevToolsActivePort file doesn't exist)
snapshot-consumer    |   (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
snapshot-consumer    | Stacktrace:

Solution reference:

If you use docker and docker-compose directly, use the first one

If it is k8s, use the second one!

Solve because the shm swap space is too small, resulting in session deleted because of page crash

The combination of selenium + chrome + chromedriver requires a lot of shm space. By default, Docker only allocates shm size of 16 MB

A single selenium + chrome + chromedriver instance requires around 20 MB of shm space.

If you leave it alone, you will get the following error:

 snapshot-consumer    |   File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
snapshot-consumer    |     raise exception_class(message, screen, stacktrace)
snapshot-consumer    |           │               │        │       └ ['#0 0x556b82b0db13 <unknown>', '#1 0x556b8291451f <unknown>', '#2 0x556b8290193d <unknown>', '#3 0x556b82901355 <unknown>', ...
snapshot-consumer    |           │               │        └ None
snapshot-consumer    |           │               └ 'unknown error: session deleted because of page crash\nfrom tab crashed\n  (Session info: headless chrome=103.0.5060.114)'
snapshot-consumer    |           └ <class 'selenium.common.exceptions.WebDriverException'>
snapshot-consumer    | 
snapshot-consumer    | selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
snapshot-consumer    | from tab crashed
snapshot-consumer    |   (Session info: headless chrome=103.0.5060.114)
snapshot-consumer    | Stacktrace:
snapshot-consumer    | #0 0x556b82b0db13 <unknown>
snapshot-consumer    | #1 0x556b8291451f <unknown>
snapshot-consumer    | #2 0x556b8290193d <unknown>

How to solve it?

 version: "3"
services:
  snapshot:
    container_name: snapshot
    image: ponponon/snapshot
    restart: always
    logging:
      driver: json-file
      options:
        max-size: "30m"
        max-file: "1"
    shm_size: "2048M"
    command: python main.py
What is the appropriate size for shm_size? Through naked eye observation, the usage is generally around 50MB, so it is more than enough to set it as 512M

Solution: https://developer.aliyun.com/article/833847

docker-compose how to set shm-size: reference https://stackoverflow.com/questions/30210362/how-to-increase-the-size-of-the-dev-shm-in-docker-container

How to get a jpg screenshot

Reference: Does JPG or PNG have anything to do with memory structure? Or does it only make a difference when saving to hard disk?


I made an open source tutorial and put it on github: ponponon/snapshot

图片.png


universe_king
3.4k 声望680 粉丝