头图

1. Review & Preface

After the previous explanation, everyone must have a deep impression of Docker, so in this article we will make a long story short:

  • How to stuff a headless browser like Puppeteer into Docker

2. Puppeteer directory structure

Management, our service is still based on the previous Node.js service transformation, so jsliang took a basic Node.js + TypeScript service written by himself.

Its directory structure is as follows:

docker-puppeteer

Docker-demo-26.jpg

It only takes 2 steps to start this demo:

  • Installation package: npm i
  • Start service: npm run robot-test

Wait until the 0th second of every minute, the terminal operation opens Puppeteer, and saves the picture to src/source .

The key code is:

src/index.ts
// ……代码省略
console.log('你好,已进入程序');
let time = 0;
await schedule.scheduleJob('0 * * * * *', async () => {
  const browser = process.env.NODE_ENV === 'production' ?
    // 正式环境需要开启沙盒模式
    await puppeteer.launch({
      args: ['--no-sandbox', '--disable-setuid-sandbox'],
    }) :
    // 非正式环境则随意
    await puppeteer.launch({
      headless: false, // 非无头模式,
      devtools: true, // 调试模式,可以在控制台看到 console
    });
  
  // 创建新标签页并打开
  const page = await browser.newPage();
  await page.goto('https://www.baidu.com/s?wd=jsliang');

  // 等待 5 秒加载
  await page.waitForTimeout(5 * 1000);

  // 获取快照并存储到本地
  await page.screenshot({
    path: `./src/source/baidu_${++time}.png`,
  });

  // 关闭窗口
  await browser.close();
});
// ……代码省略

Interested friends can stop and open the warehouse and watch the demo first. If you are not interested, you can continue to look down.

Let's look at the key of the key:

src/index.ts
const browser = process.env.NODE_ENV === 'production' ?
  // 正式环境需要开启沙盒模式
  await puppeteer.launch({
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  }) :
  // 非正式环境则随意
  await puppeteer.launch({
    headless: false, // 非无头模式,
    devtools: true, // 调试模式,可以在控制台看到 console
  });

Because if we build the Node.js service through Docker, we cannot start Puppeteer normally, so we need:

  1. Set up Dockerfile
  2. Set the pose of launch

Here our package.json code:

package.json
"scripts": {
 "robot": "cross-env NODE_ENV=production ts-node ./src/index.ts robot",
 "robot-test": "cross-env NODE_ENV=test ts-node ./src/index.ts robot"
},

Therefore, after we set it to launch , we only need to run npm run robot in the official environment to start the sandbox mode.

3. Write Dockerfile

Without further ado, let's write the Dockerfile directly:

Dockerfile
# 官方文档 https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#running-puppeteer-in-docker
# 基于 Alpine Linux 的最小 Docker 图像,具有完整的包索引,大小仅为 5 MB!
FROM alpine:edge

# 指定执行 CMD 的目录,即先 cd 到该目录上
WORKDIR /home/docker/we_render

# 安装最新版 Chromium(89) 的包
RUN apk add --no-cache \
      chromium \
      nss \
      freetype \
      harfbuzz \
      ca-certificates \
      ttf-freefont \
      nodejs \
      yarn

# 跳过自动安装 Chrome 包. 使用上面已经安装的 Chrome
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \
    PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

# Puppeteer v6.0.0 配套 Chromium 89
RUN yarn add puppeteer@6.0.0

# 拷贝宿主机的文件到容器中的 we_render 目录
COPY . /home/docker/we_render

# 通过 yarn 设置淘宝源和装包,并清除缓存
RUN yarn config set registry 'https://registry.npm.taobao.org' && \
    yarn global add pm2 && \
    yarn install && \
    yarn cache clean

# 声明容器提供的服务端口
EXPOSE 9527

# 容器主进程的启动命令
CMD ["yarn", "run", "robot"]

Then you only need to create images and create containers step by step.

Fourth, start the service

Note : It is strongly recommended to switch the mirror first, otherwise the download content will be very slow. Before I was working in the company, it was fine, and it took me a long time (3000s) to hang up the Internet by myself.

Modify the mirroring method: 03 - Getting Started & Concept

  • Create image (Image): docker image build ./ -t docker-node:1.0.0

Docker-demo-27.jpg

  • Create a container (Container): docker container create -p 3333:80 docker-node:1.0.0
  • Start the container (Container): docker restart dd420fc4267ad3bdb9eadfdbf37d89e2592dbc9d030a501b96fe10b07ac565ff
  • Check the running status of the container (Container): docker ps -a
  • View the log of the container (Container): docker logs -f dd420fc4267a
  • Enter Container: docker exec -it dd420fc4267a bash
  • Go to directory: cd src/source
  • View directory contents: ls

As you can see, we already have a few screenshots:

Docker-demo-28.jpg

Copy the contents of the container using the method learned earlier and check it out: docker cp f5000c4a530b:/home/docker/we_render/src/source E:\MyWeb\all-for-one

Docker-demo-29.png

Although you don't understand? ? ? What the hell, but it works fine anyway!

Then, we successfully plug Puppeteer into Docker, and the rest only need to set the time zone and Hosts, so I won't go into details here.

In this way, our Docker journey has come to an end. Interested partners are welcome to update, and the subsequent write the Git warehouse code locally, and then push to GitHub, take CI/CD and update to the server... operation, We can only wait jsliang to have time to update it further!

I am jsliang , a lifelong learning slash programmer who is full of exploratory desires, likes tossing, and is willing to expand his knowledge. Let's toss and explore together!


What is the difference between non-torn front end and salted fish!

Friends who think the article is good are welcome to like/star.

If you need to contact jsliang :

Personal contact information is stored on the Github homepage, welcome to toss together~

Strive to build yourself into a lifelong learning slash programmer who is full of exploration, likes tossing, and is willing to expand his knowledge.

jsliang's documentation repository is licensed by Liang the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License . <br/>Created based on works on https://github.com/LiangJunrong/document-library . <br/>Use rights other than authorized by this license agreement may be obtained from https://creativecommons.org/licenses/by-nc-sa/2.5/cn/1622e8c9262d41.

jsliang
393 声望31 粉丝

一个充满探索欲,喜欢折腾,乐于扩展自己知识面的终身学习斜杠程序员