9

puppeteer

What is Puppeteer

Puppeteer is a Node library that provides a complete API for manipulating Chrome or Chromium DevTools protocol By default, Puppeteer as 161920c8d3cf00 headless and . You can also use GUI to run Chrome and Chromium.

Students who are familiar with crawlers or UI automation may think of PhantomJS , CasperJS or Selenium , and as the Chrome DevTools team personally produced and maintained the integrity, stability, functional integrity, stability, and safety of the puppeteer Still performance will become the existence of crushing other tools.

The role of Puppeteer

In theory, everything we can do in Chrome can be done with puppeteer. for example:

  • Take screenshots of pages and elements
  • Save the page as PDF
  • Crawl the content of SPA (Single-Page Application) website and generate pre-render content for SSR (Server-Side Rendering) website
  • UI automation testing, auto-filling/submitting forms, simulating UI input
  • Test the latest Javascript and Chrome features
  • Performance test, generate timeline trace to locate website performance problems
  • Test Chrome's plug-in

Of course, puppeteer is not omnipotent. For example, it is lacking in cross-browser compatibility. Currently, only Firefox is experimentally supported. Therefore, if you want to do browser compatibility testing on websites, you still have to choose Selenium/WebDriver. Tools, puppeteer is more focused on intercommunication with Chromium to provide richer and more reliable functions.

Install Puppeteer

npm i puppeteer

or

yarn add puppeteer
During the installation of puppeteer, the latest version of Chromiun (~170MB Mac, ~282MB Linux, ~280MB Win) will be downloaded to ensure that the latest version of puppeteer is fully compatible with Chromium. We can also skip the download of Chromium, or download other versions Chromium to a specific path, these can be configured through environment variables, refer to 161920c8d3d1c3 Environment variables .

puppeteer-core

puppeteer-core is a lightweight version of puppeteer. Chromium will not be downloaded by default. Instead, you need to choose to use local or remote Chrome.

npm i puppeteer-core

or

yarn add puppeteer-core

To use puppeteer-core need to ensure that its version is compatible with the connected Chrome version.

puppeteer-core will ignore all PUPPETEER\_* environment variables

For a detailed comparison of puppeteer and puppeteer-core, please refer to: puppeteer vs puppeteer-core .

Usage example

Example 1 -Visit https://example.com and take a screenshot of the web page

Create screenshot.js

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://example.com");
  await page.screenshot({ path: "example.png" });

  await browser.close();
})();

Execute screenshot.js

node screenshot.js

Generate image preview:

screenshot

The initial window size of Puppeteer is 800x600px, which also determines the size of the screenshot of the page to be 800x600px. We can use Page.setViewport() to set the window size, for example, set it to 1080P:

page.setViewport({
  width: 1920,
  height: 1080,
});

If you want to take a scrolling screenshot of a real web page, you can use:

await page.screenshot({ fullPage: true });
Example 2 -Visit https://github.com/puppeteer/puppeteer and save the web page as a PDF file.

Create savePDF.js

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  page.setViewport({
    width: 1920,
    height: 1080,
  });
  await page.goto("https://github.com/puppeteer/puppeteer", {
    waitUntil: "networkidle2",
  });
  await page.pdf({
    path: "puppeteer.pdf",
    format: "a2",
  });

  await browser.close();
})();

Execute savePDF.js

node savePDF.js

Preview of the generated PDF:

savePDF

For more options of generating PDF, please refer to: Page.pdf() .

example 3 -execute JS code in the context of the browser

Create get-dimensions.js

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://example.com");

  // Get the "viewport" of the page, as reported by the page.
  const dimensions = await page.evaluate(() => {
    return {
      width: document.documentElement.clientWidth,
      height: document.documentElement.clientHeight,
      deviceScaleFactor: window.devicePixelRatio,
    };
  });

  console.log("Dimensions:", dimensions);

  await browser.close();
})();

Execute get-dimensions.js

node get-dimensions.js

Results of the:

evaluate

For more evaluate , please refer to Page.evaluate() .

example 4 -automatically fill in the form and submit it (enter the keyword Headless Chrome and search in the https://developers.google.com

Create search.js

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch({
    headless: false, // GUI模式
  });
  const page = await browser.newPage();
  await page.goto("https://developers.google.com/web/");
  // 在搜索框中输入关键词
  await page.type(".devsite-search-field", "Headless Chrome");
  // 按Enter键
  await page.keyboard.press("Enter");
  // 等待结果返回
  const resultsSelector = ".gsc-result .gs-title";
  await page.waitForSelector(resultsSelector);
  // 从页面中爬取结果
  const links = await page.evaluate((resultsSelector) => {
    const anchors = Array.from(document.querySelectorAll(resultsSelector));
    return anchors.map((anchor) => {
      const title = anchor.textContent.split("|")[0].trim();
      return `${title} - ${anchor.href}`;
    });
  }, resultsSelector);
  // 打印结果
  console.log(links.join("\n"));

  await browser.close();
})();

Execute search.js

node search.js

Result display:

search

Debugging skills

Puppeteer is very powerful at the debugging level, and some commonly used techniques are listed below.

1. Turn off the "headless" mode-seeing the browser display is very helpful for debugging

const browser = await puppeteer.launch({ headless: false });

2. Turn on the "slow motion" mode-to see the operation of the browser further

const browser = await puppeteer.launch({
  headless: false,
  slowMo: 250, // 将puppeteer的操作减慢250ms
});

3. Monitor the output in the browser console

page.on("console", (msg) => console.log("PAGE LOG:", msg.text()));

await page.evaluate(() => console.log(`url is ${location.href}`));

4. Use debugger in browser execution code

There are currently two execution contexts: the node.js context for running the test code and the browser context for running the tested code. We can use page.evaluate() to insert a debugger in the browser context for debugging:

  • First, set {devtools: true} :

    const browser = await puppeteer.launch({ devtools: true });
  • Then evaluate() insert code execution debugger , so Chromium in the implementation of this step will stop:

    await page.evaluate(() => {
      debugger;
    });

5. Enable verbose loggin-internal DevTools protocol traffic will be recorded through debug module

basic usage:

DEBUG=puppeteer:* node screenshot.js

cross-env can be used under Windows

npx cross-env DEBUG=puppeteer:* node screenshot.js

protocol traffic may be quite complicated, we can filter out all network domain messages

env DEBUG=puppeteer:\* env DEBUG_COLORS=true node ./examples/screenshot.js 2>&1 | grep -v '"Network'

6. Use the ndb tool for debugging, please refer to ndb

Resource link

  1. Puppeteer official website
  2. API document
  3. usage example
  4. Github - Awesome Puppeteer
  5. Troubleshooting

Demo link of this article: https://github.com/MudOnTire/puppeteer-tutorial


CodeSteppe
7.1k 声望4.1k 粉丝