What is Puppeteer
Puppeteer is a Node library that provides a complete API for manipulating Chrome or Chromium DevTools protocol By default, Puppeteer as 161920c8d3cf00 headless and . You can also use GUI to run Chrome and Chromium.
Students who are familiar with crawlers or UI automation may think of PhantomJS , CasperJS or Selenium , and as the Chrome DevTools team personally produced and maintained the integrity, stability, functional integrity, stability, and safety of the puppeteer Still performance will become the existence of crushing other tools.
The role of Puppeteer
In theory, everything we can do in Chrome can be done with puppeteer. for example:
- Take screenshots of pages and elements
- Save the page as PDF
- Crawl the content of SPA (Single-Page Application) website and generate pre-render content for SSR (Server-Side Rendering) website
- UI automation testing, auto-filling/submitting forms, simulating UI input
- Test the latest Javascript and Chrome features
- Performance test, generate timeline trace to locate website performance problems
- Test Chrome's plug-in
Of course, puppeteer is not omnipotent. For example, it is lacking in cross-browser compatibility. Currently, only Firefox is experimentally supported. Therefore, if you want to do browser compatibility testing on websites, you still have to choose Selenium/WebDriver. Tools, puppeteer is more focused on intercommunication with Chromium to provide richer and more reliable functions.
Install Puppeteer
npm i puppeteer
or
yarn add puppeteer
During the installation of puppeteer, the latest version of Chromiun (~170MB Mac, ~282MB Linux, ~280MB Win) will be downloaded to ensure that the latest version of puppeteer is fully compatible with Chromium. We can also skip the download of Chromium, or download other versions Chromium to a specific path, these can be configured through environment variables, refer to 161920c8d3d1c3 Environment variables .
puppeteer-core
puppeteer-core
is a lightweight version of puppeteer. Chromium will not be downloaded by default. Instead, you need to choose to use local or remote Chrome.
npm i puppeteer-core
or
yarn add puppeteer-core
To use
puppeteer-core
need to ensure that its version is compatible with the connected Chrome version.
puppeteer-core
will ignore all PUPPETEER\_* environment variables
For a detailed comparison of puppeteer and puppeteer-core, please refer to: puppeteer vs puppeteer-core .
Usage example
Example 1 -Visit https://example.com and take a screenshot of the web page
Create screenshot.js
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://example.com");
await page.screenshot({ path: "example.png" });
await browser.close();
})();
Execute screenshot.js
node screenshot.js
Generate image preview:
The initial window size of Puppeteer is 800x600px, which also determines the size of the screenshot of the page to be 800x600px. We can use Page.setViewport() to set the window size, for example, set it to 1080P:
page.setViewport({
width: 1920,
height: 1080,
});
If you want to take a scrolling screenshot of a real web page, you can use:
await page.screenshot({ fullPage: true });
Example 2 -Visit https://github.com/puppeteer/puppeteer and save the web page as a PDF file.
Create savePDF.js
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.setViewport({
width: 1920,
height: 1080,
});
await page.goto("https://github.com/puppeteer/puppeteer", {
waitUntil: "networkidle2",
});
await page.pdf({
path: "puppeteer.pdf",
format: "a2",
});
await browser.close();
})();
Execute savePDF.js
node savePDF.js
Preview of the generated PDF:
For more options of generating PDF, please refer to: Page.pdf() .
example 3 -execute JS code in the context of the browser
Create get-dimensions.js
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://example.com");
// Get the "viewport" of the page, as reported by the page.
const dimensions = await page.evaluate(() => {
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio,
};
});
console.log("Dimensions:", dimensions);
await browser.close();
})();
Execute get-dimensions.js
node get-dimensions.js
Results of the:
For more evaluate
, please refer to Page.evaluate() .
example 4 -automatically fill in the form and submit it (enter the keyword Headless Chrome
and search in the https://developers.google.com
Create search.js
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false, // GUI模式
});
const page = await browser.newPage();
await page.goto("https://developers.google.com/web/");
// 在搜索框中输入关键词
await page.type(".devsite-search-field", "Headless Chrome");
// 按Enter键
await page.keyboard.press("Enter");
// 等待结果返回
const resultsSelector = ".gsc-result .gs-title";
await page.waitForSelector(resultsSelector);
// 从页面中爬取结果
const links = await page.evaluate((resultsSelector) => {
const anchors = Array.from(document.querySelectorAll(resultsSelector));
return anchors.map((anchor) => {
const title = anchor.textContent.split("|")[0].trim();
return `${title} - ${anchor.href}`;
});
}, resultsSelector);
// 打印结果
console.log(links.join("\n"));
await browser.close();
})();
Execute search.js
node search.js
Result display:
Debugging skills
Puppeteer is very powerful at the debugging level, and some commonly used techniques are listed below.
1. Turn off the "headless" mode-seeing the browser display is very helpful for debugging
const browser = await puppeteer.launch({ headless: false });
2. Turn on the "slow motion" mode-to see the operation of the browser further
const browser = await puppeteer.launch({
headless: false,
slowMo: 250, // 将puppeteer的操作减慢250ms
});
3. Monitor the output in the browser console
page.on("console", (msg) => console.log("PAGE LOG:", msg.text()));
await page.evaluate(() => console.log(`url is ${location.href}`));
4. Use debugger in browser execution code
There are currently two execution contexts: the node.js context for running the test code and the browser context for running the tested code. We can use page.evaluate()
to insert a debugger in the browser context for debugging:
First, set
{devtools: true}
:const browser = await puppeteer.launch({ devtools: true });
Then
evaluate()
insert code executiondebugger
, so Chromium in the implementation of this step will stop:await page.evaluate(() => { debugger; });
5. Enable verbose loggin-internal DevTools protocol traffic will be recorded through debug module
basic usage:
DEBUG=puppeteer:* node screenshot.js
cross-env can be used under Windows
npx cross-env DEBUG=puppeteer:* node screenshot.js
protocol traffic may be quite complicated, we can filter out all network domain messages
env DEBUG=puppeteer:\* env DEBUG_COLORS=true node ./examples/screenshot.js 2>&1 | grep -v '"Network'
6. Use the ndb tool for debugging, please refer to ndb
Resource link
Demo link of this article: https://github.com/MudOnTire/puppeteer-tutorial
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。