头图

Based on the previous 5 articles, we should access point-to-earth business.

This article will start with the front-end multi-language function.

I don’t know if the friends have been in contact, this time jsliang will desensitize the processing of multi-language in the real project (yes, it is only processing instead of explaining how to configure multi-language) and share it.

The data used in this sharing is only a fictitious reference to real projects. After all, this is just a [tool library] rather than building a multi-language project, but this set of tools can still be used in other places after modification, which has reference value

In this article, we will explain how to use Puppeteer to control Chrome/Chromium to achieve the purpose of downloading files.

I. Introduction

Puppeteer is a Node library that provides a high-level API to control Chromium or Chrome through the DevTools protocol.

Just like it was introduced in the GitHub introduction: most of the operations you perform manually in the browser can be done with Puppeteer!

  • Take a snapshot of the page
  • Generate page PDF
  • Automatically manipulate page DOM
  • ……

For detailed examples, friends can refer to the GitHub or Chinese documents of the references at the bottom of this article. I will not give examples one by one here (to avoid being copied by Tucao README.md)

Two Puppeteer

  • Installation: npm i puppeteer

! jsliang error when installing:

  • (node:7584) ExperimentalWarning: The fs.promises API is experimental

My Node.js version is node@10.16.0 , so I need to upgrade Node.js.

After checking the information, there are two ways to upgrade, one is to download the latest version to cover the installation, and the other is to manage it nvm/nvmw

jsliang network is not bad, just download the latest document version: Node official website

Check the latest version after installation:

  • node -vv14.17.1

Install Puppeteer again at this time, it shows that the installation is successful, package.json shows: "puppeteer": "^10.0.0"

There may be various errors during the installation of Puppeteer. It’s time to test the speed of your friends.

After the installation is complete, start messing up~

2.1 Grab a snapshot

Let's take a snapshot of the page as a simple example:

src/index.ts
import program from 'commander';
import common from './common';
import './base/console';
import puppeteer from 'puppeteer';

program
  .version('0.0.1')
  .description('工具库')

program
  .command('jsliang')
  .description('jsliang 帮助指令')
  .action(() => {
    common();
  });

program
  .command('test')
  .description('测试频道')
  .action(async () => {
    // 启动浏览器
    const browser = await puppeteer.launch({
      headless: false, // 打开实体浏览器
    });

    // 创建新标签页并打开
    const page = await browser.newPage();
    await page.goto('https://www.baidu.com/s?wd=jsliang');

    // 获取快照并存储到本地
    await page.screenshot({
      path: './src/baidu.png',
    });

    // 关闭窗口
    await browser.close();
  });

program.parse(process.argv);

After executing npm run test , the baidu.png will appear in the src folder, which is displayed as follows:

puppeteer-01.png

Actually measured scientific Internet tools or 360 security guards will affect this operation. In order to prevent your blood pressure from soaring, please make sure that these softwares are closed

In this way, we have a preliminary understanding of Puppeteer. Of course, it can also export PDF, etc., and read the content in the following [References] to learn more about Puppeteer.

2.2 Download files

Now that we can get screenshots, it is not surprising that we can manipulate the DOM. Let's get offline files!

Take the Jinshan document as an example, let's create an Excel file first:

puppeteer-02.png

https://www.kdocs.cn/ yourself, so I won’t explain it. Kingsoft document address: 060d3d2148ba08

Then, our next link is to download this Excel (assuming that someone has already been hired to do the translation work), which is an Excel like this:

puppeteer-03.png

The picture comes from the Internet, this knowledge sharing is for reference, infringement must be deleted

Then let's make a simple one:

puppeteer-04.png

Multi-language is not important, our purpose is to operate Puppeteer to get this Excel file

OK, the file is there, how can we download it? The current situation is:

  • Imagine if we open it through Puppeteer, it is a headless browser, which is almost like Wuhen. If you log in normally, you need to log in again, enter the link, and then click the button to download.

Therefore, the login-free link of the Jinshan document is used here:

puppeteer-05.png

Everyone knows that no login means no login. Although this explanation is very mentally retarded, I feel it is necessary...

The demo address above is provided here, friends can use it for practice, but I am not sure whether this link will be deleted by me someday, so follow the above steps to set up one by yourself!

  • [Kingsoft Document Excel Trial https://www.kdocs.cn/l/sdwvJUKBzkK2

OK, Rory talked about so many preconditions, let's get to the topic-how to get offline files:

  1. Operate the browser to open https://www.kdocs.cn/l/sdwvJUKBzkK2
  2. Sleep 6.66s (make sure the browser opens the link and loads the page)
  3. Then trigger the click of the [More Menu] button
  4. Sleep for 2s (make sure that more menu buttons are clicked)
  5. Set the download path (make sure the download location, otherwise the pop-up window will be difficult to handle)
  6. Finally, the click of the [download] button is triggered
  7. Sleep for 10s (make sure resources are downloaded)
  8. close the window

The only point to pay attention to above is point 5, because when we click to download on Windows, there will be a pop-up window (not the default download), so we need to set the download path in advance (this will be reflected in the code).

puppeteer-06.png

So, code!

src/common/index.ts
import { inquirer } from '../base/inquirer';
import { Result } from '../base/interface';
import { sortCatalog } from './sortCatalog';
import { downLoadExcel } from './downLoadExcel';

const common = (): void => {
  // 问题路线:看 questionList.ts
  const questionList = [
    // q0
    {
      type: 'list',
      message: '请问需要什么服务?',
      choices: ['公共服务', '文件管理']
    },
    // q1
    {
      type: 'list',
      message: '当前公共服务有:',
      choices: ['文件排序']
    },
    // q2
    {
      type: 'input',
      message: '需要排序的文件夹为?(绝对路径)',
    },
    // q3
    {
      type: 'list',
      message: '请问需要什么支持?',
      choices: ['多语言', 'Markdown 转 Word'],
    },
    // q4
    {
      type: 'list',
      message: '请问需要什么支持?',
      choices: [
        '下载多语言资源',
        '导入多语言资源',
        '导出多语言资源',
      ],
    },
    // q5
    {
      type: 'input',
      message: '资源下载地址(HTTP)?',
      default: 'https://www.kdocs.cn/l/sdwvJUKBzkK2',
    }
  ];

  const answerList = [
    // q0
    async (result: Result, questions: any) => {
      if (result.answer === '公共服务') {
        questions[1]();
      } else if (result.answer === '文件管理') {
        questions[3]();
      }
    },
    // q1
    async (result: Result, questions: any) => {
      if (result.answer === '文件排序') {
        questions[2]();
      }
    },
    // q2
    async (result: Result, _questions: any, prompts: any) => {
      const sortResult = await sortCatalog(result.answer);
      if (sortResult) {
        console.log('排序成功!');
        prompts.complete();
      }
    },
    // q3
    async (result: Result, questions: any) => {
      if (result.answer === '多语言') {
        questions[4]();
      }
    },
    // q4
    async (result: Result, questions: any) => {
      if (result.answer === '下载多语言资源') {
        questions[5]();
      }
    },
    // q5
    async (result: Result, _questions: any, prompts: any) => {
      if (result.answer) {
        const downloadResult = await downLoadExcel(result.answer);
        if (downloadResult) {
          console.log('下载成功!');
          prompts.complete();
        }
      }
    },
  ];

  inquirer(questionList, answerList);
};

export default common;

I regretted seeing the above code. Why was Inquirer.ts so disgusting that I transformed jsliang also need to write a special file to indicate the problem sequence and then straighten the problem sequence:

src/common/questionList.ts
// common 板块的问题咨询路线
export const questionList = {
  '公共服务': { // q0
    '文件排序': { // q1
      '需要排序的文件夹': 'Work 工作', // q2
    },
  },
  '文件管理': { // q0
    '多语言': { // q3
      '下载多语言资源': { // q4
        '下载地址': 'Work 工作', // q5
      },
      '导入多语言资源': { // q4
        '导入地址': 'Work 工作',
      },
      '导出多语言资源': { // q4
        '导出全量资源': 'Work 工作',
        '导出单门资源': 'Work 工作',
      }
    },
    'Markdown 转 Word': '暂未支持', // q3
  },
};

After writing, transfer to the writing function:

src/common/downLoadExcel.ts
import puppeteer from 'puppeteer';
import path from 'path';
import fs from 'fs';

export const downLoadExcel = async (link: string): Promise<boolean> => {
  // 启动浏览器
  const browser = await puppeteer.launch({
    headless: false, // 打开实体浏览器
    devtools: true, // 打开开发模式
  });

  // 1. 创建新标签页并打开
  const page = await browser.newPage();
  await page.goto(link);

  // 2. 睡眠 6.66s - 确保页面正常打开
  await page.waitForTimeout(6666);

  // 3. 触发【更多菜单】按钮的点击
  const moreBtn = await page.$('.header-more-btn');
  moreBtn?.click();

  // 4. 睡眠 1s - 确保按钮点击到
  await page.waitForTimeout(2000);

  // 5. 设置下载路径
  const dist = path.join(__dirname, './dist');
  if (!fs.existsSync(dist)) {
    fs.mkdirSync(dist);
  }
  await (page as any)._client?.send('Page.setDownloadBehavior', {
    behavior: 'allow',
    downloadPath: dist,
  });

  // 6. 触发【下载】按钮的点击
  const elements = await page.$$('.header-menu-item');
  let downloadBtn;
  if (elements.length) {
    downloadBtn = elements[8];
  }
  if (!downloadBtn) {
    console.error('没找到下载按钮');
    await browser.close();
  }
  await downloadBtn?.click();

  // 7. 睡眠 10s - 确保资源下载到
  await page.waitForTimeout(10000);

  // 8. 关闭窗口
  await browser.close();

  return await true;
};

After running like this, if the console does not report an error, VS Code will display as follows:

puppeteer-07.png

You can see that there is indeed a dist/Excel trial common directory, and we can access the node-xlsx library to operate Excel~

See you next time!

Three references


jsliang's document library is Liang Junrong, Creative Commons Attribution-Non-Commercial Use-Same Way Sharing 4.0 International License Agreement . <br/>Based on https://github.com/LiangJunrong/document-library . <br/>Use rights not authorized by this license agreement can be obtained from https://creativecommons.org/licenses/by-nc-sa/2.5/cn/ .

jsliang
393 声望31 粉丝

一个充满探索欲,喜欢折腾,乐于扩展自己知识面的终身学习斜杠程序员