Use puppeteer + nodejs to crawl your favorite anime resources

origin

Recently, I suddenly wanted to try to cut videos, so I wanted to start with animation first. When the two-dimensional element is developed, the original video must be needed to cut the video. How to find these resources?
I often go to Six DMs to watch anime, and the clarity of the anime inside is okay, so I just want to write a crawler to download the anime I like directly. After all, it is a front-end job, and manual download is a bit embarrassing😄😄😄

final effect

效果图

How to solve the downloaded file name is not .mp4

For example, the Chinchilla I downloaded is in yum format, I directly change the suffix name to .mp4 to get it done. If it still doesn't work, I should just go to the previous format factory.

The author's skills and environment in writing this article

Know a little about the front end
nodejs knows nothing
windows system
puppeteer version: 14.3.0
node version: 16.1.0

Start analyzing the website and search for a favorite anime

Introduction page
效果图

Just click on a play address, F12 will start, analyze the page

Yes, this website will still play, and the debugging tool will give me a debugger when I open it.

How to skip the debugger infinite loop

In this way, it is convenient to skip the infinite loop, and we can continue to debug

Find the page playback address

This website is still very simple, just drop the resource address directly into the iframe, the difficulty is reduced, I have to debug it for others, 😄😄

Analyze the origin of the playback address

Idea 1: Through the interface request analysis, whether there is a common point

After capturing the packets, it is found that the playback resource paths of the first and second episodes have no commonality at all, so give up o(╥﹏╥)o

Idea 2: Look directly at the source code logic of the player and find the splicing logic of the url

Find the source code of the player, find all the js files directly through the debugging tool, and then take a look

效果图

Analyze the player source code

Read the player source code and find that this website will store a global variable in the page

global variable player_aaaa

Will do assignment storage, and then introduce a js file, file name: /static/player/parse.js

Open the debugging tool, find the /static/player/parse.js file,

/static/player/parse.js file content

Browser console input: MacPlayer.Parse + MacPlayer.PlayUrl
Because the source code of the player is a self-executing function, and then we see the resource splicing method in the parse.js file, we can directly spell this resource in the browser console

Right click to save?

Put the above address into the browser to visit, and find that it is the resource we want to download. At this step, we can right-click to save it. Of course, as a qualified front-end, how can we right-click to save? When we come down, we are ready to use the big killer, puppeteer cooperates with nodejs to help us achieve automatic download
resource demo

How to do automation?

Through the above connection, you will enter a parsing page, because if we want to do automatic download, we must find the video source connection, otherwise it will not work, o(╥﹏╥)o
Find the element page and find the final resource address
final resource address

Use puppeteer to parse the page, get the video resource address, and then use nodejs to automatically download the video

Idea 1: Traverse out the playlist, then start a task, open the pages in turn, find the resource address, then collect all the playback resource addresses, use nodejs to download to the local,

效果图

Why don't you use the above idea, because I wrote the code and tested it, and found that his server couldn't handle it, so it is still a safe point, one operation at a time

Idea 2: Use puppetee to automatically trigger the right-click download and save it to the place we want to download (haven't tried this method yet)

Idea 3: Traverse out the playlist, then start a task, start from the first one, open the page, find the resource address, use nodejs to download to the local, download is complete, start the next one and that's it

Idea 3 Difficulty Analysis

How does pupeteer get the attributes of an element, don't ask me, I don't understand anyway, the stack overflow boss told me

stack overflow answer

 // 获取单个
await page.evaluate('document.querySelector("span.styleNumber").getAttribute("data-Color")')
// 获取多个
const attr = await page.$$eval("span.styleNumber", el => el.map(x => x.getAttribute("data-Color")));

nodejs download remote video and display progress

 const fs = require('fs');
const https = require('https')
// 我的demo使用的是axios 来下载
const axiosRequest = require('./utils/request');
// 这是一个axios实例
axiosRequest.get('https://media.w3.org/2010/05/sintel/trailer.mp4', {
  responseType: 'stream'
}).then(response => {
// 返回头里面的content-length字段，会告诉我们这个视频有多大
//  获取视频总长度 byte为单位
const totalLength = response.headers['content-length']
// 当前数据的总长度
let totalChunkLength = 0
// 当前读取的流
const readSteam = response.data
// 读取流会触发的事件
readSteam.on('data', (chunk) => {
totalChunkLength += chunk.length
console.log('数据传输中，当前进度==>', ((totalChunkLength / totalLength) * 100).toFixed(2) + '%')
  });
// 读取完成的时间
readSteam.on('end', (chunk) => {
console.log('获取远端数据完毕')
  });
// 读取错误会触发的事件
readSteam.on('error', (err) => {
console.log('获取远端数据完毕，发生了错误,错误信息==>', err)
  });
// 写入本地的文件名
const fileName = 67.mp4
// 调用nodejs写入文件方法
const writeFile = readSteam.pipe(fs.createWriteStream(fileName))
// 写入完成事件
writeFile.on("finish", () => {
writeFile.close();
console.log("恭喜大哥，本地数据写入完成");
    });
// 写入错误触发的事件
writeFile.on("error", (err) => {
console.log("不好意思，写入本地文件发生异常，错误信息==>", err);
  });
});

 //axios 代码如下
const axios = require('axios')

// 创建axios实例
const service = axios.create({
 baseURL: '', // api 的 base_url
 // 永不凋谢，真男人 就是这么持久 😄😄
 timeout: 90000000 // 请求超时时间
})

// request拦截器
service.interceptors.request.use(
 config => {

   return config
 },
 error => {
   // Do something with request error
   console.log(error) // for debug
   Promise.reject(error)
 }
)

// 响应拦截器
service.interceptors.response.use(
 response => {
   
     return response
 },
 error => {
     return Promise.reject(error)
 }
)


module.exports = service

full code

Disclaimer

First of all, thank you very much for this website, which allows me, an old two-dimensional element, to find my favorite film source😄😄😄
This project is only for learning and use, and there is no intention to perform crawling and other operations on this website.
I hope that the students who want to use it can play by themselves, and the tree is big to attract the wind.
Infringement, please contact and delete immediately
That's all

References

Another idea for cracking video
puppeterr get element attributes
puppeterr download file

Use puppeteer + nodejs to crawl your favorite anime resources

origin

final effect

How to solve the downloaded file name is not .mp4

The author's skills and environment in writing this article

Start analyzing the website and search for a favorite anime

Just click on a play address, F12 will start, analyze the page

How to skip the debugger infinite loop

Find the page playback address

Analyze the origin of the playback address

Idea 1: Through the interface request analysis, whether there is a common point

Idea 2: Look directly at the source code logic of the player and find the splicing logic of the url

Find the source code of the player, find all the js files directly through the debugging tool, and then take a look

Analyze the player source code

global variable player_aaaa

Will do assignment storage, and then introduce a js file, file name: /static/player/parse.js

Open the debugging tool, find the /static/player/parse.js file,

/static/player/parse.js file content

Browser console input: MacPlayer.Parse + MacPlayer.PlayUrl

Right click to save?

How to do automation?

Use puppeteer to parse the page, get the video resource address, and then use nodejs to automatically download the video

Idea 1: Traverse out the playlist, then start a task, open the pages in turn, find the resource address, then collect all the playback resource addresses, use nodejs to download to the local,

Why don't you use the above idea, because I wrote the code and tested it, and found that his server couldn't handle it, so it is still a safe point, one operation at a time

Idea 2: Use puppetee to automatically trigger the right-click download and save it to the place we want to download (haven't tried this method yet)

Idea 3: Traverse out the playlist, then start a task, start from the first one, open the page, find the resource address, use nodejs to download to the local, download is complete, start the next one and that's it

Idea 3 Difficulty Analysis

How does pupeteer get the attributes of an element, don't ask me, I don't understand anyway, the stack overflow boss told me

nodejs download remote video and display progress

full code

Disclaimer

References

qinyuanqiblog

引用和评论

vue-cli3 本地无法代理到生产环境的问题分析

2025年最新反编译微信小程序的教程及工具

手写一个动态海洋和天空效果的vue hooks

你可能不知道的图片加载相关知识

使用CSS给标题添加书名号并超出省略

原生electron起步-从零到一完成构建和打包

Koa+Typescript起手式(空环境) 不用每次玩node都要搭环境了！