头图

origin

  • Recently, I suddenly wanted to try to cut videos, so I wanted to start with animation first. When the two-dimensional element is developed, the original video must be needed to cut the video. How to find these resources?
  • I often go to Six DMs to watch anime, and the clarity of the anime inside is okay, so I just want to write a crawler to download the anime I like directly. After all, it is a front-end job, and manual download is a bit embarrassing😄😄😄

final effect

效果图
效果图
效果图

How to solve the downloaded file name is not .mp4

For example, the Chinchilla I downloaded is in yum format, I directly change the suffix name to .mp4 to get it done. If it still doesn't work, I should just go to the previous format factory.
效果图
效果图
效果图

The author's skills and environment in writing this article

  • Know a little about the front end
  • nodejs knows nothing
  • windows system
  • puppeteer version: 14.3.0
  • node version: 16.1.0

Start analyzing the website and search for a favorite anime

Introduction page
效果图

Just click on a play address, F12 will start, analyze the page

Yes, this website will still play, and the debugging tool will give me a debugger when I open it.
效果图

How to skip the debugger infinite loop

In this way, it is convenient to skip the infinite loop, and we can continue to debug
效果图
效果图

Find the page playback address

This website is still very simple, just drop the resource address directly into the iframe, the difficulty is reduced, I have to debug it for others, 😄😄
效果图

Analyze the origin of the playback address

Idea 1: Through the interface request analysis, whether there is a common point

After capturing the packets, it is found that the playback resource paths of the first and second episodes have no commonality at all, so give up o(╥﹏╥)o
效果图

Idea 2: Look directly at the source code logic of the player and find the splicing logic of the url

Find the source code of the player, find all the js files directly through the debugging tool, and then take a look

效果图

Analyze the player source code
Read the player source code and find that this website will store a global variable in the page
效果图
global variable player_aaaa
  • 效果图
  • 效果图
Will do assignment storage, and then introduce a js file, file name: /static/player/parse.js
  • 效果图
Open the debugging tool, find the /static/player/parse.js file,
  • 效果图
/static/player/parse.js file content
  • 效果图

    Browser console input: MacPlayer.Parse + MacPlayer.PlayUrl
    Because the source code of the player is a self-executing function, and then we see the resource splicing method in the parse.js file, we can directly spell this resource in the browser console
  • 效果图
Right click to save?
Put the above address into the browser to visit, and find that it is the resource we want to download. At this step, we can right-click to save it. Of course, as a qualified front-end, how can we right-click to save? When we come down, we are ready to use the big killer, puppeteer cooperates with nodejs to help us achieve automatic download
resource demo
  • 效果图
How to do automation?
  • Through the above connection, you will enter a parsing page, because if we want to do automatic download, we must find the video source connection, otherwise it will not work, o(╥﹏╥)o
  • Find the element page and find the final resource address
  • final resource address
    效果图

Use puppeteer to parse the page, get the video resource address, and then use nodejs to automatically download the video

Idea 1: Traverse out the playlist, then start a task, open the pages in turn, find the resource address, then collect all the playback resource addresses, use nodejs to download to the local,

效果图

Why don't you use the above idea, because I wrote the code and tested it, and found that his server couldn't handle it, so it is still a safe point, one operation at a time

Idea 2: Use puppetee to automatically trigger the right-click download and save it to the place we want to download (haven't tried this method yet)

Idea 3: Traverse out the playlist, then start a task, start from the first one, open the page, find the resource address, use nodejs to download to the local, download is complete, start the next one and that's it

Idea 3 Difficulty Analysis

How does pupeteer get the attributes of an element, don't ask me, I don't understand anyway, the stack overflow boss told me
 // 获取单个
await page.evaluate('document.querySelector("span.styleNumber").getAttribute("data-Color")')
// 获取多个
const attr = await page.$$eval("span.styleNumber", el => el.map(x => x.getAttribute("data-Color")));
nodejs download remote video and display progress
 const fs = require('fs');
const https = require('https')
// 我的demo使用的是axios 来下载
const axiosRequest = require('./utils/request');
// 这是一个axios实例
axiosRequest.get('https://media.w3.org/2010/05/sintel/trailer.mp4', {
  responseType: 'stream'
}).then(response => {
// 返回头里面的content-length字段,会告诉我们这个视频有多大
//  获取视频总长度 byte为单位
const totalLength = response.headers['content-length']
// 当前数据的总长度
let totalChunkLength = 0
// 当前读取的流
const readSteam = response.data
// 读取流会触发的事件
readSteam.on('data', (chunk) => {
totalChunkLength += chunk.length
console.log('数据传输中,当前进度==>', ((totalChunkLength / totalLength) * 100).toFixed(2) + '%')
  });
// 读取完成的时间
readSteam.on('end', (chunk) => {
console.log('获取远端数据完毕')
  });
// 读取错误会触发的事件
readSteam.on('error', (err) => {
console.log('获取远端数据完毕,发生了错误,错误信息==>', err)
  });
// 写入本地的文件名
const fileName = 67.mp4
// 调用nodejs写入文件方法
const writeFile = readSteam.pipe(fs.createWriteStream(fileName))
// 写入完成事件
writeFile.on("finish", () => {
writeFile.close();
console.log("恭喜大哥,本地数据写入完成");
    });
// 写入错误触发的事件
writeFile.on("error", (err) => {
console.log("不好意思,写入本地文件发生异常,错误信息==>", err);
  });
});
 //axios 代码如下
const axios = require('axios')

// 创建axios实例
const service = axios.create({
 baseURL: '', // api 的 base_url
 // 永不凋谢,真男人 就是这么持久 😄😄
 timeout: 90000000 // 请求超时时间
})

// request拦截器
service.interceptors.request.use(
 config => {

   return config
 },
 error => {
   // Do something with request error
   console.log(error) // for debug
   Promise.reject(error)
 }
)

// 响应拦截器
service.interceptors.response.use(
 response => {
   
     return response
 },
 error => {
     return Promise.reject(error)
 }
)


module.exports = service

full code

full code

Disclaimer

  • First of all, thank you very much for this website, which allows me, an old two-dimensional element, to find my favorite film source😄😄😄
  • This project is only for learning and use, and there is no intention to perform crawling and other operations on this website.
  • I hope that the students who want to use it can play by themselves, and the tree is big to attract the wind.
  • Infringement, please contact and delete immediately
  • That's all

References

Another idea for cracking video
puppeterr get element attributes
puppeterr download file


qinyuanqiblog
20 声望3 粉丝

擅长摸鱼~