origin
- Recently, I suddenly wanted to try to cut videos, so I wanted to start with animation first. When the two-dimensional element is developed, the original video must be needed to cut the video. How to find these resources?
- I often go to Six DMs to watch anime, and the clarity of the anime inside is okay, so I just want to write a crawler to download the anime I like directly. After all, it is a front-end job, and manual download is a bit embarrassing😄😄😄
final effect
How to solve the downloaded file name is not .mp4
For example, the Chinchilla I downloaded is in yum format, I directly change the suffix name to .mp4 to get it done. If it still doesn't work, I should just go to the previous format factory.
The author's skills and environment in writing this article
- Know a little about the front end
- nodejs knows nothing
- windows system
- puppeteer version: 14.3.0
- node version: 16.1.0
Start analyzing the website and search for a favorite anime
Introduction page
Just click on a play address, F12 will start, analyze the page
Yes, this website will still play, and the debugging tool will give me a debugger when I open it.
How to skip the debugger infinite loop
In this way, it is convenient to skip the infinite loop, and we can continue to debug
Find the page playback address
This website is still very simple, just drop the resource address directly into the iframe, the difficulty is reduced, I have to debug it for others, 😄😄
Analyze the origin of the playback address
Idea 1: Through the interface request analysis, whether there is a common point
After capturing the packets, it is found that the playback resource paths of the first and second episodes have no commonality at all, so give up o(╥﹏╥)o
Idea 2: Look directly at the source code logic of the player and find the splicing logic of the url
Find the source code of the player, find all the js files directly through the debugging tool, and then take a look
Analyze the player source code
Read the player source code and find that this website will store a global variable in the page
global variable player_aaaa
Will do assignment storage, and then introduce a js file, file name: /static/player/parse.js
Open the debugging tool, find the /static/player/parse.js file,
/static/player/parse.js file content
Browser console input: MacPlayer.Parse + MacPlayer.PlayUrl
Because the source code of the player is a self-executing function, and then we see the resource splicing method in the parse.js file, we can directly spell this resource in the browser console
Right click to save?
Put the above address into the browser to visit, and find that it is the resource we want to download. At this step, we can right-click to save it. Of course, as a qualified front-end, how can we right-click to save? When we come down, we are ready to use the big killer, puppeteer cooperates with nodejs to help us achieve automatic download
resource demo
How to do automation?
- Through the above connection, you will enter a parsing page, because if we want to do automatic download, we must find the video source connection, otherwise it will not work, o(╥﹏╥)o
- Find the element page and find the final resource address
- final resource address
Use puppeteer to parse the page, get the video resource address, and then use nodejs to automatically download the video
Idea 1: Traverse out the playlist, then start a task, open the pages in turn, find the resource address, then collect all the playback resource addresses, use nodejs to download to the local,
Why don't you use the above idea, because I wrote the code and tested it, and found that his server couldn't handle it, so it is still a safe point, one operation at a time
Idea 2: Use puppetee to automatically trigger the right-click download and save it to the place we want to download (haven't tried this method yet)
Idea 3: Traverse out the playlist, then start a task, start from the first one, open the page, find the resource address, use nodejs to download to the local, download is complete, start the next one and that's it
Idea 3 Difficulty Analysis
How does pupeteer get the attributes of an element, don't ask me, I don't understand anyway, the stack overflow boss told me
// 获取单个
await page.evaluate('document.querySelector("span.styleNumber").getAttribute("data-Color")')
// 获取多个
const attr = await page.$$eval("span.styleNumber", el => el.map(x => x.getAttribute("data-Color")));
nodejs download remote video and display progress
const fs = require('fs');
const https = require('https')
// 我的demo使用的是axios 来下载
const axiosRequest = require('./utils/request');
// 这是一个axios实例
axiosRequest.get('https://media.w3.org/2010/05/sintel/trailer.mp4', {
responseType: 'stream'
}).then(response => {
// 返回头里面的content-length字段,会告诉我们这个视频有多大
// 获取视频总长度 byte为单位
const totalLength = response.headers['content-length']
// 当前数据的总长度
let totalChunkLength = 0
// 当前读取的流
const readSteam = response.data
// 读取流会触发的事件
readSteam.on('data', (chunk) => {
totalChunkLength += chunk.length
console.log('数据传输中,当前进度==>', ((totalChunkLength / totalLength) * 100).toFixed(2) + '%')
});
// 读取完成的时间
readSteam.on('end', (chunk) => {
console.log('获取远端数据完毕')
});
// 读取错误会触发的事件
readSteam.on('error', (err) => {
console.log('获取远端数据完毕,发生了错误,错误信息==>', err)
});
// 写入本地的文件名
const fileName = 67.mp4
// 调用nodejs写入文件方法
const writeFile = readSteam.pipe(fs.createWriteStream(fileName))
// 写入完成事件
writeFile.on("finish", () => {
writeFile.close();
console.log("恭喜大哥,本地数据写入完成");
});
// 写入错误触发的事件
writeFile.on("error", (err) => {
console.log("不好意思,写入本地文件发生异常,错误信息==>", err);
});
});
//axios 代码如下
const axios = require('axios')
// 创建axios实例
const service = axios.create({
baseURL: '', // api 的 base_url
// 永不凋谢,真男人 就是这么持久 😄😄
timeout: 90000000 // 请求超时时间
})
// request拦截器
service.interceptors.request.use(
config => {
return config
},
error => {
// Do something with request error
console.log(error) // for debug
Promise.reject(error)
}
)
// 响应拦截器
service.interceptors.response.use(
response => {
return response
},
error => {
return Promise.reject(error)
}
)
module.exports = service
full code
Disclaimer
- First of all, thank you very much for this website, which allows me, an old two-dimensional element, to find my favorite film source😄😄😄
- This project is only for learning and use, and there is no intention to perform crawling and other operations on this website.
- I hope that the students who want to use it can play by themselves, and the tree is big to attract the wind.
- Infringement, please contact and delete immediately
- That's all
References
Another idea for cracking video
puppeterr get element attributes
puppeterr download file
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。