Pay attention to WeChat public account: K brother crawler, QQ exchange group: 808574309, continue to share advanced crawler, JS/Android reverse engineering and other technical dry goods!
statement
All the content in this article is for learning and communication only. The captured content, sensitive URLs, and data interfaces have been desensitized, and it is strictly forbidden to use them for commercial or illegal purposes. Otherwise, all the consequences will have nothing to do with the author. Infringement, please contact me to delete it immediately!
Reverse target
- Goal: AirAsia flight status query, Authorization parameter in request header
- Homepage:
aHR0cHM6Ly93d3cuYWlyYXNpYS5jb20vZmxpZ2h0c3RhdHVzLw==
- Interface:
aHR0cHM6Ly9rLmFwaWFpcmFzaWEuY29tL2ZsaWdodHN0YXR1cy9zdGF0dXMvb2QvdjMv
Reverse parameters:
- Request Headers:
authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI......
- Request Headers:
Reverse process
Packet capture analysis
Come to the flight status query page, enter the departure and destination at will, click to find the flight, for example, query the flight from Macau to Kuala Lumpur. MFM and KUL are the codes of Macau and Kuala Lumpur International Airport respectively. The query interface consists of the most basic URL + airport code. + Date composition, similar to: https://xxxxxxxxxx/MFM/KUL/28/09/2021 , where there is an authorization parameter in the Request Headers of the request header. Through observation, it is found that whether it is to clear the cookie or change the browser, this The value of the parameter is always the same. After testing, it is feasible to copy the parameter directly into the code, but this time our purpose is to hook this parameter by writing a browser plug-in to find the place where it is generated.
For the detailed knowledge of Hook, there is a detailed introduction in the previous article of Brother K: JS Reverse Hook, eating hot pot and singing songs, suddenly he was robbed by gangsters!
Browser plug-in Hook
Browser plug-ins are actually called browser extensions, which can enhance browser functions, such as blocking advertisements, managing browser agents, and changing the appearance of the browser.
Since Hooking is done by writing a browser plug-in, we must first understand briefly how to write a browser plug-in. There are corresponding specifications for writing a browser plug-in. In the past, different browser plug-ins were written in different ways. The same, it’s basically written in the same way as Google Chrome plug-ins. Google Chrome’s plug-ins can run on all webkit-core domestic browsers in addition to the Chrome browser, such as 360 speed browser, 360 Security browsers, Sogou browsers, QQ browsers, etc. In addition, Firefox browser is also used by many people. The development method of Firefox browser plug-ins has changed many times, but since the end of November 2017, plug-ins must use WebExtensions APIs The purpose of building is to be unified with other browsers. General Google Chrome plug-ins can also run directly on Firefox, but Firefox plug-ins need to be signed by Mozilla before they can be installed. Otherwise, they can only be temporarily debugged and restarted. There is no plug-in after the device, which is more inconvenient.
The development of a browser plug-in is simple and simple, and complex and complex, but for developers who are doing crawler reverse engineering, we mainly use the plug-in to hook the code, we only need to know that a plug-in consists of a manifest. json and a JavaScript script file is enough . Next, Brother K takes the authorization parameter of the request header in this case as an example to lead you to develop a hook plug-in. Of course, if you want to delve into the development of browser plug-ins, you can refer to Google Chrome extension documents and Firefox Browser extension documents .
According to the Google Chrome plug-in development specification, first create a new folder, which contains a manifest.json file and a JS Hook script. Of course, if you want to configure an icon for your plug-in, you can also put the icon in Under this folder, the icon format is officially recommended as PNG. It can also be any format supported by WebKit, including BMP, GIF, ICO, and JPEG. Note: The manifest.json file name cannot be changed! The normal plugin directory is similar to the following structure:
JavaScript Hook
├─ manifest.json // 配置文件,文件名不可更改
├─ icons.png // 图标
└─ javascript_hook.js // Hook 脚本,文件名顺便取
manifest.json
manifest.json is the most important and indispensable file in a Chrome plugin. It is used to configure all plugin-related configurations and must be placed in the root directory. Among them, the three parameters manifest_version, name and version are indispensable. In this case, the configuration of the manifest.json file is as follows: (For complete configuration, refer to Chrome manifest file format )
{
"name": "JavaScript Hook", // 插件名称
"version": "1.0", // 插件版本
"description": "JavaScript Hook", // 插件描述
"manifest_version": 2, // 清单版本,必须是2或者3
"icons": { // 插件图标
"16": "/icons.png", // 图标路径,插件图标不同尺寸也可以是同一张图
"48": "/icons.png",
"128": "/icons.png"
},
"content_scripts": [{
"matches": ["<all_urls>"], // 匹配所有地址
"js": ["javascript_hook.js"], // 注入的代码文件名和路径,如果有多个,则依次注入
"all_frames": true, // 允许将内容脚本嵌入页面的所有框架中
"permissions": ["tabs"], // 权限申请,tabs 表示标签
"run_at": "document_start" // 代码注入的时间
}]
}
The following points need to be noted here:
- manifest_version: configuration manifest version, currently supports 2 and 3, 2 will be phased out in the future, and 4 or higher versions may be released in the future. Manifest V2 and Manifest V3 on the official website. 3 has higher privacy and security requirements, and 2 is recommended here.
- content_scripts: A form of injecting scripts into the page in the Chrome plug-in, including address matching (regular expressions supported), JS and CSS scripts to be injected, code injection time (document_start is recommended, inject when the web page starts to load), etc.
javascript_hook.js
The Hook code is in the javascript_hook.js file:
var hook = function () {
var org = window.XMLHttpRequest.prototype.setRequestHeader;
window.XMLHttpRequest.prototype.setRequestHeader = function (key, value) {
if (key == 'Authorization') {
debugger;
}
return org.apply(this, arguments);
}
}
var script = document.createElement('script');
script.textContent = '(' + hook + ')()';
(document.head || document.documentElement).appendChild(script);
script.parentNode.removeChild(script);
XMLHttpRequest.setRequestHeader()
is the method to set the HTTP request header, a variable org is defined to save the original method, window.XMLHttpRequest.prototype.setRequestHeader
here is a prototype object prototype, all JavaScript objects will inherit properties and methods from a prototype prototype object. For details, please refer to the rookie tutorial Introduction to JavaScript prototype Once the program sets the Authorization in the request header, it will enter our Hook code, break it through the debugger, and finally return all the parameters to org, which is XMLHttpRequest.setRequestHeader()
ensure normal data transmission. Then create a script tag. The content of the script tag is to turn the Hook function into an IIFE self-executing function, and then insert it into the web page.
At this point, our browser plug-in has been written. Next, we will introduce how to use it in Google Chrome and Firefox Browser.
Google Chrome
Enter chrome://extensions in the browser address bar or click on the upper right corner [Customize and control Google Chrome] —> [More Tools] —> [Extensions], enter the extension page, and then select open [Developer] Mode]—>[Load the decompressed extension program], select the entire Hook plug-in folder (the folder should contain manifest.json, javascript_hook.js and icon files), as shown in the figure below:
Firefox Browser
Firefox cannot directly install plug-ins that have not been signed by Mozilla, and can only be installed by debugging add-ons. The format of the plug-in must be .xpi, .jar, .zip, so we need to package manifest.json, javascript_hook.js and the icon file together. The packaging needs to be careful not to include the top-level directory. Just right-click and compress all of them, otherwise, It will prompt does not contain a valid manifest during installation.
about:addons
in the browser address bar or click on the upper right corner [Open the application menu] —> [Extensions and themes], or you can directly use the shortcut key Ctrl + Shift + A to come to the extension page, there is Click to select [Debug Add-ons], select [Temporary Load Add-ons] under the temporary extension item, and select the compressed package of the Hook plug-in.
about:debugging#/runtime/this-firefox
in the browser address bar to directly enter the temporary extension page, as shown in the following figure:
Since then, we have completed the development and installation of the browser Hook plug-in. We came to the flight query page again, entered the departure place and destination, and clicked to find the flight. You can see that it has been successfully disconnected at this time:
TamperMonkey plugin hook
Earlier we have introduced how to write a browser plug-in, but the writing of different browser plug-ins is always the same. It is possible that a certain plug-in you write will not run on other browsers, and TamperMonkey can help us solve this problem. , TamperMonkey is commonly known as the oil monkey plug-in. It is a browser extension and the most popular user script manager. It basically supports all browsers with extended functions. The script can be written once, and all platforms can run. Scripts published by others can be directly obtained on platforms such as GreasyFork, OpenUserJS, etc., with many functions and powerful. Similarly, we can also use TamperMonkey to implement hooks.
TamperMonkey can be installed directly in major browser extension stores, or you can go to TamperMonkey official website to install, the installation process will not be repeated here.
After the installation is complete, click the icon to add a new script, or click the management panel, and then click the plus sign to create a new script, and write the following Hook code:
// ==UserScript==
// @name JavaScript Hook
// @namespace http://tampermonkey.net/
// @version 0.1
// @description JavaScript Hook 脚本
// @author K哥爬虫
// @include *://*airasia.com/*
// @icon https://profile.csdnimg.cn/1/B/8/3_kdl_csdn
// @grant none
// @run-at document-start
// ==/UserScript==
(function () {
'use strict';
var org = window.XMLHttpRequest.prototype.setRequestHeader;
window.XMLHttpRequest.prototype.setRequestHeader = function (key, value) {
if (key == 'Authorization') {
debugger;
}
return org.apply(this, arguments);
};
})();
The JavaScript part of the entire code is an IIFE immediate execution function. The specific meaning will not be explained. It has been mentioned during the development of the browser plug-in. The important thing is the above few lines of comments. Do not think that this is just a simple comment, optional. in TamperMonkey can be seen as this part of the basic configuration options, each has its own specific meaning, full configuration options reference TamperMonkey official document , common configuration items shown in the following table (which require special attention @match
, @include
and @run-at
option):
Options | meaning |
---|---|
@name | The name of the script |
@namespace | Namespace, used to distinguish scripts with the same name, generally write the author's name or URL |
@version | Script version, the update of the oil monkey script will read this version number |
@description | Describe what this script is for |
@author | The name of the author who wrote this script |
@match | Match the regular expression from the beginning of the string, and only the matched URL will execute the corresponding script. For example, * matches all, https://www.baidu.com/* matches Baidu, etc. You can refer to the re.match() method in the Python re module, allowing multiple instances |
@include | Similar to @match, only the matching URL will execute the corresponding script, but @include will not match from the beginning of the string. For example, *://*baidu.com/* matches Baidu. For details, please refer to TamperMonkey official document |
@icon | Script icon icon |
@grant | Specify the permissions required to run the script. If the script has the corresponding permissions, you can call the API provided by the oil monkey extension to interact with the browser. If it is set to none, the sandbox environment is not used, and the script will run directly in the environment of the web page. At this time, most of the oil monkey extended APIs cannot be used. If not specified, Oil Monkey will add several of the most commonly used APIs by default |
@require | If the script depends on other JS libraries, you can use the require command to import, and load other libraries before running the script |
@run-at | Script injection timing, this option is the key to whether it can be document-start . There are five values to choose from: 0616e8215f3209: when the web page starts; document-body : when the body appears; document-end : execute during or after document-idle : execute after loading is complete, Default option; context-menu : When you click the script in the browser context menu, it is generally set to document-start |
Return to the flight query page and enable the TamperMonkey script. If the configuration is correct, you can see that the Hook script we wrote has been turned on. Just enter the departure place and destination, click Find Flight, and you can see that it has been successfully disconnected at this time :
Parameter reverse
Regardless of whether you use a browser plug-in or TamperMonkey for Hooking, at this time Hook is to set the request header Authorization, which means that the value of Authorization must have passed through a certain function or method before, then we follow up with the developer The tool's Call Stack call stack will definitely be able to find this method. It is a test of patience with the call stack, and it takes a lot of time.
Under normal circumstances, we check whether the parameters passed by each function contain our target parameters. If there is no parameter in the previous function but appears in the next function, then the encryption process is likely to be between these two functions and enter the upper If you perform single-step debugging of a function, you can generally find the encrypted code. In this case, we followed the t.getData
function to bury a breakpoint to perform single-step debugging. You can see that we are actually calling t.subscribe
and t.call
repeatedly. The reason why we are not here Breakpoints are buried in the two functions because too many loops are not t.getData
debug, and 0616e8215f32dc is also suspicious to judge by name.
Click login again, and come to the place where we just buried the breakpoint. F11 or click the down arrow to enter the function for single-step debugging. After debugging for about 7 steps, it comes to a t.getHttpHeader
, you can see that the value of Authorization is "Bearer " + r.accessToken
r.accessToken
on the console and we can see that it is the value we want, as shown in the following figure:
Then the key point is this r.accessToken
. If you try to find it directly, you will find that you have found a lot of lines but you can’t find it. You can search for the keyword accessToken directly, and you can find that it is directly defined in the zUnb object. Just use it directly. ,As shown below:
The code of each place about the departure and destination is passed through JSON, which is easy to find and can be flexibly processed according to actual needs, as shown in the following figure:
This case itself is not difficult, direct search can also locate the parameter position faster, but the focus of this case is how to use the browser plug-in to hook operation, which is for some parameters that cannot be searched, or there are too many search results that are difficult to locate It is a good solution.
Complete code
GitHub pays attention to K brother crawler, and continues to share crawler-related code! Welcome star! https://github.com/kgepachong/
only part of the key code is demonstrated and cannot be run directly! complete code warehouse address: https://github.com/kgepachong/crawler/
Python sample code
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import requests
status_url = '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler'
def get_flight_status(departure, destination, date):
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36',
'authorization': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler'
}
complete_url = status_url + departure + '/' + destination + '/' + date
response = requests.get(url=complete_url, headers=headers)
print(response.text)
if __name__ == '__main__':
departure = input('请输入出发地代码:')
destination = input('请输入目的地代码:')
date = input('请输入日期(例如:29/09/2021):')
# departure = 'MFM'
# destination = 'KUL'
# date = '29/09/2021'
get_flight_status(departure, destination, date)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。