Pay attention to WeChat public account: Brother K crawler, continue to share advanced crawler, JS/Android reverse engineering and other technical dry goods!
statement
All the content in this article is for learning and communication only. The captured content, sensitive URLs, and data interfaces have been desensitized, and it is strictly forbidden to use them for commercial or illegal purposes. Otherwise, all consequences arising therefrom will have nothing to do with the author. Infringement, please contact me to delete it immediately!
Reverse target
- Goal: X ball investor community cookie parameter
acw_sc__v2
encryption analysis - Homepage:
aHR0cHM6Ly94dWVxaXUuY29tL3RvZGF5
- Reverse parameters: Cookie:
acw_tc=27608267164066250867189...
Packet capture analysis
Our crawler goal is: Essence —> Today's Topic —> Hot posts of X ball, hot posts are loaded by Ajax, it is easy to find the data interface, the interface has no other encryption parameters, mainly because there are some values in the cookie, it is impossible without a cookie Visited. Among them, there is a acw_sc__v2
in the cookie, which is generated by JS, and the other values are obtained from the first visit to the homepage. The capture is as follows:
Encrypted lookup
Let's clear the cookie, open the F12 developer tool, refresh the page, and find that it will enter anti-debugging, and an infinite debugger appears. If you go up and follow the call stack, you can see that there is a large series of obfuscated codes in this method, which is actually the debugger. As shown below:
It’s very easy to pass the debugger. It should be noted that this site is rather tricky. The first time you visit the homepage, it is the obfuscated JS code, and then you will jump to the normal HTML page. If you want to replace the JS locally, the debugger is too much. It is dropped, but you may not be able to debug it in the future. If you are interested, you can try it yourself. Here, Brother K will right-click Never pause here. Never pause here:
We observe this obfuscated code and directly search for acw_sc__v2
, we can see that there is an operation to set a cookie at the end, where x is the value of acw_sc__v2
Parameter reverse
Let's follow the call stack to see how x is obtained. Here, setTimeout will execute '\x72\x65\x6c\x6f\x61\x64\x28\x61\x72\x67\x32\x29'
when the time is up. The console output will find that it is the reload method. The parameter passed in is arg2, and the value of arg2 is the value of acw_sc__v2
. ,As shown below:
arg1 is defined in the head. It should be noted that this arg1 will change every time it is refreshed, so we need to dynamically obtain it when we get the value later. Let's take out the key code and analyze it separately:
var arg1 = '6A6BE0CAF2D2305297951C9A2ADBC2E8D21D48FD';
var _0x5e8b26 = _0x55f3('0x3', '\x6a\x53\x31\x59');
var _0x23a392 = arg1[_0x55f3('0x19', '\x50\x67\x35\x34')]();
arg2 = _0x23a392[_0x55f3('0x1b', '\x7a\x35\x4f\x26')](_0x5e8b26);
You can see that the main _0x55f3()
method. If you directly buckle this method, the local operation will directly enter an infinite loop. After a few more debugging times, you will find that the _0x5e8b26
call function transfer parameter is the same every time, and the result every time The same is true, so it can be written directly as a fixed value. The _0x23a392[_0x55f3('0x1b', '\x7a\x35\x4f\x26')]
arg2 actually uses an anonymous function, as shown in the following figure:
We follow up this anonymous function directly, we can see that a lot of _0x55f3()
methods are also called, we directly output it in the console, and then take the result directly to the local:
After all the results are replaced, you will find that you will rely on another anonymous function. Finally, you can deduct all these two anonymous functions:
Of course, if you encounter _0x55f3()
methods are called, it is impossible to replace them one by one. You need to further analyze the logic in the function, and single-step debugging locally to see why it enters an infinite loop. There are a lot of them. If-else statement, it must be the lack of a certain environment to enter the else statement, which leads to an infinite loop. It is possible to directly delete the else statement, fill the environment and use the if statement.
Complete code
Follow K brother crawler on GitHub and continue to share crawler-related code! Welcome star! https://github.com/kgepachong/
only part of the key code is demonstrated and cannot be run directly! complete code warehouse address: https://github.com/kgepachong/crawler/
JavaScript encryption code
/* ==================================
# @Time : 2021-12-29
# @Author : 微信公众号:K哥爬虫
# @FileName: get_acw_sc_v2.js
# @Software: PyCharm
# ================================== */
var _0x5e8b26 = '3000176000856006061501533003690027800375'
var getAcwScV2 = function (arg1) {
String['prototype']['hexXor'] = function (_0x4e08d8) {
var _0x5a5d3b = '';
for (var _0xe89588 = 0x0; _0xe89588 < this['length'] && _0xe89588 < _0x4e08d8['length']; _0xe89588 += 0x2) {
var _0x401af1 = parseInt(this['slice'](_0xe89588, _0xe89588 + 0x2), 0x10);
var _0x105f59 = parseInt(_0x4e08d8['slice'](_0xe89588, _0xe89588 + 0x2), 0x10);
var _0x189e2c = (_0x401af1 ^ _0x105f59)['toString'](0x10);
if (_0x189e2c['length'] == 0x1) {
_0x189e2c = '0' + _0x189e2c;
}
_0x5a5d3b += _0x189e2c;
}
return _0x5a5d3b;
};
String['prototype']['unsbox'] = function () {
var _0x4b082b = [0xf, 0x23, 0x1d, 0x18, 0x21, 0x10, 0x1, 0x26, 0xa, 0x9, 0x13, 0x1f, 0x28, 0x1b, 0x16, 0x17, 0x19, 0xd, 0x6, 0xb, 0x27, 0x12, 0x14, 0x8, 0xe, 0x15, 0x20, 0x1a, 0x2, 0x1e, 0x7, 0x4, 0x11, 0x5, 0x3, 0x1c, 0x22, 0x25, 0xc, 0x24];
var _0x4da0dc = [];
var _0x12605e = '';
for (var _0x20a7bf = 0x0; _0x20a7bf < this['length']; _0x20a7bf++) {
var _0x385ee3 = this[_0x20a7bf];
for (var _0x217721 = 0x0; _0x217721 < _0x4b082b['length']; _0x217721++) {
if (_0x4b082b[_0x217721] == _0x20a7bf + 0x1) {
_0x4da0dc[_0x217721] = _0x385ee3;
}
}
}
_0x12605e = _0x4da0dc['join']('');
return _0x12605e;
};
var _0x23a392 = arg1['unsbox']();
arg2 = _0x23a392['hexXor'](_0x5e8b26);
return arg2
};
// 测试输出
// var arg1 = '2410463826D86A52A5BB43A13A80BAE6C4122A73';
// console.log(getAcwScV2(arg1))
Python test code
# ==================================
# --*-- coding: utf-8 --*--
# @Time : 2021-12-29
# @Author : 微信公众号:K哥爬虫
# @FileName: main.py
# @Software: PyCharm
# ==================================
import re
import execjs
import requests
index_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
news_test_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
headers = {
"Host": "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler",
"Referer": "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
}
def get_complete_cookie():
complete_cookie = {}
# 第一次不带参数访问首页,获取 acw_tc 和 acw_sc__v2
response = requests.get(url=index_url, headers=headers)
complete_cookie.update(response.cookies.get_dict())
arg1 = re.findall("arg1='(.*?)'", response.text)[0]
with open('get_acw_sc_v2.js', 'r', encoding='utf-8') as f:
acw_sc_v2_js = f.read()
acw_sc__v2 = execjs.compile(acw_sc_v2_js).call('getAcwScV2', arg1)
complete_cookie.update({"acw_sc__v2": acw_sc__v2})
# 第二次访问首页,获取其他 cookies
response2 = requests.get(url=index_url, headers=headers, cookies=complete_cookie)
complete_cookie.update(response2.cookies.get_dict())
return complete_cookie
def news_test(cookies):
response = requests.get(url=news_test_url, headers=headers, cookies=cookies)
print(response.json())
if __name__ == '__main__':
complete_cookie = get_complete_cookie()
news_test(complete_cookie)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。