Brother K has written an article about the reverse direction of Baidu translation before, and has also published a corresponding video on bilibili. Recently, a group of friends in Brother K’s reptile exchange group proposed that Baidu translation added a request header parameter Acs-Token, If you do not carry this parameter, and process it directly according to the previous method, an error of 1022 will appear, and if you directly write the Acs-Token as a fixed value, the first few times may be successful, and the same error will be reported after multiple queries. It performs reverse analysis and refactors the previous code. At the same time, Brother K found that some interfaces of Baidu Index have a Cipher-Text parameter, which is similar to the encryption method of Baidu Translate's Acs-Token, so let's analyze it together.
statement
All content in this article is only for learning and communication, not for any other purpose, and does not provide complete code. The content of captured packets, sensitive URLs, data interfaces, etc. have been desensitized. Commercial and illegal uses are strictly prohibited. All consequences of this have nothing to do with the author!
The author is not responsible for any accident caused by unauthorized use of the technology explained in this article. If there is any infringement, please contact the author on the official account [K Brother Crawler] to delete it immediately !
reverse goal
- Goal: Baidu Translate latest request header parameter Acs-Token, Baidu Index request header Cipher-Text
- Homepage: https://fanyi.baidu.com/
- Interface: https://fanyi.baidu.com/v2transapi
- The reverse method of the sign and token parameters will not be repeated in this article. If you want to know more, you can read the articles on the reverse of Baidu translation by Brother K in the past.
reverse process
Packet capture analysis
Let’s take Baidu translation as an example. Just enter the text, and you can see that the translation result comes out without refreshing the page. From this, you can infer that it was loaded by Ajax. Open the developer tool, select XHR to filter the Ajax request, find the interface location, and analyze it in detail. It is recommended to read K's previous articles on Baidu translation reverse. As shown in the figure below, an Acs-Token parameter has been added to the request header. The first two strings of numbers look like timestamps. The specific encryption method requires us to further analyze:
Here, the Fiddler plug-in hook is used to locate the Acs-Token parameter. For the related hook operation method, you can read the previous articles of Brother K. This article will not repeat them:
(function () {
var org = window.XMLHttpRequest.prototype.setRequestHeader;
window.XMLHttpRequest.prototype.setRequestHeader = function (key, value) {
console.log(key, ':', value)
if (key == 'Acs-Token') {
debugger;
}
return org.apply(this, arguments);
};
})();
Clear the cache, click translate, you can see the successful hook to the Acs-Token parameter, follow the stack down to find the location where its value is generated:
reverse analysis
Going down the stack analysis, the value of the Acs-Token parameter is generated in line 187 of the translate.js
file, passed by the sign parameter, the sign parameter is defined in line 180, and the breakpoint is debugged on line 195, click After the translation, it is successfully broken at the breakpoint:
Follow up getAcsSign()
function, select it as a whole, click to enter the paris.js
file, you can see that an asynchronous Promise object is created in the function body to perform asynchronous operations:
The constructor of Promise accepts a function parameter, and this function needs to pass in two parameters:
-
resolve
: the callback function after the asynchronous operation is successfully executed; -
reject
: The callback function after the asynchronous operation fails.
So the successful execution of the asynchronous operation returns the value of the sign parameter:
Now that the sign has been obtained, we follow the stack up again, and we can find that the value of the Acs-Token parameter is generated at line 805 of the acs-2060.js
file, which is obviously spliced:
The above picture shows the broken situation during the analysis a few days ago. When we analyze it again today, we found that the structure has changed, as shown in the following picture:
Where did this acs-2060.js
come from? In paris.js
, you can actually see that init initializes some configuration files, the acsUrl is the address of acs-2060.js
, 2060 is the channel number, assigned by the administrator, you can see according to the comments This thing is called "Yumenguan".
Continue the previous steps, analyze acs-2060.js
, break the debugging at line 805, analyze the meaning of each splicing part in a8()
, and get the following results:
-
b('0x78')
or'\x31\x36\x36\x30\x35\x34\x36\x38\x30\x39\x35\x30\x35\x5f'
: Fixed string1660287615129_
or1660546809505_
will change every time. The specific change cycle requires continuous observation to know. -
ae
: current timestamp -
'\x5f'
: underscore_
-
eg(a2, a0, a1)
: a large string of encrypted strings, you can know the meaning of a2, a0, a1 in the console output
a0, a1 are fixed values, analyze the meaning of each parameter value in the a2 dictionary:
-
ua
: browser type -
url
: translate the link, for example, enter spider, the url is https://fanyi.baidu.com/#zh/en/spider -
platform
: Platform OS version -
clientTs
: current timestamp -
version
: version number
Select eg, follow up to the location where the eg function is defined, at line 537 of the acs-2060.js
file:
The details are as follows:
function eg(a2, a8, a9) {
return a2 = b('0x4d') == typeof a2 ? JSON[b('0xc')](a2) : void 0x0 === a2 ? '' : '' + a2,
dD[b('0x37')](a2, ad[b('0x29')](a8), {
'\x69\x76': ad[b('0x29')](a9),
'\x6d\x6f\x64\x65': cc,
'\x70\x61\x64\x64\x69\x6e\x67': cz
})[b('0x27')][b('0xa')](ag);
}
You can debug at the breakpoint on line 538, or print the obfuscated part directly from the console, and you will find three classic encryption parameters:
-
'\x69\x76'
: iv, offset -
'\x6d\x6f\x64\x65'
: mode, encryption method -
'\x70\x61\x64\x64\x69\x6e\x67'
: padding, padding
And assign eg to window.aes_encrypt
on line 548, it is obvious that AES is encrypted, you can choose to directly import the library, or you can directly deduct the code, we will not do further research here:
Baidu Index Cipher-Text
The Cipher-Text of Baidu Index and the Acs-Token of Baidu Translate are the same in structure. According to Baidu Translate's experience, we know that the core encryption code should be in "Yumenguan". The channel numbers assigned by different stations are different. We directly Globally search for acsUrl, or directly look for JS starting with acs, and you will find one acs-2057.js
:
As usual, disconnect at a8()
, refresh the interface, and then disconnect:
The difference between Baidu Index and Baidu Translate is that the timestamp at the beginning is different, the variable a0
is different, and the other logic is the same, we noticed that the timestamp at the beginning will change after a while , if it is applied in the project code, it is definitely unreasonable to change it manually. The processing idea here can be to fix a set of algorithms locally, and then go to the JS that starts with acs for each request, and after getting the content, Get the timestamp through regular matching, and then pass it to the local algorithm to generate the final value, which can be processed flexibly.
At this point, the analysis of Cipher-Text and Acs-Token is over. The reverse encryption algorithm is actually not difficult, but it requires certain skills to find the encryption location. In addition, when writing this article, I found that Baidu Translate does not add Acs The -Token request is OK again. The current situation is that sometimes it can be requested without adding it, and sometimes it cannot be requested without adding it. If you request an error {"errno":1022,"errmsg":"访问出现异常,请刷新后重试!","error":1022,"errShowMsg":"访问出现异常,请刷新后重试!"}
, you can try adding this parameter.
full code
bilibili pays attention to Brother K's reptile, and the little assistant does video teaching: https://space.bilibili.com/1622879192
GitHub pays attention to Brother K's crawler and continues to share crawler-related code! Welcome star! https://github.com/kgepachong/
The following only demonstrates some key codes and cannot be run directly!
baidufanyi_encrypt.js
var window = global;
// 以下部分内容过长,此处省略
// 完整代码关注 GitHub:https://github.com/kgepachong/crawler
(function(){...
})()
function ascToken(translate_url){
// 部分参数直接写死了,不同网站参数值不同,如果在项目中使用,请灵活处理
var a0 = 'uyaqcsmsseqyosiy';
var a1 = '1234567887654321';
var ae = (new Date).getTime();
var a2 = '{"ua":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36","url":' + translate_url + '","platform":"Win32","clientTs":' + ae + ',"version":"2.2.0"}';
// 这里开头的时间戳写死了,如果请求失败请更新这个值
return '1660546809505_' + ae + '_' + window.aes_encrypt(a2, a0, a1);
}
// console.log(ascToken("https://fanyi.baidu.com/#zh/en/%E6%B5%8B%E8%AF%95"))
baidufanyi.py
# ==================================
# --*-- coding: utf-8 --*--
# @Time : 2021-08-12
# @Author : 微信公众号:K哥爬虫
# @FileName: baidufanyi.py
# @Software: PyCharm
# ==================================
import re
import execjs
import requests
from urllib import parse
session = requests.session()
index_url = 'https://fanyi.baidu.com/'
lang_url = 'https://fanyi.baidu.com/langdetect'
translate_api = 'https://fanyi.baidu.com/v2transapi'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
# cookies = {
# "BAIDUID": "624363427DBD2BFCDF0C3D6E129F5C65:FG=1"
# }
def get_params(query):
# 获取 token 和 gtk
session.get(url=index_url, headers=headers)
# print(session.cookies.get_dict())
response_index = session.get(url=index_url, headers=headers)
token = re.findall(r"token: '([0-9a-z]+)'", response_index.text)[0]
gtk = re.findall(r'gtk = "(.*?)"', response_index.text)[0]
# 自动检测语言
response_lang = session.post(url=lang_url, headers=headers, data={'query': query})
lang = response_lang.json()['lan']
return token, gtk, lang
def get_sign_and_token(query, gtk, lang):
with open('baidufanyi_encrypt.js', 'r', encoding='utf-8') as f:
baidu_js = f.read()
sign = execjs.compile(baidu_js).call('e', query, gtk)
translate_url = 'https://fanyi.baidu.com/#%s/en/%s' % (lang, parse.quote(query))
acs_token = execjs.compile(baidu_js).call('ascToken', translate_url)
return sign, acs_token
def get_result(query, lang, sign, token, acs_token):
data = {
'from': lang,
'to': 'en',
'query': query,
'transtype': 'realtime',
'simple_means_flag': '3',
'sign': sign,
'token': token,
}
headers["Acs-Token"] = acs_token
response = session.post(url=translate_api, headers=headers, data=data)
result = response.json()['trans_result']['data'][0]['dst']
return result
def main():
query = input('请输入要翻译的文字:')
token, gtk, lang = get_params(query)
sign, acs_token = get_sign_and_token(query, gtk, lang)
result = get_result(query, lang, sign, token, acs_token)
print('翻译成英文的结果为:', result)
if __name__ == '__main__':
main()
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。