1

Brother K has written an article about the reverse direction of Baidu translation before, and has also published a corresponding video on bilibili. Recently, a group of friends in Brother K’s reptile exchange group proposed that Baidu translation added a request header parameter Acs-Token, If you do not carry this parameter, and process it directly according to the previous method, an error of 1022 will appear, and if you directly write the Acs-Token as a fixed value, the first few times may be successful, and the same error will be reported after multiple queries. It performs reverse analysis and refactors the previous code. At the same time, Brother K found that some interfaces of Baidu Index have a Cipher-Text parameter, which is similar to the encryption method of Baidu Translate's Acs-Token, so let's analyze it together.

statement

All content in this article is only for learning and communication, not for any other purpose, and does not provide complete code. The content of captured packets, sensitive URLs, data interfaces, etc. have been desensitized. Commercial and illegal uses are strictly prohibited. All consequences of this have nothing to do with the author!

The author is not responsible for any accident caused by unauthorized use of the technology explained in this article. If there is any infringement, please contact the author on the official account [K Brother Crawler] to delete it immediately !

reverse goal

  • Goal: Baidu Translate latest request header parameter Acs-Token, Baidu Index request header Cipher-Text
  • Homepage: https://fanyi.baidu.com/
  • Interface: https://fanyi.baidu.com/v2transapi
  • The reverse method of the sign and token parameters will not be repeated in this article. If you want to know more, you can read the articles on the reverse of Baidu translation by Brother K in the past.

reverse process

Packet capture analysis

Let’s take Baidu translation as an example. Just enter the text, and you can see that the translation result comes out without refreshing the page. From this, you can infer that it was loaded by Ajax. Open the developer tool, select XHR to filter the Ajax request, find the interface location, and analyze it in detail. It is recommended to read K's previous articles on Baidu translation reverse. As shown in the figure below, an Acs-Token parameter has been added to the request header. The first two strings of numbers look like timestamps. The specific encryption method requires us to further analyze:

2

Here, the Fiddler plug-in hook is used to locate the Acs-Token parameter. For the related hook operation method, you can read the previous articles of Brother K. This article will not repeat them:

3

 (function () {
    var org = window.XMLHttpRequest.prototype.setRequestHeader;
    window.XMLHttpRequest.prototype.setRequestHeader = function (key, value) {
        console.log(key, ':', value)
        if (key == 'Acs-Token') {
            debugger;
        }
        return org.apply(this, arguments);
    };
})();

Clear the cache, click translate, you can see the successful hook to the Acs-Token parameter, follow the stack down to find the location where its value is generated:

4

reverse analysis

Going down the stack analysis, the value of the Acs-Token parameter is generated in line 187 of the translate.js file, passed by the sign parameter, the sign parameter is defined in line 180, and the breakpoint is debugged on line 195, click After the translation, it is successfully broken at the breakpoint:

5

Follow up getAcsSign() function, select it as a whole, click to enter the paris.js file, you can see that an asynchronous Promise object is created in the function body to perform asynchronous operations:

The constructor of Promise accepts a function parameter, and this function needs to pass in two parameters:

  • resolve : the callback function after the asynchronous operation is successfully executed;
  • reject : The callback function after the asynchronous operation fails.

So the successful execution of the asynchronous operation returns the value of the sign parameter:

6

Now that the sign has been obtained, we follow the stack up again, and we can find that the value of the Acs-Token parameter is generated at line 805 of the acs-2060.js file, which is obviously spliced:

7

The above picture shows the broken situation during the analysis a few days ago. When we analyze it again today, we found that the structure has changed, as shown in the following picture:

8

Where did this acs-2060.js come from? In paris.js , you can actually see that init initializes some configuration files, the acsUrl is the address of acs-2060.js , 2060 is the channel number, assigned by the administrator, you can see according to the comments This thing is called "Yumenguan".

9

Continue the previous steps, analyze acs-2060.js , break the debugging at line 805, analyze the meaning of each splicing part in a8() , and get the following results:

  • b('0x78') or '\x31\x36\x36\x30\x35\x34\x36\x38\x30\x39\x35\x30\x35\x5f' : Fixed string 1660287615129_ or 1660546809505_ will change every time. The specific change cycle requires continuous observation to know.
  • ae : current timestamp
  • '\x5f' : underscore _
  • eg(a2, a0, a1) : a large string of encrypted strings, you can know the meaning of a2, a0, a1 in the console output

10

a0, a1 are fixed values, analyze the meaning of each parameter value in the a2 dictionary:

  • ua : browser type
  • url : translate the link, for example, enter spider, the url is https://fanyi.baidu.com/#zh/en/spider
  • platform : Platform OS version
  • clientTs : current timestamp
  • version : version number

Select eg, follow up to the location where the eg function is defined, at line 537 of the acs-2060.js file:

11

The details are as follows:

 function eg(a2, a8, a9) {
    return a2 = b('0x4d') == typeof a2 ? JSON[b('0xc')](a2) : void 0x0 === a2 ? '' : '' + a2,
        dD[b('0x37')](a2, ad[b('0x29')](a8), {
        '\x69\x76': ad[b('0x29')](a9),
        '\x6d\x6f\x64\x65': cc,
        '\x70\x61\x64\x64\x69\x6e\x67': cz
    })[b('0x27')][b('0xa')](ag);
}

You can debug at the breakpoint on line 538, or print the obfuscated part directly from the console, and you will find three classic encryption parameters:

  • '\x69\x76' : iv, offset
  • '\x6d\x6f\x64\x65' : mode, encryption method
  • '\x70\x61\x64\x64\x69\x6e\x67' : padding, padding

12

And assign eg to window.aes_encrypt on line 548, it is obvious that AES is encrypted, you can choose to directly import the library, or you can directly deduct the code, we will not do further research here:

13

14

Baidu Index Cipher-Text

The Cipher-Text of Baidu Index and the Acs-Token of Baidu Translate are the same in structure. According to Baidu Translate's experience, we know that the core encryption code should be in "Yumenguan". The channel numbers assigned by different stations are different. We directly Globally search for acsUrl, or directly look for JS starting with acs, and you will find one acs-2057.js :

15

As usual, disconnect at a8() , refresh the interface, and then disconnect:

16

The difference between Baidu Index and Baidu Translate is that the timestamp at the beginning is different, the variable a0 is different, and the other logic is the same, we noticed that the timestamp at the beginning will change after a while , if it is applied in the project code, it is definitely unreasonable to change it manually. The processing idea here can be to fix a set of algorithms locally, and then go to the JS that starts with acs for each request, and after getting the content, Get the timestamp through regular matching, and then pass it to the local algorithm to generate the final value, which can be processed flexibly.

At this point, the analysis of Cipher-Text and Acs-Token is over. The reverse encryption algorithm is actually not difficult, but it requires certain skills to find the encryption location. In addition, when writing this article, I found that Baidu Translate does not add Acs The -Token request is OK again. The current situation is that sometimes it can be requested without adding it, and sometimes it cannot be requested without adding it. If you request an error {"errno":1022,"errmsg":"访问出现异常,请刷新后重试!","error":1022,"errShowMsg":"访问出现异常,请刷新后重试!"} , you can try adding this parameter.

full code

bilibili pays attention to Brother K's reptile, and the little assistant does video teaching: https://space.bilibili.com/1622879192

GitHub pays attention to Brother K's crawler and continues to share crawler-related code! Welcome star! https://github.com/kgepachong/

The following only demonstrates some key codes and cannot be run directly!

baidufanyi_encrypt.js

 var window =  global;

// 以下部分内容过长,此处省略
// 完整代码关注 GitHub:https://github.com/kgepachong/crawler
(function(){...
})()
function ascToken(translate_url){
    // 部分参数直接写死了,不同网站参数值不同,如果在项目中使用,请灵活处理
    var a0 = 'uyaqcsmsseqyosiy';
    var a1 = '1234567887654321';
    var ae = (new Date).getTime();
    var a2 = '{"ua":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36","url":' + translate_url + '","platform":"Win32","clientTs":' + ae + ',"version":"2.2.0"}';
    // 这里开头的时间戳写死了,如果请求失败请更新这个值
    return '1660546809505_' + ae + '_' + window.aes_encrypt(a2, a0, a1);
}

// console.log(ascToken("https://fanyi.baidu.com/#zh/en/%E6%B5%8B%E8%AF%95"))

baidufanyi.py

 # ==================================
# --*-- coding: utf-8 --*--
# @Time    : 2021-08-12
# @Author  : 微信公众号:K哥爬虫
# @FileName: baidufanyi.py
# @Software: PyCharm
# ==================================


import re
import execjs
import requests
from urllib import parse


session = requests.session()
index_url = 'https://fanyi.baidu.com/'
lang_url = 'https://fanyi.baidu.com/langdetect'
translate_api = 'https://fanyi.baidu.com/v2transapi'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
# cookies = {
#     "BAIDUID": "624363427DBD2BFCDF0C3D6E129F5C65:FG=1"
# }


def get_params(query):
    # 获取 token 和 gtk
    session.get(url=index_url, headers=headers)
    # print(session.cookies.get_dict())
    response_index = session.get(url=index_url, headers=headers)
    token = re.findall(r"token: '([0-9a-z]+)'", response_index.text)[0]
    gtk = re.findall(r'gtk = "(.*?)"', response_index.text)[0]
    # 自动检测语言
    response_lang = session.post(url=lang_url, headers=headers, data={'query': query})
    lang = response_lang.json()['lan']
    return token, gtk, lang


def get_sign_and_token(query, gtk, lang):
    with open('baidufanyi_encrypt.js', 'r', encoding='utf-8') as f:
        baidu_js = f.read()
    sign = execjs.compile(baidu_js).call('e', query, gtk)
    translate_url = 'https://fanyi.baidu.com/#%s/en/%s' % (lang, parse.quote(query))
    acs_token = execjs.compile(baidu_js).call('ascToken', translate_url)
    return sign, acs_token


def get_result(query, lang, sign, token, acs_token):
    data = {
        'from': lang,
        'to': 'en',
        'query': query,
        'transtype': 'realtime',
        'simple_means_flag': '3',
        'sign': sign,
        'token': token,
    }
    headers["Acs-Token"] = acs_token
    response = session.post(url=translate_api, headers=headers, data=data)
    result = response.json()['trans_result']['data'][0]['dst']
    return result


def main():
    query = input('请输入要翻译的文字:')
    token, gtk, lang = get_params(query)
    sign, acs_token = get_sign_and_token(query, gtk, lang)
    result = get_result(query, lang, sign, token, acs_token)
    print('翻译成英文的结果为:', result)


if __name__ == '__main__':
    main()

17


K哥爬虫
166 声望147 粉丝

Python网络爬虫、JS 逆向等相关技术研究与分享。