2
头图
Pay attention to WeChat public account: K brother crawler, QQ exchange group: 808574309, continue to share advanced crawler, JS/Android reverse engineering and other technical dry goods!

statement

All the content in this article is for learning and communication only. The captured content, sensitive URLs, and data interfaces have been desensitized, and it is strictly forbidden to use them for commercial or illegal purposes. Otherwise, all consequences arising therefrom will have nothing to do with the author. Infringement, please contact me to delete it immediately!

Reverse target

  • Goal: Public Inquiries from the Medical Security Administration
  • Homepage: aHR0cHM6Ly9mdXd1Lm5oc2EuZ292LmNuL25hdGlvbmFsSGFsbFN0LyMvc2VhcmNoL21lZGljYWw=
  • Interface: aHR0cHM6Ly9mdXd1Lm5oc2EuZ292LmNuL2VidXMvZnV3dS9hcGkvbnRobC9hcGkvZml4ZWQvcXVlcnlGaXhlZEhvc3BpdGFs
  • Reverse parameters: Request Payload of encData and signData , the Request Headers of x-tif-nonce and x-tif-signature

Reverse process

Packet capture analysis

When you come to the public query page and click to turn the page, you can see a POST request. The parameters of the Request Payload are encrypted, mainly appCode, encData and signData. The returned data also has these parameters. The encryption and decryption methods are the same. Yes, where encType and signType are SM4 and SM2 respectively, so there is a high probability that this is the national secret algorithm. The previous article about the national secret algorithm K has introduced: "Crawler reverse basics, understanding SM1-SM9, ZUC national secret algorithm" In addition, the request header also has x-tif-nonce and x-tif-signature parameters, as shown in the figure below:

01.png

Parameter reverse

Directly search for encData or signData globally. The search results are only available in app.1634197175801.js. Obviously, there is also a place to set the header above. All parameters are here. Put a breakpoint and you can see that this is the encrypted place, as follows As shown in the figure:

02.png

The encryption function here mainly passes in an e parameter, we can take a look at this e first, the meaning of the parameters inside is as follows:

  • addr : The detailed address of the medical institution, blank by default;
  • medinsLvCode : medical institution level code, blank by default;
  • medinsName : medical institution name, blank by default;
  • medinsTypeCode : Type code of medical institution, blank by default;
  • pageNum : page number, default 1;
  • pageSize : the number of data items per page, the default is 10;
  • regnCode : the location code of the medical institution, the default is 110000 (Beijing);
  • sprtEcFlag : I don't know its meaning for the time being, and it is empty by default.

The grade code, type code, and location code are all obtained through the request encryption interface. Their encryption and decryption methods are the same. They are shared in the final complete code, so I won’t repeat them here. Other parameters, such as appCode, are hard-coded in JS.

03.png

Let's take a look at the entire JS file. In the header, we can see the .call statement and the keywords exports, which is obviously a webpack format.

04.png

Let’s go back to the place of encryption. From top to bottom, the whole function refers to many other modules. If you want to deduct the whole function, the time will be extremely huge. If you want to take down the entire JS directly, then export the parameters. Violent methods are ok, but the entire JS has more than 70,000 lines, and the operating efficiency must be affected. Therefore, it is better to observe the function, remove the unused functions, and keep the useful ones. Observe function d, first Line var t = n("6c27").sha256 , click in to the createOutputMethod method, here is a SHA256 algorithm, from this method down the entire copy down, as shown in the following figure:

05.png

06.png

It should be noted here that the sha256 exported after this function actually calls the createMethod method, then the method we copy down can directly call createMethod, that is, var t = createMethod() , which does not need these exports.

07.png

In addition, there are some variables that need to be defined. The structure of the entire copy is as follows:

08.png

Then continue to look down, there is a sentence o = Object(i.a)() , also click in and copy it directly, there is nothing to pay attention to here.

09.png

Looking down again, you come to e.data.signData = p(e) , click into function p, and copy the entire function. At this time, you will find no errors when you debug locally. In fact, he uses the try-catch statement here. After catching the exception, there is no processing. , You can add a sentence of console.log(e) to output the exception. In fact, he will prompt undefined in the two positions of o.doSignature and e.from. Similarly, we can click in to deduct the function, but the function will continue to reference other functions later. Function, for convenience, we can write it in webpack, the same is true for e.from below.

10.png

11.png

Write the module in the form of webpack, call it in the self-executing method, and then define global variables to receive it, and then replace the original o, e with global variables. One more thing to note here is the incoming o.doSignature h, is a fixed value, needs to be defined, otherwise decryption will fail. As shown below:

12.png

13.png

You also need to pay attention when deducting the webpack module here. Don’t deduct all the modules in the original method. Some are not used at all. You can comment them out directly. This process requires patience. If you deduct all of them, it will be It is endless, it is better to use the entire JS file directly, all useful modules are as follows (maybe more, but not less):

14.png

Then the original said, encData: v("SM4", e) used function v here, and A, g and other functions are used in v, all of which can be deducted. At the same time, it should be noted that the e mentioned above is also used in the A function, and it needs to be replaced. Into a global variable defined by ourselves, as shown in the following figure:

15.png

16.png

At this point, the functions used for encryption are all deducted. At this time, we can write a method to encapsulate the encryption process. When using it, we only need to pass in parameters similar to the following:

{
    "addr": "", 
    "regnCode": "110000", 
    "medinsName": "", 
    "sprtEcFlag": "", 
    "medinsLvCode": "", 
    "medinsTypeCode": "", 
    "pageNum": 1, 
    "pageSize": 10
}

As shown in the figure below, getEncryptedData is the encryption method:

17.png

What about the decryption method? Obviously the returned data is encData. There are only three results if you search for encData directly. It is easy to find function y. Similarly, we should pay attention to change e.from to our custom e_.Buffer.from . In addition, we can also change the header parameter generation method. Also encapsulated into a function, easy to call.

18.png

19.png

Complete code

GitHub pays attention to K brother crawler, and continues to share crawler-related code! Welcome star! https://github.com/kgepachong/

only demonstrate part of the key code, can not run directly! complete code warehouse address: https://github.com/kgepachong/crawler/

JavaScript encryption key code architecture

var sm2, sm4, e_;
!function (e) {
    var n = {},
        i = {app: 0},
        r = {app: 0};

    function o(t) {}

    o.e = function (e) {}
    o.m = e
    o.c = n
    o.d = function (e, t, n) {}
    o.r = function (e) {}
    o.n = function (e) {}
    o.o = function (e, t) {}

    sm2 = o('4d09')
    e_ = o('b639')
    sm4 = o('e04e')

}({
    "4d09": function (e, t, n) {},
    'f33e': function (e, t, n) {},
    "4d2d": function (e, t, n) {},
    'b381': function (e, t, n) {},
    // 此处省略 N 个模块
})

// 此处省略 N 个变量

var createOutputMethod = function (e, t) {},
    createMethod = function (e) {},
    nodeWrap = function (method, is224) {},
    createHmacOutputMethod = function (e, t) {},
    createHmacMethod = function (e) {};

function Sha256(e, t) {}

function HmacSha256(e, t, n) {}

// 此处省略 N 个方法

function i() {}

function p(t) {}

function m(e) {}

var c = {
    paasId: undefined,
    appCode: "T98HPCGN5ZVVQBS8LZQNOAEXVI9GYHKQ",
    version: "1.0.0",
    appSecret: "NMVFVILMKT13GEMD3BKPKCTBOQBPZR2P",
    publicKey: "BEKaw3Qtc31LG/hTPHFPlriKuAn/nzTWl8LiRxLw4iQiSUIyuglptFxNkdCiNXcXvkqTH79Rh/A2sEFU6hjeK3k=",
    privateKey: "AJxKNdmspMaPGj+onJNoQ0cgWk2E3CYFWKBJhpcJrAtC",
    publicKeyType: "base64",
    privateKeyType: "base64"
    },
    l = c.appCode,
    u = c.appSecret,
    f = c.publicKey,
    h = c.privateKey,
    t = createMethod(),
    // t = n("6c27").sha256,
    r = Math.ceil((new Date).getTime() / 1e3),
    o = i(),
    a = r + o + r;

function getEncryptedData(data) {
    var e = {"data": data}
    return e.data = {
            data: e.data || {}
        },
        e.data.appCode = c.appCode,
        e.data.version = c.version,
        e.data.encType = "SM4",
        e.data.signType = "SM2",
        e.data.timestamp = r,
        e.data.signData = p(e),
        e.data.data = {
            encData: v("SM4", e)
        },
        // e.data = JSON.stringify({
        //     data: e.data
        // }),
        e
}

function getDecryptedData(t) {
    if (!t)
        return null;
    var n = e_.Buffer.from(t.data.data.encData, "hex")
      , i = function(t, n) {
        var i = sm4.decrypt(n, t)
          , r = i[i.length - 1];
        return i = i.slice(0, i.length - r),
        e_.Buffer.from(i).toString("utf-8")
    }(g(l, u), n);
    return JSON.parse(i)
}

function getHeaders(){
    var headers = {}
    return headers["x-tif-paasid"] = c.paasId,
        headers["x-tif-signature"] = t(a),
        headers["x-tif-timestamp"] = r.toString(),
        headers["x-tif-nonce"] = o,
        headers["Accept"] = "application/json",
        headers["contentType"] = "application/x-www-form-urlencoded",
        headers
}

Python get data key code

# ==================================
# --*-- coding: utf-8 --*--
# @Time    : 2021-11-03
# @Author  : 微信公众号:K哥爬虫
# @FileName: nhsa.py
# @Software: PyCharm
# ==================================


import execjs
import requests


regn_code_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
lv_and_type_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
result_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
UA = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"

with open('nhsa.js', 'r', encoding='utf-8') as f:
    nhsa_js = execjs.compile(f.read())


def get_headers():
    """获取 header 参数,每次请求改变"""
    headers = nhsa_js.call("getHeaders")
    headers["User-Agent"] = UA
    headers["Content-Type"] = "application/json"
    headers["Host"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
    headers["Origin"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
    headers["Referer"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
    # print(headers)
    return headers


def get_regn_code():
    """获取城市代码,返回结果无加密"""
    payload = {"data": {"transferFlag": ""}}
    response = requests.post(url=regn_code_url, json=payload, headers=get_headers())
    print(response.text)


def get_medins_lv_or_type_code(key):
    """获取医疗机构等级 (LV) or 类型 (TYPE) 代码"""
    if key == "LV":
        payload = {"type": "MEDINSLV"}
    elif key == "TYPE":
        payload = {"type": "MEDINS_TYPE"}
    else:
        print("输入有误!")
        return
    encrypted_payload = nhsa_js.call("getEncryptedData", payload)
    encrypted_data = requests.post(url=lv_and_type_url, json=encrypted_payload, headers=get_headers()).json()
    decrypted_data = nhsa_js.call("getDecryptedData", encrypted_data)
    print(decrypted_data)


def get_result():
    addr = input("请输入医疗机构详细地址(默认无): ") or ""
    medins_lv_code = input("请输入医疗机构等级代码(默认无): ") or ""
    medins_name = input("请输入医疗机构名称(默认无): ") or ""
    medins_type_code = input("请输入医疗机构类型代码(默认无): ") or ""
    regn_code = input("请输入医疗机构所在地代码(默认北京市): ") or "110000"
    page_num = input("请输入要爬取的页数(默认1): ") or 1

    for page in range(1, int(page_num)+1):
        payload = {
            "addr": addr,
            "medinsLvCode": medins_lv_code,
            "medinsName": medins_name,
            "medinsTypeCode": medins_type_code,
            "pageNum": page,
            "pageSize": 10,
            "regnCode": regn_code,
            "sprtEcFlag": ""
        }
        page += 1
        encrypted_payload = nhsa_js.call("getEncryptedData", payload)
        encrypted_data = requests.post(url=result_url, json=encrypted_payload, headers=get_headers()).json()
        decrypted_data = nhsa_js.call("getDecryptedData", encrypted_data)
        print(decrypted_data)


def main():
    # 获取城市代码
    # get_regn_code()
    # 获取医疗机构等级代码
    # get_medins_lv_or_type_code("LV")
    # 获取医疗机构类型代码
    # get_medins_lv_or_type_code("TYPE")
    # 获取搜索结果
    get_result()


if __name__ == "__main__":
    main()


K哥爬虫
166 声望148 粉丝

Python网络爬虫、JS 逆向等相关技术研究与分享。