3
头图
Pay attention to WeChat public account: K brother crawler, QQ exchange group: 808574309, continue to share advanced crawler, JS/Android reverse engineering and other technical dry goods!

statement

All the content in this article is for learning and communication only. The captured content, sensitive URLs, and data interfaces have been desensitized, and it is strictly forbidden to use them for commercial or illegal purposes. Otherwise, all the consequences arising therefrom will have nothing to do with the author, if any Infringement, please contact me to delete it immediately!

Reverse target

The reverse goal of this time is the login of WB. Although the login encryption parameters are not too many, the login process is a little more complicated. After many transfers, it takes about nine processing times to successfully log in.

There is only one encryption parameter encountered during the login process, that is, password encryption. The encrypted password will be used when obtaining the token. Obtaining the token is a POST request. The sp in the Form Data is the encrypted password, similar to From: e23c5d62dbf9f8364005f331e487873c70d7ab0e8dd2057c3e66d1ae5d2837ef1dcf86......

Login process

First, let's clarify the login process. The special parameters of each step are explained. The parameters that are not mentioned are fixed values and can be copied directly.

The general process is as follows:

  1. Pre-login
  2. Get encrypted password
  3. Get token
  4. Get encrypted account
  5. Send the verification code
  6. Verification code
  7. Access redirect url
  8. Visit crossdomain2 url
  9. Login via passport url

1. Pre-login

01.png

Pre-login is a GET request. Query String Parameters mainly contains two more important parameters: su : the user name is obtained by base64 encoding, _ : 13-bit timestamp, the returned data contains a JSON, which can be extracted by regular rules. The JSON contains retcode seven parameter values: 061540ac7153e2, servertime , pcid , nonce , pubkey , rsakv , exectime , most of which are used in subsequent requests, some of which are used in the encrypted password example, return data:

xxxxSSOController.preloginCallBack({
    "retcode": 0,
    "servertime": 1627461942,
    "pcid": "gz-1cd535198c0efe850b96944c7945e8fd514b",
    "nonce": "GWBOCL",
    "pubkey": "EB2A38568661887FA180BDDB5CABD5F21C7BFD59C090CB2D245......",
    "rsakv": 1330428213,
    "exectime": 16
})

2. Get the encrypted password

The encryption of the password uses RSA encryption, and the encrypted password can be obtained through Python or JS. The reverse of JS encryption will be analyzed separately later.

3. Get the token

02.png

This token value will be used in the subsequent steps of obtaining an encrypted phone number, sending a verification code, verifying a verification code, etc. The obtained token value is a POST request, and the value of Query String Parameters is fixed: client: ssologin.js(v1.4.19) , the value of Form Data is relatively There are many, but in addition to the encrypted password, other parameters can actually be found in the data returned by the pre-login in step 1. The main parameters are as follows:

  • su : The username is obtained through base64 encryption
  • servertime : Obtained from the JSON returned by pre-login in step 1
  • nonce : Obtained from the JSON returned by the pre-login in step 1
  • rsakv : Obtained from the JSON returned by the pre-login in step 1
  • sp : encrypted password
  • prelt : random value

The returned data is the HTML source code, and the token value can be extracted from it, similar to: 2NGFhARzFAFAIp_QwX70Npj8gw4lgj7RbCnByb3RlY3Rpb24. . If the returned token is not this way, it means that the account or password is wrong.

4. Get the encrypted account

03.png

su we encountered earlier is the username obtained through base64 encryption. Here it further encrypts the username. The encrypted username will be used when sending the verification code and verification code, GET request, Query The parameters of String Parameters are also relatively simple, token is the token value obtained in step 3, callback_url is the homepage of the website, the returned data is HTML source code, you can use xpath syntax: //input[@name='encrypt_mobile']/@value to extract the encrypted account, its value is similar to: f2de0b5e333a , It should be noted here that even for the same account, the result of each encryption is different.

5. Send verification code

04.png

Sending the verification code is a POST request, and its parameters are relatively simple. The token in Query String Parameters is the token obtained in step 3, and the encrypt_mobile in Form Data is the encrypted account obtained in step 4. The returned data is the verification code. The sending status, for example: {'retcode': 20000000, 'msg': 'succ', 'data': []} .

6. Verify the verification code

05.png

Check codes is a POST request, which parameter is very simple, Query String Parameters in the token is acquired in Step 3 of token, Form Data Lane encrypt_mobile in step 4 of the encrypted account acquisition, code Step 5 The received verification code, the returned data is a JSON, retcode and msg represent the status redirect url is the page to be accessed after the verification step is completed, it will be used in the next step, the returned data example:

{
  "retcode": 20000000,
  "msg": "succ",
  "data": {
    "redirect_url": "https://login.xxxx.com.cn/sso/login.php?entry=xxxxx&returntype=META&crossdomain=1&cdult=3&alt=ALT-NTcxNjMyMTA2OA==-1630292617-yf-78B1DDE6833847576B0DC4B77A6C77C4-1&savestate=30&url=https://xxxxx.com"
  }
}

7. Visit the redirect url

06.png

The request interface in this step is actually the redirect url returned in step 6, GET request, similar to: https://login.xxxx.com.cn/sso/login.php?entry=xxxxx&returntype=META......

The returned data is the HTML source code. We need to extract the URL of crossdomain2 from it. The extracted result is similar to: https://login.xxxx.com.cn/crossdomain2.php?action=login&entry=xxxxx...... . Similarly, this URL is also the page that needs to be visited next.

8. Visit crossdomain2 url

07.png

The request interface in this step is the crossdomain2 url extracted in step 7, GET request, similar to: https://login.xxxx.com.cn/crossdomain2.php?action=login&entry=xxxxx......

The returned data is also the HTML source code. We need to extract the real login URL from it. The extracted result is similar to: https://passport.xxxxx.com/wbsso/login?ssosavestate=1661828618&url=https...... . The last step only needs to access the real login URL to realize the login operation.

9. Login via passport url

08.png

This is the last step and the real login operation. GET request, the request interface is the passport url extracted in step 8, similar to: https://passport.xxxxx.com/wbsso/login?ssosavestate=1661828618&url=https......

The returned data contains the login result, user ID and user name, similar to:

({"result":true,"userinfo":{"uniqueid":"5712321368","displayname":"tomb"}});

Since then, the complete login process of WB has been completed, and you can directly take the cookies after successful login for other operations.

Encrypted password reverse

In the login process, the second step is to obtain the encrypted password. In the third step of login to obtain the token, the requested Query String Parameters contains an encrypted parameter sp , which is the encrypted password, and then we have the password Encryption for reverse analysis.

Directly search the sp keyword globally and find that there are many values. Here we have used the techniques sp= earlier. Try to search for 061540ac715890, sp: or var sp to narrow the scope. In this case, we try to search for sp= , which can be seen in the index There is only one value in .js, so we can debug with a breakpoint. You can see that sp is b the value of 061540ac715899:

PS: When searching, you should pay attention that you cannot search on the page after successful login. At this time, the resource has been refreshed and reloaded. The encrypted JS file is no longer available. You need to enter the wrong account password in the login interface to capture, search, and Breakpoint.

09.png

Continue to track up b , the key code has an if-else statement, which bury breakpoints respectively, after debugging, you can see b the value of 061540ac7158fe is generated under the if:

10.png

Analyze two key lines of code:

f.setPublic(me.rsaPubkey, "10001");
b = f.encrypt([me.servertime, me.nonce].join("\t") + "\n" + b)

me.rsaPubkey , me.servertime , me.nonce are all the data returned from the first step of pre-login.

f.setPublic mouse to 061540ac71599f and f.encrypt , you can see that they are br and bt functions:

11.png

12.png

Follow up these two functions separately, you can see that they are both under an anonymous function:

13.png

Copy the entire anonymous function directly, remove the outermost anonymous function, and perform local debugging. During the debugging process, it will prompt that navigator undefined. Check the copied source code. navigator.appName and navigator.appVersion are used inside. You can define it directly, or leave it blank. .

navigator = {
    appName: "Netscape",
    appVersion: "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

Debugging will continue to find var c = this.doPublic(b); prompt object does not support this property or method, search doPublic found a bq.prototype.doPublic = bs; , where it directly instead doPublic = bs; can.

Analyze the entire RSA encryption logic, in fact, it can also be implemented through Python, code example (pubkey needs to be completed):

import rsa
import binascii


pre_parameter = {
        "retcode": 0,
        "servertime": 1627461942,
        "pcid": "gz-1cd535198c0efe850b96944c7945e8fd514b",
        "nonce": "GWBOCL",
        "pubkey": "EB2A38568661887FA180BDDB5CABD5F21C7BFD59C090CB2D245......",
        "rsakv": 1330428213,
        "exectime": 16
}

password = '12345678'

public_key = rsa.PublicKey(int(pre_parameter['pubkey'], 16), int('10001', 16))
text = '%s\t%s\n%s' % (pre_parameter['servertime'], pre_parameter['nonce'], password)
encrypted_str = rsa.encrypt(text.encode(), public_key)
encrypted_password = binascii.b2a_hex(encrypted_str).decode()

print(encrypted_password)

Complete code

GitHub pays attention to K brother crawler, and continues to share crawler-related code! Welcome star! https://github.com/kgepachong/

following 161540ac715b5a only demonstrates part of the key code and cannot be run directly! complete code warehouse address: https://github.com/kgepachong/crawler/

Key JS encryption code architecture

navigator = {
    appName: "Netscape",
    appVersion: "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

function bt(a) {}

function bs(a) {}

function br(a, b) {}

// 此处省略 N 个函数

bl.prototype.nextBytes = bk;
doPublic = bs;
bq.prototype.setPublic = br;
bq.prototype.encrypt = bt;
this.RSAKey = bq


function getEncryptedPassword(me, b) {
    br(me.pubkey, "10001");
    b = bt([me.servertime, me.nonce].join("\t") + "\n" + b);
    return b
}

// 测试样例
// var me = {
//     "retcode": 0,
//     "servertime": 1627283238,
//     "pcid": "gz-a9243276722ed6d4671f21310e2665c92ba4",
//     "nonce": "N0Y3SZ",
//     "pubkey": "EB2A38568661887FA180BDDB5CABD5F21C7BFD59C090CB2D245A87AC253062882729293E5506350508E7F9AA3BB77F4333231490F915F6D63C55FE2F08A49B353F444AD3993CACC02DB784ABBB8E42A9B1BBFFFB38BE18D78E87A0E41B9B8F73A928EE0CCEE1F6739884B9777E4FE9E88A1BBE495927AC4A799B3181D6442443",
//     "rsakv": "1330428213",
//     "exectime": 13
// }
// var b = '12312312312'  // 密码
// console.log(getEncryptedPassword(me, b))

Python login key code

#!/usr/bin/env python3
# -*- coding: utf-8 -*-


import re
import json
import time
import base64
import binascii

import rsa
import execjs
import requests
from lxml import etree


# 判断某些请求是否成功的标志
response_success_str = 'succ'

pre_login_url = '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler'
get_token_url = '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler'
protection_url = '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler'
send_code_url = '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler'
confirm_url = '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler'

headers = {
    'Host': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler',
    'Referer': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler',
    'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
session = requests.session()


def get_pre_parameter(username: str) -> dict:
    su = base64.b64encode(username.encode())
    time_now = str(int(time.time() * 1000))
    params = {
        'entry': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler',
        'callback': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler',
        'su': su,
        'rsakt': 'mod',
        'checkpin': 1,
        'client': 'ssologin.js(v1.4.19)',
        '_': time_now,
    }
    response = session.get(url=pre_login_url, params=params, headers=headers).text
    parameter_dict = json.loads(re.findall(r'\((.*)\)', response)[0])
    # print('1.【pre parameter】: %s' % parameter_dict)
    return parameter_dict


def get_encrypted_password(pre_parameter: dict, password: str) -> str:
    # 通过 JS 获取加密后的密码
    # with open('encrypt.js', 'r', encoding='utf-8') as f:
    #     js = f.read()
    # encrypted_password = execjs.compile(js).call('getEncryptedPassword', pre_parameter, password)
    # # print('2.【encrypted password】: %s' % encrypted_password)
    # return encrypted_password

    # 通过 Python 的 rsa 模块和 binascii 模块获取加密后的密码
    public_key = rsa.PublicKey(int(pre_parameter['pubkey'], 16), int('10001', 16))
    text = '%s\t%s\n%s' % (pre_parameter['servertime'], pre_parameter['nonce'], password)
    encrypted_str = rsa.encrypt(text.encode(), public_key)
    encrypted_password = binascii.b2a_hex(encrypted_str).decode()
    # print('2.【encrypted password】: %s' % encrypted_password)
    return encrypted_password


def get_token(encrypted_password: str, pre_parameter: dict, username: str) -> str:
    su = base64.b64encode(username.encode())
    data = {
        'entry': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler',
        'gateway': 1,
        'from': '',
        'savestate': 7,
        'qrcode_flag': False,
        'useticket': 1,
        'pagerefer': '',
        'vsnf': 1,
        'su': su,
        'service': 'miniblog',
        'servertime': pre_parameter['servertime'],
        'nonce': pre_parameter['nonce'],
        'pwencode': 'rsa2',
        'rsakv': pre_parameter['rsakv'],
        'sp': encrypted_password,
        'sr': '1920*1080',
        'encoding': 'UTF-8',
        'prelt': 38,
        'url': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler',
        'returntype': 'META'
    }
    response = session.post(url=get_token_url, headers=headers, data=data)
    # response.encoding = 'gbk'
    ajax_login_url = re.findall(r'replace\("(.*)"\)', response.text)[0]
    token = ajax_login_url.split('token%3D')[-1]
    if 'weibo' not in token:
        # print('3.【token】: %s' % token)
        return token
    else:
        raise Exception('登录失败! 用户名或者密码错误!')


def get_encrypted_mobile(token: str) -> str:
    params = {
        'token': token,
        'callback_url': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler'
    }
    response = session.get(url=protection_url, params=params, headers=headers)
    tree = etree.HTML(response.text)
    encrypted_mobile = tree.xpath("//input[@name='encrypt_mobile']/@value")[0]
    # print('4.【encrypted mobile】: %s' % encrypted_mobile)
    return encrypted_mobile


def send_code(token: str, encrypt_mobile: str) -> str:
    params = {'token': token}
    data = {'encrypt_mobile': encrypt_mobile}
    response = session.post(url=send_code_url, params=params, data=data, headers=headers).json()
    if response['msg'] == response_success_str:
        code = input('请输入验证码: ')
        # print('5.【code】: %s' % code)
        return code
    else:
        # print('5.【failed to send verification code】: %s' % response)
        raise Exception('验证码发送失败: %s' % response)


def confirm_code(encrypted_mobile: str, code: str, token: str) -> str:
    params = {'token': token}
    data = {
        'encrypt_mobile': encrypted_mobile,
        'code': code
    }
    response = session.post(url=confirm_url, params=params, data=data, headers=headers).json()
    if response['msg'] == response_success_str:
        redirect_url = response['data']['redirect_url']
        # print('6.【redirect url】: %s' % redirect_url)
        return redirect_url
    else:
        # print('6.【验证码校验失败】: %s' % response)
        raise Exception('验证码校验失败: %s' % response)


def get_cross_domain2_url(redirect_url: str) -> str:
    response = session.get(url=redirect_url, headers=headers).text
    cross_domain2_url = re.findall(r'replace\("(.*)"\)', response)[0]
    # print('7.【cross domain2 url】: %s' % cross_domain2_url)
    return cross_domain2_url


def get_passport_url(cross_domain2_url: str) -> str:
    response = session.get(url=cross_domain2_url, headers=headers).text
    passport_url_str = re.findall(r'setCrossDomainUrlList\((.*)\)', response)[0]
    passport_url = json.loads(passport_url_str)['arrURL'][0]
    # print('8.【passport url】: %s' % passport_url)
    return passport_url


def login(passport_url: str) -> None:
    response = session.get(url=passport_url, headers=headers).text
    login_result = json.loads(response.replace('(', '').replace(');', ''))
    if login_result['result']:
        user_unique_id = login_result['userinfo']['uniqueid']
        user_display_name = login_result['userinfo']['displayname']
        print('登录成功!用户 ID:%s,用户名:%s' % (user_unique_id, user_display_name))
    else:
        raise Exception('登录失败:%s' % login_result)


def main():
    username = input('请输入登录账号: ')
    password = input('请输入登录密码: ')

    # 1.预登陆,获取一个字典参数,包含后面要用的 servertime、nonce、pubkey、rsakv
    pre_parameter = get_pre_parameter(username)

    # 2.通过 JS 或者 Python 获取加密后的密码
    encrypted_password = get_encrypted_password(pre_parameter, password)

    # 3.获取 token
    token = get_token(encrypted_password, pre_parameter, username)

    # 4.通过 protection url 获取加密后的手机号
    encrypted_mobile = get_encrypted_mobile(token)

    # 5.发送手机验证码
    code = send_code(token, encrypted_mobile)

    # 6.校验验证码,校验成功则返回一个重定向的 URL
    redirect_url = confirm_code(encrypted_mobile, code, token)

    # 7.访问重定向的 URL,提取 crossdomain2 URL
    cross_domain2_url = get_cross_domain2_url(redirect_url)

    # 8.访问 crossdomain2 URL,提取 passport URL
    passport_url = get_passport_url(cross_domain2_url)

    # 9.访问 passport URL 进行登录操作
    login(passport_url)


if __name__ == '__main__':
    main()


K哥爬虫
172 声望163 粉丝

Python网络爬虫、JS 逆向等相关技术研究与分享。