2
头图

statement

All the content in this article is for learning and communication only. The captured content, sensitive URLs, and data interfaces have been desensitized, and it is strictly forbidden to use them for commercial or illegal purposes. Otherwise, all consequences arising therefrom will have nothing to do with the author. Infringement, please contact me to delete it immediately!

Reverse target

  • target : X meter account login
  • Homepage : aHR0cHM6Ly9hY2NvdW50LnhpYW9taS5jb20v
  • interface : aHR0cHM6Ly9hY2NvdW50LnhpYW9taS5jb20vcGFzcy9zZXJ2aWNlTG9naW5BdXRoMg==
  • reverse parameter : Form Data: hash: FCEA920F7412B5DA7BE0CF42B8C93759

Reverse process

Packet capture analysis

Come to the login page of Xiaomi, just enter an account password to log in, capture the packet and locate the login interface as aHR0cHM6Ly9hY2NvdW50LnhpYW9taS5jb20vcGFzcy9zZXJ2aWNlTG9naW5BdXRoMg==

01.png

POST request, there are many parameters in Form Data, analyze the main parameters:

  • serviceParam : {"checkSafePhone":false,"checkSafeAddress":false,"lsrp_score":0.0} , judging from the literal meaning of the parameter, it seems to be checking whether the phone and address are safe. As for the specific meaning, it is temporarily unknown, and it is not known where it is set.
  • callback : http://order.xxx.com/login/callback?followup=https%3A%2F%2Fwww.xx...... , the callback link is generally fixed, followed by followup and sid parameters.
  • qs : %3Fcallback%3Dhttp%253A%252F%252Forder.xxx.com%252Flogin%252Fcallback%2...... , formatting the value of qs, you can find that the four values of callback, sign, sid, and _qrsize are actually combined according to the URL encoding.
  • _sign : w1RBM6cG8q2xj5JzBPPa65QKs9w= , this string seems to be obtained after some kind of encryption, or it may be the value in the source code of the webpage.
  • user : 15555555555 , the user name in plain text.
  • hash : FCEA920F7412B5DA7BE0CF42B8C93759 , the encrypted password.

Parameter reverse

Basic parameters

First look at the serviceParam . The general idea is to search directly to see if we can find this value directly. The search found that the serviceParam keyword is in a 302 redirect request:

02.png

We noticed that when only the login homepage aHR0cHM6Ly9hY2NvdW50LnhpYW9taS5jb20v is entered, it will have two consecutive 302 redirects. Let’s focus on analyzing these two redirects.

For the first redirection, the new URL has followup , callback , sign , sid , which we will use in subsequent login requests.

03.png

04.png

For the second redirection, the new URL also has the followup , callback , sign , sid , in addition to the parameters serviceParam , qs , which are also required for subsequent login requests.

05.png

06.png

Find the source of the parameters, directly extract the parameters from the second redirection link. Here, response.history[1].headers['Location'] used to extract the target address in the second redirection return header of the page, and urllib.parse.urlparse used to parse the structure of the redirect link URL. urllib.parse.parse_qs extracts parameters, returns a dictionary, code sample:

import requests
import urllib.parse


headers = {
    'Host': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
index_url = '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler'
response = requests.get(url=index_url, headers=headers)
location_url = response.history[1].headers['Location']
urlparse = urllib.parse.urlparse(location_url)
query_dict = urllib.parse.parse_qs(urlparse.query)
print(query_dict)

need_theme = query_dict['needTheme'][0]
show_active_x = query_dict['showActiveX'][0]
service_param = query_dict['serviceParam'][0]
callback = query_dict['callback'][0]
qs = query_dict['qs'][0]
sid = query_dict['sid'][0]
_sign = query_dict['_sign'][0]

print(need_theme, show_active_x, service_param, callback, qs, sid, _sign)

hash

All other parameters are complete, and now there is still an encrypted password hash. Generally speaking, this is encrypted by JS. The old method is to search for hash or hash: . You can see it in the 78.4da22c55.chunk.js file. One sentence: hash: S()(r.password).toUpperCase() , it is obvious that the plaintext password is encrypted and then converted to all uppercase:

07.png

The point is this S(). When you move the mouse, you will find that an anonymous function of 78.4da22c55.chunk.js is actually called. We bury a breakpoint at the return position of the anonymous function for debugging:

08.png

e.exports = function(e, n) {
    if (void 0 === e || null === e)
        throw new Error("Illegal argument " + e);
    var r = t.wordsToBytes(u(e, n));
    return n && n.asBytes ? r : n && n.asString ? s.bytesToString(r) : t.bytesToHex(r)
}

You can see that the e passed in is a plaintext password, and the last return statement is a ternary operator. Since n is undefined, the last return is actually t.bytesToHex(r) , and its value is the encrypted password, but all letters They are all lowercase. According to normal thinking, we must have started to deduct JS. The parameter r, var r = t.wordsToBytes(u(e, n)); is passed in here. First, follow the u function to see:

09.png

10.png

It can be seen that the u function actually uses the object method 567. In this object method, many methods such as 129, 211, and 22 are also used. Error-prone, too much code is not easy to locate the wrong place, so here we need to change our thinking, first let’s see what t.bytesToHex(r) is, and follow up to this function:

11.png

bytesToHex: function(e) {
    for (var t = [], n = 0; n < e.length; n++)
        t.push((e[n] >>> 4).toString(16)),
        t.push((15 & e[n]).toString(16));
    return t.join("")
}

Interpreting this code, the incoming e is a 16-bit Array object, which defines a t empty array. After a loop, the values in the Array object are taken one by one, and the unsigned right shift operation is performed for the first time (>>> ), convert it to a hexadecimal string, and add the result to the end of the t array. After the second bit operation (&), it is also converted to a hexadecimal string, and the result is added to the end of the t array. That is to say, each value of the 16-bit Array object originally passed in has undergone two operations, then the final result will have 32 values in the t array, and finally the t array will be converted into a string and returned.

Combining the name of the called function, let's walk through the entire process. First, call the wordsToBytes() method to convert the plaintext password string into a byte array. No matter the length of the password, the byte array obtained is 16 bits, and then call the bytesToHex() method. Loop through the generated byte type array and let it generate a 32-bit string.

Regardless of the length of the password, the final ciphertext is 32 bits and consists of letters and numbers. These characteristics are easy to think of MD5 encryption. After the plaintext is converted into a byte array, it is randomly hashed and the byte array is processed. Digest, get the digest byte array, loop through the byte array, and generate a fixed-digit string. Isn’t this the MD5 encryption process?

The password is directly used for MD5 encryption, and compared with the encryption result of the website, it can be found that it is indeed the same, which verifies that our guess is correct:

12.png

In this case, it is OK to directly use Python's hashlib module to implement it. There is no need to buckle the code at all. Code sample:

import hashlib

password = "1234567"
encrypted_password = hashlib.md5(password.encode(encoding='utf-8')).hexdigest().upper()
print(encrypted_password)
# FCEA920F7412B5DA7BE0CF42B8C93759

Summarize

Sometimes we need to change our minds, and we don’t necessarily have to buckle the JS code every time. There are only a few encryption methods for the relatively easy site. Some have been slightly rewritten, and some have hidden parameters such as keys and offsets. Some of them confuse you with the encryption and decryption process, which makes it difficult for you to understand. If you are familiar with common encryption methods and principles, sometimes you only need to figure out what encryption method he uses, or get the key and offset. You can completely restore the entire encryption process by yourself if you need key parameters such as volume!

Complete code

GitHub pays attention to K brother crawler, and continues to share crawler-related code! Welcome star!

https://github.com/kgepachong/

following only demonstrates part of the key code , the complete code warehouse address:

https://github.com/kgepachong/crawler/

Python login key code

#!/usr/bin/env python3
# -*- coding: utf-8 -*-


import json
import hashlib
import urllib.parse

import requests


index_url = '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler'
login_url = '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler'
headers = {
    'Host': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler',
    'Origin': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler',
    'Referer': '脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
session = requests.session()


def get_encrypted_password(password):
    encrypted_password = hashlib.md5(password.encode(encoding='utf-8')).hexdigest().upper()
    return encrypted_password


def get_parameter():
    response = requests.get(url=index_url, headers=headers)
    location_url = response.history[1].headers['Location']
    urlparse = urllib.parse.urlparse(location_url)
    query_dict = urllib.parse.parse_qs(urlparse.query)
    # print(query_dict)
    return query_dict


def login(username, encrypted_password, query_dict):
    data = {
        'bizDeviceType': '',
        'needTheme': query_dict['needTheme'][0],
        'theme': '',
        'showActiveX': query_dict['showActiveX'][0],
        'serviceParam': query_dict['serviceParam'][0],
        'callback': query_dict['callback'][0],
        'qs': query_dict['qs'][0],
        'sid': query_dict['sid'][0],
        '_sign': query_dict['_sign'][0],
        'user': username,
        'cc': '+86',
        'hash': encrypted_password,
        '_json': True
    }
    response = session.post(url=login_url, data=data, headers=headers)
    response_json = json.loads(response.text.replace('&&&START&&&', ''))
    print(response_json)
    return response_json


def main():
    username = input('请输入登录账号: ')
    password = input('请输入登录密码: ')
    encrypted_password = get_encrypted_password(password)
    parameter = get_parameter()
    login(username, encrypted_password, parameter)


if __name__ == '__main__':
    main()


K哥爬虫
172 声望163 粉丝

Python网络爬虫、JS 逆向等相关技术研究与分享。