Pay attention to WeChat public account: Brother K crawler, continue to share advanced crawler, JS/Android reverse engineering and other technical dry goods!
statement
All the content in this article is for learning and communication only. The captured content, sensitive URLs, and data interfaces have been desensitized, and it is strictly forbidden to use them for commercial or illegal purposes. Otherwise, all consequences arising therefrom will have nothing to do with the author. Infringement, please contact me to delete it immediately!
Reverse target
- Goal: Wisdom Tree scan code login, the interface uses WebSocket communication protocol
- Homepage:
aHR0cHM6Ly9wYXNzcG9ydC56aGlodWlzaHUuY29tL2xvZ2luI3FyQ29kZUxvZ2lu
Introduction to WebSocket
WebSocket is a protocol for full-duplex communication on a single TCP connection. WebSocket makes the data exchange between the client and the server easier. In the WebSocket API, the browser and the server only need to complete a handshake, and a persistent connection can be created directly between the two, and two-way data transmission can be carried out.
The WebSocket protocol is abbreviated as WS or WSS (WebSocket Secure). The URL for sending requests starts with ws://
or wss://
. WSS is an encrypted version of WS, similar to HTTP and HTTPS.
The biggest feature of the WebSocket protocol is that the server can actively push information to the client, and the client can also actively send information to the server. This is a true two-way equal dialogue, which is a type of server push technology. The comparison with HTTP is shown in the figure below:
Packet capture analysis
Go to the scan code login page of Wisdom Tree, select WS for packet capture and use it to filter WebSocket requests, as shown in the figure below:
There are some special parameters that are not included in the HTTP/ HTTPS request:
Upgrade: websocket
: Indicates that this is a WebSocket type request;Sec-WebSocket-Version
: Tell the server the Websocket Draft (protocol version) used, which must be 13;Sec-WebSocket-Extensions
: Protocol extension, a certain type of protocol may support multiple extensions, through which protocol enhancements can be achieved;Sec-WebSocket-Key
: It is a base64-encoded ciphertext sent by the WebSocket client, which is randomly generated by the browser. The server must return a corresponding encryptedSec-WebSocket-Accept
response, otherwise the client will throw aError during WebSocket handshake
error and close the connection.
Let's scan the code and log in again, and then select the Messages tab. You can see that there are some data interactions. The green arrow is the data sent by the client to the server, and the red arrow is the data returned by the server to the client in response, as shown in the following figure:
Let's observe the entire interaction process. When we open the QR code page, that is, when the QR code is loaded, the WebSocket connection is established. Every 8 seconds or so, the client actively sends a string of characters to the server. The same string is also returned, just in dictionary format. When we scan the code successfully, the server will return the message that the scan was successful. When we click to log in, the client will return the scan result again. If it succeeds, there will be A one-time password oncePassword
and a uuid
, these two parameters will definitely be used in subsequent requests. If you don’t scan the code for a long time, it will return a message that the QR code has expired after a while, and send a message every 8 seconds, just to keep the connection and get the status message of the QR code.
So here are two problems:
- How did you get the string of strings that were sent back and forth interactively?
- How to implement WebSocket requests in Python?
- How to realize that the client sends data every 8 seconds while receiving the information from the server in real time? (Observe that the result of scanning code is returned in real time, so it cannot be received every 8 seconds)
Parameter acquisition
First, solve the first problem. How did the string sent by the client come from? The way to find the encrypted string here is the same as the HTTP/HTTPS request. In this example, we can directly search for this string and find It is passed through an interface, where img is the base64 value of the QR code image, and qrToken is the string sent by the client, as shown in the following figure:
It should be noted here that not all WebSocket requests are so simple. The data sent by some clients is Binary Message (binary data) or more complex encryption parameters, which cannot be obtained by direct search. For this situation , We also have a solution:
- It is known that the statement to create a WebSocket object is:
var Socket = new WebSocket(url, [protocol] );
, so we can search fornew WebSocket
locate the location of the request. - Knowing that a WebSocket object has the following related events, we can search for the corresponding event handler code to locate:
event | Event handler | describe |
---|---|---|
open | Socket.onopen | Triggered when the connection is established |
message | Socket.onmessage | Triggered when the client receives server data |
error | Socket.onerror | Triggered when a communication error occurs |
close | Socket.onclose | Triggered when the connection is closed |
- Knowing that a WebSocket object has the following related methods, we can search for the corresponding method to locate:
method | describe |
---|---|
Socket.send() | Use connection to send data |
Socket.close() | Close the connection |
Python implements WebSocket requests
Let’s move on to the second question, how to implement WebSocket requests in Python? There are many Python libraries used to connect to WebSocket, the more commonly used and stable ones are websocket-client (non-asynchronous), websockets (asynchronous), aiowebsocket aiowebsocket cfaf09 asynchronous. In this case, websocket-client is used, and here is the third issue. For the client, data needs to be sent every 8 seconds. For the server, we need to receive the information from the server in real time. You can observe the request and scan. The result of the code is returned in real time. If we only receive data every 8 seconds, data may be lost, and the response of the entire program will not be timely and the efficiency will be low.
The websocket-client official document provides us with a long-connection demo, which realizes three consecutive data transmissions and monitors the data returned by the server in real time. Among them, websocket.enableTrace(True)
indicates whether to display connection details:
import websocket
import _thread
import time
def on_message(ws, message):
print(message)
def on_error(ws, error):
print(error)
def on_close(ws, close_status_code, close_msg):
print("### closed ###")
def on_open(ws):
def run(*args):
for i in range(3):
time.sleep(1)
ws.send("Hello %d" % i)
time.sleep(1)
ws.close()
print("thread terminating...")
_thread.start_new_thread(run, ())
if __name__ == "__main__":
websocket.enableTrace(True)
ws = websocket.WebSocketApp(
"ws://echo.websocket.org/", on_open=on_open,
on_message=on_message, on_error=on_error, on_close=on_close
)
ws.run_forever()
Let's modify it appropriately. In the run method, the client still sends qr_token every 8 seconds to receive messages from the server in real time. When the word "scan code success" appears in the message, the oncePassword
and uuid
will be stored Get up, then close the connection. The logic code is as shown below. In the future, you only need to connect the QR code acquisition logic. (It has been desensitized and cannot be run directly)
import json
import time
import _thread
import websocket
web_socket_url = "wss://appcomm-user.脱敏处理.com/app-commserv-user/websocket?qrToken=%s"
qr_token = "ca6e6cfb70de4f2f915b968aefcad404"
once_password = ""
uuid = ""
def wss_on_message(ws, message):
print("=============== [message] ===============")
message = json.loads(message)
print(message)
if "扫码成功" in message["msg"]:
global once_password, uuid
once_password = message["oncePassword"]
uuid = message["uuid"]
ws.close()
def wss_on_error(ws, error):
print("=============== [error] ===============")
print(error)
ws.close()
def wss_on_close(ws, close_status_code, close_msg):
print("=============== [closed] ===============")
print(close_status_code)
print(close_msg)
def wss_on_open(ws):
def run(*args):
while True:
ws.send(qr_token)
time.sleep(8)
_thread.start_new_thread(run, (qr_token,))
def wss():
# websocket.enableTrace(True) # 是否显示连接详细信息
ws = websocket.WebSocketApp(
web_socket_url % qr_token, on_open=wss_on_open,
on_message=wss_on_message, on_error=wss_on_error,
on_close=wss_on_close
)
ws.run_forever()
Realize scan code login
The most important part of the WebSocket request has been solved. After scanning the code to get oncePassword
and 061b1d3b09d0d1, the subsequent processing steps are relatively simple, now uuid
- Request the homepage, get the cookie for the first time, including: INGRESSCOOKIE, JSESSIONID, SERVERID, acw_tc;
- Request the QR code interface to get the base64 value and qrToken of the QR code;
- Establish a WebSocket connection, scan the QR code, and get the one-time password oncePassword and uuid (it seems useless);
- Request a login interface, 302 redirect, need to carry a one-time password, get the cookie for the second time, including: CASLOGC, CASTGC, and update the SERVERID at the same time;
- Request the 302 redirect address in step 4, get the cookie for the third time, including: SESSION;
- Carry the complete cookie, request the user information interface, and obtain the real user name and other information.
In fact, after the WebSocket connection is over, there are many requests, which seem to be more acceptable, but after K brother’s test, only two redirects are more useful. The packet capture is as follows:
Complete code
Follow K brother crawler on GitHub and continue to share crawler-related code! Welcome star! https://github.com/kgepachong/
only part of the key code is demonstrated and cannot be run directly! complete code warehouse address: https://github.com/kgepachong/crawler/
Python login code
import time
import json
import base64
import _thread
import requests
import websocket
from PIL import Image
web_socket_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
get_login_qr_img_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
login_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
user_info_url = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
headers = {
"Host": "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler",
"Pragma": "no-cache",
"Referer": "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"
}
qr_token = ""
once_password = ""
uuid = ""
cookie = {}
def get_cookies_first():
response = requests.get(url=login_url, headers=headers)
global cookie
cookie = response.cookies.get_dict()
def get_login_qr_img():
response = requests.get(url=get_login_qr_img_url, headers=headers, cookies=cookie).json()
qr_img = response["img"]
global qr_token
qr_token = response["qrToken"]
with open('code.png', 'wb') as f:
f.write(base64.b64decode(qr_img))
image = Image.open('code.png')
image.show()
print("请扫描验证码! ")
def wss_on_message(ws, message):
print("=============== [message] ===============")
message = json.loads(message)
print(message)
if "扫码成功" in message["msg"]:
global once_password, uuid
once_password = message["oncePassword"]
uuid = message["uuid"]
ws.close()
def wss_on_error(ws, error):
print("=============== [error] ===============")
print(error)
ws.close()
def wss_on_close(ws, close_status_code, close_msg):
print("=============== [closed] ===============")
print(close_status_code)
print(close_msg)
def wss_on_open(ws):
def run(*args):
while True:
ws.send(qr_token)
time.sleep(8)
_thread.start_new_thread(run, (qr_token,))
def wss():
# websocket.enableTrace(True) # 是否显示连接详细信息
ws = websocket.WebSocketApp(
web_socket_url % qr_token, on_open=wss_on_open,
on_message=wss_on_message, on_error=wss_on_error,
on_close=wss_on_close
)
ws.run_forever()
def get_cookie_second():
global cookie
params = {
"pwd": once_password,
"service": "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
}
headers["Host"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
headers["Referer"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
response = requests.get(url=login_url, params=params, headers=headers, cookies=cookie, allow_redirects=False)
cookie.update(response.cookies.get_dict())
location = response.headers.get("Location")
return location
def get_cookie_third(location):
global cookie
headers["Host"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
headers["Referer"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
response = requests.get(url=location, headers=headers, cookies=cookie, allow_redirects=False)
cookie.update(response.cookies.get_dict())
location = response.headers.get("Location")
return location
def get_login_user_info():
headers["Host"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
headers["Origin"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
headers["Referer"] = "脱敏处理,完整代码关注 GitHub:https://github.com/kgepachong/crawler"
params = {"time": str(int(time.time() * 1000))}
response = requests.get(url=user_info_url, headers=headers, cookies=cookie, params=params)
print(response.text)
def main():
# 第一次获取 cookie,包含 INGRESSCOOKIE、JSESSIONID、SERVERID、acw_tc
get_cookies_first()
# 获取二维码
get_login_qr_img()
# websocket 扫码登录,返回一次性密码
wss()
# 第二次获取 cookie,更新 SERVERID、获取 CASLOGC、CASTGC
location1 = get_cookie_second()
# 第三次获取 cookie,获取 SESSION
get_cookie_third(location1)
# 获取登录用户信息
get_login_user_info()
if __name__ == '__main__':
main()
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。