本文档中的信息仅用于学术研究和交流学习之目的,不涉及任何商业或非法用途。我们已对可能涉及的敏感信息,如抓包数据、特定网址、数据接口等,进行了必要的脱敏处理,以确保信息安全。任何未经授权的转载或修改后的二次传播均被严格禁止。

若擅自使用本文所述技术而引发的任何不良后果或意外,本文作者不承担任何责任。若认为本文内容涉及侵权,请通过公众号【小马哥逆向】与我们取得联系,我们将立即进行处理。感谢您的理解和配合

验证码是爬虫中常见的一种反爬手段,其中又分为文字验证码,滑块验证码,旋转验证码等
今天给大家介绍一个简单的文字验证码,进行学习研究

网站:aHR0cDovL29sZC5lZm5jaGluYS5jb20vaW5kZXgucGhwP209bWVtYmVyJmM9aW5kZXgmYT1sb2dpbiZmb3J3YXJkPWh0dHAlM0ElMkYlMkZvbGQuZWZuY2hpbmEuY29tJTJGJnNpdGVpZD0x

验证码类型:文字验证码

打开网站,开始f12直接开启抓包,可以多点击几次验证码,多抓几次包
image.png
发现点击多次后,url会出现时间拼接,但是没有影响。
接下来进行模拟请求下载图片

import requests
s = requests.Session()

headers = {
    "Accept": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Cache-Control": "no-cache",
    "Connection": "keep-alive",
    "Pragma": "no-cache",
    "Referer": "http://old.efnchina.com/index.php?m=member&c=index&a=login&forward=http%3A%2F%2Fold.efnchina.com%2F&siteid=1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
}
cookies = {
    "PsAvI__onlineid": "2f20VAEBU1JWAAQICQVWUVMEUVUFVVcHBgUCB1EFAA5UVAIHAAJTBlAJDA8JAFFVAwdSV1IIUwEDAgRRAg",
    "PHPSESSID": "pq1imibu1jmh6aomqqpnni5av1"
}
url = "http://old.efnchina.com/api.php"
params = {
    "op": "checkcode",
    "code_len": "5",
    "font_size": "14",
    "width": "120",
    "height": "26",
    "font_color": "",
    "background": "",
    "0.05253885216200893": "",
    "0.25073488177350733": "",
    "0.35129750737816634": "",
    "0.6442330764602027": ""
}
response = s.get(url, headers=headers, cookies=cookies, params=params, verify=False)

print(response.text)
print(response)

添加识别库

import requests
import ddddocr
s = requests.Session()
ocr = ddddocr.DdddOcr()

headers = {
    "Accept": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Cache-Control": "no-cache",
    "Connection": "keep-alive",
    "Pragma": "no-cache",
    "Referer": "http://old.efnchina.com/index.php?m=member&c=index&a=login&forward=http%3A%2F%2Fold.efnchina.com%2F&siteid=1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
}
cookies = {
    "PsAvI__onlineid": "2f20VAEBU1JWAAQICQVWUVMEUVUFVVcHBgUCB1EFAA5UVAIHAAJTBlAJDA8JAFFVAwdSV1IIUwEDAgRRAg",
    "PHPSESSID": "pq1imibu1jmh6aomqqpnni5av1"
}
url = "http://old.efnchina.com/api.php"
params = {
    "op": "checkcode",
    "code_len": "5",
    "font_size": "14",
    "width": "120",
    "height": "26",
    "font_color": "",
    "background": "",
    "0.05253885216200893": "",
    "0.25073488177350733": "",
    "0.35129750737816634": "",
    "0.6442330764602027": ""
}
response = s.get(url, headers=headers, cookies=cookies, params=params, verify=False)
result = ocr.classification(response.content)

print(result)

完成

小技巧
请求中使用requests.Session()开启一个全局的会话,因为验证码请求接口中一般是使用cookie进行接口确认,使用requests.Session(),不需要进行多余操作。

本文由mdnice多平台发布


小马哥逆向
1 声望1 粉丝