response = requests.get('https://36kr.com/newsflashes') all_list=re.findall( '"title":"(.*?)","catch_title":"","description":".*?","cover":"","news_url_type":"news_url","news_url":"(.*?)","user_id":"344033181","published_at":"(.*?)",', response.text, re.S) print(all_list) print(len(all_list)) 我这个正则为啥不能匹配全部呢？只能匹配16个，总共是20个标题 https://36kr.com/newsflashes 这是网址

python 爬虫正则匹配不完全

2 个回答

得票最新

50Percent

2k1111

发布于
2018-10-10

✓ 已被采纳

一般这种还是先拿到json再从里边取数据吧,合理,还能多很多数据也许有用.

# coding:utf-8

import json
import re

import requests

page = requests.get("https://36kr.com/newsflashes")

pattern = re.compile(r"<script>var props=(.+?),locationnal")


data = json.loads(pattern.search(page.text).groups()[0].strip())

for index, i in enumerate(data["newsflashList|newsflash"]):
    print index+1, i["title"], i["news_url"]

1 央行等三部门联合发布《互联网金融从业机构反洗钱和反恐怖融资管理办法（试行）》 http://www.pbc.gov.cn/goutong...
2 乐视网：贾跃亭持股累计被司法处置约2202万股，用于偿还债务 http://www.szse.cn/disclosure...
3 滴滴：10月18日起试行黑名单，乘客司机均可拉黑对方 https://mp.weixin.qq.com/s/Zy...
4 优衣库投资1000亿日元升级物流自动化 https://www.iyiou.com/breakin...
5 九曳供应链获正大集团数亿元C轮战略投资 https://www.iyiou.com/p/83147
6 美媒：法拉第未来拟明年年中招工1300人，开始生产电动车 http://tech.qq.com/a/20181010...
7 数据显示：高盛在中国企业股票资本融资上领先同业 https://finance.qq.com/a/2018...
8 工业互联网成黑客攻击新目标，保障体系亟待加强 http://www.txxxb.com/cy/hlw/2...
9 中科院院士：人工智能发展需要理性务实 https://tech.sina.com.cn/d/i/...
10 Forerunner成立3.6亿美元新基金，专注DTC商业模式、加注供应链 http://www.lieyunwang.com/arc...
11 生鲜传奇完成B轮3亿元融资，估值达30亿元 https://mp.weixin.qq.com/s/Qa...
12 亚马逊AI招聘工具被曝性别歧视 http://finance.sina.com.cn/st...
13 福特打造全新电动运货汽车 http://tech.qq.com/a/20181010...
14 比亚迪：9月新能源汽车销量达2.79万辆，环比增长27.8% http://www.szse.cn/disclosure...
15 美银美林调查显示用户升级iPhone意愿增加，维持苹果股票买入评级 http://www.techweb.com.cn/wor...
16 百度联合中国软件行业协会发布AI深度学习工程师认证标准
17 苹果再次被告：双摄像头技术涉嫌侵犯专利 https://www.ithome.com/0/387/...
18 iPhone XR本月19日开启预购：6499元起 https://www.apple.com/cn/shop...
19 国家市场监管总局发布23项国家标准，涉及公共安全、食品等多领域 http://www.xinhuanet.com//201...
20 高德：十一期间导航服务超105亿次，同比增长98% https://tech.sina.com.cn/i/20...

同意并接受

7.2k21127

发布于
2018-10-10

用json把 js的变量转换为 dict，所有内容都在dic里面。

python3

import requests as req
import json
import re

rsp = req.get('https://36kr.com/newsflashes')
p = r'<script>var props=(.*?}),\s*\w*?='
j = re.findall(p, rsp.text)
dic = json.loads(j[0])
print(dic)

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

python 爬虫正则匹配不完全

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

如何实现一个深拷贝函数？

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？