获取列表时，如何翻页？链接都获取到了，怎么取出、依次访问？

Question

获取列表时，如何翻页？链接都获取到了，怎么取出、依次访问？

看近行远

271868110

发布于
2017-12-29

dokelung

4.9k1516

更新于
2017-12-29

#-*- coding:utf-8 -*-

import requests
import re

# 获取列表页链接
for page in range(1, 67):
    url_list = 'http://top.chinaz.com/hangye/index_news_{}.html'.format(page)

content_list = requests.get(url_list)
content_list.encoding = 'utf-8'
content_list = content_list.text
contents_list = re.search('''<h3 class="rightTxtHead"><a href="(.*?)" title='.*?</a>''',content_list) # 获取内页链接

source_list = 'http://top.chinaz.com' + contents_list.group(1) # 拼接内页地址
source_contents = requests.get(source_list)
source_contents.encoding = 'utf-8'
source_contents = source_contents.text

website_name = re.search('<h2.*?>(.*?)</h2><p class="plink ml5 fl"><a href="(.*?)" target="_blank" >.*?</a></p>',source_contents,re.S)

print(source_list)
print(website_name.group(1),website_name.group(2))

有点伪代码... url_list 获取了所有的列表页，接着怎么取出第一个列表页、取出第一个列表页中的第一个详情页、获取内容呢？然后循环取出第一个列表页的第二个详情页获取内容？

python

阅读 2.3k

1 个回答

dokelung

可以考慮用 BeautifulSoup:

import requests
from bs4 import BeautifulSoup

url = 'http://top.chinaz.com/hangye/index_news_3.html'.format(page)
res = requests.get(url)
res.encoding = 'utf-8'
content = res.text
soup = BeautifulSoup(content, 'html5lib')
lst = soup.find_all('h3', class_='rightTxtHead')
for h3 in lst:
    print(h3.a['href'], h3.a['title'])

結果:

/Html/site_qlwb.com.cn.html 齐鲁晚报网
/Html/site_ynet.com.html 北青网
/site_news.cnhubei.com.html 荆楚网新闻频道
/Html/site_hunantv.com.html 芒果TV
/Html/site_henan100.com.html 河南一百度
/Html/site_pep.com.cn.html 人民教育出版社
/Html/site_cnbeta.com.html cnBeta.COM_中文业界资讯站
/Html/site_wenming.cn.html 中国文明网
/Html/site_ettoday.net.html ETtoday 东森新闻云
/Html/site_zjstv.com.html 浙江卫视官方网站
/Html/site_chengdu.cn.html 成都全搜索
...

我回答過的問題: Python-QA

查看全部 1 个回答

推荐问题

相似问题

找不到问题？创建新问题

获取列表时，如何翻页？链接都获取到了，怎么取出、依次访问？

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

如何实现一个深拷贝函数？

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？