python爬取酷狗top500，分页爬取的问题

Question

python爬取酷狗top500，分页爬取的问题

arsoooo

11

发布于
2018-08-12

Leo黎诗霆

4168

更新于
2018-08-13

新手上路，请多包涵

题目描述

跟着书爬一下酷狗音乐top500
我爬取的思路是先寻找所有网页，然后再请求所有网页，并将他们的内容用beautifulsoup解析出来，最后直接print，但是却报错了，我看了一下思路应该不会有什么问题啊？求各位大神帮助，
报错：
No connection adapters were found for '['http://www.kugou.com/yy/rank/...']'
我的代码如下：

相关代码

import requests
from bs4 import BeautifulSoup
import time
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'
}#请求头
def get_info(url):         #获取网站信息
    res = requests.get(url,headers =  headers)  #请求网页
    soup = BeautifulSoup(res.text,'lxml')   #解析数据
    #名次：
    nums = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(3) > strong:nth-of-type(1)')
    #歌手-名字：
    titles = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > a:nth-of-type(4)')
    #时间：
    times = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(5) > span:nth-of-type(4)')
    for num,title,time in zip(nums,titles,times):
        data = {
            '名次':num.get_text().strip(),
            '歌手':title.get("title").get_text().split('-')[0],
            '名字':prices.get("title").get_text().split('-')[1],
            '时间':address.get_text().strip(),
        }
        print(data)
        time.sleep(2)

主程序


#主程序
urls = ['http://www.kugou.com/yy/rank/home/{}-8888.html?from=rank'.format(number) for number in range(1,24)]  #收集1-23页
for single_url in urls:
    get_info(single_url)
    time.sleep(5)

错误信息

主程序直接卡在那里没有任何信息打出来，于是我就试了一下第一页的爬取['http://www.kugou.com/yy/rank/home/1-8888.html?from=rank']，结果报错了，很奇怪好像是没连上的意思，我直接点开网页是能连上的。
代码如下：

url = ['http://www.kugou.com/yy/rank/home/1-8888.html?from=rank']
get_info(url)

报错如下：

No connection adapters were found for '['http://www.kugou.com/yy/rank/home/1-8888.html?from=rank']'

百度了一下这个报错试了一下没辙，而且百度上此报错内容较少拜托各位!

python html

阅读 2.2k

1 个回答

nums = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(3) > strong:nth-of-type(1)')
titles = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > a:nth-of-type(4)')
times = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(5) > span:nth-of-type(4)')

这个数据解析有问题啊,所以当然没有打印输出了
你觉得卡住,每次循环要sleep 7秒,而且输出为空造成的假象吧
以下代码供参考:
import requests
from bs4 import BeautifulSoup

url='http://www.kugou.com/yy/rank/...{}-8888.html?from=rank'

def get_info(url):

res=requests.get(url)
soup=BeautifulSoup(res.text,'lxml')
infoes=soup.select('div.pc_temp_songlist ul li ')
for info in infoes:
    nums=info.select('span.pc_temp_num')[0].text.strip()
    singer,name=info['title'].split('-',1)
    times=info.select('span.pc_temp_tips_r span.pc_temp_time')[0].text.strip()
    print({'名次':nums,'歌手':singer,'歌名':name,'时长':times})

if __name__=='__main__':

urls = [url.format(i) for i in range(1, 24)]
for url in urls:
    get_info(url)

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

相似问题

找不到问题？创建新问题

python爬取酷狗top500，分页爬取的问题

题目描述

相关代码

错误信息

你尚未登录，登录后可以

Qt中布局是否只有5种呢？

这段代码为什么不能获取到数据？

请问一下，如何理解reduce函数呢？

如何使用Python+Selenium爬取Goodreads上万条书评而不崩溃？

如何使用 python 代码实现迅雷磁力链接资源的下载？

在PyCharm开发不同python项目，如果每个项目使用自己的venv环境，是不是每次切换项目都需要修改python interpreter？

请问，FastAPI如何获取到前端上传的二进制文件并且返回？