python爬取酷狗top500,分页爬取的问题

题目描述

跟着书爬一下酷狗音乐top500
我爬取的思路是先寻找所有网页,然后再请求所有网页,并将他们的内容用beautifulsoup解析出来,最后直接print,但是却报错了,我看了一下思路应该不会有什么问题啊?求各位大神帮助,
报错:
No connection adapters were found for '['http://www.kugou.com/yy/rank/...']'
我的代码如下:

相关代码

import requests
from bs4 import BeautifulSoup
import time
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'
}#请求头
def get_info(url):         #获取网站信息
    res = requests.get(url,headers =  headers)  #请求网页
    soup = BeautifulSoup(res.text,'lxml')   #解析数据
    #名次:
    nums = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(3) > strong:nth-of-type(1)')
    #歌手-名字:
    titles = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > a:nth-of-type(4)')
    #时间:
    times = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(5) > span:nth-of-type(4)')
    for num,title,time in zip(nums,titles,times):
        data = {
            '名次':num.get_text().strip(),
            '歌手':title.get("title").get_text().split('-')[0],
            '名字':prices.get("title").get_text().split('-')[1],
            '时间':address.get_text().strip(),
        }
        print(data)
        time.sleep(2)

    

主程序


#主程序
urls = ['http://www.kugou.com/yy/rank/home/{}-8888.html?from=rank'.format(number) for number in range(1,24)]  #收集1-23页
for single_url in urls:
    get_info(single_url)
    time.sleep(5)

错误信息

主程序直接卡在那里没有任何信息打出来,于是我就试了一下第一页的爬取['http://www.kugou.com/yy/rank/home/1-8888.html?from=rank'],结果报错了,很奇怪好像是没连上的意思,我直接点开网页是能连上的。
代码如下:

url = ['http://www.kugou.com/yy/rank/home/1-8888.html?from=rank']
get_info(url)

报错如下:

No connection adapters were found for '['http://www.kugou.com/yy/rank/home/1-8888.html?from=rank']'

百度了一下这个报错试了一下没辙,而且百度上此报错内容较少 拜托各位!

阅读 2.2k
1 个回答

nums = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(3) > strong:nth-of-type(1)')
titles = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > a:nth-of-type(4)')
times = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(5) > span:nth-of-type(4)')

这个数据解析有问题啊,所以当然没有打印输出了
你觉得卡住,每次循环要sleep 7秒,而且输出为空造成的假象吧
以下代码供参考:
import requests
from bs4 import BeautifulSoup

url='http://www.kugou.com/yy/rank/...{}-8888.html?from=rank'

def get_info(url):

res=requests.get(url)
soup=BeautifulSoup(res.text,'lxml')
infoes=soup.select('div.pc_temp_songlist ul li ')
for info in infoes:
    nums=info.select('span.pc_temp_num')[0].text.strip()
    singer,name=info['title'].split('-',1)
    times=info.select('span.pc_temp_tips_r span.pc_temp_time')[0].text.strip()
    print({'名次':nums,'歌手':singer,'歌名':name,'时长':times})

if __name__=='__main__':

urls = [url.format(i) for i in range(1, 24)]
for url in urls:
    get_info(url)

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题