题目描述
跟着书爬一下酷狗音乐top500
我爬取的思路是先寻找所有网页,然后再请求所有网页,并将他们的内容用beautifulsoup解析出来,最后直接print,但是却报错了,我看了一下思路应该不会有什么问题啊?求各位大神帮助,
报错:
No connection adapters were found for '['http://www.kugou.com/yy/rank/...']'
我的代码如下:
相关代码
import requests
from bs4 import BeautifulSoup
import time
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'
}#请求头
def get_info(url): #获取网站信息
res = requests.get(url,headers = headers) #请求网页
soup = BeautifulSoup(res.text,'lxml') #解析数据
#名次:
nums = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(3) > strong:nth-of-type(1)')
#歌手-名字:
titles = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > a:nth-of-type(4)')
#时间:
times = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(5) > span:nth-of-type(4)')
for num,title,time in zip(nums,titles,times):
data = {
'名次':num.get_text().strip(),
'歌手':title.get("title").get_text().split('-')[0],
'名字':prices.get("title").get_text().split('-')[1],
'时间':address.get_text().strip(),
}
print(data)
time.sleep(2)
主程序
#主程序
urls = ['http://www.kugou.com/yy/rank/home/{}-8888.html?from=rank'.format(number) for number in range(1,24)] #收集1-23页
for single_url in urls:
get_info(single_url)
time.sleep(5)
错误信息
主程序直接卡在那里没有任何信息打出来,于是我就试了一下第一页的爬取['http://www.kugou.com/yy/rank/home/1-8888.html?from=rank']
,结果报错了,很奇怪好像是没连上的意思,我直接点开网页是能连上的。
代码如下:
url = ['http://www.kugou.com/yy/rank/home/1-8888.html?from=rank']
get_info(url)
报错如下:
No connection adapters were found for '['http://www.kugou.com/yy/rank/home/1-8888.html?from=rank']'
百度了一下这个报错试了一下没辙,而且百度上此报错内容较少 拜托各位!
nums = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(3) > strong:nth-of-type(1)')
titles = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > a:nth-of-type(4)')
times = soup.select('.pc_temp_songlist > ul:nth-of-type(1) > li > span:nth-of-type(5) > span:nth-of-type(4)')
这个数据解析有问题啊,所以当然没有打印输出了
你觉得卡住,每次循环要sleep 7秒,而且输出为空造成的假象吧
以下代码供参考:
import requests
from bs4 import BeautifulSoup
url='http://www.kugou.com/yy/rank/...{}-8888.html?from=rank'
def get_info(url):
if __name__=='__main__':