爬虫python ，为什么偶尔出现list out of range ，爬不出数据的情况？

Question

爬虫python ，为什么偶尔出现list out of range ，爬不出数据的情况？

发布于
2024-05-08 山东

新手上路，请多包涵

python爬虫用 beautifulsoup 解析，有时候会出现 list out of range ，但是代码不变情况下，有时候也能运行。输出的列表均为空


import requests
from bs4 import BeautifulSoup
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0'}
response=requests.get("https://www.iqiyi.com/ranks1/3/0",headers=headers)
print(response.status_code)
response=response.text
soup= BeautifulSoup(response,"html.parser")
print(soup)
all_titles = soup.find_all("div", attrs={"class": "rvi__tit1"})
print("jil")
def get_title():
    try:

        all_titles = soup.find_all("div", attrs={"class": "rvi__tit1"})

        title = []
        for titles in all_titles:
            title2 = titles.get_text().strip()

            title.append(title2)

        return title[0:10]
    except IndexError:

        pass
def get_heat():
    try:
        soup = BeautifulSoup(response, "html.parser")
        all_heat = soup.find_all("span", attrs={"class": "rvi__index__num"})
        heat = []
        for heat1 in all_heat:
            heat.append(heat1.get_text().strip())
        return heat[0:10]

    except IndexError:

        pass


def get_introduction():

        all_introduction = soup.find_all("p", attrs={"class": "rvi__des2"})
        introduction = []
        for introduction1 in all_introduction:
            introduction.append(introduction1.get_text().strip())
        return introduction[0:10]  # 爬取前10

def list_show():
    tplt = "{0:{3}^10}\t{1:{3}^6}\t{2:{3}^20\t{3:{3}^60}}"  ## {1:{3}^10} 1表示位置，{3}表示用第3个参数来填充，^表示居中，10表示占10个位置
    #print(tplt.format('排名', '剧名', '热度', chr(12288)))
    lt = []
    for i in range(10):
        # 输出前25组数据
        tplt1 = "{0:{3}^10}\t{1:{3}^6}\t{2:{3}^20\t{3:{3}^60}}"

        lt.append([i +1, get_title()[i], get_heat()[i],get_introduction()[i]])

    return lt
if __name__ == '__main__':
    print(get_title())
    print(get_heat())
    print(get_introduction())
    list_show()

输出statu_code 是 200，也输出了soup ，但是就是列表返回不出数据

爬虫

python

阅读 898

AI BotBETA

在Python爬虫中，出现“list out of range”错误通常意味着你试图访问一个列表中不存在的索引。在你的代码中，这个错误可能是由于以下几个原因引起的：

页面结构变化：你正在爬取的网站可能会定期更新或更改其页面结构。如果页面的HTML结构发生了变化，你的爬虫代码可能就无法正确地找到并提取数据。因此，尽管你的代码之前能够正常工作，但现在可能无法找到所需的元素，导致find_all方法返回一个空列表。
网络延迟或问题：有时候，由于网络延迟或目标网站的问题，你可能无法正确接收到页面的完整内容。这可能导致BeautifulSoup无法解析出你期望的元素。
请求频率过高：如果你在短时间内频繁地向同一个网站发送请求，可能会被服务器识别为恶意行为，从而导致你的请求被拒绝或返回不完整的内容。

为了解决这个问题，你可以尝试以下几个方法：

检查页面结构：确保你要爬取的页面元素仍然存在于HTML中，并且其类名或ID没有发生变化。
增加异常处理：在访问列表元素之前，检查列表是否为空。如果为空，则返回一个默认值或跳过当前迭代。
降低请求频率：设置合适的请求延迟，以避免因为过于频繁的请求而被服务器拒绝。
使用代理：通过代理服务器发送请求，以避免被服务器识别为同一IP地址的频繁请求。
使用缓存：对于频繁请求的数据，可以考虑使用缓存来存储之前获取的数据，以减少对服务器的请求次数。

在你的代码中，get_title(), get_heat(), 和 get_introduction() 函数在列表为空时没有返回任何值，这可能导致list_show()函数在尝试访问这些列表时抛出“list out of range”错误。你可以在这些函数中添加适当的异常处理来避免这个问题。例如：

def get_title():
    try:
        all_titles = soup.find_all("div", attrs={"class": "rvi__tit1"})
        if not all_titles:
            return []  # 返回一个空列表而不是不返回任何值
        title = []
        for titles in all_titles:
            title2 = titles.get_text().strip()
            title.append(title2)
        return title[0:10]
    except IndexError:
        return []  # 在发生IndexError时返回一个空列表

# 对get_heat()和get_introduction()函数也进行类似的修改

这样，即使find_all方法返回一个空列表，你的函数也会返回一个空列表，而不是不返回任何值，从而避免“list out of range”错误。

1 个回答

得票最新

ezmo

7.3k2633

发布于
2024-05-08 上海

当get_title()返回的数组长度不够10个，你的程序一定会报错：数组地址越界。
看下面这段逻辑，如果get_title()返回的数组只有5个，此时如果i=7， get_title()[7]一定会抛出异常。

for i in range(10):
   ......
   lt.append([i +1, get_title()[i], get_heat()[i],get_introduction()[i]])

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

相似问题

找不到问题？创建新问题

爬虫python ，为什么偶尔出现list out of range ，爬不出数据的情况？

你尚未登录，登录后可以

Qt中布局是否只有5种呢？

这段代码为什么不能获取到数据？

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

请问一下，如何理解reduce函数呢？

如何使用Python+Selenium爬取Goodreads上万条书评而不崩溃？

如何使用 python 代码实现迅雷磁力链接资源的下载？

在PyCharm开发不同python项目，如果每个项目使用自己的venv环境，是不是每次切换项目都需要修改python interpreter？

爬虫python ，为什么偶尔出现list out of range ，爬不出数据 的情况？

你尚未登录，登录后可以

Qt中布局是否只有5种呢？

这段代码为什么不能获取到数据？

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

请问一下，如何理解reduce函数呢？

如何使用Python+Selenium爬取Goodreads上万条书评而不崩溃？

如何使用 python 代码实现迅雷磁力链接资源的下载？

在PyCharm开发不同python项目，如果每个项目使用自己的venv环境，是不是每次切换项目都需要修改python interpreter？

爬虫python ，为什么偶尔出现list out of range ，爬不出数据的情况？