BeautifulSoup的ResultSet，如何遍历全部内容？

Question

BeautifulSoup的ResultSet，如何遍历全部内容？

薄墨无痕

161613

发布于
2018-10-24

更新于
2018-10-24

目标网页https://www.w3cschool.cn/code...

这个是抓取html

def getHtml(url):
    re = requests.get(url)
    return re.text
index = getHtml(url)
index

这个是解析html的方法

def parseHtml(html):
    soup = BeautifulSoup(index,'html.parser')
    #soup
    lessonList= soup.find('div',class_='codecamplist-catalog').find_all('a')
    return lessonList
lessonList = parseHtml(index)
lessonList

最后得到的lessonList 是bs4.element.ResultSet 格式

[<a href="//www.w3cschool.cn/codecamp/say-hello-to-html-element.html" title="Say Hello to HTML Element">
 <i class="icon-codecamp-list icon-codecamp-option"></i>
 开始学习HTML标签</a>,
 <a href="//www.w3cschool.cn/codecamp/headline-with-the-h2-element.html" title="Headline with the h2 Element">
 <i class="icon-codecamp-list icon-codecamp-option"></i>
 HTML 学习h2标签</a>,
 <a href="//www.w3cschool.cn/codecamp/inform-with-the-paragraph-element.html" title="Inform with the Paragraph Element">
 <i class="icon-codecamp-list icon-codecamp-option"></i>
 HTML 学习p标签</a>,
 <a href="//www.w3cschool.cn/codecamp/uncomment-html.html" title="Uncomment HTML">
 <i class="icon-codecamp-list icon-codecamp-option"></i>
 删除HTML的注释</a>]

请问一下这样的格式的数据怎么解析呀
目标是把里面的链接和title 保存成csv格式

对应的Tag格式的数据只能找到第一个，使用Find_all方法又会报错。

def getLesson(lessonList):
    for i in lessonList:
        lesson={}
        try:
            lesson['title'] = i.find('a')['href'].lstrip('//')
            lesson['name']= i.find('a')['title']
        except:
            print('error')
    return lesson
getLesson(lessonList)
#  当上面是 lessonList= soup.find_all('div',class_='codecamplist-catalog')
#  .find_all('a') 时为什么只能输出一条呢

结果

{'name': 'Say Hello to HTML Element',
 'title': 'www.w3cschool.cn/codecamp/say-hello-to-html-element.html'}

python

阅读 18.4k

1 个回答

✓ 已被采纳

def parseHtml(html):
    soup = BeautifulSoup(html,'lxml')
    # lessonList = soup.find('div',class_='codecamplist-catalog').find('ul').find_all('a')
    # lessonList = soup.find_all('div',class_='codecamplist-catalog')[0].find_all('ul')[0].find_all('a')
    lessonList= soup.select('div.codecamplist-catalog a')
    for item in lessonList:
        yield {item['href']:item['title']}

三种方式都可以,只是加了注释而已.

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

相似问题

找不到问题？创建新问题

BeautifulSoup的ResultSet，如何遍历全部内容？

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

分解质因素的算法很难，理解不了。请问有哪位大佬可以进行解释一下呢？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

BeautifulSoup的ResultSet，如何遍历全部内容？

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

分解质因素的算法很难，理解不了。 请问有哪位大佬可以进行解释一下呢？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

分解质因素的算法很难，理解不了。请问有哪位大佬可以进行解释一下呢？