爬虫：如何获得vn30指数构成公司的symbol?

Question

爬虫：如何获得vn30指数构成公司的symbol?

发布于
2024-02-13 北京

"https://cn.investing.com/indices/hnx-30-components"，这个网页包含了hnx30公司的构成，我只要爬取下来，用一个字典来容纳结果，键是公司名，值是一个链接，点击这个链接，可以跳转到公司名的网页，这个公司名对应的symbol就在里面。

root = 'https://cn.investing.com/'
from playwright.sync_api import sync_playwright as playwright
pw = playwright().start()
browser = pw.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
url="https://cn.investing.com/indices/hnx-30-components"
page.goto(url, wait_until="domcontentloaded")
import lxml.html
doc = lxml.html.fromstring(page.content())
elements = doc.xpath('//table[@id="cr1"]//tr//a')
company ={}
for e in elements:
    key = e.text_content()
    value = root + e.attrib['href']
    company[key] = value

下面我要做的是，获得每个公司的symbol,发现，居然无法用playwright,来模拟跳转，并获取跳转后的网页，请各位指导一下。

python3.x 爬虫

阅读 1.5k

AI BotBETA

首先，我注意到你正在尝试使用playwright来模拟点击并获取跳转后的网页内容。然而，playwright主要用于自动化网页交互，而不是用于网页爬虫。

对于你的需求，你可以考虑使用更专业的爬虫库，例如requests和BeautifulSoup，来获取你需要的公司symbol。下面是一个简单的示例：

import requests
from bs4 import BeautifulSoup

root = 'https://cn.investing.com/'
url = "https://cn.investing.com/indices/hnx-30-components"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')

company_dict = {}
table = soup.find('table', id='cr1')
for row in table.find_all('tr'):
    if row.find('a'):  # 确保行中有链接
        a_tag = row.find('a')
        company_name = a_tag.text.strip()
        href = a_tag.get('href')
        company_dict[company_name] = root + href

# 打印结果
for company, href in company_dict.items():
    print(f"{company}: {href}")

在这个例子中，我们首先发送一个GET请求到给定的URL。然后，我们使用BeautifulSoup来解析HTML内容，并找到包含公司名称和链接的表格。对于每个找到的链接，我们将其添加到字典中，键是公司名称，值是链接。最后，我们打印出字典的内容。

请注意，如果你想进一步获取跳转后的网页内容以获取symbol，你可能需要使用更复杂的逻辑来模拟浏览器行为，例如使用JavaScript来触发点击事件等。但上述代码应该能够为你提供一个基本的起点。

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

相似问题

找不到问题？创建新问题

爬虫：如何获得vn30指数构成公司的symbol?

你尚未登录，登录后可以

base32 crockford 编码与其他语言的实现结果不同?

现在哪个多模态视觉ai大模型可以用于 ocr 识别反爬虫验证码？

如何获取爬取iconfont里面的图标？

微信有给开发者提供输入 biz 返回公众号的主体注册信息的接口或者工具吗？

有什么开箱即用的爬虫版本mac安卓虚拟机？