"https://cn.investing.com/indices/hnx-30-components",这个网页包含了hnx30公司的构成,我只要爬取下来,用一个字典来容纳结果,键是公司名,值是一个链接,点击这个链接,可以跳转到公司名的网页,这个公司名对应的symbol就在里面。
root = 'https://cn.investing.com/'
from playwright.sync_api import sync_playwright as playwright
pw = playwright().start()
browser = pw.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
url="https://cn.investing.com/indices/hnx-30-components"
page.goto(url, wait_until="domcontentloaded")
import lxml.html
doc = lxml.html.fromstring(page.content())
elements = doc.xpath('//table[@id="cr1"]//tr//a')
company ={}
for e in elements:
key = e.text_content()
value = root + e.attrib['href']
company[key] = value
下面我要做的是,获得每个公司的symbol,发现,居然无法用playwright,来模拟跳转,并获取跳转后的网页,请各位指导一下。