python selenium 抓取整个表

2 个回答

发布于
2022-11-17

✓ 已被采纳

查看下面的脚本以从该网页获取整个表格。我在我的脚本中使用了硬编码延迟，这不是一个好习惯。但是，您始终可以定义 Explicit Wait 以使代码更健壮：

 import time
from selenium import webdriver

url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'

driver = webdriver.Chrome()
driver.get(url)
item = driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a')
driver.execute_script("arguments[0].click();", item)
time.sleep(2)
for table in driver.find_elements_by_xpath('//*[contains(@id,"eventHistoryTable")]//tr'):
    data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
    print(data)

driver.quit()

要获取耗尽 show more 按钮以及定义 Explicit Wait 的所有数据，您可以尝试以下脚本：

 from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'

driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver,10)

while True:
    try:
        item = wait.until(EC.visibility_of_element_located((By.XPATH,'//*[contains(@id,"showMoreHistory")]/a')))
        driver.execute_script("arguments[0].click();", item)
    except Exception:break

for table in wait.until(EC.visibility_of_all_elements_located((By.XPATH,'//*[contains(@id,"eventHistoryTable")]//tr'))):
    data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
    print(data)

driver.quit()

原文由 SIM 发布，翻译遵循 CC BY-SA 4.0 许可协议

社区维基

1

发布于
2022-11-17

根据您的问题和 url https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155 要抓取整个表，您可以使用以下解决方案：

代码块：

   # -*- coding: UTF-8 -*-
  from selenium import webdriver
  from selenium.webdriver.common.by import By
  from selenium.webdriver.support.ui import WebDriverWait
  from selenium.webdriver.support import expected_conditions as EC
  from selenium.common.exceptions import TimeoutException

  table_rows = []
  options = webdriver.ChromeOptions()
  options.add_argument("start-maximized")
  options.add_argument('disable-infobars')
  driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
  driver.get("https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155")
  show_more_button = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr>th.left.symbol")))
  driver.execute_script("arguments[0].scrollIntoView(true);",show_more_button);
  myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']"))))
  while True:
      try:
          WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#showMoreHistory1155>a"))).click()
          WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_css_selector("table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']")) > myLength)
          table_rows = driver.find_elements_by_css_selector("table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']")
          myLength = len(table_rows)
      except TimeoutException:
          break
  for row in table_rows:
      print(row.text)
  driver.quit()

控制台输出：

   Sep 24, 2018 01:30
  Sep 17, 2018 01:30 53.1%   55.3%
  Sep 10, 2018 01:30 55.3%   49.0%
  Sep 03, 2018 01:30 49.0%   43.3%
  Aug 27, 2018 01:30 43.3%   49.7%
  Aug 20, 2018 01:30 49.7%   52.5%
  Aug 13, 2018 01:30 52.5%   59.9%
  Aug 06, 2018 01:30 59.9%   62.6%
  Jul 30, 2018 01:30 62.6%   52.8%
  Jul 23, 2018 01:30 52.8%   52.7%
  Jul 16, 2018 01:30 52.7%   46.2%
  Jul 10, 2018 01:30 46.2%   55.3%
  Jul 02, 2018 01:30 55.3%   53.1%
  Jun 25, 2018 01:30 53.1%   66.2%
  Jun 18, 2018 01:30 66.2%   65.2%
  Jun 11, 2018 01:30 65.2%   61.2%
  Jun 04, 2018 01:30 61.2%   63.9%
  May 28, 2018 01:30 63.9%   67.0%
  May 21, 2018 01:30 67.0%   63.2%
  May 14, 2018 01:30 63.2%   61.3%
  May 07, 2018 01:30 61.3%   57.6%
  Apr 30, 2018 01:30 57.6%   64.8%
  Apr 23, 2018 01:30 64.8%   65.2%
  Apr 16, 2018 01:30 65.2%   60.4%
  Apr 09, 2018 01:30 60.4%   63.3%
  Apr 02, 2018 01:30 63.3%   62.1%
  Mar 26, 2018 01:30 62.1%   65.7%
  Mar 19, 2018 02:30 65.7%   56.0%
  Mar 12, 2018 02:30 56.0%   62.3%
  Mar 05, 2018 02:30 62.3%   59.1%
  Feb 26, 2018 02:30 59.1%   52.8%
  Feb 19, 2018 02:30 52.8%   55.8%
  Feb 12, 2018 02:30 55.8%   51.7%
  Feb 05, 2018 02:30 51.7%   56.8%
  Jan 29, 2018 02:30 56.8%   52.2%
  Jan 22, 2018 02:30 52.2%   56.1%
  Jan 15, 2018 02:30 56.1%   60.2%
  Jan 08, 2018 02:30 60.2%   54.6%
  Jan 01, 2018 02:30 54.6%   48.4%
  Dec 25, 2017 02:30 48.4%   66.4%
  Dec 18, 2017 02:30 66.4%   58.9%
  Dec 11, 2017 02:30 58.9%   53.8%
  Dec 04, 2017 02:30 53.8%   55.9%
  Nov 28, 2017 02:30 55.9%   53.7%
  Nov 20, 2017 02:30 53.7%   58.6%
  Nov 14, 2017 02:30 58.6%   52.8%
  Nov 06, 2017 02:30 52.8%   57.6%
  Oct 30, 2017 01:30 57.6%   54.7%
  Oct 23, 2017 01:30 54.7%   58.9%
  Oct 16, 2017 01:30 58.9%   57.3%
  Oct 09, 2017 01:30 57.3%   64.0%
  Oct 02, 2017 01:30 64.0%   47.5%
  Sep 25, 2017 01:30 47.5%   52.2%
  Sep 18, 2017 01:30 52.2%   55.5%
  Sep 11, 2017 01:30 55.5%   54.3%
  Sep 04, 2017 01:30 54.3%   54.2%
  Aug 28, 2017 01:30 54.2%   51.4%
  Aug 21, 2017 01:30 51.4%   57.4%
  Aug 14, 2017 01:30 57.4%   51.2%
  Aug 07, 2017 01:30 51.2%   51.3%
  Jul 31, 2017 01:30 51.3%   52.8%
  Jul 24, 2017 01:30 52.8%   53.3%
  Jul 17, 2017 01:30 53.3%   54.1%
  Jul 10, 2017 01:30 54.1%   51.9%
  Jul 03, 2017 01:30 51.9%   40.6%
  Jun 26, 2017 01:30 40.6%   52.6%
  Jun 19, 2017 01:30 52.6%   51.0%
  Jun 12, 2017 01:30 51.0%   52.1%
  Jun 05, 2017 01:30 52.1%   59.1%
  May 29, 2017 01:30 59.1%   46.9%
  May 22, 2017 01:30 46.9%   53.0%
  May 15, 2017 01:30 53.0%   44.9%
  May 08, 2017 01:30 44.9%   37.0%
  May 01, 2017 01:30 37.0%   43.0%
  Apr 24, 2017 01:30 43.0%   52.4%
  Apr 10, 2017 01:30 52.4%   55.1%
  Apr 03, 2017 01:30 55.1%   43.5%
  Mar 27, 2017 02:30 43.5%   36.0%
  Mar 20, 2017 02:30 36.0%   32.3%
  Mar 13, 2017 02:30 32.3%   42.8%
  Mar 06, 2017 02:30 42.8%   39.1%
  Feb 27, 2017 02:30 39.1%   41.7%
  Feb 20, 2017 02:30 41.7%   43.2%
  Feb 13, 2017 02:30 43.2%   36.6%
  Feb 06, 2017 02:30 36.6%   39.7%
  Jan 30, 2017 02:30 39.7%   33.5%
  Jan 23, 2017 02:30 33.5%   36.8%
  Jan 16, 2017 03:30 36.8%   37.0%
  Jan 09, 2017 02:30 37.0%   41.6%
  Jan 02, 2017 02:30 41.6%   35.8%
  Dec 26, 2016 02:30 35.8%   42.3%
  Dec 19, 2016 02:30 42.3%   39.7%
  Dec 12, 2016 04:15 39.7%   33.8%
  Dec 05, 2016 02:30 33.8%   37.1%
  Nov 29, 2016 02:30 37.1%   41.9%
  Nov 21, 2016 02:30 41.9%   39.1%
  Nov 15, 2016 02:00 39.1%   20.5%
  Nov 07, 2016 02:30 20.5%   27.4%
  Oct 31, 2016 02:30 27.4%   33.4%
  Oct 25, 2016 02:30 33.4%   30.8%
  Oct 18, 2016 02:30 30.8%   26.6%
  Oct 10, 2016 02:30 26.6%   28.6%
  Oct 05, 2016 02:00 28.6%   26.2%
  Sep 26, 2016 02:30 26.2%   34.8%
  Sep 19, 2016 02:30 34.8%   21.2%
  Sep 13, 2016 02:30 21.2%   27.0%
  Sep 05, 2016 02:30 27.0%   32.7%
  Aug 29, 2016 02:30 32.7%   23.9%
  Aug 22, 2016 02:30 23.9%   28.8%
  Aug 15, 2016 02:30 28.8%   30.8%
  Aug 08, 2016 02:30 30.8%   20.3%
  Aug 01, 2016 02:30 20.3%   30.2%
  Jul 25, 2016 02:30 30.2%   29.5%
  Jul 18, 2016 02:30 29.5%   26.2%
  Jul 11, 2016 02:30 26.2%   27.5%
  Jul 04, 2016 02:30 27.5%   26.8%
  Jun 27, 2016 02:30 26.8%   35.1%
  Jun 20, 2016 02:30 35.1%   22.8%
  Jun 13, 2016 02:30 22.8%   32.5%
  Jun 06, 2016 02:30 32.5%   35.6%
  May 30, 2016 02:30 35.6%   39.5%
  May 23, 2016 02:30 39.5%   37.8%
  May 16, 2016 03:30 37.8%   39.5%
  May 09, 2016 02:30 39.5%   30.3%
  May 02, 2016 02:30 30.3%   32.9%
  Apr 25, 2016 02:30 32.9%   29.6%
  Apr 18, 2016 06:00 29.6%   30.5%
  Apr 11, 2016 02:30 30.5%   22.7%
  Apr 04, 2016 03:30 22.7%   32.1%
  Mar 28, 2016 03:30 32.1%   23.2%
  Mar 21, 2016 03:30 23.2%   26.7%
  Mar 14, 2016 03:30 26.7%   22.6%
  Mar 07, 2016 03:30 22.6%   33.7%
  Feb 29, 2016 03:30 33.7%   34.8%
  Feb 22, 2016 03:30 34.8%   33.3%
  Feb 15, 2016 03:30 33.3%   33.3%
  Feb 08, 2016 03:30 33.3%   34.3%
  Feb 01, 2016 03:30 34.3%   33.2%
  Jan 25, 2016 03:30 33.2%   27.0%
  Jan 18, 2016 03:30 27.0%   27.2%
  Jan 11, 2016 03:30 27.2%   30.0%
  Jan 05, 2016 03:30 30.0%   24.0%
  Dec 29, 2015 03:30 24.0%   33.3%
  Dec 21, 2015 03:30 33.3%   31.2%
  Dec 14, 2015 04:30 31.2%   27.1%
  Dec 07, 2015 03:00 27.1%   29.8%
  Dec 01, 2015 03:00 29.8%   27.5%
  Nov 23, 2015 03:00 27.5%   33.1%
  Nov 17, 2015 04:00 33.1%   26.8%
  Nov 09, 2015 02:30 26.8%   24.3%
  Nov 02, 2015 01:30 24.3%   36.4%
  Oct 26, 2015 01:30 36.4%   28.6%
  Oct 19, 2015 01:30 28.6%   25.5%
  Oct 11, 2015 04:30 25.5%   29.6%
  Oct 06, 2015 01:00 29.6%   28.5%
  Sep 28, 2015 01:30 28.5%   29.1%
  Sep 21, 2015 01:30 29.1%   21.2%
  Sep 14, 2015 01:30 21.2%   29.8%
  Sep 07, 2015 01:30 29.8%   36.3%
  Aug 31, 2015 01:30 36.3%   35.6%
  Aug 24, 2015 01:30 35.6%   26.4%
  Aug 17, 2015 01:30 26.4%   24.8%
  Aug 10, 2015 01:30 24.8%   29.7%
  Aug 03, 2015 01:30 29.7%   24.8%
  Jul 27, 2015 01:30 24.8%   30.7%
  Jul 20, 2015 01:30 30.7%   27.9%
  Jul 13, 2015 01:30 27.9%   27.4%
  Jul 07, 2015 01:30 27.4%   26.8%
  Jun 29, 2015 01:30 26.8%   33.1%
  Jun 22, 2015 01:30 33.1%   33.6%
  Jun 15, 2015 03:30 33.6%   28.9%
  Jun 08, 2015 01:30 28.9%   23.0%
  Jun 01, 2015 01:30 23.0%   34.0%
  May 25, 2015 04:00 34.0%   28.9%
  May 18, 2015 01:30 28.9%   28.8%
  May 11, 2015 01:30 28.8%   28.3%
  May 04, 2015 02:00 28.3%   23.7%
  Apr 27, 2015 01:30 23.7%   27.2%
  Apr 20, 2015 01:30 27.2%   33.7%
  Apr 13, 2015 02:00 33.7%   23.2%
  Apr 06, 2015 02:00 23.2%   19.8%
  Mar 30, 2015 02:30 19.8%   24.1%
  Mar 23, 2015 02:30 24.1%   27.2%
  Mar 16, 2015 03:00 27.2%   35.6%
  Mar 09, 2015 02:30 35.6%   34.4%
  Mar 02, 2015 02:30 34.4%   30.2%
  Feb 23, 2015 02:30 30.2%   26.6%
  Feb 16, 2015 03:30 26.6%   23.8%
  Feb 09, 2015 02:30 23.8%   26.4%
  Feb 02, 2015 02:30 26.4%   23.9%
  Jan 26, 2015 02:30 23.9%   28.9%
  Jan 19, 2015 02:30 28.9%   35.5%
  Jan 12, 2015 02:30 35.5%   38.1%
  Jan 06, 2015 03:30 38.1%   40.6%
  Jan 01, 2015 02:30 40.6%   45.2%
  Dec 22, 2014 02:00 45.2%   39.8%
  Dec 15, 2014 02:00 39.8%   41.7%
  Dec 07, 2014 21:00 41.7%   33.8%
  Dec 02, 2014 03:00 33.8%   38.6%
  Nov 24, 2014 01:30 38.6%   39.2%
  Nov 17, 2014 01:00 39.2%   33.1%
  Nov 10, 2014 01:00 33.1%   35.4%
  Nov 04, 2014 03:00 35.4%   37.3%
  Oct 27, 2014 02:00 37.3%   33.7%
  Oct 19, 2014 22:00 33.7%   36.2%
  Oct 13, 2014 01:00 36.2%   44.5%
  Oct 06, 2014 01:00 44.5%   41.3%
  Sep 29, 2014 01:00 41.3%   50.3%
  Sep 21, 2014 22:35 50.3%   39.5%
  Sep 15, 2014 00:45 39.5%   39.9%
  Sep 08, 2014 01:00 39.9%   42.8%
  Sep 01, 2014 02:35 42.8%   41.9%
  Aug 25, 2014 01:00 41.9%   38.9%
  Aug 18, 2014 01:00 38.9%   34.0%
  Aug 11, 2014 01:00 34.0%   38.2%
  Aug 04, 2014 01:00 38.2%   38.4%
  Jul 28, 2014 01:00 38.4%   42.3%
  Jul 21, 2014 01:00 42.3%   37.2%
  Jul 14, 2014 01:00 37.2%   39.6%
  Jul 07, 2014 01:00 39.6%   39.8%
  Jun 30, 2014 01:00 39.8%   36.1%
  Jun 23, 2014 00:30 36.1%   37.6%
  Jun 16, 2014 00:30 37.6%   36.5%
  Jun 09, 2014 00:30 36.5%   44.1%
  Jun 01, 2014 22:00 44.1%   49.4%
  May 26, 2014 00:30 49.4%   41.0%
  May 19, 2014 00:00 41.0%   55.0%
  May 12, 2014 00:00 55.0%   41.1%
  May 04, 2014 06:00 41.1%   43.5%
  Apr 27, 2014 06:00 43.5%   40.3%
  Apr 06, 2014 06:00 40.3%

原文由 undetected Selenium 发布，翻译遵循 CC BY-SA 4.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

python selenium 抓取整个表

你尚未登录，登录后可以

如何使用Python+Selenium爬取Goodreads上万条书评而不崩溃？

如何使用Python Selenium爬取shadow-root（open）内的评论内容？

如何在Python中使用Selenium实现页面图片上传功能？

Stack Overflow 翻译