我见过几种从网站上抓取多个页面的解决方案,但无法在我的代码上运行。
目前,我有这段代码,正在努力抓取第一页。我想创建一个循环来抓取网站的所有页面(从第 1 页到第 5 页)
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
options = Options()
options.add_argument("window-size=1400,600")
from fake_useragent import UserAgent
ua = UserAgent()
a = ua.random
user_agent = ua.random
print(user_agent)
options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome('/Users/raduulea/Documents/chromedriver', options=options)
driver.get('https://www.immoweb.be/fr/recherche/immeuble-de-rapport/a-vendre/liege/4000?page=1')
import time
time.sleep(10)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
results = soup.find_all("div", {"class":"result-xl"})
title=[]
address=[]
price=[]
surface=[]
desc=[]
for result in results:
title.append(result.find("div", {"class":"title-bar-left"}).get_text().strip())
address.append(result.find("span", {"result-adress"}).get_text().strip())
price.append(result.find("div", {"class":"xl-price rangePrice"}).get_text().strip())
surface.append(result.find("div", {"class":"xl-surface-ch"}).get_text().strip())
desc.append(result.find("div", {"class":"xl-desc"}).get_text().strip())
df = pd.DataFrame({"Title":title,"Address":address,"Price:":price,"Surface" : surface,"Description":desc})
df.to_csv("output.csv")
原文由 mr-kim 发布,翻译遵循 CC BY-SA 4.0 许可协议
试试下面的代码。它会循环遍历所有页面,而不仅仅是 5 页。如果可用,请检查下一个按钮,然后单击它,否则会中断 wile 循环。