求大神指导,本人刚接触到python爬虫,有一些问题,再此感激不尽!!!!
我想爬取一些英文新闻标题,然后把他们存在一个csv文件里面
我的代码如下
import csv, requests, re
from bs4 import BeautifulSoup
urls = ['https://www.defense.gov/News/Archive/?Page={}'.format(str(i)) for i in range(1,10)]
def get_titles(urls,data = None):
html = requests.get(urls).text
soup = BeautifulSoup(html, 'html.parser')
articles = []
for article in soup.find_all(class_='info'):
Label = 'Archive'
News = article.find(class_='title').get_text()
articles.append([Label,News])
with open(r'1.csv','a', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Label','News'])
for row in articles:
writer.writerow(row)
for titles in urls:
get_titles(titles)
想这样来爬取1-9页的新闻标题,但是最后运行结果是这样
每增加一个新闻标题都会把之前的标题重复写入csv中。
求大神指导!!
原因是之前的articles列表没有清空,所以每一次都会输出之前的数据,只需要再后面置空一下列表变量就行了
articles = []就行了,希望可以帮到大家,然后把writer.writerow(['Label','News'])写出循环外就可以不用每次都有Label和News了