带有对 HTTP 存档支持的 shot-scraper 1.6

发布于 2 月 14 日

H1：shot-scraper 1.6 发布及 HTTP Archives 支持

发布了 shot-scraper CLI 工具的 1.6 版本，用于截取屏幕截图和抓取网页。
H2：新特性 - HTTP Archive (HAR) 支持
新增shot-scraper har 命令，可创建页面及其所有依赖项的存档，如shot-scraper har https://datasette.io/，生成datasette-io.har文件（约 163KB），是表示渲染该页面所用全部请求的 JSON 格式，可在此查看，JSON 包含所有响应的完整副本，二进制文件如图片则为 base64 编码。
可添加--zip标志获取datasette-io.har.zip文件，包含har.har中的 JSON 数据和作为单独文件保存的响应体。
H2：shot-scraper multi 命令
shot-scraper multi命令可按顺序对多个 URL 运行shot-scraper，通过 YAML 文件指定，现在可使用--har（或--har-zip或--har-file name-of-file）选项，文档中描述，同时截取屏幕截图并生成 HAR。
之前 shots 通常在 YAML 中定义，如- output: example.com.png url: http://www.example.com/ - output: w3c.org.png url: https://www.w3.org/，现在可省略output:键，不截取屏幕截图而生成 HAR 文件，如- url: http://www.example.com/ - url: https://www.w3.org/，运行shot-scraper multi shots.yml --har，输出Skipping screenshot... Wrote to HAR file: trace.har。
H3：技术实现
shot-scraper基于 Playwright，新特性使用browser.new_context(record_har_path=...)参数。

阅读 9