scrapyd
安装:
sudo pip install scrapyd
配置:
#文件~/.scrapyd.conf
#内容如下:
[scrapyd]
eggs_dir = /home/sirius/scrapyd/eggs
logs_dir = /home/sirius/scrapyd/logs
items_dir = /home/sirius/scrapyd/items
jobs_to_keep = 5
dbs_dir = /home/sirius/scrapyd/dbs
max_proc = 0
max_proc_per_cpu = 4
finished_to_keep = 50
poll_interval = 5
bind_address = 0.0.0.0
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
#daemonstatus.json = scrapyd.webservice.DaemonStatus
supervisor
守护进程,用这个的原因实在是因为scrapyd太脆弱了,一看不住就挂了
安装:
sudo pip install supervisor
配置:
sudo mkdir -p /etc/supervisor/
#导入默认配置
sudo su - root -c "echo_supervisord_conf > /etc/supervisor/supervisord.conf"
#链接管理
[inet_http_server] ; inet (TCP) server disabled by default
port=127.0.0.1:9001 ; (ip_address:port specifier, *:port for all iface)
;username=user ; (default is no username (open server))
;password=123 ; (default is no password (open server))
[supervisorctl]
;serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket
serverurl=http://127.0.0.1:9001 ; use an http:// url to specify an inet socket
;username=chris ; should be same as http_username if set
;password=123 ; should be same as http_password if set
;prompt=mysupervisor ; cmd line prompt (default "supervisor")
;history_file=~/.sc_history ; use readline history if available
#设置管理进程
[program:scrapyd]
command=scrapyd
autostart=true
autorestart=unexpected
启动
`创建文件/usr/lib/systemd/system/supervisord.service内容如下:
[Unit]
Description=supervisord - Supervisor process control system for UNIX
Documentation=http://supervisord.org
After=network.target
[Service]
Type=forking
ExecStart=/usr/bin/supervisord -c /etc/supervisor/supervisord.conf
ExecReload=/usr/bin/supervisorctl reload
ExecStop=/usr/bin/supervisorctl shutdown
User=<user>
[Install]
WantedBy=multi-user.target
#启动
sudo systemctl enable supervisord
sudo systemctl start supervisord
#查看
supervisorctl
#如一切正常
|>$ scrapyd RUNNING pid 8059, uptime 0:02:02
#常用命令
status #查看状态
reload #重新载入
restart scrapyd #重启任务
update #可以更新 supervisor 配置
tail -f scrapyd stderr #检查日志
爬虫部署:
部署:
cd <项目目录>
scrapyd-deploy
API控制:
curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。