ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)
问题
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)
出现这个错误的情况是,在服务器上的docker中运行训练代码时,batch size设置得过大,shared memory不够(因为docker限制了shm).
根据PyTorch README:
Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.
解决方案
1.这里说明PyTorch的IPC会利用共享内存,所以共享内存必须足够大,可以通过docker run --shm-size
进行修改
2.通过设置 --ipc=host
3.将Dataloader的num_workers设置为0.但训练会变慢
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。