问题描述:
GPU memory不能释放,即使进程kill之后。如下:
解决方法:
- reset GPU core
sudo nvidia-smi --gpu-reset -i 5
若出现以下error:
GPU 00000000:B2:00.0 is currently in use by another process.
1 device is currently being used by one or more other processes (e.g., Fabric Manager, CUDA application, graphics application such as an X server, or a monitoring application such as another instance of nvidia-smi). Please first kill all processes using this device and all compute applications running in the system.
2.搜索并kill还在占用GPU的进程
ls /proc/*/fd/* -l | grep /dev/nvidia5
问题解决。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。