我正在尝试在我拥有的笔记本电脑上运行 Pytorch。这是一个较旧的型号,但它确实有一个 Nvidia 显卡。我意识到这可能不足以进行真正的机器学习,但我正在尝试这样做,以便我可以了解安装 CUDA 的过程。
我已按照 Ubuntu 18.04 安装指南 中的步骤进行操作(我的特定发行版是 Xubuntu)。
我的显卡是 GeForce 845M,通过 lspci | grep nvidia
验证:
01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce 845M] (rev a2)
01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)
我也安装了 gcc 7.5,由 gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
我安装了正确的标头,通过尝试使用 sudo apt-get install linux-headers-$(uname -r)
安装它们来验证:
Reading package lists... Done
Building dependency tree
Reading state information... Done
linux-headers-4.15.0-106-generic is already the newest version (4.15.0-106.107).
然后我按照安装说明使用 10.1 版的本地 .deb。
Npw,当我运行 nvidia-smi
时,我得到:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 845M On | 00000000:01:00.0 Off | N/A |
| N/A 40C P0 N/A / N/A | 88MiB / 2004MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 982 G /usr/lib/xorg/Xorg 87MiB |
+-----------------------------------------------------------------------------+
我运行 nvcc -V
我得到:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
然后我执行了 第 6.1 节 中的安装后说明,因此, echo $PATH
看起来像这样:
/home/isaek/anaconda3/envs/stylegan2_pytorch/bin:/home/isaek/anaconda3/bin:/home/isaek/anaconda3/condabin:/usr/local/cuda-10.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
echo $LD_LIBRARY_PATH
看起来像这样:
/usr/local/cuda-10.1/lib64
而我的 /etc/udev/rules.d/40-vm-hotadd.rules
文件如下所示:
# On Hyper-V and Xen Virtual Machines we want to add memory and cpus as soon as they appear
ATTR{[dmi/id]sys_vendor}=="Microsoft Corporation", ATTR{[dmi/id]product_name}=="Virtual Machine", GOTO="vm_hotadd_apply"
ATTR{[dmi/id]sys_vendor}=="Xen", GOTO="vm_hotadd_apply"
GOTO="vm_hotadd_end"
LABEL="vm_hotadd_apply"
# Memory hotadd request
# CPU hotadd request
SUBSYSTEM=="cpu", ACTION=="add", DEVPATH=="/devices/system/cpu/cpu[0-9]*", TEST=="online", ATTR{online}="1"
LABEL="vm_hotadd_end"
毕竟,我什至编译并运行了示例。 ./deviceQuery
返回:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce 845M"
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 2004 MBytes (2101870592 bytes)
( 4) Multiprocessors, (128) CUDA Cores/MP: 512 CUDA Cores
GPU Max Clock rate: 863 MHz (0.86 GHz)
Memory Clock rate: 1001 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS
和 ./bandwidthTest
返回:
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce 845M
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 11.7
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 11.8
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 14.5
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
但毕竟,这个 Python 片段(在安装了所有依赖项的 conda 环境中):
import torch
torch.cuda.is_available()
返回 False
有人知道如何解决这个问题吗?我尝试将 /usr/local/cuda-10.1/bin
添加到 etc/environment
如下所示:
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games"
PATH=$PATH:/usr/local/cuda-10.1/bin
并重新启动终端,但这并没有解决它。我真的不知道还能尝试什么。
编辑 - @kHarshit 的 collect_env 的结果
Collecting environment information...
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.2
OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: Could not collect
Python version: 3.6
Is CUDA available: No
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce 845M
Nvidia driver version: 418.87.00
cuDNN version: Could not collect
Versions of relevant libraries:
[pip] numpy==1.18.5
[pip] pytorch-ranger==0.1.1
[pip] stylegan2-pytorch==0.12.0
[pip] torch==1.5.0
[pip] torch-optimizer==0.0.1a12
[pip] torchvision==0.6.0
[pip] vector-quantize-pytorch==0.0.2
[conda] numpy 1.18.5 pypi_0 pypi
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] stylegan2-pytorch 0.12.0 pypi_0 pypi
[conda] torch 1.5.0 pypi_0 pypi
[conda] torch-optimizer 0.0.1a12 pypi_0 pypi
[conda] torchvision 0.6.0 pypi_0 pypi
[conda] vector-quantize-pytorch 0.0.2 pypi_0 pypi
原文由 wfgeo 发布,翻译遵循 CC BY-SA 4.0 许可协议
PyTorch 不使用系统的 CUDA 库。当您使用预编译的二进制文件安装 PyTorch 时,使用
pip
或conda
它附带了本地安装的指定版本的 CUDA 库的副本。事实上,您甚至不需要在系统上安装 CUDA 即可使用支持 CUDA 的 PyTorch。有两种情况可能导致您的问题。
您安装了仅 CPU 版本的 PyTorch。在这种情况下,PyTorch 没有使用 CUDA 支持进行编译,因此它不支持 CUDA。
您安装了 PyTorch 的 CUDA 10.2 版本。在这种情况下,问题在于您的显卡当前使用的是 418.87 驱动程序,该驱动程序仅支持 CUDA 10.1。在这种情况下,两个潜在的修复方法是安装更新的驱动程序(根据 表 2 的版本 >= 440.33)或安装针对 CUDA 10.1 编译的 PyTorch 版本。
要确定安装 PyTorch 时要使用的适当命令,您可以使用 pytorch.org 上“安装 PyTorch ”部分中的方便小部件。只需选择适当的操作系统、包管理器和 CUDA 版本,然后运行推荐的命令。
在您的情况下,一种解决方案是使用
它向 conda 明确指定您要安装针对 CUDA 10.1 编译的 PyTorch 版本。
有关 PyTorch CUDA 与驱动程序和硬件的兼容性的更多信息,请参阅 此答案。
编辑 添加
collect_env
的输出后,我们可以看到问题在于您安装了 CUDA 10.2 版本的 PyTorch。基于此,另一种解决方案是更新图形驱动程序,如第 2 项和链接答案中所述。