Using CUDA 11.x but getting error: Unknown CUDA arch (8.6) or GPU not supported - pytorch

I'm setting up a conda environment to use pytorch 1.4.0 (on Ubuntu 20.04.2), but getting the error message:
ValueError: Unknown CUDA arch (8.6) or GPU not supported
I know this has been asked before, but no answer fits my case. This answer suggests that the CUDA version is too old. However, I updated my CUDA version to the most recent, and get the same error message.
nvcc -V says I have CUDA 11 installed, and when I run nvidia-smi I get this info:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.84 Driver Version: 460.84 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
which, according to the NVIDIA docs, should work be compatible:
Another auxilliary question: What does the "8.6" in CUDA arch (8.6) represent?

Specific versions of PyTorch work only with specific versions of CUDA.
If you are using CUDA-11.1, you'll need a fairly recent version of PyTorch. You need to either upgrade your PyTorch, or downgrade your CUDA.

It seems you can grab PyTorch v1.4 for CUDA 10.0 from here:
pip install torch==1.4.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

Related

THC/THC.h: No such file or directory

I am trying to compile this with cuda support: https://github.com/CharlesShang/DCNv2 for a Project. But everytime I try it gives me this error message:
/THC.h: No such file or directory
9 | #include <THC/THC.h>
I am using:
Arch Linux with kernel version 6.1.4
GTX 1080
python 3.6
pytorch 1.2.0
torchvision 0.4.0
cudatoolkit 10.0
gcc 7.5
I thought it might be incompatible cuda and gcc versions, but I tried multiple combinations and none of them worked. At the moment I am using cuda 10.0 with gcc version 7.5 as it should be compatible.
Any help is greatly appreciated.

Tensorflow 2.2 and cudnn 8.0.3 not working together as they should. It still looks for cudnn 7.6.5 dll files

I have Tensorflow 2.2 and Cuda 10.1 with cuDnn 8.0.3
I am unable to run my scripts because it keeps looking for cuDnn 7 dll file: cudnn64_7.dll
I get the following:
Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
Even though I installed the newly published cuDnn 8.0.3 for Cuda 10.1 (see cuDNN 8.x support matrix)
I went back to cuDNN 7.6.5 but I was hoping to get the "5 times faster" cuDNN v8.0 as NVIDIA claims.
Any help or workarounds on how to get this done? Googling gets me literally less than 5 results! as it seems not many got to try the new 8.0.3 (the one for 10.1)
Had the same issue. 8.0.3 version is the current and latest supported version of the library for CUDA 10.1. However, tensorflow is build for the earlier version, so you have to use that instead.
To elaborate, if you check this page: https://www.tensorflow.org/install/source_windows#tested_build_configurations
+----------------------+----------------+-----------+-------------+-------+------+
| Version | Python version | Compiler | Build tools | cuDNN | CUDA |
+----------------------+----------------+-----------+-------------+-------+------+
| tensorflow_gpu-2.3.0 | 3.5-3.8 | MSVC 2019 | Bazel 3.1.0 | 7.6 | 10.1 |
+----------------------+----------------+-----------+-------------+-------+------+
so, unless you build the TF locally - you have to use the supported version of cudnn.
That being said, however, if you check latest TF releases:
https://github.com/tensorflow/tensorflow/releases
you will then see the following TensorFlow 2.4.0-rc1 note:
TensorFlow pip packages are now built with CUDA11 and cuDNN 8.0.2.
You can use the release candidate version of TF, but then, you also have to upgrade CUDA to 11 (I am guessing version 11.0 since no postfix is mentioned) and use the cuDNN v8.0.2 (July 24th, 2020), for CUDA 11.0.
Just tested - this setup works. You just have to make sure to install numpy version 1.19.3 in order to avoid the problem mentioned in these threads
RuntimeError: The current Numpy installation fails to pass a sanity check due to a bug in the windows runtime
https://developercommunity.visualstudio.com/content/problem/1207405/fmod-after-an-update-to-windows-2004-is-causing-a.html

Pytorch says that CUDA is not available (on Ubuntu)

I'm trying to run Pytorch on a laptop that I have. It's an older model but it does have an Nvidia graphics card. I realize it is probably not going to be sufficient for real machine learning but I am trying to do it so I can learn the process of getting CUDA installed.
I have followed the steps on the installation guide for Ubuntu 18.04 (my specific distribution is Xubuntu).
My graphics card is a GeForce 845M, verified by lspci | grep nvidia:
01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce 845M] (rev a2)
01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)
I also have gcc 7.5 installed, verified by gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
And I have the correct headers installed, verified by trying to install them with sudo apt-get install linux-headers-$(uname -r):
Reading package lists... Done
Building dependency tree
Reading state information... Done
linux-headers-4.15.0-106-generic is already the newest version (4.15.0-106.107).
I then followed the installation instructions using a local .deb for version 10.1.
Now, when I run nvidia-smi, I get:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 845M On | 00000000:01:00.0 Off | N/A |
| N/A 40C P0 N/A / N/A | 88MiB / 2004MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 982 G /usr/lib/xorg/Xorg 87MiB |
+-----------------------------------------------------------------------------+
and I run nvcc -V I get:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
I then performed the post-installation instructions from section 6.1, and so as a result, echo $PATH looks like this:
/home/isaek/anaconda3/envs/stylegan2_pytorch/bin:/home/isaek/anaconda3/bin:/home/isaek/anaconda3/condabin:/usr/local/cuda-10.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
echo $LD_LIBRARY_PATH looks like this:
/usr/local/cuda-10.1/lib64
and my /etc/udev/rules.d/40-vm-hotadd.rules file looks like this:
# On Hyper-V and Xen Virtual Machines we want to add memory and cpus as soon as they appear
ATTR{[dmi/id]sys_vendor}=="Microsoft Corporation", ATTR{[dmi/id]product_name}=="Virtual Machine", GOTO="vm_hotadd_apply"
ATTR{[dmi/id]sys_vendor}=="Xen", GOTO="vm_hotadd_apply"
GOTO="vm_hotadd_end"
LABEL="vm_hotadd_apply"
# Memory hotadd request
# CPU hotadd request
SUBSYSTEM=="cpu", ACTION=="add", DEVPATH=="/devices/system/cpu/cpu[0-9]*", TEST=="online", ATTR{online}="1"
LABEL="vm_hotadd_end"
After all of this, I even compiled and ran the samples. ./deviceQuery returns:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce 845M"
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 2004 MBytes (2101870592 bytes)
( 4) Multiprocessors, (128) CUDA Cores/MP: 512 CUDA Cores
GPU Max Clock rate: 863 MHz (0.86 GHz)
Memory Clock rate: 1001 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS
and ./bandwidthTest returns:
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce 845M
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 11.7
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 11.8
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 14.5
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
But after all of this, this Python snippet (in a conda environment with all dependencies installed):
import torch
torch.cuda.is_available()
returns False
Does anybody have any idea about how to resolve this? I've tried to add /usr/local/cuda-10.1/bin to etc/environment like this:
PATH=$PATH:/usr/local/cuda-10.1/bin
And restarting the terminal, but that didn't fix it. I really don't know what else to try.
EDIT - Results of collect_env for #kHarshit
Collecting environment information...
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.2
OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: Could not collect
Python version: 3.6
Is CUDA available: No
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce 845M
Nvidia driver version: 418.87.00
cuDNN version: Could not collect
Versions of relevant libraries:
[pip] numpy==1.18.5
[pip] pytorch-ranger==0.1.1
[pip] stylegan2-pytorch==0.12.0
[pip] torch==1.5.0
[pip] torch-optimizer==0.0.1a12
[pip] torchvision==0.6.0
[pip] vector-quantize-pytorch==0.0.2
[conda] numpy 1.18.5 pypi_0 pypi
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] stylegan2-pytorch 0.12.0 pypi_0 pypi
[conda] torch 1.5.0 pypi_0 pypi
[conda] torch-optimizer 0.0.1a12 pypi_0 pypi
[conda] torchvision 0.6.0 pypi_0 pypi
[conda] vector-quantize-pytorch 0.0.2 pypi_0 pypi
PyTorch doesn't use the system's CUDA library. When you install PyTorch using the precompiled binaries using either pip or conda it is shipped with a copy of the specified version of the CUDA library which is installed locally. In fact, you don't even need to install CUDA on your system to use PyTorch with CUDA support.
There are two scenarios which could have caused your issue.
You installed the CPU only version of PyTorch. In this case PyTorch wasn't compiled with CUDA support so it didn't support CUDA.
You installed the CUDA 10.2 version of PyTorch. In this case the problem is that your graphics card currently uses the 418.87 drivers, which only support up to CUDA 10.1. The two potential fixes in this case would be to either install updated drivers (version >= 440.33 according to Table 2) or to install a version of PyTorch compiled against CUDA 10.1.
To determine the appropriate command to use when installing PyTorch you can use the handy widget in the "Install PyTorch" section at pytorch.org. Just select the appropriate operating system, package manager, and CUDA version then run the recommended command.
In your case one solution was to use
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
which explicitly specifies to conda that you want to install the version of PyTorch compiled against CUDA 10.1.
For more information about PyTorch CUDA compatibility with respect drivers and hardware see this answer.
Edit After you added the output of collect_env we can see that the problem was that you had the CUDA 10.2 version of PyTorch installed. Based on that an alternative solution would have been to update the graphics driver as elaborated in item 2 and the linked answer.
TL; DR
Install NVIDIA Toolkit provided by Canonical or NVIDIA third-party PPA.
Reboot your workstation.
Create a clean Python virtual environment (or reinstall all CUDA dependent packages).
Description
First install NVIDIA CUDA Toolkit provided by Canonical:
sudo apt install -y nvidia-cuda-toolkit
or follow NVIDIA developers instructions:
# ENVARS ADDED **ONLY FOR READABILITY**
NVIDIA_CUDA_PPA=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/
NVIDIA_CUDA_PREFERENCES=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
NVIDIA_CUDA_PUBKEY=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
# Add NVIDIA Developers 3rd-Party PPA
sudo wget ${NVIDIA_CUDA_PREFERENCES} -O /etc/apt/preferences.d/nvidia-cuda
sudo apt-key adv --fetch-keys ${NVIDIA_CUDA_PUBKEY}
echo "deb ${NVIDIA_CUDA_PPA} /" | sudo tee /etc/apt/sources.list.d/nvidia-cuda.list
# Install development tools
sudo apt update
sudo apt install -y cuda
then reboot the OS load the kernel with the NVIDIA drivers
Create an environment using your favorite manager (conda, venv, etc)
conda create -n stack-overflow pytorch torchvision
conda activate stack-overflow
or reinstall pytorch and torchvision into the existing one:
conda activate stack-overflow
conda install --force-reinstall pytorch torchvision
otherwise NVIDIA CUDA C/C++ bindings may not be correctly detected.
Finally ensure CUDA is correctly detected:
(stack-overflow)$ python3 -c 'import torch; print(torch.cuda.is_available())'
True
Versions
NVIDIA CUDA Toolkit v11.6
Ubuntu LTS 20.04.x
Ubuntu LTS 22.04 (prior official release)
In my case, just restarting my machine made the GPU active again. The initial message I got was that the GPU is currently in use by another application. But when I looked at nvidia-smi, there was nothing that I saw. So, no changes to dependencies, and it just started working again.
Another possible scenario is that environment variable CUDA_VISIBLE_DEVICES is not set correctly before installing PyTorch.
In my case it worked to do as follows:
remove the CUDA drivers
sudo apt-get remove --purge nvidia*
Then get the exact installation script of the drivers based on your distro and system from the link: https://developer.nvidia.com/cuda-downloads?target_os=Linux
In my case it was dabian on x64 so I did:
wget https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda
And now nvidia-smi works as intended!
I hope that helps
If your CUDA version does not match what PyTorch expects, you will see this issue.
On Arch / Manjaro:
Get Pytorch from here: https://pytorch.org/get-started/locally/
Note what CUDA version you are getting PyTorch for
Get the same CUDA version from here: https://archive.archlinux.org/packages/c/cuda/
Install CUDA using (e.g.) sudo pacman -U --noconfirm cuda-11.6.2-1-x86_64.pkg.tar.zst
Do not update to a newer version of CUDA than PyTorch expects. If PyTorch wants 11.6 and you have updated to 11.7, you will get the error message.
Make sure that os.environ['CUDA_VISIBLE_DEVICES'] = '0' is set after if __name__ == "__main__":. So your code should look like this:
import torch
import os
if __name__ == "__main__":
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
print(torch.cuda.is_available()) // true
...

how to check if i have cuda installed, i came across three methods but they give me different results [duplicate]

I am very confused by the different CUDA versions shown by running which nvcc and nvidia-smi. I have both cuda9.2 and cuda10 installed on my ubuntu 16.04. Now I set the PATH to point to cuda9.2. So when I run
$ which nvcc
/usr/local/cuda-9.2/bin/nvcc
However, when I run
$ nvidia-smi
Wed Nov 21 19:41:32 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 Off | N/A |
| N/A 53C P0 26W / N/A | 379MiB / 6078MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1324 G /usr/lib/xorg/Xorg 225MiB |
| 0 2844 G compiz 146MiB |
| 0 15550 G /usr/lib/firefox/firefox 1MiB |
| 0 19992 G /usr/lib/firefox/firefox 1MiB |
| 0 23605 G /usr/lib/firefox/firefox 1MiB |
So am I using cuda9.2 as which nvcc suggests, or am I using cuda10 as nvidia-smi suggests? I saw this answer but it does not provide direct answer to the confusion, it just asks us to reinstall the CUDA Toolkit, which I already did.
CUDA has 2 primary APIs, the runtime and the driver API. Both have a corresponding version (e.g. 8.0, 9.0, etc.)
The necessary support for the driver API (e.g. libcuda.so on linux) is installed by the GPU driver installer.
The necessary support for the runtime API (e.g. libcudart.so on linux, and also nvcc) is installed by the CUDA toolkit installer (which may also have a GPU driver installer bundled in it).
In any event, the (installed) driver API version may not always match the (installed) runtime API version, especially if you install a GPU driver independently from installing CUDA (i.e. the CUDA toolkit).
The nvidia-smi tool gets installed by the GPU driver installer, and generally has the GPU driver in view, not anything installed by the CUDA toolkit installer.
Recently (somewhere between 410.48 and 410.73 driver version on linux) the powers-that-be at NVIDIA decided to add reporting of the CUDA Driver API version installed by the driver, in the output from nvidia-smi.
This has no connection to the installed CUDA runtime version.
nvcc, the CUDA compiler-driver tool that is installed with the CUDA toolkit, will always report the CUDA runtime version that it was built to recognize. It doesn't know anything about what driver version is installed, or even if a GPU driver is installed.
Therefore, by design, these two numbers don't necessarily match, as they are reflective of two different things.
If you are wondering why nvcc -V displays a version of CUDA you weren't expecting (e.g. it displays a version other than the one you think you installed) or doesn't display anything at all, version wise, it may be because you haven't followed the mandatory instructions in step 7 (prior to CUDA 11) (or step 6 in the CUDA 11 linux install guide) of the cuda linux install guide
Note that although this question mostly has linux in view, the same concepts apply to windows CUDA installs. The driver has a CUDA driver version associated with it (which can be queried with nvidia-smi, for example). The CUDA runtime also has a CUDA runtime version associated with it. The two will not necessarily match in all cases.
In most cases, if nvidia-smi reports a CUDA version that is numerically equal to or higher than the one reported by nvcc -V, this is not a cause for concern. That is a defined compatibility path in CUDA (newer drivers/driver API support "older" CUDA toolkits/runtime API). For example if nvidia-smi reports CUDA 10.2, and nvcc -V reports CUDA 10.1, that is generally not cause for concern. It should just work, and it does not necessarily mean that you "actually installed CUDA 10.2 when you meant to install CUDA 10.1"
If nvcc command doesn't report anything at all (e.g. Command 'nvcc' not found...) or if it reports an unexpected CUDA version, this may also be due to an incorrect CUDA install, i.e the mandatory steps mentioned above were not performed correctly. You can start to figure this out by using a linux utility like find or locate (use man pages to learn how, please) to find your nvcc executable. Assuming there is only one, the path to it can then be used to fix your PATH environment variable. The CUDA linux install guide also explains how to set this. You may need to adjust the CUDA version in the PATH variable to match your actual CUDA version desired/installed.
Similarly, when using docker, the nvidia-smi command will generally report the driver version installed on the base machine, whereas other version methods like nvcc --version will report the CUDA version installed inside the docker container.
Similarly, if you have used another installation method for the CUDA "toolkit" such as Anaconda, you may discover that the version indicated by Anaconda does not "match" the version indicated by nvidia-smi. However, the above comments still apply. Older CUDA toolkits installed by Anaconda can be used with newer versions reported by nvidia-smi, and the fact that nvidia-smi reports a newer/higher CUDA version than the one installed by Anaconda does not mean you have an installation problem.
Here is another question that covers similar ground. The above treatment does not in any way indicate that this answer is only applicable if you have installed multiple CUDA versions intentionally or unintentionally. The situation presents itself any time you install CUDA. The version reported by nvcc and nvidia-smi may not match, and that is expected behavior and in most cases quite normal.
nvcc is in the CUDA bin folder - as such check if the CUDA bin folder has been added to your $PATH.
Specifically, ensure that you have carried out the CUDA Post-Installation actions (e.g. from here):
Add the CUDA Bin to $PATH (i.e. add the following line to your ~/.bashrc)
export PATH=/usr/local/cuda-10.1/bin:/usr/local/cuda-10.1/NsightCompute-2019.1${PATH:+:${PATH}}
PS. Ensure the following two paths above, exist first: /usr/local/cuda-10.1/bin and /usr/local/cuda-10.1/NsightCompute-2019.1 (the NsightCompute path could have a slightly different ending depending on the version of Nsight compute installed...
Update $LD_LIBRARY_PATH (i.e. add the following line to your ~/bashrc).
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
After this, both nvcc and nvidia-smi (or nvtop) report the same version of CUDA...
If you are using cuda 10.2 :
export PATH=/usr/local/cuda-10.2/bin:/opt/nvidia/nsight-compute/2019.5.0${PATH:+:${PATH}}
might help because when I checked, there was no directory for nsight-compute in cuda-10.2.
I am not sure if this was just the problem with me or else why wouldn't they mention it in the official documentation.
Adding onto Robert Crovella's answer...
The difference between the device driver and the runtime driver is that, with device driver you will be able to run compiled CUDA C code. That is, you can download CUDA powered applications and they will be able to successfully execute their code on your GPU.
Whereas, with the runtime driver you will be able to able to compile the CUDA C code, which then will be executed with the help of the device driver on your GPU.
Section 2.2.3 - Cuda Development Toolkit
nvidia-smi can show a “different CUDA version” from the one that is reported by nvcc. Because they are reporting two different things:
nvidia-smi shows that maximum available CUDA version support for a given GPU driver.
And the 2nd thing which nvcc -V reports is the CUDA version that is currently being used by the system.
In short
nvidia-smi shows the highest version of CUDA supported by your driver. nvcc -V shows the version of the current CUDA installation. As long as your driver-supported version is higher than your installed version, it's fine. You can even have several versions of CUDA installed at the same time.

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

I have installed cuda-8.0 and cudnn5.1 on CentOS. Then, when importing tensorflow (python 3.6), it gives the error as above.
I have already set symbol link as below in /etc/profile. Are there any guys who occurred this kind of problem?
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH
Also, what makes me confused is that, I run nvcc -V, it shows
Cuda compilation tools, release 8.0, V8.0.61
However, when I run ./deviceQuery in folder /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery, on device 0: "Tesla M40", it shows
CUDA Driver Version / Runtime Version 9.1 / 8.0
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = Tesla M40
Check your version of tensorflow using "pip3 list | grep tensorflow" If it is of version tensorflow-gpu (1.5.0) then the required cuda version is 9.0 and cuDNN v7.
Look into the following link for more details:
https://github.com/tensorflow/tensorflow/releases
Tensorflow installation guide needs to be updated.
I had the same problem. Tensorflow 1.5.0 is precompiled to CUDA 9.0 (which is outdated; Sept 2017).
The newest CUDA version is CUDA 9.1 (Dec. 2017) and sudo pip install tensorflow-gpu will not work with the newest CUDA 9.1. There are two solutions to the problem:
1.) Install CUDA 9.0 next to CUDA 9.1 (this worked for me)
2.) Build Tensorflow by yourself from the git source code
Either way do not forget to add the PATH variables to your operating system, otherwise you receive the error message stated in the question from your python interpreter.

Resources