Torch.cuda.is_available() keeps switching to False - pytorch

I have tried several solutions which hinted at what to do when the CUDA GPU is available and CUDA is installed but the Torch.cuda.is_available() returns False. They did help but only temporarily, meaning torch.cuda-is_available() reported True but after some time, it switched back to False. I use CUDA 9.0.176 and GTX 1080. What should I do to get the permanent effect?
I tried the following methods:
https://forums.fast.ai/t/torch-cuda-is-available-returns-false/16721/5
https://github.com/pytorch/pytorch/issues/15612
Note: When torch.cuda.is_available() works fine but then at some point switches to False, then I have to restart the computer and then it works again (for some time).

The reason for torch.cuda.is_available() resulting False is the incompatibility between the versions of pytorch and cudatoolkit.
As on Jun-2022, the current version of pytorch is compatible with cudatoolkit=11.3 whereas the current cuda toolkit version = 11.7. Source
Solution:
Uninstall Pytorch for a fresh installation. You cannot install an old version on top of a new version without force installation (using pip install --upgrade --force-reinstall <package_name>.
Run conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch to install pytorch.
Install CUDA 11.3 version from https://developer.nvidia.com/cuda-11.3.0-download-archive.
You are good to go.

Also with torch.cuda.is_available () had false.
But when installing the Nvidia driver to the most updated version 436.48, True is displayed. I previously updated Pytorch to 1.2.0.
I have windows 10 and Anaconda.

Install CUDA 9.1 using apt-get, following the instructions in this link:
https://cryptoandcoffee.com/mining-gems/cuda-9-0-install-ubuntu-16-04-apt-get/
Installed PyTorch using pip:
pip install torchvision ( this will install both torch and torchvision )
Rebooted
Now try it:
~$ python -c 'import torch; print torch.cuda.is_available()'

I saw this issue as well. The reason was the CUDA version used by Pytorch being out of sync with the installed Nvidia driver. As in Joe's answer, the solution was updating the Nvidia drivers. Some other important background info to be aware of:
Each release of CUDA requires a minimum Nvidia driver version (see here for a compatibility table).
You can check your Nvidia driver version with nvidia-smi.
Pytorch comes pre-packaged with a version of CUDA that may be different from the version you installed on your computer.
The CUDA version that you installed manually is the one shows up when you run nvidia-smi. Even if your driver version is compatible with this CUDA version, it may be incompatible with the Pytorch CUDA version.
You can get the Pytorch CUDA version by printing the torch.version.cuda variable in ipython or in a Python program. This is the version that determines the needed Nvidia driver version.

Related

PyTorch with GPU is not working but working fine with CPU

My codes were working fine before but suddenly they stop working, without any error or warning.
this is the setting they were working fine.
After that, I have tried multiple options below
My current settings are in 1 env:
torch 1.10.0+cu113
torch-cluster 1.5.9
torch-geometric 2.0.1
torch-scatter 2.0.9
torch-sparse 0.6.12
torch-spline-conv 1.2.1
torchaudio 0.10.0+cu113
torchvision 0.11.1+cu113
but nothing worked. Even I asked my server admin to create a new account for me. he created a new account and I just installed the below packages. These are the total packages installed in my conda env.
installation by
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f download.pytorch.org/whl/torch_stable.html
Package Version
----------------- ------------
certifi 2021.10.8
numpy 1.21.4
Pillow 8.4.0
pip 21.2.2
setuptools 58.0.4
torch 1.9.0+cu111
torchaudio 0.9.0
torchvision 0.10.0+cu111
typing_extensions 4.0.1
wheel 0.37.0
Here are the results: My code is always stuck here.
What could be a possible reason?, while my labmate with same setting and same server executed the same code on his user profile, it was ok.
Further Details:
torch.cuda.is_available()
>>> True
torch.cuda.current_device()
>>> 0
torch.cuda.device(0)
>>> <torch.cuda.device at 0x7fb4e8baa650>
torch.cuda.device_count()
>>> 4
torch.cuda.get_device_name(0)
>>> 'GeForce RTX 3090'
I have the same type of problems, but with RTX 3060. I think the problem is the torch version.
Using torch == 1.11.0, I can move tensors to GPU, but with pastest versions can't do this.
Torch Geometric don't use torch=1.11.0 at the time I'm writing.
NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

appropriate cuda version for installing pytorch using anaconda?

I have anaconda3 installed on my windows10 system and I want to install pytorch using anaconda.
I was looking at the official website
https://pytorch.org/get-started/locally/
where I see only the versions 10.2 and 11.1 for cuda are written in the selector. Now, I know I have a cuda-capable gpu in my computer, and the version is NVIDIA GeForce GTX 1660 Ti.
I was also checking nvidia website for finding out the compute capability of my gpu so that I can select the appropriate one in the installation. Here's the link:
https://developer.nvidia.com/cuda-gpus#compute
As you see, the compute capability of all the versions is at most 8.7 ( Mine was 7.5 )
Does this mean I can't use this command for installing pytorch?
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge
Or can I use this command? and they won't cause a problem?

How to tell PyTorch which CUDA version to take?

I have two version of CUDA installed on my Ubuntu 16.04 machine: 9.0 and 10.1.
They are located in /usr/local/cuda-9.0 and /usr/local/10.1 respectively.
If I install PyTorch 1.6.0 (which needs CUDA 10.1) via pip (pip install torch==1.6.0), it uses version 9.0 and thus detects no GPUs. I already changed my LD_LIBRARY_PATH to "/usr/local/cuda-10.1/lib64:/usr/local/cuda-10.1/cuda/extras/CUPTI/lib64" but PyTorch is still using CUDA 9.0.
How do I tell PyTorch to use CUDA 10.1?
Prebuilt wheels for torch built with different versions of CUDA are available at torch stable releases page. For example you can install torch v1.9.0 built with CUDA v11.1 like this:
pip install --upgrade torch==1.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
But not all the combinations are available.

Can I train in tensorflow with separate CUDA version in anaconda environment

I need to train a model in TensorFlow-gpu==2.3.0 which needs the CUDA version to be 10.1. But when I type 'nvidia-smi' it shows CUDA version to be 10.0.
I created a conda environment using, "conda create -n tf2-gpu tensorflow-gpu cudatoolkit=10.1"
after initiating training, it throws an error as tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
How can I train using tensorflow-gpu in conda environment with another version of CUDA? And, I still need CUDA 10.0 to be there, as it helps my other training setup.
Yes, you can create two virtual environments in Anaconda with different tensorflow version. But CUDA and CuDNN will be installing compatible to that specified tensorflow-gpu.
You can find tensorflow-gpu build configuration details here to check supporting CUDA and cuDNN version.
Please check this similar issue link to create virtual environment in anaconda and to install specific tensorflow-gpu.

PyTorch having trouble detecting CUDA

I am running CNN on PyTorch. The torch.cuda.is_available() function returned false and no GPU is detected. However, I can run Keras model with GPU. Here is my system information:
OS: Ubuntu 18.04.3
Python 3.7.3 (Conda)
GPU: GTX1080Ti
Nvidia driver: 430.50
When I check nvidia-smi, the output said that the CUDA version is 10.1. However, the nvcc -V command tells me that it is CUDA 9.1.
I downloaded NVIDIA-Linux-x86_64-430.50.run from the official site and install it with command line. I installed CUDA 10.1 using these following command line recommended by the official site:
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
sudo sh cuda_10.1.243_418.87.00_linux.run
I installed PyTorch through pip install. What is wrong? Thanks in advance!
The default Pytorch 1.2 package depends on CUDA 10.0, but you have CUDA 9.1. The output of nvidia-smi just tells you the maximum CUDA version your GPU supports, nvcc gives the CUDA installed on your system. It seems that your installation of CUDA 10.1 was unsuccessful.
In addition to CUDA 10.0, Pytorch also supports CUDA 9.2 and I've found that the Pytorch package compiled for CUDA 10.0 also works with CUDA 10.1. So you can either upgrade your CUDA installation to 9.2 and install the Pytorch CUDA 9.2 package with
pip3 install torch==1.2.0+cu92 torchvision==0.4.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
Or get a working installation of CUDA 10.1. There are detailed Linux instructions here. (Note that you may have to remove previous installations of CUDA before installing a new one.)
FYI, this answer is a hack which could mess up your conda env, but may work more easily than installing a fresh env. A consistency-checking tool would be really helpful because of all the people having exactly this problem. Matching anaconda's CUDA version with the system driver and the actual hardware and the other system environment settings is challenging to say the least and almost an art.
I found that Anaconda improperly guesses the CUDA version to use frequently. So I have found the best way to fix this is to surgically uninstall and reinstall just pytorch with pip:
pip uninstall torch
pip install torch
Note that pip calls pytorch torch while conda calls it pytorch.
However, I also found that pip sometimes refuses to reinstall torch because it didn't get rid of the anaconda site package files. If that is the case you can very carefully remove them manually as:
rm -fr $HOME/miniconda3/envs/<ENV>/lib/python3.9/site-packages/torch/
rm -fr $HOME/miniconda3/envs/<ENV>/lib/python3.9/site-packages/torch-*.dist-info/
where should be replaced with your environment name and miniconda might be anaconda or something else depending on your installation.
Be very careful not to delete anything other than the torch files or you may mess something else up. Then you would be best served by installing yet another fresh environment.
After this pip install torch should work and torch.cuda.is_available() should return True. Unless there is another problem... YMMV.
Note that I recommend using miniconda because the full anaconda comes overloaded with packages and I find it quickly gets clogged and broken.

Resources