PyTorch GPU install with 11.2 CUDA - pytorch

How to install PyTorch if my CUDA version is 11.2 and CudNN version is 8.1.0?
Offical documentation at https://pytorch.org/get-started/locally/ suggests only CUDA 10.2 and 11.3 versions.
Should I try to install it with v10.2 with next command?
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
Won`t it have any incompatible version problems in the future?

As far as I can tell[1], PyTorch does not provide precompiled libraries for CUDA 11.2. You would have to compile it yourself. For that, read this section of PyTorch Github's README.
[1]: I think so because on a similar question posed on PyTorch forums, one of the moderators said the same thing - https://discuss.pytorch.org/t/want-to-install-pytorch-for-custom-cuda-version-cuda-11-2/141159/2
edit: You can install PyTorch compiled for CUDA v11.1. In my case, it seems to work without any problems.

Related

CUDA and CUDAtoolkit TensorFlow in wsl2 ubuntu 20.04 with gpu enabled

are CUDA and CUDAtoolkit different?
I installed CUDA using this tutorial
I ran the sample too and it worked but to run tensorflow using gpu Do I need to install CUDAtoolkit and cudnn along with the already exisiting CUDA.
while running the trainning job I am getting this message
2023-01-01 00:03:41.818376: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Tensorflow is very particular about the requirements and configurations for GPUs. You do need to install CUDNN and I believe CudaToolkit might be a requirement to install CUDNN.
I'd recommend following the official documentation to install tensorflow with gpu capabilities. You can see all of that here: https://www.tensorflow.org/install/pip
You can also look into Lambda Labs which has a pretty streamlined way of installing tensroflow GPU specifically on Ubuntu and adding the requirements to your apt repositories for automatic updating. https://lambdalabs.com/lambda-stack-deep-learning-software

Can I install pytorch cpu + any specified version of cudatoolkit?

My remote has cuda==11.0 and I want to install pytorch on it.
I use the command conda install pytorch cudatoolkit=11.0 -c pytorch -c conda-forge but in the installation list:
cudatoolkit conda-forge/linux-64::cudatoolkit-11.0.3-h15472ef_8
pytorch pytorch/linux-64::pytorch-1.10.0-py3.8_cpu_0
I found that pytorch is a cpu one.
Alternatively, I substitute 11.0 with 11.1 and the installation list appears to be:
cudatoolkit conda-forge/linux-64::cudatoolkit-11.1.1-h6406543_8
pytorch pytorch/linux-64::pytorch-1.10.0-py3.8_cuda11.1_cudnn8.0.5_0
where pytorch is a gpu one.
My question is: are the above two installation essentially same? If not, how can I install pytorch=1.10.0 with cuda==11.0?
I'd also like to know how does the cuda compatibility work? Is a cudatoolkit==11.1 compatible with programs compiled with cudatoolkit==11.0?
It all depends on whether the pytorch channel has built a version against the particular cudatoolkit version. I don't know a specific way to search this, but one can browse what builds are available on the pytorch channel. For PyTorch 1.10 on linux-64 platform it appears only CUDA versions 10.2, 11.1, and 11.3 are available.
As mentioned in the comments, one can try forcing a CUDA build of PyTorch with
conda create -n foo -c pytorch -c conda-forge cudatoolkit=11.0 'pytorch=*=*cuda*'
which would fail in this combination.
As for compatibility, no, the pytorch package builds lock in the minor version of cudatoolkit. For example,

appropriate cuda version for installing pytorch using anaconda?

I have anaconda3 installed on my windows10 system and I want to install pytorch using anaconda.
I was looking at the official website
https://pytorch.org/get-started/locally/
where I see only the versions 10.2 and 11.1 for cuda are written in the selector. Now, I know I have a cuda-capable gpu in my computer, and the version is NVIDIA GeForce GTX 1660 Ti.
I was also checking nvidia website for finding out the compute capability of my gpu so that I can select the appropriate one in the installation. Here's the link:
https://developer.nvidia.com/cuda-gpus#compute
As you see, the compute capability of all the versions is at most 8.7 ( Mine was 7.5 )
Does this mean I can't use this command for installing pytorch?
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge
Or can I use this command? and they won't cause a problem?

How to tell PyTorch which CUDA version to take?

I have two version of CUDA installed on my Ubuntu 16.04 machine: 9.0 and 10.1.
They are located in /usr/local/cuda-9.0 and /usr/local/10.1 respectively.
If I install PyTorch 1.6.0 (which needs CUDA 10.1) via pip (pip install torch==1.6.0), it uses version 9.0 and thus detects no GPUs. I already changed my LD_LIBRARY_PATH to "/usr/local/cuda-10.1/lib64:/usr/local/cuda-10.1/cuda/extras/CUPTI/lib64" but PyTorch is still using CUDA 9.0.
How do I tell PyTorch to use CUDA 10.1?
Prebuilt wheels for torch built with different versions of CUDA are available at torch stable releases page. For example you can install torch v1.9.0 built with CUDA v11.1 like this:
pip install --upgrade torch==1.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
But not all the combinations are available.

PyTorch having trouble detecting CUDA

I am running CNN on PyTorch. The torch.cuda.is_available() function returned false and no GPU is detected. However, I can run Keras model with GPU. Here is my system information:
OS: Ubuntu 18.04.3
Python 3.7.3 (Conda)
GPU: GTX1080Ti
Nvidia driver: 430.50
When I check nvidia-smi, the output said that the CUDA version is 10.1. However, the nvcc -V command tells me that it is CUDA 9.1.
I downloaded NVIDIA-Linux-x86_64-430.50.run from the official site and install it with command line. I installed CUDA 10.1 using these following command line recommended by the official site:
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
sudo sh cuda_10.1.243_418.87.00_linux.run
I installed PyTorch through pip install. What is wrong? Thanks in advance!
The default Pytorch 1.2 package depends on CUDA 10.0, but you have CUDA 9.1. The output of nvidia-smi just tells you the maximum CUDA version your GPU supports, nvcc gives the CUDA installed on your system. It seems that your installation of CUDA 10.1 was unsuccessful.
In addition to CUDA 10.0, Pytorch also supports CUDA 9.2 and I've found that the Pytorch package compiled for CUDA 10.0 also works with CUDA 10.1. So you can either upgrade your CUDA installation to 9.2 and install the Pytorch CUDA 9.2 package with
pip3 install torch==1.2.0+cu92 torchvision==0.4.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
Or get a working installation of CUDA 10.1. There are detailed Linux instructions here. (Note that you may have to remove previous installations of CUDA before installing a new one.)
FYI, this answer is a hack which could mess up your conda env, but may work more easily than installing a fresh env. A consistency-checking tool would be really helpful because of all the people having exactly this problem. Matching anaconda's CUDA version with the system driver and the actual hardware and the other system environment settings is challenging to say the least and almost an art.
I found that Anaconda improperly guesses the CUDA version to use frequently. So I have found the best way to fix this is to surgically uninstall and reinstall just pytorch with pip:
pip uninstall torch
pip install torch
Note that pip calls pytorch torch while conda calls it pytorch.
However, I also found that pip sometimes refuses to reinstall torch because it didn't get rid of the anaconda site package files. If that is the case you can very carefully remove them manually as:
rm -fr $HOME/miniconda3/envs/<ENV>/lib/python3.9/site-packages/torch/
rm -fr $HOME/miniconda3/envs/<ENV>/lib/python3.9/site-packages/torch-*.dist-info/
where should be replaced with your environment name and miniconda might be anaconda or something else depending on your installation.
Be very careful not to delete anything other than the torch files or you may mess something else up. Then you would be best served by installing yet another fresh environment.
After this pip install torch should work and torch.cuda.is_available() should return True. Unless there is another problem... YMMV.
Note that I recommend using miniconda because the full anaconda comes overloaded with packages and I find it quickly gets clogged and broken.

Resources