After installing PyTorch as per the official command:
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch, my cuDNN version shown in conda list is pytorch 1.7.1 py3.8_cuda10.2.89_cudnn7.6.5_0 pytorch whereas my system has cudnn8.5.0.
Does it have an affect on how we train models?
TLDR; Probably no, but depends on the difference between versions.
Explanation
In reality upgrades (like what you have conda cudnn7.6.5_0 -> cudnn8.5.0 of the system) usually don't harm training because versions are backward compatible for a while. After a while, things get deprecated though (years probably), so you should try to not totally make this absurdly large, such that CUDA version uses operations that aren't implemented. You should be more interested in which minimal version is required from your GPU (like 3090's from Nvidia which requires CUDA 11.1 and above).
In reality, torch uses simple operation implementation of cudnn, and usually, they don't change that much. As Nvidia describes what cuDNN is:
The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
If you are interested to check the actual matrix support from Nvidia, check this website documentation and click on the version support matrix of a specific version. You can see that cuDNN 8.5.0 for CUDA 10.2 is supported by 8.5.0. If you think what py3.8_cuda10.2.89_cudnn7.6.5_0 means, it is saying that your cuda10.2.89 was compiled with the primitives available in cudnn7.6.5.
Also, consider that you also have to have the required version of Nvidia driver.
Related
I have a simulation written in Python which utilizes the GPU mainly through PyTorch operations, but in a couple of places I had to write a couple of (relatively simple) custom kernels via Numba's cuda library (using as_cuda_array() on the torch tensors to get a DeviceNDArray handle).
Iv'e now moved to an Apple machine with a M1 processor. It seems that the torch code can be easily edited to run on the Apple GPU, but Numba has no such option.
What would be the easiest option to rewrite the code to work on Apple silicone?
Introducing Accelerated PyTorch Training on Mac
https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac
Does anyone know whether the PyTorch v1.12 release will support AMD GPUs with Metal?
Does it have to be built from source?
Short answer: Yes.
Long answer:
https://github.com/pytorch/pytorch/issues/77867
I met a strange problem. I trained my model with one GPU(RTX Titan), and it doesn't converge. However, it worked well on two same GPUs with the same settings. There is nothing to do with the batch size. And I use the torch.fft and torch.Transformer layer. I use Python 3.8, Pytorch 1.71 and Cuda 10.1.
I need to use Tensorflow Object Detection API to make some classification connected with recognition.
My problem is that using the API for detection with a pretrained coco model takes too much time and for sure does not use the GPU. I checked my tensorflow-gpu installation on different scripts and it works fine, but when I use this model for detection I can only see increse in CPU usage.
I checked different version of tensorflow (1.12, 1.14), different combinations of CUDA Toolkit (9.0, 10.0) and CuDNN (7.4.2, 7.5.1, 7.6.1) but it is all the same, also tried it on both Windows 7 and Ubuntu 16.04, no difference. My project however requires much faster detection time.
System information:
System: Windows 7, Ubuntu 16.04
Tensorflow: 1.12, 1.14
GPU: GTX 970
Run following python code, if it detects GPU then you can use GPU for training otherwise there is some problem,
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
One more thing, just because your CPU is utilizing does not mean GPU is not at work. CPU always be busy, GPU should also spike when you are training.
Paste the output of above code in the comment if you are not sure about the output.
Edit: After chat with OP on comments, I see the suggested code, and it is using pretrained model, so no training happening here. You are using model and not training a new model. So no gpu is being used.
Has anyone had any luck being able to use PyTorch on AWS Lambda for feature extraction from images or just using the framework at all? I finally got PyTorch, numpy, and pillow zipped in a folder under the uncompressed size limit (which is actually around 262 MB) but I had to build PyTorch from source to do this. The problem I am having now is that Lambda has a very old version of gcc running on it (4.8.3) which is very buggy and missing whole header files altogether. I believe the Pytorch docs state you should be using at least gcc 7 or later but I'm hoping someone may have found a way around this? I built the source using gcc 7.5 but then when I tried to import torch Lambda obviously used it's installed version of 4.8.3 causing an error on import: Floating point exception (core dumped) which stems from the old version of gcc. Is there a possible solution around this? I've been at this for a day and a half now so any help would be great. I think the bottom line is I am facing this similar issue. Better yet does anyone have a Pytorch lambda layer I could use?
I was able to utilize the below layers for using pytorch on AWS Lambda:
arn:aws:lambda:AWS_REGION:934676248949:layer:pytorchv1-py36:1 PyTorch 1.0.1
arn:aws:lambda:AWS_REGION:934676248949:layer:pytorchv1-py36:2 PyTorch 1.1.0
Found these on Fastai production deployment page, thanks to Matt McClean