pytorch unable to run inference with GPU - pytorch

I'm developing a project based on yolov7, but I started facing this error where torch recognizes my GPU but torchvision throws an Not Implemented Error.
This is the error
NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].
I tried installing torchvision with cuda built-in but that gave me the same error, also tried reinstalling pytorch , that didn't work either

the version of torch vision installed in my env was not equipped with cuda as it was a common install with pip with pip install torchvision whereas for torchvision to function with cuda it has to be equipped with cuda in-order for it to function with an Nvidia GPU to do so install torch with the following command conda install pytorch torchvision torchaudio pytorch-cuda={CUDA version} -c pytorch -c nvidia

Related

RuntimeError: cuDNN version incompatibility

I wrote an LSTM NLP classifier with PyTorch, in google colab and it worked well. Now, I run it on google colab pro, but I get this error:
RuntimeError: cuDNN version incompatibility: PyTorch was compiled against (8, 3, 2) but found runtime version (8, 0, 5). PyTorch already comes bundled with cuDNN. One option to resolving this error is to ensure PyTorch can find the bundled cuDNN.one possibility is that there is a conflicting cuDNN in LD_LIBRARY_PATH.
I have no idea how to fix this. I'm using GPU on colab pro.
I've tried this link and it didn't work.
How I declared device:
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Fixed via upgrading cuDNN to 8.4
reference (https://github.com/JaidedAI/EasyOCR/issues/716)
if you are using google colab uae this command
!pip install --upgrade torch torchvision

Segfault in pytorch on M1: torch.from_numpy(X).float()

I'm using an M1.
I'm trying to use pytorch for a conv net.
I have a numpy array that I'm trying to turn into a torch tensor.
When I call
torch.from_numpy(X)
pytorch throws an error that it got a double when it expected a float.
When I call
torch.from_numpy(X).float() on a friends computer, everything is fine.
But when I call this command on my computer, I get a segfault.
Has anyone seen this / know what might be happening / know how to fix?
What's your pytorch vision? I've encountered the same problem on my Macbook Pro M1, and my pytorch version is 1.12.0 at first. The I downgraded it to version 1.10.0 and the problem is solved. I suspect this has something to do with the compatibility with M1 in newer torch versions.
Actually I first uninstalled torch using pip3 uninstall torch and then reinstalled with pip3 install torch==1.10.0
But if you are using torchvision or some other affiliated packages, you may also need to downgrade them too.

How to run Caffe2 on Macbook Pro M1 GPU

I was able to run PyTorch with Macbook Pro M1 Max GPU. However Caffe2 does not use the GPUs.
import torch
torch.device("mps")
from caffe2.python import core
WARNING:root:This caffe2 python run failed to load cuda module:No module named 'caffe2.python.caffe2_pybind11_state_gpu',and AMD hip module:No module named 'caffe2.python.caffe2_pybind11_state_hip'.Will run in CPU only mode.
I created the PyTorch and Caffe2 from the nightly code using
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
BUILD_CAFFE2=1 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
Any suggestions on how to solve this?

Pytorch Training; "Runtime Error:PyTorch and torchvision versions are incompatible ..."

SOLUTION at the bottom!
I want to do Object Detection with this tutorial:
https://towardsdatascience.com/building-your-own-object-detector-pytorch-vs-tensorflow-and-how-to-even-get-started-1d314691d4ae
Although I have compatible versions of Pytorch, Torchvision and Cuda:
conda list torch gives me:
I get the following RunTime Error at the bottom:
RuntimeError: Couldn't load custom C++ ops. This can happen if your
PyTorch and torchvision versions are incompatible, or if you had
errors while compiling torchvision from source. For further
information on the compatible versions, check
https://github.com/pytorch/vision#installation for the compatibility
matrix. Please check your PyTorch version with torch__version__ and
your torchvision version with torchvision__version__ and verify if
they are compatible, and if not please reinstall torchvision so that
it matches your PyTorch install.
when running:
num_epochs = 10
for epoch in range(num_epochs):
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)#.to_fp16()
lr_scheduler.step()
evaluate(model, data_loader_test, device=device)
Is it really an error resulting from incompatibility of pytorch and torchvision?
Thank you very much.
SOLUTION:
I imported torchvision from the wrong directory. I found out using following:
import torchvision
print(torchvision.__path__)

How do I use a previous version of Keras (0.3.1) on Colaboratory?

I tried pip installing 0.3.1, but when I print the version it outputs 2.1.4.
!pip install keras==0.3.1
import keras
print keras._version__
I am trying to train deepmask (https://github.com/abbypa/NNProject_DeepMask/) for which I specifically need 0.3.1.
Note that if you've already loaded keras, then the second import statement has no effect.
So first !pip install keras==0.3.1, then restart your kernel (ctrl-m . or Runtime -> Restart runtime) and then things should work as expected.

Resources