RuntimeError: cuDNN version incompatibility - pytorch

I wrote an LSTM NLP classifier with PyTorch, in google colab and it worked well. Now, I run it on google colab pro, but I get this error:
RuntimeError: cuDNN version incompatibility: PyTorch was compiled against (8, 3, 2) but found runtime version (8, 0, 5). PyTorch already comes bundled with cuDNN. One option to resolving this error is to ensure PyTorch can find the bundled cuDNN.one possibility is that there is a conflicting cuDNN in LD_LIBRARY_PATH.
I have no idea how to fix this. I'm using GPU on colab pro.
I've tried this link and it didn't work.
How I declared device:
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Fixed via upgrading cuDNN to 8.4
reference (https://github.com/JaidedAI/EasyOCR/issues/716)
if you are using google colab uae this command
!pip install --upgrade torch torchvision

Related

pytorch unable to run inference with GPU

I'm developing a project based on yolov7, but I started facing this error where torch recognizes my GPU but torchvision throws an Not Implemented Error.
This is the error
NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].
I tried installing torchvision with cuda built-in but that gave me the same error, also tried reinstalling pytorch , that didn't work either
the version of torch vision installed in my env was not equipped with cuda as it was a common install with pip with pip install torchvision whereas for torchvision to function with cuda it has to be equipped with cuda in-order for it to function with an Nvidia GPU to do so install torch with the following command conda install pytorch torchvision torchaudio pytorch-cuda={CUDA version} -c pytorch -c nvidia

Segfault in pytorch on M1: torch.from_numpy(X).float()

I'm using an M1.
I'm trying to use pytorch for a conv net.
I have a numpy array that I'm trying to turn into a torch tensor.
When I call
torch.from_numpy(X)
pytorch throws an error that it got a double when it expected a float.
When I call
torch.from_numpy(X).float() on a friends computer, everything is fine.
But when I call this command on my computer, I get a segfault.
Has anyone seen this / know what might be happening / know how to fix?
What's your pytorch vision? I've encountered the same problem on my Macbook Pro M1, and my pytorch version is 1.12.0 at first. The I downgraded it to version 1.10.0 and the problem is solved. I suspect this has something to do with the compatibility with M1 in newer torch versions.
Actually I first uninstalled torch using pip3 uninstall torch and then reinstalled with pip3 install torch==1.10.0
But if you are using torchvision or some other affiliated packages, you may also need to downgrade them too.

Pytorch Training; "Runtime Error:PyTorch and torchvision versions are incompatible ..."

SOLUTION at the bottom!
I want to do Object Detection with this tutorial:
https://towardsdatascience.com/building-your-own-object-detector-pytorch-vs-tensorflow-and-how-to-even-get-started-1d314691d4ae
Although I have compatible versions of Pytorch, Torchvision and Cuda:
conda list torch gives me:
I get the following RunTime Error at the bottom:
RuntimeError: Couldn't load custom C++ ops. This can happen if your
PyTorch and torchvision versions are incompatible, or if you had
errors while compiling torchvision from source. For further
information on the compatible versions, check
https://github.com/pytorch/vision#installation for the compatibility
matrix. Please check your PyTorch version with torch__version__ and
your torchvision version with torchvision__version__ and verify if
they are compatible, and if not please reinstall torchvision so that
it matches your PyTorch install.
when running:
num_epochs = 10
for epoch in range(num_epochs):
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)#.to_fp16()
lr_scheduler.step()
evaluate(model, data_loader_test, device=device)
Is it really an error resulting from incompatibility of pytorch and torchvision?
Thank you very much.
SOLUTION:
I imported torchvision from the wrong directory. I found out using following:
import torchvision
print(torchvision.__path__)

RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS

I am running code that UI downloaded from github. It is supposed to be working (I saw that other people managed to activate it). When I try to run it I get the following error message:
RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS
The code uses pytorch 0.4.1. I have cuda installed.
When I run the command cat /usr/local/cuda/version.txt
I get the answer:
CUDA Version 10.0.130
When I run the command conda list -n <my env name>
I see:
cudatoolkit ver 9.0
cudnn ver 7.6.5
And now, my question:
What should I do to avoid this error?
Do I need to use pip install for a more recent version of cudnn? If so, which one?
I also faced the same issue. In my case, the PyTorch version was 0.4.1, and the Cuda version was 9.0. I solved the issue by adding this piece of code:
torch.backends.cudnn.benchmark = True
try this
if torch.cuda.is_available():
device = torch.device("cuda")
print("working on gpu")
else:
device = torch.device("cpu")
print("working on cpu")

module 'tensorflow_hub' has no attribute 'KerasLayer'

When I'm trying to retrain the model with tensorflow it shows an error:
**error module 'tensorflow_hub' has no attribute 'KerasLayer'**
The code is:
print("Building model with", MODULE_HANDLE)
model = tf.keras.Sequential([
hub.KerasLayer(MODULE_HANDLE, output_shape=[FV_SIZE],
trainable=do_fine_tuning),
tf.keras.layers.Dropout(rate=0.2),
tf.keras.layers.Dense(train_generator.num_classes,
activation='softmax',
kernel_regularizer=tf.keras.regularizers.l2(0.0001))
])
model.build((None,)+IMAGE_SIZE+(3,))
model.summary()
The error is like:
1 print("Building model with", MODULE_HANDLE)
2 model = tf.keras.Sequential([
----> 3 hub.KerasLayer(MODULE_HANDLE, output_shape=[FV_SIZE],
4 trainable=do_fine_tuning),
5 tf.keras.layers.Dropout(rate=0.2),
AttributeError: module 'tensorflow_hub' has no attribute 'KerasLayer'
by using the tensorflow hub retrain the previous hub model by adding new dence fully connected layers.when run the code it show the above error.is any have idea about that.please help
Please check the tensorflow version. It should be a recent nightly version.
When I use a version like 1.13.1, I see the following warning before the error, no attribute 'KerasLayer':
W0423 20:04:16.453974 139707130586880 __init__.py:56] Some hub symbols are not available because TensorFlow version is less than 1.14
After, doing pip install "tf-nightly", everything works fine.
https://www.tensorflow.org/hub
For the BatchNormalizationv1 issue, you can use tf2.0 nightly which should also take care of the original issue.
pip install -U tf-nightly-2.0-preview
https://github.com/tensorflow/tfjs/issues/1255
hub.KerasLayer works with TF2 pre releases:
pip install tf-nightly-2.0-preview --quiet
pip install tensorflow==2.0.0-alpha
pre-release candidate for GPU:
pip install -U --pre tensorflow-gpu

Resources