cuDNN version incompatibility - pytorch

Exception has occurred: RuntimeError
cuDNN version incompatibility: PyTorch was compiled against (8, 5, 0) but found runtime version (8, 3, 3). PyTorch already comes bundled with cuDNN. One option to resolving this error is to ensure PyTorch can find the bundled cuDNN.
`File "/data/home/hemantmishra/Co-GAT/main.py", line 56, in <module>`
`model = model.cuda()`
How to solve this?

Related

RuntimeError: cuDNN version incompatibility

I wrote an LSTM NLP classifier with PyTorch, in google colab and it worked well. Now, I run it on google colab pro, but I get this error:
RuntimeError: cuDNN version incompatibility: PyTorch was compiled against (8, 3, 2) but found runtime version (8, 0, 5). PyTorch already comes bundled with cuDNN. One option to resolving this error is to ensure PyTorch can find the bundled cuDNN.one possibility is that there is a conflicting cuDNN in LD_LIBRARY_PATH.
I have no idea how to fix this. I'm using GPU on colab pro.
I've tried this link and it didn't work.
How I declared device:
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Fixed via upgrading cuDNN to 8.4
reference (https://github.com/JaidedAI/EasyOCR/issues/716)
if you are using google colab uae this command
!pip install --upgrade torch torchvision

Error in loading ONNX model with ONNXRuntime

I'm converting a customized Pytorch model to ONNX. However, when loading it with ONNXRuntime, I've encountered an error as follows:
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: ...onnxruntime/core/providers/cpu/tensor/transpose.h:46 onnxruntime::TransposeBase::TransposeBase(const onnxruntime::OpKernelInfo &) v >= 0 && static_cast<uint64_t>(v) <= std::numeric_limits<size_t>::max() was false.
I've checked with onnx.checker.check_model() and it's totally fine.
I've also tried to replace transpose() into permute() in forward() function but the error has still remained.
Is anyone familiar with this error?
Environments:
Python 3.7
Pytorch 1.9.0
CUDA 10.2
ONNX 1.10.1
ONNXRuntime 1.8.1
OS Ubuntu 18.04
The perm attribute of node Transpose_52 is [-1, 0, 1] although ONNX Runtime requires that all of them should be positive: onnxruntime/core/providers/cpu/tensor/transpose.h#L46

OSerror: loading a h5 saved model in tensorflow keras after updating the environment in anaconda on windows with python 3.7

I am recieving an OSerror (withouth any other text) from h5py when loading an h5 model created with keras- tensorflow after updating my enviroment, or working with an up-to-date environment.
I trained some models with keras and tf in the older versions, and also with keras-tf v1.15 and saved them using the model.save('filename.h5') code. Afterwards i am able to load them and work with them further using before the keras.load_model, and now tensorflow.keras.models.load_model without any problems but recieving some warnings that my tf version was not compiled to use the avx2 instructions and so.
The version installed is tensorflow 1.15 using pip install tensorflow-cpu and it seems to work well, my enviroment installed is Anaconda3-2020.02-Windows-x86_64 installed from the anaconda binaries on Windows.
After trying to change the packages to tensorflow-mkl, and needing to update my enviroment because of enviromental conflicts (shows even with the fresh install of anaconda) the OSerror raised by h5py appears.
Using the default enviromental packages from the anaconda binary with tf-cpu seems to work fine, either by cloning the environment. When updating the environment with conda update --all it raises the error either with tfc-cpu or tf-mkl.
The version of h5py in both cases is: '2.10.0' and the error is the following:
Traceback (most recent call last):
File "C:\Users\Oscar\bwSyncAndShare\OPT_PV22WP_intern\pv2wp_control\SIM\Sim_future.py", line 88, in <module>
model = load_model(pathfile_model)
File "C:\Users\Oscar\anaconda3\envs\optimizer2\lib\site-packages\tensorflow_core\python\keras\saving\save.py", line 142, in load_model
isinstance(filepath, h5py.File) or h5py.is_hdf5(filepath))):
File "C:\Users\Oscar\anaconda3\envs\optimizer2\lib\site-packages\h5py\_hl\base.py", line 44, in is_hdf5
return h5f.is_hdf5(filename_encode(fname))
File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5f.pyx", line 156, in h5py.h5f.is_hdf5
OSError
Have anyone had this problem?
I have tried training a model with the updated environment and saving
it, when loading i get the same error.
Updating to tf-cpu v2.3.1
with the base environment and loading works also.
Creating a new env, with conda create -n name python==3.7.x anaconda
and then installing tf, doesn´t work.
i think then some other library is making the problem, but i cannot figure out what is the problem.
I use hd5 instead of h5 as the extension,and solve the problem.
i can load my deep model in colab bu when i want load that model in pc i can't

Running and building Pytorch on Google Colab

I am trying to run a python package that requires pytorch-gpu. I have change the runtime type of my Colab notebook to GPU. When I run the command, I am facing the following error. Not sure if I am able to build pytorch on colab myself?
Traceback (most recent call last):
File "inference_unet.py", line 9, in <module>
import torchvision.transforms as transforms
File "/usr/local/lib/python3.6/dist-packages/torchvision/__init__.py", line 10, in <module>
from .extension import _HAS_OPS
File "/usr/local/lib/python3.6/dist-packages/torchvision/extension.py", line 58, in <module>
_check_cuda_version()
File "/usr/local/lib/python3.6/dist-packages/torchvision/extension.py", line 54, in _check_cuda_version
.format(t_major, t_minor, tv_major, tv_minor))
RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=10.2 and torchvision has CUDA Version=10.1. Please reinstall the torchvision that matches your PyTorch install.
Now you can directly use pytorch-gpu on google colab, no need of installation.
Just change your runtime to gpu, import torch and torchvision and you are done.
I have attached screenshot doing just the same.
Hope the answer will find helpful.
But in case you want to install different version of pytorch or any other package then you can install using pip, just add ! before your pip command and run the cell.
for example,

Lasagne vs Theano possible version mismatch (Windows)

So i finally managed to get theano up and running on the GPU using this guide. (the test code runs fine, telling me it used the GPU, YAY!!)
I then wanted to try it out and followed this guide for training a CNN on digit recognition.
problem is: i get errors from the way lasagne calls theano (i guess there is a version mismatch here):
Using gpu device 0: GeForce GT 730M (CNMeM is disabled, cuDNN not available)
Traceback (most recent call last):
File "C:\Users\Soren Jensen\Desktop\CNN-test\CNNTest-one.py", line 7, in <module>
import lasagne
File "C:\Users\Soren Jensen\Anaconda3\lib\site-packages\lasagne\__init__.py", line 19, in <module>
from . import layers
File "C:\Users\Soren Jensen\Anaconda3\lib\site-packages\lasagne\layers\__init__.py", line 7, in <module>
from .pool import *
File "C:\Users\Soren Jensen\Anaconda3\lib\site-packages\lasagne\layers\pool.py", line 6, in <module>
from theano.tensor.signal import downsample
ImportError: cannot import name 'downsample'
Press any key to continue . . .
From reading about the error message, it seems that 'downsample' was changed, so why is my lasagne still calling it??
trying to update my lasagne version gives:
C:\WINDOWS\system32>pip3.5 install Lasagne==0.1
Collecting Lasagne==0.1
Requirement already satisfied: numpy in c:\users\soren jensen\anaconda3\lib\site-packages (from Lasagne==0.1)
and running the code sample
import theano
import os
print(theano.config.compiledir)
print("Theano version %s" % theano.__version__)
theano_dir = os.path.dirname(theano.__file__)
print("theano is installed in %s" % theano_dir)
reveals that python3.5 uses theano v 0.9
Using gpu device 0: GeForce GT 730M (CNMeM is disabled, cuDNN not available)
C:\theano_compiledir\compiledir_Windows-10-10.0.14393-SP0-Intel64_Family_6_Model_58_Stepping_9_GenuineIntel-3.5.2-64
Theano version 0.9.0.dev-e5bedc0de240eca42433c34c05fc00f4a5ef6cbe
theano is installed in C:\Users\Soren Jensen\Anaconda3\lib\site-packages\theano\theano
Press any key to continue . . .
Sorry for the long post, but i'm going a little crazy of this not working.. Maybe i am wrong in the version mismatch and the error is something else?
Try to reinstall Theano and Lasagne like this:
pip install --upgrade https://github.com/Theano/Theano/archive/master.zip
pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip
Because: "An even more recent version of Theano will often work as well, but at the time of writing, a simple pip install Theano will give you a version that is too old."
Read more: lasagne.readthedocs.io/en/latest/user/installation.html

Resources