cudnn error while running a pyorch code on gpu - pytorch

I have the following error:
Traceback (most recent call last):
File "odenet_mnist.py", line 343, in <module>
logits = model(x)
File "/home/subhashnerella/.conda/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/subhashnerella/.conda/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/subhashnerella/.conda/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/subhashnerella/.conda/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
This is not my code. I am just trying to run the code of a recent paper "Neural Ordinary Differential Equations"-by Chen et al.
Here is the link to the code.
gpu: nvidia 2080Ti
pytorch version:'1.0.1.post2'
cuda 9.0
python 3.7.2
cudnn:
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
I am new to pytorch. Why am i getting this error and how do i fix it?

RTX2080Ti needs CUDA10 to work properly.Install the PyTorch binaries containing CUDA10

Related

RuntimeError: The 'data' object was created by an older version of PyG

thanks for your great contribution to the science:
I have installed the following pytorch and pytorch_gemetric versions as you have mentioned in this link:
conda create -n tox-env python=3.6
conda install pytorch=1.6.0 torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install torch-scatter==2.0.6 torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
pip install torch-geometric==2.0.0
The reason is that I am trying to run the code from a GitHub repositorty, when it reaches to this line, it was raising an error (in the latest version of pytorch). Then I had to downgrade the pyG and pytorch versions, however, I am getting the following error:
/home/es/anaconda3/envs/tox-env/bin/python /home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/main.py
/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/cuda/__init__.py:125: UserWarning:
NVIDIA GeForce RTX 3090 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3090 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
tox21
Iteration: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/main.py", line 131, in <module>
main("tox21", "model_gin/supervised_contextpred.pth", "gin", True, True, True, 0.1, 5)
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/main.py", line 105, in main
support_grads = model(epoch)
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/meta_model.py", line 183, in forward
for step, batch in enumerate(tqdm(support_loaders[task], desc="Iteration")):
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
data = self._next_data()
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 198, in __getitem__
data = self.get(self.indices()[idx])
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/loader.py", line 142, in get
for key in self.data.keys:
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/data.py", line 103, in keys
for store in self.stores:
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/data.py", line 393, in stores
return [self._store]
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/data.py", line 341, in __getattr__
"The 'data' object was created by an older version of PyG. "
RuntimeError: The 'data' object was created by an older version of PyG. If this error occurred while loading an already existing dataset, remove the 'processed/' directory in the dataset's root folder and try again.
Process finished with exit code 1

ValueError: Dimension 0 in both shapes must be equal, but are 0 and 512 when use of tensorflow 2.2

I try to run PFE model. It works well when I run eval_lfw with tensorflow 2.1 and tensorflow 1.x but when I tried to run it with tensorflow 2.2 and more I have this error :
ValueError: Node 'gradients/UncertaintyModule/fc_log_sigma_sq/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512].
It happens when the model is loading when it does saver = tf.compat.v1.train.import_meta_graph(meta_file, clear_devices=True, import_scope=scope) to import the meta file of my model
To reproduce the error : download https://drive.google.com/drive/folders/10RnChjxtSAUc1lv7jbm3xkkmhFYyZrHP?usp=sharing
and run eval_lfw with parameters --model_dir pretrained/PFE_sphere64_msarcface_am --dataset_path data/Dataset --protocol_path ./proto/pairs_dataset.txt
Thank you for your help
Traceback (most recent call last):
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/evaluation/eval_lfw.py", line 78, in
main(args)
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/evaluation/eval_lfw.py", line 51, in main
network.load_model(args.model_dir)
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/network.py", line 169, in load_model
saver = tf.compat.v1.train.import_meta_graph(meta_file, clear_devices=True, import_scope=scope)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1462, in import_meta_graph
**kwargs)[0]
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1486, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 799, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 501, in _import_graph_def_internal
raise ValueError(str(e))
ValueError: Node 'gradients/UncertaintyModule/fc_log_sigma_sq/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512].

RuntimeError: cuda runtime error (100) . The gpu is enabled but still giving error

I am new to google Colab and pyTorch. I am running a pytorch model but it is giving me the Cuda Runtime Error in google Colab. My gpu is enabled on google colab but it is still giving error, The description of gpu is available in the image below. Can anyone please help me out?
torch GPU
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=47 error=100 : no CUDA-capable device is detected
Traceback (most recent call last):
File "run.py", line 338, in <module>
main()
File "run.py", line 303, in main
model = model.cuda()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 458, in cuda
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 376, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 458, in <lambda>
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 190, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:47
Read prediction from logs/logs_sparc_editsql/valid_use_predicted_queries_predictions.json
Nvm, I was putting CUDA_VISIBLE_DEVICES to 5.
It should be the number of CUDA devices you have.

How to fix AttributeError: module 'tensorflow' has no attribute 'space_to_depth'

I have python 3.7.4 installed and tensorflow version 2.2.0 installed on my Windows 10 x64.
I am trying to execute this:
yolo_model = load_model("model_data/yolo.h5")
And it gives the mentioned error
Here is the stack trace:
Traceback (most recent call last):
File "Object Detection.py", line 78, in <module>
yolo_model = load_model("model_data/yolo.h5")
File "E:\Python37\lib\site-packages\keras\engine\saving.py", line 492, in load_wrapper
return load_function(*args, **kwargs)
File "E:\Python37\lib\site-packages\keras\engine\saving.py", line 584, in load_model
model = _deserialize_model(h5dict, custom_objects, compile)
File "E:\Python37\lib\site-packages\keras\engine\saving.py", line 274, in _deserialize_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "E:\Python37\lib\site-packages\keras\engine\saving.py", line 627, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "E:\Python37\lib\site-packages\keras\layers\__init__.py", line 168, in deserialize
printable_module_name='layer')
File "E:\Python37\lib\site-packages\keras\utils\generic_utils.py", line 147, in deserialize_keras_object
list(custom_objects.items())))
File "E:\Python37\lib\site-packages\keras\engine\network.py", line 1075, in from_config
process_node(layer, node_data)
File "E:\Python37\lib\site-packages\keras\engine\network.py", line 1025, in process_node
layer(unpack_singleton(input_tensors), **kwargs)
File "E:\Python37\lib\site-packages\keras\backend\tensorflow_backend.py", line 75, in symbolic_fn_wrapper
return func(*args, **kwargs)
File "E:\Python37\lib\site-packages\keras\engine\base_layer.py", line 489, in __call__
output = self.call(inputs, **kwargs)
File "E:\Python37\lib\site-packages\keras\layers\core.py", line 716, in call
return self.function(inputs, **arguments)
File "/Users/kian/Desktop/floydhub/yolo-03-oct/YAD2K/yad2k/models/keras_yolo.py", line 32, in space_to_depth_x2
AttributeError: module 'tensorflow' has no attribute 'space_to_depth'
I think location of module is different, FYI
tensorflow 1.x : tensorflow.space_to_depth
tensorflow 2.x : tensorflow.nn.space_to_depth
Seem to be conflict tensorflow version and keras version. Downgrade tensorflow to 1.14.0 and keras to 2.3.1 fix the problem
Think location of module is different, FYI:
tensorflow 1.x: tensorflow.space_to_depth
tensorflow 2.: tensorflow.nn.space_to_depth
Yes it worked.

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'

I'm working with the project 'lda2vec-pytorch' on Google CoLab,
runnin pytorch 1.1.0
https://github.com/TropComplique/lda2vec-pytorch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
cuda:0
I'm getting an exception in the forward method adding 'noise' in my
class negative_sampling_loss(nn.Module):
noise = self.multinomial.draw(batch_size*window_size*self.num_sampled)
noise = Variable(noise).view(batch_size, window_size*self.num_sampled)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
self.embedding = self.embedding.to(device)
#print("negative_sampling_loss::forward() self.embedding", self.embedding.is_cuda) This line get's an error.
# shape: [batch_size, window_size*num_sampled, embedding_dim]
noise = self.embedding(noise) # Exception HERE
Here's the stack trace:
Traceback (most recent call last):
File "train.py", line 36, in <module>
main()
File "train.py", line 32, in main
save_every=20, grad_clip=5.0
File "../utils/training.py", line 138, in train
neg_loss, dirichlet_loss = model(doc_indices, pivot_words, target_words)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "../utils/lda2vec_loss.py", line 82, in forward
neg_loss = self.neg(pivot_words, target_words, doc_vectors, w)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "../utils/lda2vec_loss.py", line 167, in forward
noise = self.embedding(noise)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/sparse.py", line 117, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1506, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'
Any ideas?
Variable noise is available on CPU while self.embedding is on GPU. We can send noise to GPU as well:
noise = noise.to(device)

Resources