I've created new environment using Anconda and tried to run my model which is built on Pytorch. But, I am having troudlbe with AssertionError. Here is my CUDA version with nvcc --version command.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:24:09_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
and, I've installed Pytorch following the link https://pytorch.org/get-started/locally/, by below command.
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
and , I got:
>>> import torch
>>> print(torch.__version__)
1.11.0
>>>
It seems than I installed right Pytorch version with right CUDA version, however I got following error :
(torch) PS C:\Users\Administrator\Desktop> python .\torch_test.py
Traceback (most recent call last):
File ".\torch_test.py", line 136, in <module>
model = TFModel(24*7*2, 24*7, 512, 8, 4, 0.1).to(device)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 907, in to
return self._apply(convert)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply
module._apply(fn)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply
module._apply(fn)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply
module._apply(fn)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 601, in _apply
param_applied = fn(param)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 905, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "C:\Users\Administrator\.conda\envs\torch\lib\site-packages\torch\cuda\__init__.py", line 210, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
How can I resolve the AssertionError?
Related
thanks for your great contribution to the science:
I have installed the following pytorch and pytorch_gemetric versions as you have mentioned in this link:
conda create -n tox-env python=3.6
conda install pytorch=1.6.0 torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install torch-scatter==2.0.6 torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
pip install torch-geometric==2.0.0
The reason is that I am trying to run the code from a GitHub repositorty, when it reaches to this line, it was raising an error (in the latest version of pytorch). Then I had to downgrade the pyG and pytorch versions, however, I am getting the following error:
/home/es/anaconda3/envs/tox-env/bin/python /home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/main.py
/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/cuda/__init__.py:125: UserWarning:
NVIDIA GeForce RTX 3090 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3090 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
tox21
Iteration: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/main.py", line 131, in <module>
main("tox21", "model_gin/supervised_contextpred.pth", "gin", True, True, True, 0.1, 5)
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/main.py", line 105, in main
support_grads = model(epoch)
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/meta_model.py", line 183, in forward
for step, batch in enumerate(tqdm(support_loaders[task], desc="Iteration")):
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
data = self._next_data()
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
return self._process_data(data)
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
data.reraise()
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
data = fetcher.fetch(index)
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 198, in __getitem__
data = self.get(self.indices()[idx])
File "/home/es/PycharmProjects/1-Meta-MGNN/Meta-MGNN/loader.py", line 142, in get
for key in self.data.keys:
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/data.py", line 103, in keys
for store in self.stores:
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/data.py", line 393, in stores
return [self._store]
File "/home/es/anaconda3/envs/tox-env/lib/python3.6/site-packages/torch_geometric/data/data.py", line 341, in __getattr__
"The 'data' object was created by an older version of PyG. "
RuntimeError: The 'data' object was created by an older version of PyG. If this error occurred while loading an already existing dataset, remove the 'processed/' directory in the dataset's root folder and try again.
Process finished with exit code 1
I am new to google Colab and pyTorch. I am running a pytorch model but it is giving me the Cuda Runtime Error in google Colab. My gpu is enabled on google colab but it is still giving error, The description of gpu is available in the image below. Can anyone please help me out?
torch GPU
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=47 error=100 : no CUDA-capable device is detected
Traceback (most recent call last):
File "run.py", line 338, in <module>
main()
File "run.py", line 303, in main
model = model.cuda()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 458, in cuda
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 376, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 458, in <lambda>
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 190, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:47
Read prediction from logs/logs_sparc_editsql/valid_use_predicted_queries_predictions.json
Nvm, I was putting CUDA_VISIBLE_DEVICES to 5.
It should be the number of CUDA devices you have.
I am trying to run the code on github(https://github.com/AayushKrChaudhary/RITnet)
I did not get the semantic segmentation dataset of OpenEDS, so I tried to download the png image from the Internet and put it in Semantic_Segmentation_Dataset\test\ to run the test program.
That code gives the following error:
Traceback (most recent call last):
File "test.py", line 59, in <module>
for i, batchdata in tqdm(enumerate(testloader),total=len(testloader)):
File "C:\Users\b0743\AppData\Local\Continuum\anaconda3\envs\Machine_Learning\lib\site-packages\torch\utils\data\dataloader.py", line 291, in __iter__
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\b0743\AppData\Local\Continuum\anaconda3\envs\Machine_Learning\lib\site-packages\torch\utils\data\dataloader.py", line 737, in __init__
w.start()
File "C:\Users\b0743\AppData\Local\Continuum\anaconda3\envs\Machine_Learning\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\b0743\AppData\Local\Continuum\anaconda3\envs\Machine_Learning\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\b0743\AppData\Local\Continuum\anaconda3\envs\Machine_Learning\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\b0743\AppData\Local\Continuum\anaconda3\envs\Machine_Learning\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\b0743\AppData\Local\Continuum\anaconda3\envs\Machine_Learning\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle cv2.CLAHE objects
(Machine_Learning) C:\Users\b0743\Downloads\RITnet-master\RITnet-master>Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\b0743\AppData\Local\Continuum\anaconda3\envs\Machine_Learning\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\b0743\AppData\Local\Continuum\anaconda3\envs\Machine_Learning\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
and my environment is:
# Name Version
cudatoolkit 10.1.243
cudnn 7.6.5
keras-applications 1.0.8
keras-base 2.3.1
keras-gpu 2.3.1
keras-preprocessing 1.1.0
matplotlib 3.3.1
matplotlib-base 3.3.1
numpy 1.19.1
numpy-base 1.19.1
opencv 3.3.1
pillow 7.2.0
python 3.6.10
pytorch 1.6.0
scikit-learn 0.23.2
scipy 1.5.2
torchsummary 1.5.1
torchvision 0.7.0
tqdm 4.48.2
I don’t know if this is a stupid question, but I hope someone can try to answer it for me.
I literally just got into the dataset python file and commented all the parts that require opencv.
Turns out it works but you wont get that sweet clahe and the other stuff but it works.
if you don't need the dataset thing just make a tensor out of the 640 by 400 image and put it in a empty array and put that array in an array until you have a 4d tensor and pass it in the dnn , and then put the output through the get predictions function and viola you have a array of eye features.
I have been using https://github.com/zalandoresearch/flair#example-usage
tried using flair to experiment flair but then I don't know why I am not able to use the GPU.
and tried the following:
>>> from flair.data import Sentence
>>> from flair.models import SequenceTagger
>>> sentence = Sentence('I love Berlin .')
>>> tagger = SequenceTagger.load('ner')
2019-07-20 17:52:15,062 loading file /home/vz/.flair/models/en-ner-conll03-v0.4.pt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/vz/miniconda3/envs/gp/lib/python3.6/site-packages/flair/nn.py", line 103, in load
model = cls._init_model_with_state_dict(state)
File "/home/vz/miniconda3/envs/gp/lib/python3.6/site-packages/flair/models/sequence_tagger_model.py", line 205, in _init_model_with_state_dict
locked_dropout=use_locked_dropout,
File "/home/vz/miniconda3/envs/gp/lib/python3.6/site-packages/flair/models/sequence_tagger_model.py", line 166, in __init__
self.to(flair.device)
File "/home/vz/miniconda3/envs/gp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 386, in to
return self._apply(convert)
File "/home/vz/miniconda3/envs/gp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/home/vz/miniconda3/envs/gp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/home/vz/miniconda3/envs/gp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/vz/miniconda3/envs/gp/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 127, in _apply
self.flatten_parameters()
File "/home/vz/miniconda3/envs/gp/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Can anyone please help me as to how to fix this error ?
Thanks in advance.
The error is with my machine and CUDNN requirement i would suggest every one to install pytorch with conda so the way to install should be something like this
conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
to Eradicate any kind of issues with the installation.
I was using a pre-trained Yolo model for my object detection project. I have downloaded the weight from someone else google drive and using the "YOLOv2" model from this GitHub repo.
My conda environment configuration:
Python 3.6.7 :: Anaconda, Inc.
keras 2.2.4
Tensorflow 1.13.1 backend
While running the program, I got the below error:
EDIT: Complete traceback
/home/anubh/anaconda3/envs/cMLdev/bin/python /snap/pycharm-professional/121/helpers/pydev/pydevconsole.py --mode=client --port=42727
import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['/home/anubh/PycharmProjects/add_projects/blendid_data_challenge'])
PyDev console: starting.
Python 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:16:44)
[GCC 7.3.0] on linux
runfile('/home/anubh/PycharmProjects/codingforfun/machine_learning/deepLearningAI-ANG/week3/car_detection_for_autonomous_driving/pretrain_yolo_model_car_detection.py', wdir='/home/anubh/PycharmProjects/codingforfun/machine_learning/deepLearningAI-ANG/week3/car_detection_for_autonomous_driving')
Using TensorFlow backend.
2019-03-20 11:08:41.522694: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
XXX lineno: 31, opcode: 0
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/snap/pycharm-professional/121/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/snap/pycharm-professional/121/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/anubh/PycharmProjects/codingforfun/machine_learning/deepLearningAI-ANG/week3/car_detection_for_autonomous_driving/pretrain_yolo_model_car_detection.py", line 89, in <module>
main()
File "/home/anubh/PycharmProjects/codingforfun/machine_learning/deepLearningAI-ANG/week3/car_detection_for_autonomous_driving/pretrain_yolo_model_car_detection.py", line 86, in main
out_scores, out_boxes, out_classes = predict(sess, "test.jpg")
File "/home/anubh/PycharmProjects/codingforfun/machine_learning/deepLearningAI-ANG/week3/car_detection_for_autonomous_driving/pretrain_yolo_model_car_detection.py", line 66, in predict
yolo_model,class_names, scores, boxes,classes = build_graph(summary_needed=1)
File "/home/anubh/PycharmProjects/codingforfun/machine_learning/deepLearningAI-ANG/week3/car_detection_for_autonomous_driving/pretrain_yolo_model_car_detection.py", line 30, in build_graph
yolo_model = load_model("model_data/yolo.h5") # (m, 19, 19, 5, 85) tensor
File "/home/anubh/anaconda3/envs/cMLdev/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "/home/anubh/anaconda3/envs/cMLdev/lib/python3.6/site-packages/keras/engine/saving.py", line 225, in _deserialize_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "/home/anubh/anaconda3/envs/cMLdev/lib/python3.6/site-packages/keras/engine/saving.py", line 458, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/home/anubh/anaconda3/envs/cMLdev/lib/python3.6/site-packages/keras/layers/__init__.py", line 55, in deserialize
printable_module_name='layer')
File "/home/anubh/anaconda3/envs/cMLdev/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 145, in deserialize_keras_object
list(custom_objects.items())))
File "/home/anubh/anaconda3/envs/cMLdev/lib/python3.6/site-packages/keras/engine/network.py", line 1032, in from_config
process_node(layer, node_data)
File "/home/anubh/anaconda3/envs/cMLdev/lib/python3.6/site-packages/keras/engine/network.py", line 991, in process_node
layer(unpack_singleton(input_tensors), **kwargs)
File "/home/anubh/anaconda3/envs/cMLdev/lib/python3.6/site-packages/keras/engine/base_layer.py", line 457, in __call__
output = self.call(inputs, **kwargs)
File "/home/anubh/anaconda3/envs/cMLdev/lib/python3.6/site-packages/keras/layers/core.py", line 687, in call
return self.function(inputs, **arguments)
File "/home/don/tensorflow/yad2k/yad2k/models/keras_yolo.py", line 31, in space_to_depth_x2
SystemError: unknown opcode
I found 2 threads 1 & 2 that tries to answer
How to get the TensorFlow binary compiled to support my CPU instructions.
I also found an easy get around at someone's GitHub issue, but the reasoning wasn't clear at all. They are just trying hit & trial.
but, My question is,
In the same environment configuration, I have used ResNet-50 & VGG-16 models for the image classification task, and much other functionality of keras as tensorflow backend and directly with tensorflow. All works with no such error!
Then, what's so special incompatibility Tensorflow issue is with Yolo_v2 model? Could anyone help in this regards as well why and which tensorflow versions would work and how you decide it before working with any model?