I saved the parameter variable on my computer (equipped with GPU and CUDA). The weights were saved with the GPU mode. Then when attempting to load the weights on the same computer, I still get an error:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location='cpu' to map your storages to the CPU.
The issue is related to the fact that torch.cuda.is_available() returns False.
The issue disappears after restarting the computer but then reappears after some time.
Try adding map_location argument to torch.load, i.e.,modify your code
from torch.load(model_weights)
to torch.load(model_weights, map_location=torch.device('cuda:0'))
Related
I am using A100-SXM4-40GB Gpu but training is terribly slow. I tried two models, a simple classification on cifar and a Unet on Cityscapes. I tried my code on other GPUs and it worked totally fine, but I do not know why training on this high capacity GPU is super slow.
I would appreciate any help.
Here are some other properties of GPUs.
GPU 0: A100-SXM4-40GB
GPU 1: A100-SXM4-40GB
GPU 2: A100-SXM4-40GB
GPU 3: A100-SXM4-40GB
Nvidia driver version: 460.32.03
cuDNN version: Could not collect
Thank you for your answer. Before trying your answer, I decided to uninstall anaconda and reinstall it and this solved the problem.
Call .cuda() on the model during initialization.
As per your above comments, you have GPUs, as well as CUDA installed, so there's no point of checking the device availability with torch.cuda.is_available().
Additionally, you should wrap your model in nn.DataParallel to allow PyTorch use every GPU you expose it to. You also could do DistributedDataParallel, but DataParallel is easier to grasp initially.
Example initialization:
model = UNet().cuda()
model = torch.nn.DataParallel(model)
Also, you can be sure you're exposing the code to all GPUs by executing the python script with the following flag:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_unet.py
Last thing to note - nn.DataParallel encapsulates the model itself, so for saving the state_dict, you'll need to reach module inside DataParallel:
torch.save(model.module.state_dict(), 'unet.pth')
I traced my Neural Network using torch.jit.trace on a CUDA-compatible GPU server. When I reloaded that Trace on the same server, I could reload it and use it fine. Now, when I downloaded it onto my laptop (for quick testing), when I try to load the trace I get:
RuntimeError: Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. 'aten::empty_strided' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
Can I not switch between GPU and CPU on a trace? Or is there something else going on?
I had this exact same issue. In my model I had one line of code that was causing this:
if torch.cuda.is_available():
weight = weight.cuda()
If you have a look at the official documentation for trace (https://pytorch.org/docs/stable/generated/torch.jit.trace.html) you will see that
the returned ScriptModule will always run the same traced graph on any input. This has some important implications when your module is expected to run different sets of operations, depending on the input and/or the module state
So, if the model was traced on a machine with GPU this operation will be recorded and you won't be able to even load your model a CPU only machine. To solve this, deleted everything that makes you model CUDA dependent. In my case it was as easy as deleting the code-block above.
I train a network on GPU with Pytorch. However, after at most 3 epochs, code stops with a message :
Killed
No other error message is given.
I monitored the memory and gpu usage, there were still space during the run. I reviewed the /var/sys/dmesg to find a detailed message regarding to this, however no message with "kill" was entered. What might be the problem?
Cuda version: 9.0
Pytorch version: 1.1.0
If you had root access, you could check whether this is memory issue or not by dmesg command.
In my case, the process was killed by kernel due to out of memory.
I found the cause to be saving tensors require grad to a list and each of those stores an entire computation graph, which consumes significant memory.
I fixed the issue by saving .detach() tensor instead of saving tensors returned by loss function to the list.
You can type "dmesg" on your terminal and scroll down to the bottom. It will show you the message of why it is killed.
Since you mentioned PyTorch, the chances are that your process is killed due to "Out of Memory". To resolve this, reduce your batch size till you no longer see the error.
Hope this helps! :)
In order to give an idea to people who will enconter this:
Apparently, Slurm was installed on the machine so that I needed to give the tasks on Slurm.
As described in the issue I opened, I get the following error when running the Pytorch inverse-cooking model on CPU:
RuntimeError: expected device cpu and dtype Byte but got device cpu and dtype Bool
I have tried running the demo.ipynb file in both my laptop's Intel i7-4700HQ 8 threads and my desktop Ryzen 3700x. I was using Arch Linux on my laptop and Manjaro on my desktop.
The model works fine when I run it on Google Collabs GPU.
According to the demo.ipynb file the model should be able to run on CPU as well. Does anyone know if I have to tweak any parameters in order to make it work?
As stated by #iacolippo and in the comment session and myDennisCode, the problem really was dependency versions. I had torchvision==0.4.0 (which confused me) and torch==1.2.0.
To fix the problem, simply install torch==0.4.1 and torchvision==0.2.1.
Im trying to train a Neural Network that I wrote, but it seems that colab is not recognizing the gtx 1050 on my laptop. I can't use their cloud GPU's for this task, because I run into memory constraints
print(cuda.is_available())
is returning False
Indeed you gotta select the local runtime accelerator to use GPUs or TPUs, go to Runtime then Change runtime type like in the picture:
And then change it to GPU (takes some secs):