RuntimeError: expected device cpu and dtype Byte but got device cpu and dtype Bool - pytorch

As described in the issue I opened, I get the following error when running the Pytorch inverse-cooking model on CPU:
RuntimeError: expected device cpu and dtype Byte but got device cpu and dtype Bool
I have tried running the demo.ipynb file in both my laptop's Intel i7-4700HQ 8 threads and my desktop Ryzen 3700x. I was using Arch Linux on my laptop and Manjaro on my desktop.
The model works fine when I run it on Google Collabs GPU.
According to the demo.ipynb file the model should be able to run on CPU as well. Does anyone know if I have to tweak any parameters in order to make it work?

As stated by #iacolippo and in the comment session and myDennisCode, the problem really was dependency versions. I had torchvision==0.4.0 (which confused me) and torch==1.2.0.
To fix the problem, simply install torch==0.4.1 and torchvision==0.2.1.

Related

Code working on windows but launch failures on Linux

First and foremost: I am completely unable to create a MCVE, as I can only reproduce this when running a full code, any attempt to measure or replicate the error in a simpler environment makes it disappear. TDLR I suspect its not a code problem, but a configuration problem.
I have a piece of code for some mathematics on kernels in CUDA. I have a windows machine Win10 x64, GTX 1050, CUDA 9.2 and a Ubuntu 17.04, 2xGTX 1080 Ti, CUDA 9.1.
My code runs good on the windows machine. It is long (~700ms per kernel call for big samples) so I needed to increase the TDR value in windows. The code also (for now) forces it to run in 1 GPU, the first one that is selected with cudaSetDevice(0).
When I copy the same input data and code to the linux machine (I am using git, it is the same code), I get either
an illegal memory access was encountered
or
unspecified launch failure
in my error checking after the GPU call.
If I change the kernel to instead do the math, to just write a number in the output, the kernel executes properly. Other CUDA code (different functions that I have) works fine too. All this leads me to think that there is a problem outside the code, not with the code itself, nor with the general configuration of the drivers/environment variables.
I read that the xorg.conf can have an effect on the timeout of the kernels. I generated a xorg.conf (I had none) and remove the devices from there, as suggested here. I am connecting to the server remotely, and have no monitor plugged in. This changes nothing in the behavior, my kernels still error.
My question is: what else should I look? What linux specific configuration should I have a look at to pinpoint the cause of the kernel halts?
The error ended up being indeed illegal memory access.
These were caused by the fact that sizeof(unsigned long) is machine specific, and my linux machine returns 8 while my windows machine returns 4. As this code is called from MATLAB, and MATLAB (like some other high level languages such as python) defines the sizes of variables in bits (such as uint32(1)) there was a mismatch in the linux machine when doing memcpys. Turns out that this happened in a variable that is a index, so the kernels were reading garbage (due to the bad memcpy), but then triying to access another array at that location, creating an illegal memory error.
Too specific? yeah.

Theano falls back to CPU

I am training a model in Theano 0.9 and Lasagne 0.1 and want to run it on GPU. I've set THEANO_FLAGS as follows:
THEANO_FLAGS=device=gpu0,force_device=True,floatX=float64
Theano prints it is using GPU
Using gpu device 0: GeForce GTX 980 Ti (CNMeM is disabled, cuDNN 4007)
However, I noticed it's not, profiling shows that it's using CorrMM operation which is according to the docs
CorrMM This is a CPU-only 2d correlation implementation taken from caffe’s cpp implementation and also used by Torch.
I have CUDA Toolkit 7.5 installed, Tensorflow works perfectly on GPU.
For some reason Theano is falling back to CPU, it is supposed to cause an error due to force_device flag but it's not.
I am not sure where the problem is as I'm new to Theano, I appreciate your help.
Issue is floatX=float64.
Use floatX=float32. GPU supports 32 bit only yet.

Theano Installation On windows 64

Im new in Python and Theano library. I want to install Theano on windows 7-64. I have a display adapters :
Intel(R) HD Graphics 3000 which is not compatible with NVIDA.
My QUESTIONS:
1-Is obligatory to install CUDA to i can use Theano?
2- Even if i have an Ubuntu opearting system, with the same display adapters, CUDA still mandatory?
Any help!
Thanks
You do not need CUDA to run Theano.
Theano can run on either CPU or GPU. If you want to run on GPU you must (currently) use CUDA which means you must be using a NVIDIA display adapter. Without CUDA/NVIDIA you must run on CPU.
There is no disadvantage to running on CPU other than speed -- Theano can be much faster on GPU but everything that runs on a GPU will also run on a CPU as long as it has been coded generically (the default and standard method for Theano code).

CUDA device seems to be blocked

I'm running a small CUDA application: the QuickSort benchmark algorithm (see here). I have a dual system with a NVIDIA 660GTX (device 0) and 8600GTS (device 1).
Under Windows 8 and Visual Studio, the application compiles and runs flawlessly on device 0. Under Linux (Ubuntu 12.04 LTS), the app compiles with nvcc and gcc but suddenly stops in its tracks, returning a (unspecified launch failure).
I have two issues:
After this error, my GPU cannot perform some other operations, e.g., running the SDK example bandwidhtTest blocks when it performs the first data transfer, but running deviceQuery continues to perform well. How can I reset my GPU? I've already tried the cudaDeviceReset() method but it doesn't help
How can I find what is going wrong under linux? Has someone a clue or seen this before?
Thanks in advance for your help!
Using the nvidia-smi utility you can reset the GPU if it is compatible
To my knowledge and experience, (unspecified launch failure) usually referees to segmentation fault. Have you specified the right GPU to use? Try to use cuda-memcheck to see if there is any memory out of bound scenario.
From my experience XID 31 was always caused by accessing bad pointer (aka Memory access violation).
I'd first pursue this trail. Run your application with cuda memcheck. Like that cuda-memcheck you_app args to your app and see if it finds any wrong memory accesses.
Also try stepping though the code with cuda-gdb or Nsight Eclipse Edition.
I've found that using
cuda-memcheck -b ...
prevents the device from locking up.

Error OpenCV with CUDA using TBB for multiple GPUs

My OpenCV CUDA program runs fine using a single NVidia 580GTX, but when using another, it gives the following error:
OpenCV Error: Gpu API call (invalid device ordinal) in mallocPitch
I know I need TBB to assign a GPU its job, but even though I installed OpenCV with TBB support (followed the willowgarage website), it says TBB support is required (CMake key 'WITH_TBB' must be true). Any help would really be appreciated since I need this to complete my computer science Master's project.
Thanks!
Ok its solved. turns out it was build 7232 that was the problem, since it works with the latest opencv build(7292) with no problems. Thanks all for the support

Resources