Training a model on GPU is very slow - pytorch

I am using A100-SXM4-40GB Gpu but training is terribly slow. I tried two models, a simple classification on cifar and a Unet on Cityscapes. I tried my code on other GPUs and it worked totally fine, but I do not know why training on this high capacity GPU is super slow.
I would appreciate any help.
Here are some other properties of GPUs.
GPU 0: A100-SXM4-40GB
GPU 1: A100-SXM4-40GB
GPU 2: A100-SXM4-40GB
GPU 3: A100-SXM4-40GB
Nvidia driver version: 460.32.03
cuDNN version: Could not collect

Thank you for your answer. Before trying your answer, I decided to uninstall anaconda and reinstall it and this solved the problem.

Call .cuda() on the model during initialization.
As per your above comments, you have GPUs, as well as CUDA installed, so there's no point of checking the device availability with torch.cuda.is_available().
Additionally, you should wrap your model in nn.DataParallel to allow PyTorch use every GPU you expose it to. You also could do DistributedDataParallel, but DataParallel is easier to grasp initially.
Example initialization:
model = UNet().cuda()
model = torch.nn.DataParallel(model)
Also, you can be sure you're exposing the code to all GPUs by executing the python script with the following flag:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_unet.py
Last thing to note - nn.DataParallel encapsulates the model itself, so for saving the state_dict, you'll need to reach module inside DataParallel:
torch.save(model.module.state_dict(), 'unet.pth')

Related

Running model in GPU using OpenVino

I can run model on CPU successfully.
Once run on GPU, I have error as
[ ERROR ] Check 'get_element_type().is_dynamic() || get_element_type() == element_type' failed at C:\j\workspace\private-ci\ie\build-windows-vs2019#2\b\repos\openvino\ngraph\core\src\runtime\host_tensor.cpp:174:
Can not change a static element type
How can I solve?
You might need to do the Additional Installation Steps for Intel® Processor Graphics (GPU) or your model may not work on GPU. Refer to the Public Pre-Trained Models and Intel's Pre-Trained Models to check the Models Device Support. Also, please ensure that you are following all the System requirements needed.

Colab not recognizing local gpu

Im trying to train a Neural Network that I wrote, but it seems that colab is not recognizing the gtx 1050 on my laptop. I can't use their cloud GPU's for this task, because I run into memory constraints
print(cuda.is_available())
is returning False
Indeed you gotta select the local runtime accelerator to use GPUs or TPUs, go to Runtime then Change runtime type like in the picture:
And then change it to GPU (takes some secs):

Allocate only one gpu to Keras (TF backend) script

I have a machine with 2 GPUs.
Quite often, one is used in production (i.e doing predictions with the already trained model), while the other is used for training and experimenting new models.
While I was using theano, I had no problem running my scripts on only one GPU by specifying a flag as follow
THEANO_FLAGS="device=cuda0" training_script.py
THEANO_FLAGS="device=cuda1" prediction_script.py
Is there a simple way to do the same in Keras with a Tensorflow backend ? Default behavior seem to map all the memory of all the GPUs for one session
(Please note that I don't really care if each script maps a whole GPU separately, even if they could work using less memory)
You can easily choose one gpu. Just fill 0 or 1 on CUDA_VISIBLE_DEVICES
import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"
Furthermore if you want to spesify a portion of gpu for the selected gpu above, add:
from keras import backend as K
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4 #what portion of gpu to use
session = tf.Session(config=config)
K.set_session(session)

Theano falls back to CPU

I am training a model in Theano 0.9 and Lasagne 0.1 and want to run it on GPU. I've set THEANO_FLAGS as follows:
THEANO_FLAGS=device=gpu0,force_device=True,floatX=float64
Theano prints it is using GPU
Using gpu device 0: GeForce GTX 980 Ti (CNMeM is disabled, cuDNN 4007)
However, I noticed it's not, profiling shows that it's using CorrMM operation which is according to the docs
CorrMM This is a CPU-only 2d correlation implementation taken from caffe’s cpp implementation and also used by Torch.
I have CUDA Toolkit 7.5 installed, Tensorflow works perfectly on GPU.
For some reason Theano is falling back to CPU, it is supposed to cause an error due to force_device flag but it's not.
I am not sure where the problem is as I'm new to Theano, I appreciate your help.
Issue is floatX=float64.
Use floatX=float32. GPU supports 32 bit only yet.

Theano Installation On windows 64

Im new in Python and Theano library. I want to install Theano on windows 7-64. I have a display adapters :
Intel(R) HD Graphics 3000 which is not compatible with NVIDA.
My QUESTIONS:
1-Is obligatory to install CUDA to i can use Theano?
2- Even if i have an Ubuntu opearting system, with the same display adapters, CUDA still mandatory?
Any help!
Thanks
You do not need CUDA to run Theano.
Theano can run on either CPU or GPU. If you want to run on GPU you must (currently) use CUDA which means you must be using a NVIDIA display adapter. Without CUDA/NVIDIA you must run on CPU.
There is no disadvantage to running on CPU other than speed -- Theano can be much faster on GPU but everything that runs on a GPU will also run on a CPU as long as it has been coded generically (the default and standard method for Theano code).

Resources