TensorFlow and Keras GPU usage issues - python-3.x

I am using an existing model to train a CRNN model which is based on Tensorflow and Keras. I am using anaconda-navigator to train the model.
When the model is being trained, it does not seem to use GPU as my GPU usage is at 5-6%.
I am attaching images of top -i command, Nvidia Xserver, and also my gpu is not being picked up by Tensorflow as tf.test.gpu_device_name() only shows CPU as shown in the below screenshot tf.test.gpu_device_name()
Tensorflow and Keras Versions in anaconda
top -i output

After looking at your TensorFlow and Keras versions in anaconda, i found there is a tensorflow-gpu package missing. It looks like you have only installed TensorFlow CPU version and not the GPU version. If you had installed the GPU version of TensorFlow, then the anaconda navigator would also show a metapackage of TensorFlow GPU library.
That is the reason why you don't see the details of your GPU with the command tf.test.gpu_device_name()
TensorFlow GPU will automatically load the CUDA libraries corresponding to your GPU.
Since you are using anaconda environment install the GPU version as follows
Activate your conda environment
conda install -c anaconda tensorflow-gpu
This command would install TensorFlow v2.2.0 to your conda environment.
If you want the latest tf v2.4 then try using pip to install the specific version inside your conda environment
pip install tensorflow-gpu==2.4
Keras will automatically be installed with TensorFlow 2.0 verison. You can use Keras library with TensorFlow backend. If you want a specific version of Keras, you can always install it using pip

Related

PyTorch with GPU is not working but working fine with CPU

My codes were working fine before but suddenly they stop working, without any error or warning.
this is the setting they were working fine.
After that, I have tried multiple options below
My current settings are in 1 env:
torch 1.10.0+cu113
torch-cluster 1.5.9
torch-geometric 2.0.1
torch-scatter 2.0.9
torch-sparse 0.6.12
torch-spline-conv 1.2.1
torchaudio 0.10.0+cu113
torchvision 0.11.1+cu113
but nothing worked. Even I asked my server admin to create a new account for me. he created a new account and I just installed the below packages. These are the total packages installed in my conda env.
installation by
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f download.pytorch.org/whl/torch_stable.html
Package Version
----------------- ------------
certifi 2021.10.8
numpy 1.21.4
Pillow 8.4.0
pip 21.2.2
setuptools 58.0.4
torch 1.9.0+cu111
torchaudio 0.9.0
torchvision 0.10.0+cu111
typing_extensions 4.0.1
wheel 0.37.0
Here are the results: My code is always stuck here.
What could be a possible reason?, while my labmate with same setting and same server executed the same code on his user profile, it was ok.
Further Details:
torch.cuda.is_available()
>>> True
torch.cuda.current_device()
>>> 0
torch.cuda.device(0)
>>> <torch.cuda.device at 0x7fb4e8baa650>
torch.cuda.device_count()
>>> 4
torch.cuda.get_device_name(0)
>>> 'GeForce RTX 3090'
I have the same type of problems, but with RTX 3060. I think the problem is the torch version.
Using torch == 1.11.0, I can move tensors to GPU, but with pastest versions can't do this.
Torch Geometric don't use torch=1.11.0 at the time I'm writing.
NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

No module found torch

I am using jetson NX xavier kit having cuda 10.2.89, open Cv 4.1.1 and tensorRT 7.1.3 . Trying to install pytorch. Tried installing with this line
conda install pytorch torchvision cpuonly -c pytorch
but when i write this line
import torch
It throws an error by saying that module not installed.
How I can verify if pytorch has been installed correctly.
Try this one
conda install -c pytorch pytorch
After executing this command, you need to enter yes(if prompted in the command line) for installing all the related packages. If there is no conflict while installing the libraries, the PyTorch library will be installed.
To check if it is properly installed or not, type the command python in your command line and type import torch to check if it is properly installed or not.

Can I train in tensorflow with separate CUDA version in anaconda environment

I need to train a model in TensorFlow-gpu==2.3.0 which needs the CUDA version to be 10.1. But when I type 'nvidia-smi' it shows CUDA version to be 10.0.
I created a conda environment using, "conda create -n tf2-gpu tensorflow-gpu cudatoolkit=10.1"
after initiating training, it throws an error as tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
How can I train using tensorflow-gpu in conda environment with another version of CUDA? And, I still need CUDA 10.0 to be there, as it helps my other training setup.
Yes, you can create two virtual environments in Anaconda with different tensorflow version. But CUDA and CuDNN will be installing compatible to that specified tensorflow-gpu.
You can find tensorflow-gpu build configuration details here to check supporting CUDA and cuDNN version.
Please check this similar issue link to create virtual environment in anaconda and to install specific tensorflow-gpu.

Does torch.utils.tensorboard need installation of Tensorflow?

I'm new to pytorch and I wonder if using tensorboard on pytorch needs tensorflow as a dependency. Moreover, except for tensorboard, what are other options for training curve visualization?
Indeed tensorboard is part of tensorflow, but it does not depend on it.
You can safely install only tensorboard (e.g, pip install tensorboard) or conda install -c conda-forge tensorboard) and use torch.utils.tensorboard.

Keras - ImportError: cannot import name 'CuDNNLSTM'

I am trying to use the CuDNNLSTM Keras cell to improve training speed for a recurrent neural network (doc here).
When I run:
from keras.layers import Bidirectional, CuDNNLSTM
I get this error:
ImportError: cannot import name 'CuDNNLSTM'
My configuration is Keras 2.0.8, python 3.5, tensorflow-gpu 1.4.0 (all managed by Anaconda) and I have both CUDA 8.0 and cudnn 6.0 installed that should be OK with the nvidia dependencies of tensorflow (here). My code setup makes Keras effectively use tensorflow backend, and every layer except the ones starting with CuDNN* work fine.
Anyone has an idea about the source of this import error ?
And for Tensorflow-2: You can just use LSTM with no activation function and it will automatically use the CuDNN version
It turns out keras 2.0.8 doesn't have the code for these kind of layers that came in more recent versions.
I used pip to upgrade to the lastest version:
pip install --upgrade keras
and it all works now.
These layers have been deprecated in the latest versions.
For a detailed tutorial you can see this Keras guide
in conda it will be (as of nov 2019)
conda config --add channels conda-forge
conda install keras==2.3.0

Resources