my GPU is NVIDIA RTX 2080 TI
Keras 2.2.4
Tensorflow-gpu 1.12.0
CUDA 10.0
Once I load build a model ( before compilation ), I found that GPU memory is fully allocated
[0] GeForce RTX 2080 Ti | 50'C, 15 % | 10759 / 10989 MB | issd/8067(10749M)
What could be the reason, how can i debug it?
I don't have spare memory to load the data even if I load via generators
I have tried to monitor the GPUs memory usage found out it is full just after building the layers (before compiling model)
I meet a similar problem when I load pre-trained ResNet50. The GPU memory usage just surges to 11GB while ResNet50 usually only consumes less than 150MB.
The problem in my case is that I also import PyTorch without actually used it in my code. After commented it, everything works fine.
But I have another PC with the same code that works just fine. So I uninstall and reinstall the Tensorflow and PyTorch with the correct version. Then everything works fine even if I import PyTorch.
Related
I used two servers, one with a GPU of 3070, the other with a GPU of 2080ti, and CUDA Version: 11.7 still stuck on this line.
model = torch.nn.parallel.DistributedDataParallel(model,find_unused_parameters=False,output_device=None,device_ids=None)
I'm trying to free up GPU memory after finishing using the model.
I checked the nvidia-smi before creating and trainning the model: 402MiB / 7973MiB
After creating and training the model, I checked again the GPU memory status with nvidia-smi: 7801MiB / 7973MiB
Now I tried to free up GPU memory with:
del model
torch.cuda.empty_cache()
gc.collect()
and checked again the GPU memory: 2361MiB / 7973MiB
As you can see not all the GPU memory was released (I expected to get 400~MiB / 7973MiB).
I can only relase the GPU memory via terminal (sudo fuser -v /dev/nvidia* and kill pid)
Is there a way to free up the GPU memory after I done using the model ?
This happens becauce pytorch reserves the gpu memory for fast memory allocation. To learn more about it, see pytorch memory management. To solve this issue, you can use the following code:
from numba import cuda
cuda.select_device(your_gpu_id)
cuda.close()
However, this comes with a catch. It closes the GPU completely. So, you can't start training without restarting everything.
I have installed tensorflow-gpu to train my models on GPU and have confirmed the installation from below.
import tensorflow as tf
tf.config.list_physical_devices()
#[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
# PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
I started training an image classification model and I hope it runs on GPU automatically until and unless specified to run manually on a device. But while training the model I could see in my task manager that there were 2 GPU's and Intel Graphics card was GPU 0 and NVIDIA GeForce GTX1660Ti was GPU1. Does that mean tensorflow didn't detect my NVIDIA card or is it the actual GPU that was detected?
While training the model I could see that my NVIDIA GPU utilization was very low. Not sure on which device my model was trained.
Can someone clarify please.
Further version details. tf.__version__ (2.6.0), python 3.7, CUDA 11.4, cudnn 8.2
Try to enable debug:
tf.debugging.set_log_device_placement(True)
I think your Intel GPU is ignored by tf.config.list_physical_devices().
I run training phase of TF2 model (based on object detection pre-trained models from TF2-Models Zoo) on GPU (Nvidia 3070).
Is there some way to define evaluation phase (for checkpoints created by training) on CPU?
Cause train phase allocates almost all memory of GPU, I cant run both of them (train and eval) on GPU.
OS - Ubuntu 20.04
GPU - Nvidia 3070 (driver 460)
TF - 2.4.1
Python - 3.8.5
Thank you.
In my case, the solution is into evaluation function define:
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
I am using Keras with tensorflow-gpu in backend, I don't have tensorflow (CPU - version) installed, all the outputs show GPU selected but tf is using CPU and system memory
when i run my code the output is: output_code
I even ran device_lib.list_local_device() and the output is: list_local_devices_output
After running the code I tried nvidia-smi to see the usage of gpu and the output is:
nvidia-smi output
Tensorflow-gpu = "1.12.0"
CUDA toolkit = "9.0"
cuDNN = "7.4.1.5"
Environment Variables contain:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin;
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\libnvvp;
C:\WINDOWS\system32;
C:\WINDOWS;
C:\WINDOWS\System32\Wbem;
C:\WINDOWS\System32\WindowsPowerShell\v1.0\;
C:\WINDOWS\System32\OpenSSH\;
C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;
D:\Anaconda3;D:\Anaconda3\Library\mingw-w64\bin
D:\Anaconda3\Library\usr\bin;
D:\Anaconda3\Library\bin;
D:\Anaconda3\Scripts;D:\ffmpeg\bin\;
But still when i check for memory usage in task manager the output is
CPU utilization 51%, RAM utilization 86%
GPU utilization 1%, GPU-RAM utilization 0%
Task_manager_Output
So, I think it is still using CPU instead of GPU.
System Configuration:
Windows-10 64 bit; IDE: Liclipse; Python: 3.6.5
It is using the GPU, as you can see in logs.
The problem is, that a lot of things can not be done on the GPU and as long your data is small and your complexity is low, you will end up with low GPU usage.
Maybe the batch_size is to low -> Increase until you run into OOM Errors
Your data loading is consuming a lot of time and your gpu has to wait (IO Reads)
Your RAM is to low and the application uses Disk as a fallback
Preprocsssing is to slow. If you are dealing with image try to compute everything as a generator or on the gpu if possible
You are using some operations, which are not GPU accelerated
Here is some more detailed explanation.