I am using Keras with tensorflow-gpu in backend, I don't have tensorflow (CPU - version) installed, all the outputs show GPU selected but tf is using CPU and system memory
when i run my code the output is: output_code
I even ran device_lib.list_local_device() and the output is: list_local_devices_output
After running the code I tried nvidia-smi to see the usage of gpu and the output is:
nvidia-smi output
Tensorflow-gpu = "1.12.0"
CUDA toolkit = "9.0"
cuDNN = "7.4.1.5"
Environment Variables contain:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin;
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\libnvvp;
C:\WINDOWS\system32;
C:\WINDOWS;
C:\WINDOWS\System32\Wbem;
C:\WINDOWS\System32\WindowsPowerShell\v1.0\;
C:\WINDOWS\System32\OpenSSH\;
C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;
D:\Anaconda3;D:\Anaconda3\Library\mingw-w64\bin
D:\Anaconda3\Library\usr\bin;
D:\Anaconda3\Library\bin;
D:\Anaconda3\Scripts;D:\ffmpeg\bin\;
But still when i check for memory usage in task manager the output is
CPU utilization 51%, RAM utilization 86%
GPU utilization 1%, GPU-RAM utilization 0%
Task_manager_Output
So, I think it is still using CPU instead of GPU.
System Configuration:
Windows-10 64 bit; IDE: Liclipse; Python: 3.6.5
It is using the GPU, as you can see in logs.
The problem is, that a lot of things can not be done on the GPU and as long your data is small and your complexity is low, you will end up with low GPU usage.
Maybe the batch_size is to low -> Increase until you run into OOM Errors
Your data loading is consuming a lot of time and your gpu has to wait (IO Reads)
Your RAM is to low and the application uses Disk as a fallback
Preprocsssing is to slow. If you are dealing with image try to compute everything as a generator or on the gpu if possible
You are using some operations, which are not GPU accelerated
Here is some more detailed explanation.
Related
I'm trying to free up GPU memory after finishing using the model.
I checked the nvidia-smi before creating and trainning the model: 402MiB / 7973MiB
After creating and training the model, I checked again the GPU memory status with nvidia-smi: 7801MiB / 7973MiB
Now I tried to free up GPU memory with:
del model
torch.cuda.empty_cache()
gc.collect()
and checked again the GPU memory: 2361MiB / 7973MiB
As you can see not all the GPU memory was released (I expected to get 400~MiB / 7973MiB).
I can only relase the GPU memory via terminal (sudo fuser -v /dev/nvidia* and kill pid)
Is there a way to free up the GPU memory after I done using the model ?
This happens becauce pytorch reserves the gpu memory for fast memory allocation. To learn more about it, see pytorch memory management. To solve this issue, you can use the following code:
from numba import cuda
cuda.select_device(your_gpu_id)
cuda.close()
However, this comes with a catch. It closes the GPU completely. So, you can't start training without restarting everything.
I need to create variables directly on the GPU because I am very limited in my CPU ram.
I found the method to do this here
https://discuss.pytorch.org/t/how-to-create-a-tensor-on-gpu-as-default/2128/3
Which mentions using
torch.set_default_tensor_type(‘torch.cuda.FloatTensor’)
However, when I tried
torch.set_default_tensor_type('torch.cuda.FloatTensor')
pytorchGPUDirectCreate = torch.FloatTensor(20000000, 128).uniform_(-1, 1).cuda()
It still seemed to take up mostly CPU RAM, before being transferred to GPU ram.
I am using Google Colab. To view RAM usage during the variable creation process, after running the cell, go to Runtime -> Manage Sessions
With and without using torch.set_default_tensor_type('torch.cuda.FloatTensor') , the CPU RAM bumps up to 11.34 GB while GPU ram stays low, and then GPU RAM goes to 9.85 and CPU ram goes back down.
It seems that torch.set_default_tensor_type(‘torch.cuda.FloatTensor’) didn't make a difference
For convenience here's a direct link to a notebook anyone can directly run
https://colab.research.google.com/drive/1LxPMHl8yFAATH0i0PBYRURo5tqj0_An7
my GPU is NVIDIA RTX 2080 TI
Keras 2.2.4
Tensorflow-gpu 1.12.0
CUDA 10.0
Once I load build a model ( before compilation ), I found that GPU memory is fully allocated
[0] GeForce RTX 2080 Ti | 50'C, 15 % | 10759 / 10989 MB | issd/8067(10749M)
What could be the reason, how can i debug it?
I don't have spare memory to load the data even if I load via generators
I have tried to monitor the GPUs memory usage found out it is full just after building the layers (before compiling model)
I meet a similar problem when I load pre-trained ResNet50. The GPU memory usage just surges to 11GB while ResNet50 usually only consumes less than 150MB.
The problem in my case is that I also import PyTorch without actually used it in my code. After commented it, everything works fine.
But I have another PC with the same code that works just fine. So I uninstall and reinstall the Tensorflow and PyTorch with the correct version. Then everything works fine even if I import PyTorch.
I have installed Tensorflow and Keras with an Anaconda installation on Windows 10. I´m using an Intel i7 processor. It takes 40 minutes to train 4000 data samples of a CSV file and I´m trying to perform a LSTM RNN predictive analytics on this data.
Is this an expected compile time using CPU? Can we make it faster using cpu or switching to GPU?
Yes, this does seem like a reasonable amount of time for your code to run when you're training using only a CPU. If you used a NVIDIA GPU it would run much faster.
However, you might not be using every core on your CPU; if you are, it might run faster. You can change the number of threads that Tensorflow uses by running
sess = tf.Session(config=tf.ConfigProto(intra_op_parallelism_threads=NUM_THREADS))
If you set the number of threads equal to those provided by your CPU, it should run faster.
I have a mac book pro with a 2 Gb Nvidia GPU. I am trying to utilize all my GPU memory for computations (python code). How much saving I may get if I bypassed the GUI interface and only accessed my machine through command line. I want to know if such a thing would save me a good amount of GPU memory?
The difference probably won't be huge.
A GPU that is only hosting a console display will only typically have ~5-25 megabytes of memory reserved out of the total memory size. On the other hand, a GPU that is hosting a GUI display (using the NVIDIA GPU driver) might typically have ~50 megabytes or more reserved for display use (this may vary somewhat based on the size of the display attached).
So you can probably get a good "estimate" of the savings by running nvidia-smi and looking at the difference between total and available memory for your GPU with the GUI running. If that is, for example, 62MB, then you can probably "recover" around 40-50MB by shutting off the GUI, for example on linux switching to runlevel 3.
I just ran this experiment on a linux laptop with a Quadro3000M that happens to have 2GB of memory. With the X display up and NVIDIA GPU driver loaded, the "used" memory was 62MB out of 2047MB (reported by nvidia-smi).
When I switch to runlevel 3 (X is not started) the memory usage dropped to about 4MB. That probably means that ~50MB additional is available to CUDA.
A side benefit of shutting off the GUI might be the elimination of the display watchdog.