Why can't my GPU be used when it is largely available? - pytorch

As you can see, the gpu 3 is 0% used, and only 11MB memory is used. But when I try to use this gpu on PyTorch or Tensorflow, it gives an error:
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: out of memory
Why can't I use them?

Related

How to clear GPU memory after using model?

I'm trying to free up GPU memory after finishing using the model.
I checked the nvidia-smi before creating and trainning the model: 402MiB / 7973MiB
After creating and training the model, I checked again the GPU memory status with nvidia-smi: 7801MiB / 7973MiB
Now I tried to free up GPU memory with:
del model
torch.cuda.empty_cache()
gc.collect()
and checked again the GPU memory: 2361MiB / 7973MiB
As you can see not all the GPU memory was released (I expected to get 400~MiB / 7973MiB).
I can only relase the GPU memory via terminal (sudo fuser -v /dev/nvidia* and kill pid)
Is there a way to free up the GPU memory after I done using the model ?
This happens becauce pytorch reserves the gpu memory for fast memory allocation. To learn more about it, see pytorch memory management. To solve this issue, you can use the following code:
from numba import cuda
cuda.select_device(your_gpu_id)
cuda.close()
However, this comes with a catch. It closes the GPU completely. So, you can't start training without restarting everything.

How to increase gpu memory for Pytorch

I am using NVIDIA GeForce RTX 2070 Super with 24GB gpu memory in total . When I try to train a RES101 model using Pytorch, the cuda memory runs out during training time. I found that only 8GB gpu memory is saved for Pytorch. I wonder if I can increase the gpu memory allocated for it.

Why Does PyTorch Use So Much GPU Memory to Store Tensors?

I made a simple test using PyTorch which involves measuring the current GPU memory use, creating a tensor of a certain size, moving it to the GPU, and measuring the GPU memory again. According to my calculation, around 6.5k bytes were required to store each element of the tensor! Here is the break down:
GPU memory use before creating the tensor as shown by nvidia-smi: 384 MiB.
Create a tensor with 100,000 random elements:
a = torch.rand(100000)
Transfer the tensor to the GPU:
device = torch.device('cuda')
b = a.to(device)
GPU memory use after the transfer: 1020 MiB
Calculate memory change per one element of the tensor
(1020-384)*1024*1024/len(b)
# Answer is 6668.94336
This is weird, to say the least. Why would 6.5 KiB of GPU memory be required to store a single float32 element?
UPDATE: Following Robert Crovella's suggestion in the comment, I created another tensor c and then moved it to the CUDA device on d. The GPU memory usage didn't increase. So, it seems that PyTorch or CUDA requires some 636 MiB for bootstrapping. Why is that? What is this memory used for? It seems a lot to me!

Specifying gpus in keras backend without CUDA_VISIBLE

I have a problem where I have to include both keras and pytorch frameworks in the same program. The frozen feature extractor is in Keras. When I run the Keras feature extractor in multi gpu mode (4 gpus) it takes all the gpus for its execution leaving no space for the second half (pytorch model) I would like to allocate 2 gpus to keras model and the other gpus to the pytorch model. If I set cuda.visible_devices='0,1', it does not allow pytorch to use the other 3 gpus as well which is a sticky situation.
Another workaround is to set the keras backend session:
config = tf.ConfigProto( device_count = {'GPU': 1} )
sess = tf.Session(config=config)
keras.backend.set_session(sess)
But here, I want to specify which gpus can be used by the session (by ID).
The error I am getting is CUDA out of memory error while pushing the data onto the gpu before the second half (pytorch section) because keras has already declared gpu occupancy.
RuntimeError: CUDA out of memory. Tried to allocate 204.00 MiB (GPU 2; 10.92 GiB total capacity; 14.54 MiB already allocated; 151.50 MiB free; 9.46 MiB cached)
I can specify GPUs with pytorch. I just want to do the same with keras.

Force GPU memory limit in PyTorch

Is there a way to force a maximum value for the amount of GPU memory that I want to be available for a particular Pytorch instance? For example, my GPU may have 12Gb available, but I'd like to assign 4Gb max to a particular process.
Update (04-MAR-2021): it is now available in the stable 1.8.0 version of PyTorch. Also, in the docs
Original answer follows.
This feature request has been merged into PyTorch master branch. Yet, not introduced in the stable release.
Introduced as set_per_process_memory_fraction
Set memory fraction for a process.
The fraction is used to limit an caching allocator to allocated memory on a CUDA device.
The allowed value equals the total visible memory multiplied fraction.
If trying to allocate more than the allowed value in a process, will raise an out of
memory error in allocator.
You can check the tests as usage examples.
Update pytorch to 1.8.0
(pip install --upgrade torch==1.8.0)
function: torch.cuda.set_per_process_memory_fraction(fraction, device=None)
params:
fraction (float) – Range: 0~1. Allowed memory equals total_memory * fraction.
device (torch.device or int, optional) – selected device. If it is None the default CUDA device is used.
eg:
import torch
torch.cuda.set_per_process_memory_fraction(0.5, 0)
torch.cuda.empty_cache()
total_memory = torch.cuda.get_device_properties(0).total_memory
# less than 0.5 will be ok:
tmp_tensor = torch.empty(int(total_memory * 0.499), dtype=torch.int8, device='cuda')
del tmp_tensor
torch.cuda.empty_cache()
# this allocation will raise a OOM:
torch.empty(total_memory // 2, dtype=torch.int8, device='cuda')
"""
It raises an error as follows:
RuntimeError: CUDA out of memory. Tried to allocate 5.59 GiB (GPU 0; 11.17 GiB total capacity; 0 bytes already allocated; 10.91 GiB free; 5.59 GiB allowed; 0 bytes reserved in total by PyTorch)
"""
In contrast to tensorflow which will block all of the CPUs memory, Pytorch only uses as much as 'it needs'. However you could:
Reduce the batch size
Use CUDA_VISIBLE_DEVICES=# of GPU (can be multiples) to limit the GPUs that can be accessed.
To make this run within the program try:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

Resources