Why does it occupy so much GPU memory when I use .cuda()? And how to reduce it? My poor 1660's GPU memory is so limited.
Execute--
torch.cuda.empty_cache()
It will reduce the GPU memory. If it does not reduce the memory, then restarting the kernel would solve the problem.
Related
I'm working on feature generation before I train a model in PyTorch. I wish to save my features as PyTorch tensors on disk for later use in training.
One of my features ("Feature A") is calculated on a CPU while another feature ("Feature B") must be calculated from that CPU on a GPU (some linear algebra stuff). I have an unusual limitation: on my university cluster, jobs which don't use GPUs have CPU memory limits of 1TB each while jobs which do use GPUs have CPU memory limits of 4GB with GPU memory limits of 48GB. Feature A and Feature B are each approximately 10GB.
Naturally, I want to first calculate Feature A using CPUs only then save Feature A to disk. In another job (this one with GPU access and thus the 4GB CPU memory limitation), I want to load Feature A directly to GPU, compute Feature B, then save Feature B to disk.
With Feature A computed and saved to disk, I've tried:
feaB = torch.load(feaAfile, map_location=torch.device('cuda'))
And yet I max-out my CPU memory. I've confirmed cuda is available.
In the PyTorch documentation I see that in loading tensors they "are first deserialized on the CPU..."
I wonder if there is any way to avoid a CPU memory implication when I want to load only onto the GPU? If the tensor must first be copied to the CPU, could I use some sort of 4GB buffer? Thanks so much in advance.
EDIT: per discussion in the comments, I no longer need to do this. But the question itself, of loading a tensor to the GPU without using CPU memory, remains unanswered so I'm leaving this question up.
I'm trying to free up GPU memory after finishing using the model.
I checked the nvidia-smi before creating and trainning the model: 402MiB / 7973MiB
After creating and training the model, I checked again the GPU memory status with nvidia-smi: 7801MiB / 7973MiB
Now I tried to free up GPU memory with:
del model
torch.cuda.empty_cache()
gc.collect()
and checked again the GPU memory: 2361MiB / 7973MiB
As you can see not all the GPU memory was released (I expected to get 400~MiB / 7973MiB).
I can only relase the GPU memory via terminal (sudo fuser -v /dev/nvidia* and kill pid)
Is there a way to free up the GPU memory after I done using the model ?
This happens becauce pytorch reserves the gpu memory for fast memory allocation. To learn more about it, see pytorch memory management. To solve this issue, you can use the following code:
from numba import cuda
cuda.select_device(your_gpu_id)
cuda.close()
However, this comes with a catch. It closes the GPU completely. So, you can't start training without restarting everything.
I am using NVIDIA GeForce RTX 2070 Super with 24GB gpu memory in total . When I try to train a RES101 model using Pytorch, the cuda memory runs out during training time. I found that only 8GB gpu memory is saved for Pytorch. I wonder if I can increase the gpu memory allocated for it.
i am facing issue with my inception model during the performance testing with Apache JMeter.
Error: OOM when allocating tensor with shape[800,1280,3] and type
float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator
GPU_0_bfc [[Node: Cast = CastDstT=DT_FLOAT, SrcT=DT_UINT8,
_device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens,
add report_tensor_allocations_upon_oom to RunOptions for current
allocation info.
OOM stands for Out Of Memory. That means that your GPU has run out of space, presumably because you've allocated other tensors which are too large. You can fix this by making your model smaller or reducing your batch size. By the looks of it, you're feeding in a large image (800x1280) you may want to consider downsampling.
If you have multiple GPUS at hand, kindly select a GPU which is not as busy as this one, (possible reasons, other processes are also running on this GPU). Go to terminal and type
export CUDA_VISIBLE_DEVICES=1
where 1 is the number of other GPU available. re-run the same code.
you can check the available GPUs using
nvidia-smi
this will show you what GPUs are available and how much memory is available on each one of them
Hello Everyone,
I am working on a Image classification problem using tensorflow and Convolution Neural Network.
My model is having following layers.
Input image of size 2456x2058
3 convolution Layer {Con1-shape(10,10,1,32); Con2-shape(5,5,32,64); Con3-shape(5,5,64,64)}
3 max pool 2x2 layer
1 fully connected layer.
I have tried using the NVIDIA-SMI tool but it shows me the GPU memory consumption as the model runs.
I would like to know if there is any method or a way to find the estimate of memory before running the model on GPU. So that I can design models with the consideration of available memory.
I have tried using this method for estimation but my calculated memory and observed memory utilisation are no where near to each other.
Thank you all for your time.
As far as I understand, when you open a session with tensorflow-gpu, it allocates all the memory in the GPUS that are available. So, when you look at the nvidia-smi output, you will always see the same amount of used memory, even if it actually uses only a part of it. There are options when opening a session to force tensorflow to allocate only a part of the available memory (see How to prevent tensorflow from allocating the totality of a GPU memory? for instance)
You can control the memory allocation of GPU in TensorFlow. Once you calculated your memory requirements for your Deep learning model you can use tf.GPUOptions.
For example if you want to allocate 4 GB(approximately) of GPU memory out of 8 GB.
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)
Once done pass it in tf.Session using config parameter
The per_process_gpu_memory_fraction is used to bound the available amount of GPU memory.
Here's the link to documentation :-
https://www.tensorflow.org/tutorials/using_gpu
NVIDIA-SMI ... shows me the GPU memory consumption as the model run
TF preallocates all available memory when you use it, so NVIDIA-SMI would show nearly 100% memory usage ...
but my calculated memory and observed memory utilisation are no where near to each other.
.. so this is unsurprising.