Why does PyTorch fail to allocate exactly the amount of free GPU memory? - pytorch

I am seeing this error pretty randomly. It happens after creating a checkpoint. PyTorch throws a OOM error, while trying to allocate what seems to be a reasonable amount of memory:
RuntimeError: CUDA out of memory. Tried to allocate 8.26 GiB (GPU 2; 14.76 GiB total capacity; 3.50 GiB already allocated; 8.26 GiB free; 5.41 GiB reserved in total by PyTorch)
I wonder what could be going on here.

Related

speechbrain & CUDA out of memory

I am trying to enhance an audio file (3:16 minutes in length, available here) using Speechbrain. If I run the code below (from this tutorial), I get the error OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 39.59 GiB total capacity; 33.60 GiB already allocated; 3.19 MiB free; 38.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.
What is the recommended way to fix the issue? Should I just cut the audio file in pieces?
from speechbrain.pretrained import SepformerSeparation as separator
import torchaudio
model = separator.from_hparams(source="speechbrain/sepformer-wham-enhancement",
savedir='pretrained_models/sepformer-wham-enhancement', run_opts={"device":"cuda"})
est_sources = model.separate_file(path=audio_file)
torchaudio.save("enhanced_wham.wav", est_sources[:, :, 0].detach().cpu(), 8000)

How to force my program to use all the GPUs with less batch size?

Recently I asked a question and answered myself after finding its solution. Please read the question if possible.
To be concise: I have 8 GPUs. I can run only on 7 GPUs with batch size 7.
I need to increase the batch size to 8 to use all the 8 GPUs. But, the program is causing CUDA out of memory error for a batch size of 8.
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0;
10.76 GiB total capacity; 9.71 GiB already allocated; 7.56 MiB free; 9.78 GiB reserved in total by PyTorch)
So, I want to run my program on all the GPUs with batch size 7. Is it possible to do?

Pytorch not accessing the GPU's memory

I'm trying to run a reinforcement learning algorithm using pytorch, but it keeps telling me that CUDA is out of memory. However, it seems that pytorch is only accessing a tiny amount of my GPU's memory.
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 4.00 GiB total capacity; 3.78 MiB already allocated; 0 bytes free; 4.00 MiB reserved in total by PyTorch)
It's not that PyTorch is only accessing a tiny amount of GPU memory, but your PyTorch program accumulatively allocated tensors to the GPU memory, and that 2 MB tensor hits the limitation. Try to use a lower batch size or run the model with half-precision to save the GPU memory.
This command should work for PyTorch to access the GPU memory.
> import os os.environ["CUDA_VISIBLE_DEVICES"] = "1"

Strange Cuda out of Memory behavior in Pytorch

Edit: SOLVED- Problem relied on the number of workers, lowered them, problem solved
I am using a 24GB Titan RTX and I am using it for an image segmentation Unet with Pytorch,
it is always throwing Cuda out of Memory at different batch sizes, plus I have more free memory than it states that I need, and by lowering batch sizes, it INCREASES the memory it tries to allocate which doesn't make any sense.
here is what I tried:
Image size = 448, batch size = 8
"RuntimeError: CUDA error: out of memory"
Image size = 448, batch size = 6
"RuntimeError: CUDA out of memory. Tried to allocate 3.12 GiB (GPU 0; 24.00 GiB total capacity; 2.06 GiB already allocated; 19.66 GiB free; 2.31 GiB reserved in total by PyTorch)"
is says it tried to allocate 3.12GB and I have 19GB free and it throws an error??
Image size = 224, batch size = 8
"RuntimeError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 24.00 GiB total capacity; 2.78 GiB already allocated; 19.15 GiB free; 2.82 GiB reserved in total by PyTorch)"
Image size = 224, batch size = 6
"RuntimeError: CUDA out of memory. Tried to allocate 344.00 MiB (GPU 0; 24.00 GiB total capacity; 2.30 GiB already allocated; 19.38 GiB free; 2.59 GiB reserved in total by PyTorch)"
reduced batch size but tried to allocate more ???
Image size = 224, batch size = 4
"RuntimeError: CUDA out of memory. Tried to allocate 482.00 MiB (GPU 0; 24.00 GiB total capacity; 2.21 GiB already allocated; 19.48 GiB free; 2.50 GiB reserved in total by PyTorch)"
Image size = 224, batch size = 2
"RuntimeError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 24.00 GiB total capacity; 1.44 GiB already allocated; 19.88 GiB free; 2.10 GiB reserved in total by PyTorch)"
Image size = 224, batch size = 1
"RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 24.00 GiB total capacity; 894.36 MiB already allocated; 20.94 GiB free; 1.03 GiB reserved in total by PyTorch)"
Even with stupidly low image sizes and batch sizes...
SOLVED- Problem relied on the number of workers, lowered them, problem solved

CUDA out of memory even though there is free memory when using the simple-transformers pytorch library

Ive been trying to follow this tutorial for the simple transformers library but I keep running into cuda errors that I dont quite understand.
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 697.82 MiB already allocated; 3.99 GiB free; 744.00 MiB reserved in total by PyTorch)
I dont understand why its failing to allocate memory when it says there is 4GB free. Im also seeing this error in the stack trace
OSERROR: [WinError 1455] The paging file is too small for this operation to complete
occasionally this one will also pop up.
OSERROR: [WinError 1114] A dynamic link library (DLL) initialization routine failed
Dont really know where to go from here.

Resources