Large datasets and Cuda memory Issue - pytorch

I was processing a large dataset and ran into this error: "RuntimeError: CUDA out of memory. Tried to allocate 1.35 GiB (GPU 0; 8.00 GiB total capacity; 3.45 GiB already allocated; 1.20 GiB free; 4.79 GiB reserved in total by PyTorch).
Any thought on how to solve this?

I met the same problem before. It's not a bug, you just ran out of memory on your GPU.
One way to solve it is to reduce the batch size until your code will run without this error.
if it not works, better to understand your model. A single 8GiB GPU may not handle a large and deep model. You should consider changing a GPU with larger memory and find a lab to help you (Google Colab can help)
if you are just doing evaluate, force a tensor to be run CPU would be fine
Try model compression algorithm

If you are using full batch gradient descent (or similar), use mini batch instead with smaller batch size and reflect the same in dataloaders.

Related

Trying to train a spacy model using GPU but PyTorch throws the "Out Of Memory Error"

I'm Trying to build a parser to recognize entities from text given and to train the spacy model with GPU I went ahead and installed all the necessary packages but while training it runs the first iteration and throws the "Cuda Out Of Memory Error"
i was using this command on my terminal to train the model: python -m spacy train C:\Users\User\PycharmProjects\pythonProject\pythonProject3\config.cfg --output ./output --paths.train ./train_data.spacy --paths.dev ./test_data.spacy --gpu-id 0
And after the First Iteration it gives me this error:
RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 6.00 GiB total capacity; 4.99 GiB already allocated; 0 bytes free; 5.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
What i dont seem to understand here is it mentions 0 bytes free. is there anything i can possibly do to provide some space and train my Model?

Detectron2 Segmentation training : out of memory while training the Detectron2 mask-rcnn model on GPU

I tried almost all the option to train the model including reducing batch size to 1 and some other steps as described here
How do I select which GPU to run a job on?,
But still i get the error
RuntimeError: CUDA out of memory. Tried to allocate 238.00 MiB (GPU 3; 15.90 GiB total capacity; 15.20 GiB already allocated; 1.88 MiB free; 9.25 MiB cached)
This is the notebook , configured in Azure ML workspace with N24-GPU
thank you
Check your memory usage before you start training, sometimes detectron2 doesn't free vram after use, particularly if training crashes. If this is the case, the easiest way to fix the issue in the short term is a reboot.
As for a long term fix to this issue, I cant give any advise other than ensuring your using the latest version of everything.

Pytorch not accessing the GPU's memory

I'm trying to run a reinforcement learning algorithm using pytorch, but it keeps telling me that CUDA is out of memory. However, it seems that pytorch is only accessing a tiny amount of my GPU's memory.
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 4.00 GiB total capacity; 3.78 MiB already allocated; 0 bytes free; 4.00 MiB reserved in total by PyTorch)
It's not that PyTorch is only accessing a tiny amount of GPU memory, but your PyTorch program accumulatively allocated tensors to the GPU memory, and that 2 MB tensor hits the limitation. Try to use a lower batch size or run the model with half-precision to save the GPU memory.
This command should work for PyTorch to access the GPU memory.
> import os os.environ["CUDA_VISIBLE_DEVICES"] = "1"

cuda out of memory while trying to train a model with Transformer model of spacy

I'm trying to train a custom NER model on top of Spacy transformer model.
For faster training, I'm running it on a GPU with cuda on Google Colab Pro with High-Ram server.
After the first iteration, I get an error:
RuntimeError: CUDA out of memory. Tried to allocate 480.00 MiB (GPU 0;
11.17 GiB total capacity; 9.32 GiB already allocated; 193.31 MiB free; 10.32 GiB reserved in total by PyTorch)
For the above error I tried emptying cache too. But still I'm getting this error. Seems like it’s not freeing up enough space.
import torch
torch.cuda.empty_cache()
Moreover I also tried reducing the batch sizes even to 2. Still the same error occurs.

Memory leak with en_core_web_trf model, Spacy

there is a Memory leak when using pipe of en_core_web_trf model, I run the model using GPU with 16GB RAM, here is a sample of the code.
!python -m spacy download en_core_web_trf
import en_core_web_trf
nlp = en_core_web_trf.load()
#it's just an array of 100K sentences.
data = dataload()
for index, review in enumerate( nlp.pipe(data, batch_size=100) ):
#doing some processing here
if index % 1000: print(index)
this code cracks when reaching 31K, and raises OOM error.
CUDA out of memory. Tried to allocate 46.00 MiB (GPU 0; 11.17 GiB total capacity; 10.44 GiB already allocated; 832.00 KiB free; 10.72 GiB reserved in total by PyTorch)
I just use the pipeline to predict, not train any data or other stuff and tried with different batch sizes, but nothing happened,
still, crash.
Your Environment
spaCy version: 3.0.5
Platform: Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.10
Pipelines: en_core_web_trf (3.0.0)
Lucky you with GPU - I am still trying to get thru the (torch GPU) DLL Hell on Windows :-). But it looks like Spacy 3 uses more GPU memory than Spacy 2 did - my 6GB GPU may have become useless.
That said, have you tried running your case without the GPU (and watching memory usage)?
Spacy 2 'leak' on large datasets is (mainly) due to growing vocabulary - each data row may add couple more words, and the suggested 'solution' is reloading the model and/or just the vocabulary every nnn rows. The GPU usage may have the same issue...

Resources