Memory leak with en_core_web_trf model, Spacy - memory-leaks

there is a Memory leak when using pipe of en_core_web_trf model, I run the model using GPU with 16GB RAM, here is a sample of the code.
!python -m spacy download en_core_web_trf
import en_core_web_trf
nlp = en_core_web_trf.load()
#it's just an array of 100K sentences.
data = dataload()
for index, review in enumerate( nlp.pipe(data, batch_size=100) ):
#doing some processing here
if index % 1000: print(index)
this code cracks when reaching 31K, and raises OOM error.
CUDA out of memory. Tried to allocate 46.00 MiB (GPU 0; 11.17 GiB total capacity; 10.44 GiB already allocated; 832.00 KiB free; 10.72 GiB reserved in total by PyTorch)
I just use the pipeline to predict, not train any data or other stuff and tried with different batch sizes, but nothing happened,
still, crash.
Your Environment
spaCy version: 3.0.5
Platform: Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.10
Pipelines: en_core_web_trf (3.0.0)

Lucky you with GPU - I am still trying to get thru the (torch GPU) DLL Hell on Windows :-). But it looks like Spacy 3 uses more GPU memory than Spacy 2 did - my 6GB GPU may have become useless.
That said, have you tried running your case without the GPU (and watching memory usage)?
Spacy 2 'leak' on large datasets is (mainly) due to growing vocabulary - each data row may add couple more words, and the suggested 'solution' is reloading the model and/or just the vocabulary every nnn rows. The GPU usage may have the same issue...

Related

Trying to train a spacy model using GPU but PyTorch throws the "Out Of Memory Error"

I'm Trying to build a parser to recognize entities from text given and to train the spacy model with GPU I went ahead and installed all the necessary packages but while training it runs the first iteration and throws the "Cuda Out Of Memory Error"
i was using this command on my terminal to train the model: python -m spacy train C:\Users\User\PycharmProjects\pythonProject\pythonProject3\config.cfg --output ./output --paths.train ./train_data.spacy --paths.dev ./test_data.spacy --gpu-id 0
And after the First Iteration it gives me this error:
RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 6.00 GiB total capacity; 4.99 GiB already allocated; 0 bytes free; 5.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
What i dont seem to understand here is it mentions 0 bytes free. is there anything i can possibly do to provide some space and train my Model?

Detectron2 Segmentation training : out of memory while training the Detectron2 mask-rcnn model on GPU

I tried almost all the option to train the model including reducing batch size to 1 and some other steps as described here
How do I select which GPU to run a job on?,
But still i get the error
RuntimeError: CUDA out of memory. Tried to allocate 238.00 MiB (GPU 3; 15.90 GiB total capacity; 15.20 GiB already allocated; 1.88 MiB free; 9.25 MiB cached)
This is the notebook , configured in Azure ML workspace with N24-GPU
thank you
Check your memory usage before you start training, sometimes detectron2 doesn't free vram after use, particularly if training crashes. If this is the case, the easiest way to fix the issue in the short term is a reboot.
As for a long term fix to this issue, I cant give any advise other than ensuring your using the latest version of everything.

cuda out of memory while trying to train a model with Transformer model of spacy

I'm trying to train a custom NER model on top of Spacy transformer model.
For faster training, I'm running it on a GPU with cuda on Google Colab Pro with High-Ram server.
After the first iteration, I get an error:
RuntimeError: CUDA out of memory. Tried to allocate 480.00 MiB (GPU 0;
11.17 GiB total capacity; 9.32 GiB already allocated; 193.31 MiB free; 10.32 GiB reserved in total by PyTorch)
For the above error I tried emptying cache too. But still I'm getting this error. Seems like it’s not freeing up enough space.
import torch
torch.cuda.empty_cache()
Moreover I also tried reducing the batch sizes even to 2. Still the same error occurs.

Large datasets and Cuda memory Issue

I was processing a large dataset and ran into this error: "RuntimeError: CUDA out of memory. Tried to allocate 1.35 GiB (GPU 0; 8.00 GiB total capacity; 3.45 GiB already allocated; 1.20 GiB free; 4.79 GiB reserved in total by PyTorch).
Any thought on how to solve this?
I met the same problem before. It's not a bug, you just ran out of memory on your GPU.
One way to solve it is to reduce the batch size until your code will run without this error.
if it not works, better to understand your model. A single 8GiB GPU may not handle a large and deep model. You should consider changing a GPU with larger memory and find a lab to help you (Google Colab can help)
if you are just doing evaluate, force a tensor to be run CPU would be fine
Try model compression algorithm
If you are using full batch gradient descent (or similar), use mini batch instead with smaller batch size and reflect the same in dataloaders.

Force GPU memory limit in PyTorch

Is there a way to force a maximum value for the amount of GPU memory that I want to be available for a particular Pytorch instance? For example, my GPU may have 12Gb available, but I'd like to assign 4Gb max to a particular process.
Update (04-MAR-2021): it is now available in the stable 1.8.0 version of PyTorch. Also, in the docs
Original answer follows.
This feature request has been merged into PyTorch master branch. Yet, not introduced in the stable release.
Introduced as set_per_process_memory_fraction
Set memory fraction for a process.
The fraction is used to limit an caching allocator to allocated memory on a CUDA device.
The allowed value equals the total visible memory multiplied fraction.
If trying to allocate more than the allowed value in a process, will raise an out of
memory error in allocator.
You can check the tests as usage examples.
Update pytorch to 1.8.0
(pip install --upgrade torch==1.8.0)
function: torch.cuda.set_per_process_memory_fraction(fraction, device=None)
params:
fraction (float) – Range: 0~1. Allowed memory equals total_memory * fraction.
device (torch.device or int, optional) – selected device. If it is None the default CUDA device is used.
eg:
import torch
torch.cuda.set_per_process_memory_fraction(0.5, 0)
torch.cuda.empty_cache()
total_memory = torch.cuda.get_device_properties(0).total_memory
# less than 0.5 will be ok:
tmp_tensor = torch.empty(int(total_memory * 0.499), dtype=torch.int8, device='cuda')
del tmp_tensor
torch.cuda.empty_cache()
# this allocation will raise a OOM:
torch.empty(total_memory // 2, dtype=torch.int8, device='cuda')
"""
It raises an error as follows:
RuntimeError: CUDA out of memory. Tried to allocate 5.59 GiB (GPU 0; 11.17 GiB total capacity; 0 bytes already allocated; 10.91 GiB free; 5.59 GiB allowed; 0 bytes reserved in total by PyTorch)
"""
In contrast to tensorflow which will block all of the CPUs memory, Pytorch only uses as much as 'it needs'. However you could:
Reduce the batch size
Use CUDA_VISIBLE_DEVICES=# of GPU (can be multiples) to limit the GPUs that can be accessed.
To make this run within the program try:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

Resources