Specifying gpus in keras backend without CUDA_VISIBLE - python-3.x

I have a problem where I have to include both keras and pytorch frameworks in the same program. The frozen feature extractor is in Keras. When I run the Keras feature extractor in multi gpu mode (4 gpus) it takes all the gpus for its execution leaving no space for the second half (pytorch model) I would like to allocate 2 gpus to keras model and the other gpus to the pytorch model. If I set cuda.visible_devices='0,1', it does not allow pytorch to use the other 3 gpus as well which is a sticky situation.
Another workaround is to set the keras backend session:
config = tf.ConfigProto( device_count = {'GPU': 1} )
sess = tf.Session(config=config)
keras.backend.set_session(sess)
But here, I want to specify which gpus can be used by the session (by ID).
The error I am getting is CUDA out of memory error while pushing the data onto the gpu before the second half (pytorch section) because keras has already declared gpu occupancy.
RuntimeError: CUDA out of memory. Tried to allocate 204.00 MiB (GPU 2; 10.92 GiB total capacity; 14.54 MiB already allocated; 151.50 MiB free; 9.46 MiB cached)
I can specify GPUs with pytorch. I just want to do the same with keras.

Related

Detectron2 Segmentation training : out of memory while training the Detectron2 mask-rcnn model on GPU

I tried almost all the option to train the model including reducing batch size to 1 and some other steps as described here
How do I select which GPU to run a job on?,
But still i get the error
RuntimeError: CUDA out of memory. Tried to allocate 238.00 MiB (GPU 3; 15.90 GiB total capacity; 15.20 GiB already allocated; 1.88 MiB free; 9.25 MiB cached)
This is the notebook , configured in Azure ML workspace with N24-GPU
thank you
Check your memory usage before you start training, sometimes detectron2 doesn't free vram after use, particularly if training crashes. If this is the case, the easiest way to fix the issue in the short term is a reboot.
As for a long term fix to this issue, I cant give any advise other than ensuring your using the latest version of everything.

Pytorch not accessing the GPU's memory

I'm trying to run a reinforcement learning algorithm using pytorch, but it keeps telling me that CUDA is out of memory. However, it seems that pytorch is only accessing a tiny amount of my GPU's memory.
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 4.00 GiB total capacity; 3.78 MiB already allocated; 0 bytes free; 4.00 MiB reserved in total by PyTorch)
It's not that PyTorch is only accessing a tiny amount of GPU memory, but your PyTorch program accumulatively allocated tensors to the GPU memory, and that 2 MB tensor hits the limitation. Try to use a lower batch size or run the model with half-precision to save the GPU memory.
This command should work for PyTorch to access the GPU memory.
> import os os.environ["CUDA_VISIBLE_DEVICES"] = "1"

cuda out of memory while trying to train a model with Transformer model of spacy

I'm trying to train a custom NER model on top of Spacy transformer model.
For faster training, I'm running it on a GPU with cuda on Google Colab Pro with High-Ram server.
After the first iteration, I get an error:
RuntimeError: CUDA out of memory. Tried to allocate 480.00 MiB (GPU 0;
11.17 GiB total capacity; 9.32 GiB already allocated; 193.31 MiB free; 10.32 GiB reserved in total by PyTorch)
For the above error I tried emptying cache too. But still I'm getting this error. Seems like it’s not freeing up enough space.
import torch
torch.cuda.empty_cache()
Moreover I also tried reducing the batch sizes even to 2. Still the same error occurs.

Memory leak with en_core_web_trf model, Spacy

there is a Memory leak when using pipe of en_core_web_trf model, I run the model using GPU with 16GB RAM, here is a sample of the code.
!python -m spacy download en_core_web_trf
import en_core_web_trf
nlp = en_core_web_trf.load()
#it's just an array of 100K sentences.
data = dataload()
for index, review in enumerate( nlp.pipe(data, batch_size=100) ):
#doing some processing here
if index % 1000: print(index)
this code cracks when reaching 31K, and raises OOM error.
CUDA out of memory. Tried to allocate 46.00 MiB (GPU 0; 11.17 GiB total capacity; 10.44 GiB already allocated; 832.00 KiB free; 10.72 GiB reserved in total by PyTorch)
I just use the pipeline to predict, not train any data or other stuff and tried with different batch sizes, but nothing happened,
still, crash.
Your Environment
spaCy version: 3.0.5
Platform: Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.10
Pipelines: en_core_web_trf (3.0.0)
Lucky you with GPU - I am still trying to get thru the (torch GPU) DLL Hell on Windows :-). But it looks like Spacy 3 uses more GPU memory than Spacy 2 did - my 6GB GPU may have become useless.
That said, have you tried running your case without the GPU (and watching memory usage)?
Spacy 2 'leak' on large datasets is (mainly) due to growing vocabulary - each data row may add couple more words, and the suggested 'solution' is reloading the model and/or just the vocabulary every nnn rows. The GPU usage may have the same issue...

Why Does PyTorch Use So Much GPU Memory to Store Tensors?

I made a simple test using PyTorch which involves measuring the current GPU memory use, creating a tensor of a certain size, moving it to the GPU, and measuring the GPU memory again. According to my calculation, around 6.5k bytes were required to store each element of the tensor! Here is the break down:
GPU memory use before creating the tensor as shown by nvidia-smi: 384 MiB.
Create a tensor with 100,000 random elements:
a = torch.rand(100000)
Transfer the tensor to the GPU:
device = torch.device('cuda')
b = a.to(device)
GPU memory use after the transfer: 1020 MiB
Calculate memory change per one element of the tensor
(1020-384)*1024*1024/len(b)
# Answer is 6668.94336
This is weird, to say the least. Why would 6.5 KiB of GPU memory be required to store a single float32 element?
UPDATE: Following Robert Crovella's suggestion in the comment, I created another tensor c and then moved it to the CUDA device on d. The GPU memory usage didn't increase. So, it seems that PyTorch or CUDA requires some 636 MiB for bootstrapping. Why is that? What is this memory used for? It seems a lot to me!

Resources