I try to use 4 gpus in my labtop. so i use the command
new_model = multi_gpu_model(model, gpus=4)
It works very well, but there are one problem.
only the first gpu has more gpu memory.
below is my code and the state of my gpus memory checked by watch nvidia-smi
The first gpu0 has increase only when make up the model before command new_model = multi_gpu_model(model, gpus=4)
in that time, 4000MiB memory occurs in the first gpu0.
Why this unfaired gpu usage occurs?
and how can i use the 4 gpus equally?
please give me the hint.
thanks a lot.
Related
I am using torch NMS as part of Detecron2, though for some reason I see that it runs on the CPU (as can be seen in the attached image of the profiling). Though when I debug it, I see that both the inputs and the outputs of the NMS are on the GPU so it is unclear why in the profiling it looks like it is on the CPU.
I'll appreciate any help in the matter.
Thanks!
Profling
I made sure the tensors are on the GPU but still it looks like it is on the CPU in the profiling.
I'm trying to free up GPU memory after finishing using the model.
I checked the nvidia-smi before creating and trainning the model: 402MiB / 7973MiB
After creating and training the model, I checked again the GPU memory status with nvidia-smi: 7801MiB / 7973MiB
Now I tried to free up GPU memory with:
del model
torch.cuda.empty_cache()
gc.collect()
and checked again the GPU memory: 2361MiB / 7973MiB
As you can see not all the GPU memory was released (I expected to get 400~MiB / 7973MiB).
I can only relase the GPU memory via terminal (sudo fuser -v /dev/nvidia* and kill pid)
Is there a way to free up the GPU memory after I done using the model ?
This happens becauce pytorch reserves the gpu memory for fast memory allocation. To learn more about it, see pytorch memory management. To solve this issue, you can use the following code:
from numba import cuda
cuda.select_device(your_gpu_id)
cuda.close()
However, this comes with a catch. It closes the GPU completely. So, you can't start training without restarting everything.
I am training a pretty intensive ML model using a GPU and what will often happen that if I start training the model, then let it train for a couple of epochs and notice that my changes have not made a significant difference in the loss/accuracy, I will make edits, re-initialize the model and re-start training from epoch 0. In this case, I often get OOM errors.
My guess is that despite me overriding all the model variables something is still taking up space in-memory.
Is there a way to clear the memory of the GPU in Tensorflow 1.15 so that I don't have to keep restarting the kernel each time I want to start training from scratch?
It depends on exactly what GPUs you're using. I'm assuming you're using NVIDIA, but even then depending on the exact GPU there are three ways to do this-
nvidia-smi -r works on TESLA and other modern variants.
nvidia-smi --gpu-reset works on a variety of older GPUs.
Rebooting is the only options for the rest, unfortunately.
i am facing issue with my inception model during the performance testing with Apache JMeter.
Error: OOM when allocating tensor with shape[800,1280,3] and type
float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator
GPU_0_bfc [[Node: Cast = CastDstT=DT_FLOAT, SrcT=DT_UINT8,
_device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens,
add report_tensor_allocations_upon_oom to RunOptions for current
allocation info.
OOM stands for Out Of Memory. That means that your GPU has run out of space, presumably because you've allocated other tensors which are too large. You can fix this by making your model smaller or reducing your batch size. By the looks of it, you're feeding in a large image (800x1280) you may want to consider downsampling.
If you have multiple GPUS at hand, kindly select a GPU which is not as busy as this one, (possible reasons, other processes are also running on this GPU). Go to terminal and type
export CUDA_VISIBLE_DEVICES=1
where 1 is the number of other GPU available. re-run the same code.
you can check the available GPUs using
nvidia-smi
this will show you what GPUs are available and how much memory is available on each one of them
Hello Everyone,
I am working on a Image classification problem using tensorflow and Convolution Neural Network.
My model is having following layers.
Input image of size 2456x2058
3 convolution Layer {Con1-shape(10,10,1,32); Con2-shape(5,5,32,64); Con3-shape(5,5,64,64)}
3 max pool 2x2 layer
1 fully connected layer.
I have tried using the NVIDIA-SMI tool but it shows me the GPU memory consumption as the model runs.
I would like to know if there is any method or a way to find the estimate of memory before running the model on GPU. So that I can design models with the consideration of available memory.
I have tried using this method for estimation but my calculated memory and observed memory utilisation are no where near to each other.
Thank you all for your time.
As far as I understand, when you open a session with tensorflow-gpu, it allocates all the memory in the GPUS that are available. So, when you look at the nvidia-smi output, you will always see the same amount of used memory, even if it actually uses only a part of it. There are options when opening a session to force tensorflow to allocate only a part of the available memory (see How to prevent tensorflow from allocating the totality of a GPU memory? for instance)
You can control the memory allocation of GPU in TensorFlow. Once you calculated your memory requirements for your Deep learning model you can use tf.GPUOptions.
For example if you want to allocate 4 GB(approximately) of GPU memory out of 8 GB.
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)
Once done pass it in tf.Session using config parameter
The per_process_gpu_memory_fraction is used to bound the available amount of GPU memory.
Here's the link to documentation :-
https://www.tensorflow.org/tutorials/using_gpu
NVIDIA-SMI ... shows me the GPU memory consumption as the model run
TF preallocates all available memory when you use it, so NVIDIA-SMI would show nearly 100% memory usage ...
but my calculated memory and observed memory utilisation are no where near to each other.
.. so this is unsurprising.