How to monitor GPU memory usage when training a DNN? - pytorch

I give a result example. I want to ask how to get the data like this graph.

You can use pytorch commands such as torch.cuda.memory_stats to get information about current GPU memory usage and then create a temporal graph based on these reports.

I think this is the best
torch.cuda.mem_get_info
it returns the global free and total GPU memory occupied for a given device using cudaMemGetInfo.

Another method is using nvidia-smi which looks like this
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 On | N/A |
| 0% 50C P8 12W / 215W | 1088MiB / 8113MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1091 G /usr/lib/xorg/Xorg 24MiB |
| 0 N/A N/A 1158 G /usr/bin/gnome-shell 48MiB |
+-----------------------------------------------------------------------------+
And use subprocess to get the string for example
import subprocess
import re
command = 'nvidia-smi'
while True:
p = subprocess.check_output(command)
ram_using = re.findall(r'\b\d+MiB+ /', str(p))[0][:-5]
ram_total = re.findall(r'/ \b\d+MiB', str(p))[0][3:-3]
ram_percent = int(ram_using) / int(ram_total)
Or just split the line with str(p).split('\n') and count the string length

Related

How to set GPU count to 0 using os.environ['CUDA_VISIBLE_DEVICES'] =""?

So I have the following GPU configured in my system:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 461.33 Driver Version: 461.33 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100S-PCI... TCC | 00000000:3B:00.0 Off | 0 |
| N/A 30C P0 25W / 250W | 1MiB / 32642MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100S-PCI... TCC | 00000000:D8:00.0 Off | 0 |
| N/A 31C P0 25W / 250W | 1MiB / 32642MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Now, via python, I have to set the environment, such that, GPU count = 0.
I have tried the following, after learning from various sources:
import os
os.environ["CUDA_VISIBLE_DEVICES"]=""
import torch
torch.device_count()
But, it still gives me the output as "2" as in for 2 GPUs in the system.
How to set the environment, such that it outputs "0" ?
Any other way, to set the count to "0" is also appreciated but it should be any ML-Library agnostic. (For example, I can't use device = torch.device("cpu") as this will work only for Pytorch and not for other libraries)
To prevent your GPU from being used, set os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
The easiest way to do this is to run python with the correct environment set. For example, on Linux
CUDA_VISIBLE_DEVICES="" python ...
The following should also work:
os.environ["CUDA_VISIBLE_DEVICES"]=""
But this must be done before you first import torch.
What I think is happening in your case is you must be importing torch earlier, perhaps indirectly via some libraries that use torch.
os.environ["CUDA_VISIBLE_DEVICES"]="-1"
should set you to not use GPUs. From https://sodocumentation.net/tensorflow/topic/10621/tensorflow-gpu-setup#run-tensorflow-on-cpu-only---using-the--cuda-visible-devices--environment-variable-
os.environ["CUDA_VISIBLE_DEVICES"]="0,1"
torch.cuda.device_count() # result is 2
os.environ["CUDA_VISIBLE_DEVICES"]="0"
torch.cuda.device_count() # result is 1, using first GPU
os.environ["CUDA_VISIBLE_DEVICES"]="1"
torch.cuda.device_count() # result is 1, using second GPU

SLURM srun print log instance-wise

While using slurm on multi-node cluster,
I ran
srun -N 2 -C worker nvidia-smi
The output of this command is mangled/interleaved instead of in order.
Example output:
Tue Dec 15 22:37:55 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
Tue Dec 15 22:37:55 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:16.0 Off | 0 |
| N/A 46C P0 42W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 0 Tesla V100-SXM2... On | 00000000:00:16.0 Off | 0 |
| N/A 40C P0 44W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:00:17.0 Off | 0 |
| N/A 49C P0 46W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:00:17.0 Off | 0 |
| N/A 39C P0 43W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Expected output:
Instance 1
Tue Dec 15 22:37:55 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
+-------------------------------+----------------------+----------------------+
| 0 Tesla V100-SXM2... On | 00000000:00:16.0 Off | 0 |
| N/A 40C P0 44W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:00:17.0 Off | 0 |
| N/A 39C P0 43W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Instance 2
Tue Dec 15 22:37:55 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:16.0 Off | 0 |
| N/A 46C P0 42W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:00:17.0 Off | 0 |
| N/A 49C P0 46W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
You can use the --label option to prefix each output line with the task number, and then use sort to group lines together:
srun --label -N 2 -C worker nvidia-smi | sort -n

Tensorflow MirroredStrategy() looks like it may only be working on one GPU?

I finally got a computer with 2 gpus, and tested out https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html and https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10_estimator and confirmed that both gpus are being utilized in each(The wattage increases to 160-180 on both, Memory is almost maxed out on both, and GPU-Util increased to about 45% on both at the same time).
So I decided I would try out tensorflow's MirroredStrategy() on an exitsting neural net I had trained with one GPU in the past.
What I don't understand is that the wattage increases in both, and the memory is pretty much maxed in both but only one GPU looks like it is being utilized at 98% and the other one just chills at 3%. Am I messing something up in my code? Or is this working as designed?
strategy = tensorflow.distribute.MirroredStrategy()
with strategy.scope():
model = tensorflow.keras.models.Sequential([
tensorflow.keras.layers.Dense(units=427, kernel_initializer='uniform', activation='relu', input_dim=853),
tensorflow.keras.layers.Dense(units=427, kernel_initializer='uniform',activation='relu'),
tensorflow.keras.layers.Dense(units=1, kernel_initializer='uniform', activation='sigmoid')])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=1000, epochs=100)
nvidia-smi:
Fri Nov 22 09:26:21 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp COLLEC... Off | 00000000:0A:00.0 Off | N/A |
| 24% 47C P2 81W / 250W | 11733MiB / 12196MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp COLLEC... Off | 00000000:41:00.0 On | N/A |
| 28% 51C P2 64W / 250W | 11736MiB / 12187MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2506 C python3 11721MiB |
| 1 1312 G /usr/lib/xorg/Xorg 18MiB |
| 1 1353 G /usr/bin/gnome-shell 51MiB |
| 1 1620 G /usr/lib/xorg/Xorg 108MiB |
| 1 1751 G /usr/bin/gnome-shell 72MiB |
| 1 2506 C python3 11473MiB |
+-----------------------------------------------------------------------------+

How to check if keras training is already running in a GPU?

Sometimes I make a mistake and try to run two simultaneous trainings with keras in the same GPU (two different scripts), making my machine crash or breaking both trainings.
I would like to be able to test in my script if there is some training running and therefore either change of gpu or stop the new training.
The only hint I found searching for an answer is to use nvidia-smi to check processes running in gpus?
An example of nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 411.63 Driver Version: 411.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp WDDM | 00000000:03:00.0 Off | N/A |
| 42% 67C P2 81W / 250W | 10114MiB / 12288MiB | 54% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp WDDM | 00000000:04:00.0 Off | N/A |
| 35% 58C P2 144W / 250W | 10315MiB / 12288MiB | 73% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 11660 C ...\conda\envs\tensorflow18-gpu\python.exe N/A |
| 1 1532 C+G Insufficient Permissions N/A |
| 1 5388 C+G C:\Windows\explorer.exe N/A |
| 1 6648 C+G Insufficient Permissions N/A |
| 1 7396 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 1 7688 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
| 1 9808 C ...\conda\envs\tensorflow18-gpu\python.exe N/A |
| 1 10820 C+G Insufficient Permissions N/A |
| 1 11232 C+G ...x64__8wekyb3d8bbwe\Microsoft.Photos.exe N/A |
+-----------------------------------------------------------------------------+
In this case there is python.exe running in GPU 0 and in GPU 1.
Is there a more direct solution? Thanks
You can try this python package, GPUtil

Does Theano actually use 100% of the CPU when it is configured to use the GPU?

I noticed that when I configure Theano to use the GPU, and run some scripts, the CPU is ~100% used:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5927 jjjjjj 20 0 0.259t 0.025t 83228 R 100.2 20.2 33025:42 python
8259 jjjjjj 20 0 0.239t 5.303g 102876 R 100.2 4.2 8209:45 python
7791 jjjjjj 20 0 0.239t 5.086g 102872 R 99.8 4.0 8209:36 python
7761 jjjjjj 20 0 0.239t 5.193g 104604 R 99.5 4.1 7267:47 python
Does this mean that the CPU is the bottleneck? I.e., should I infer that if I replace the CPU with a CPU that has a higher frequency, the script will run faster? Or could it be that the bottleneck is somewhere else and the CPU is actively waiting? If both are a possibility, how do I know which one is the bottleneck?
Here is the output of nvidia-smi:
Tue Sep 27 13:55:13 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:02:00.0 Off | N/A |
| 32% 73C P2 95W / 250W | 207MiB / 12287MiB | 45% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 0000:03:00.0 Off | N/A |
| 32% 72C P2 94W / 250W | 182MiB / 12287MiB | 40% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX TIT... Off | 0000:82:00.0 Off | N/A |
| 33% 73C P2 93W / 250W | 207MiB / 12287MiB | 43% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX TIT... Off | 0000:83:00.0 Off | N/A |
| 42% 81C P2 148W / 250W | 11872MiB / 12287MiB | 79% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 8259 C python 157MiB |
| 1 7791 C python 157MiB |
| 2 7761 C python 157MiB |
| 3 5927 C python 11847MiB |
+-----------------------------------------------------------------------------+
Theano performs all GPU operations asynchronously, meaning that the function call will not wait for the operation to finish. Synchronization is needed before transferring data between the host and the GPU. By default, synchronization is performed by polling the GPU in a busy loop. Consequently, the operating system will always show 100 % CPU usage, even if the program is GPU bound.
Setting the gpuarray.sched=multi flag causes the CPU thread to go to sleep while waiting for the GPU to finish computing. You can use this to check if your program is actually CPU or GPU bound. It takes a little bit more time to wake up, but then your CPU is available for other processes while waiting for the GPU.
Note that ps shows the CPU usage during the entire lifetime of a process. So if your program first does something on CPU before using Theano, the displayed CPU usage will be high. You can get the current value using top -p <PID>.

Resources