Meaning of each entry in torch.autograd.profiler.profile - pytorch

All
I just try using the torch.autograd.profiler.profile
There are several entries
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
where could I find the explanation for these? does the time calculated in CPU total also overlap with the time in CUDA total if an operator involves the GPU computation?
Thanks!

Related

what's the difference between CPU time sum and CPU time avg azure

What's the difference between CPU time sum and CPU time avg?
Why is the CPU time avg larger than CPU time?
Its seems you are trying to know about azure CPU calculation metrics. Please have a look.
CPU Time: The amount of CPU consumed by each app in seconds, because one of their quotas is defined in CPU minutes used by the app. Its calculated over one application uses.
CPU percentage: CPU percentage is a good indication of the overall usage across all instances. Let's say, you have 5 application these metrics calculated all of your application uses in average.
Why CPU time avg is large than CPU time?
I think in your screen shot given metrics is alright where your total CPU time is 18.05 that mean all if your apps consume this amount and each application consume on average 2.10
See the screen shot
For details you could take a look official docs

tensorflow - Unexpected behavior of per_process_gpu_memory_fraction

I am trying to limit the gpu memory usage to exactly 10% of the gpu memory, but according to nvidia-smi the below program uses about 13% of the gpu. Is this expected behavior? If it is expected behavior, what is the other approximately 3-4% coming from?
from time import sleep
i = tf.constant(0)
x = tf.constant(10)
r = tf.add(i,x)
# Use at most 10% of gpu memory, I expect this to set a hard limit
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=.1)
# sleep is used to see what nvidia-smi says for gpu memory usage,
# I expect that it will be at most 10% of gpu memory (which is 1616.0 mib for my gpu)
# but instead I see the process using up to 2120 mib
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
sess.run(r);
sleep(10)
See this github issue for more details about my environment and gpu: https://github.com/tensorflow/tensorflow/issues/22158
From my experimentation, it looks like cudnn and cublas context initialization take around 228 mb of memory. Also, the cuda context can take from 50 to 118 mb.

Force GPU memory limit in PyTorch

Is there a way to force a maximum value for the amount of GPU memory that I want to be available for a particular Pytorch instance? For example, my GPU may have 12Gb available, but I'd like to assign 4Gb max to a particular process.
Update (04-MAR-2021): it is now available in the stable 1.8.0 version of PyTorch. Also, in the docs
Original answer follows.
This feature request has been merged into PyTorch master branch. Yet, not introduced in the stable release.
Introduced as set_per_process_memory_fraction
Set memory fraction for a process.
The fraction is used to limit an caching allocator to allocated memory on a CUDA device.
The allowed value equals the total visible memory multiplied fraction.
If trying to allocate more than the allowed value in a process, will raise an out of
memory error in allocator.
You can check the tests as usage examples.
Update pytorch to 1.8.0
(pip install --upgrade torch==1.8.0)
function: torch.cuda.set_per_process_memory_fraction(fraction, device=None)
params:
fraction (float) – Range: 0~1. Allowed memory equals total_memory * fraction.
device (torch.device or int, optional) – selected device. If it is None the default CUDA device is used.
eg:
import torch
torch.cuda.set_per_process_memory_fraction(0.5, 0)
torch.cuda.empty_cache()
total_memory = torch.cuda.get_device_properties(0).total_memory
# less than 0.5 will be ok:
tmp_tensor = torch.empty(int(total_memory * 0.499), dtype=torch.int8, device='cuda')
del tmp_tensor
torch.cuda.empty_cache()
# this allocation will raise a OOM:
torch.empty(total_memory // 2, dtype=torch.int8, device='cuda')
"""
It raises an error as follows:
RuntimeError: CUDA out of memory. Tried to allocate 5.59 GiB (GPU 0; 11.17 GiB total capacity; 0 bytes already allocated; 10.91 GiB free; 5.59 GiB allowed; 0 bytes reserved in total by PyTorch)
"""
In contrast to tensorflow which will block all of the CPUs memory, Pytorch only uses as much as 'it needs'. However you could:
Reduce the batch size
Use CUDA_VISIBLE_DEVICES=# of GPU (can be multiples) to limit the GPUs that can be accessed.
To make this run within the program try:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

Meaning of values in CPU tab of resource monitor on windows 8.1

(Sorry for non english character in picture. Each column is thread/CPU/average CPU)
When I open CPU tab in resource monitor on Window 8.1, I see above values.
What's the difference between CPU and average CPU?
At first, I thought average CPU means avaerag usage per core but I have 4 cores so the value should be CPU=4*avg. CPU which is not.
Please let me know the meaning of CPU and average CPU values.
CPU. Current percent of CPU consumption by the process, or how much of the system's processing power is being devoted to this specific process.
Average CPU. This is average CPU consumption by the process over the past 60 seconds. This gives you a real-time look at what's happening on the system right now and for the past minute.
http://www.techrepublic.com/blog/the-enterprise-cloud/use-resource-monitor-to-monitor-cpu-performance/

Calculating CPU usage % in Linux

Is the CPU utilization percent calculated by computing 100 - idle% when using mpstat to calculate the average CPU usage? Is this correct or do I have to compute the CPU utilization in any other way? Thanks!

Resources