How do I get the total CPU usage of an application from /proc/pid/stat? - linux

I was wondering how to calculate the total CPU usage of a process.
If I do cat /proc/pid/stat, I think the relevant fields are (taken from
CPU time spent in user code, measured in jiffies
CPU time spent in kernel code, measured in jiffies
CPU time spent in user code, including time from children
CPU time spent in kernel code, including time from children
So is the total time spend the sum of fields 14 to 17?

To calculate CPU usage for a specific process you'll need the following:
#1 uptime of the system (seconds)
#14 utime - CPU time spent in user code, measured in clock ticks
#15 stime - CPU time spent in kernel code, measured in clock ticks
#16 cutime - Waited-for children's CPU time spent in user code (in clock ticks)
#17 cstime - Waited-for children's CPU time spent in kernel code (in clock ticks)
#22 starttime - Time when the process started, measured in clock ticks
Hertz (number of clock ticks per second) of your system.
In most cases, getconf CLK_TCK can be used to return the number of clock ticks.
The sysconf(_SC_CLK_TCK) C function call may also be used to return the hertz value.
First we determine the total time spent for the process:
total_time = utime + stime
We also have to decide whether we want to include the time from children processes. If we do, then we add those values to total_time:
total_time = total_time + cutime + cstime
Next we get the total elapsed time in seconds since the process started:
seconds = uptime - (starttime / Hertz)
Finally we calculate the CPU usage percentage:
cpu_usage = 100 * ((total_time / Hertz) / seconds)
Yes, you can say so. You can convert those values into seconds using formula:
sec = jiffies / HZ ; here - HZ = number of ticks per second
HZ value is configurable - done at kernel configuration time.

Here is my simple solution written in BASH. It is a linux/unix system monitor and process manager through procfs, like "top" or "ps". There is two versions simple monochrome(fast) and colored version(little bit slow, but useful especially for monitoring the statŠµ of processes). I made sorting by CPU usage.
utime, stime, cutime, cstime, starttime used for getting cpu usage and obtained from /proc/[pid]/stat file.
state, ppid, priority, nice, num_threads parameters obtained also from /proc/[pid]/stat file.
resident and data_and_stack parameters used for getting memory usage and obtained from /proc/[pid]/statm file.
function my_ps
pid_array=`ls /proc | grep -E '^[0-9]+$'`
clock_ticks=$(getconf CLK_TCK)
total_memory=$( grep -Po '(?<=MemTotal:\s{8})(\d+)' /proc/meminfo )
cat /dev/null >
for pid in $pid_array
if [ -r /proc/$pid/stat ]
stat_array=( `sed -E 's/(\([^\s)]+)\s([^)]+\))/\1_\2/g' /proc/$pid/stat` )
uptime_array=( `cat /proc/uptime` )
statm_array=( `cat /proc/$pid/statm` )
comm=( `grep -Po '^[^\s\/]+' /proc/$pid/comm` )
user_id=$( grep -Po '(?<=Uid:\s)(\d+)' /proc/$pid/status )
user=$( id -nu $user_id )
total_time=$(( $utime + $stime ))
#add $cstime - CPU time spent in user and kernel code ( can olso add $cutime - CPU time spent in user code )
total_time=$(( $total_time + $cstime ))
seconds=$( awk 'BEGIN {print ( '$uptime' - ('$starttime' / '$clock_ticks') )}' )
cpu_usage=$( awk 'BEGIN {print ( 100 * (('$total_time' / '$clock_ticks') / '$seconds') )}' )
memory_usage=$( awk 'BEGIN {print( (('$resident' + '$data_and_stack' ) * 100) / '$total_memory' )}' )
printf "%-6d %-6d %-10s %-4d %-5d %-4s %-4u %-7.2f %-7.2f %-18s\n" $pid $ppid $user $priority $nice $state $num_threads $memory_usage $cpu_usage $comm >>
printf "\e[30;107m%-6s %-6s %-10s %-4s %-3s %-6s %-4s %-7s %-7s %-18s\e[0m\n" "PID" "PPID" "USER" "PR" "NI" "STATE" "THR" "%MEM" "%CPU" "COMMAND"
sort -nr -k9 | head -$1

If need to calculate how much cpu% used by a process in last 10 secs
total_time (13+14) in jiffies => t1
starttime(22) in jiffies => s1
--delay of 10 secs
total_time (13+14) in jiffies => t2
starttime(22) in jiffies => s2
t2-t1 *100 / s2 - s1
wouldnt give the % ??

Here is another way that I got my App's CPU usage. I did this in Android, and it makes a kernel top call and gets the CPU usage for your apps PID using what top returns.
public void myWonderfulApp()
// Some wonderfully written code here
Integer lMyProcessID = android.os.Process.myPid();
int lMyCPUUsage = getAppCPUUsage( lMyProcessID );
// More magic
// Alternate way that I switched to. I found the first version was slower
// this version only returns a single line for the app, so far less parsing
// and processing.
public static float getTotalCPUUsage2()
// read global stats file for total CPU
BufferedReader reader = new BufferedReader(new FileReader("/proc/stat"));
String[] sa = reader.readLine().split("[ ]+", 9);
long work = Long.parseLong(sa[1]) + Long.parseLong(sa[2]) + Long.parseLong(sa[3]);
long total = work + Long.parseLong(sa[4]) + Long.parseLong(sa[5]) + Long.parseLong(sa[6]) + Long.parseLong(sa[7]);
// calculate and convert to percentage
return restrictPercentage(work * 100 / (float) total);
catch (Exception ex)
Logger.e(Constants.TAG, "Unable to get Total CPU usage");
// if there was an issue, just return 0
return 0;
// This is an alternate way, but it takes the entire output of
// top, so there is a fair bit of parsing.
public static int getAppCPUUsage( Integer aAppPID)
int lReturn = 0;
// make sure a valid pid was passed
if ( null == aAppPID && aAppPID > 0)
return lReturn;
// Make a call to top so we have all the processes CPU
Process lTopProcess = Runtime.getRuntime().exec("top");
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(lTopProcess.getInputStream()));
String lLine;
// While we have stuff to read and we have not found our PID, process the lines
while ( (lLine = bufferedReader.readLine()) != null )
// Split on 4, the CPU % is the 3rd field .
// NOTE: We trim because sometimes we had the first field in the split be a "".
String[] lSplit = lLine.trim().split("[ ]+", 4);
// Don't even bother if we don't have at least the 4
if ( lSplit.length > 3 )
// Make sure we can handle if we can't parse the int
// On the line that is our process, field 0 is a PID
Integer lCurrentPID = Integer.parseInt(lSplit[0]);
// Did we find our process?
if (aAppPID.equals(lCurrentPID))
// This is us, strip off the % and return it
String lCPU = lSplit[2].replace("%", "");
lReturn = Integer.parseInt(lCPU);
catch( NumberFormatException e )
// No op. We expect this when it's not a PID line
lTopProcess.destroy(); // Cleanup the process, otherwise you make a nice hand warmer out of your device
catch( IOException ex )
// Log bad stuff happened
catch (Exception ex)
// Log bad stuff happened
// if there was an issue, just return 0
return lReturn;

Here's what you're looking for:
//USER_HZ detection, from openssl code
#ifndef HZ
# if defined(_SC_CLK_TCK) \
&& (!defined(OPENSSL_SYS_VMS) || __CTRL_VER >= 70000000)
# define HZ ((double)sysconf(_SC_CLK_TCK))
# else
# ifndef CLK_TCK
# ifndef _BSD_CLK_TCK_ /* FreeBSD hack */
# define HZ 100.0
# else /* _BSD_CLK_TCK_ */
# define HZ ((double)_BSD_CLK_TCK_)
# endif
# else /* CLK_TCK */
# define HZ ((double)CLK_TCK)
# endif
# endif
This code is actually from cpulimit, but uses openssl snippets.


capturing pid that is changing frequently [duplicate]

I want to know the CPU utilization of a process and all the child processes, for a fixed period of time, in Linux.
To be more specific, here is my use-case:
There is a process which waits for a request from the user to execute the programs. To execute the programs, this process invokes child processes (maximum limit of 5 at a time) & each of this child process executes 1 of these submitted programs (let's say user submitted 15 programs at once). So, if user submits 15 programs, then 3 batches of 5 child processes each will run. Child processes are killed as soon as they finish their execution of the program.
I want to know about % CPU Utilization for the parent process and all its child process during the execution of those 15 programs.
Is there any simple way to do this using top or another command? (Or any tool i should attach to the parent process.)
You can find this information in /proc/PID/stat where PID is your parent process's process ID. Assuming that the parent process waits for its children then the total CPU usage can be calculated from utime, stime, cutime and cstime:
utime %lu
Amount of time that this process has been scheduled in user mode,
measured in clock ticks (divide by sysconf(_SC_CLK_TCK). This includes
guest time, guest_time (time spent running a virtual CPU, see below),
so that applications that are not aware of the guest time field do not
lose that time from their calculations.
stime %lu
Amount of time that this process has been scheduled in kernel mode,
measured in clock ticks (divide by sysconf(_SC_CLK_TCK).
cutime %ld
Amount of time that this process's waited-for children have been
scheduled in user mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK). (See also times(2).) This includes guest time,
cguest_time (time spent running a virtual CPU, see below).
cstime %ld
Amount of time that this process's waited-for children have been
scheduled in kernel mode, measured in clock ticks (divide by
See proc(5) manpage for details.
And of course you can do it in hardcore-way using good old C
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define MAX_CHILDREN 100
* System command execution output
* #param <char> command - system command to execute
* #returb <char> execution output
char *system_output (const char *command)
FILE *pipe;
static char out[1000];
pipe = popen (command, "r");
fgets (out, sizeof(out), pipe);
pclose (pipe);
return out;
* Finding all process's children
* #param <Int> - process ID
* #param <Int> - array of childs
void find_children (int pid, int children[])
char empty_command[] = "/bin/ps h -o pid --ppid ";
char pid_string[5];
snprintf(pid_string, 5, "%d", pid);
char *command = (char*) malloc(strlen(empty_command) + strlen(pid_string) + 1);
sprintf(command, "%s%s", empty_command, pid_string);
FILE *fp = popen(command, "r");
int child_pid, i = 1;
while (fscanf(fp, "%i", &child_pid) != EOF)
children[i] = child_pid;
* Parsign `ps` command output
* #param <char> out - ps command output
* #return <int> cpu utilization
float parse_cpu_utilization (const char *out)
float cpu;
sscanf (out, "%f", &cpu);
return cpu;
int main(void)
unsigned pid = 1;
// getting array with process children
int process_children[MAX_CHILDREN] = { 0 };
process_children[0] = pid; // parent PID as first element
find_children(pid, process_children);
// calculating summary processor utilization
unsigned i;
float common_cpu_usage = 0.0;
for (i = 0; i < sizeof(process_children)/sizeof(int); ++i)
if (process_children[i] > 0)
char *command = (char*)malloc(1000);
sprintf (command, "/bin/ps -p %i -o 'pcpu' --no-headers", process_children[i]);
common_cpu_usage += parse_cpu_utilization(system_output(command));
printf("%f\n", common_cpu_usage);
return 0;
gcc -Wall -pedantic --std=gnu99 find_cpu.c
Might not be the exact command. But you can do something like below to get cpu usage of various process and add it.
#ps -C sendmail,firefox -o pcpu= | awk '{s+=$1} END {print s}'
/proc/[pid]/stat Status information about the process. This is used by ps and made into human readable form.
Another way is to use cgroups and use cpuacct.
Here's one-liner to compute total CPU for all processes. You can adjust it by passing column filter into top output:
top -b -d 5 -n 2 | awk '$1 == "PID" {block_num++; next} block_num == 2 {sum += $9;} END {print sum}'

Linux clock_gettime() elapse spikes?

I'm try to get high resolution timestamp on linux. Using clock_gettime(), as below, I got "spike" elapses that looks pretty horrible at almost 26 micro second elapse. Most of the "dt"'s are around 30 ns. I was on linux 2.6.32, Red Hat 4.4.6. 'lscpu' shows CPU MHz=2666.121. I thought that means each each clock tick needs about 2 ns. So, asking for ns resolution didn't see like too unreasonable here.
output of program (sorry wasn't able to post this without making it a list. It thinks it's code some how)
1397534268,40823395 1397534268,40827950,dt=4555
1397534268,41233555 1397534268,41236716,dt=3161
1397534268,41389902 1397534268,41392922,dt=3020
1397534268,46488430 1397534268,46491674,dt=3244
1397534268,46531297 1397534268,46534279,dt=2982
1397534268,46823368 1397534268,46849336,dt=25968
1397534268,46915657 1397534268,46918663,dt=3006
1397534268,51488643 1397534268,51491791,dt=3148
1397534268,51530490 1397534268,51533496,dt=3006
1397534268,51823307 1397534268,51826904,dt=3597
1397534268,55823359 1397534268,55827826,dt=4467
1397534268,60531184 1397534268,60534183,dt=2999
1397534268,60823381 1397534268,60844866,dt=21485
1397534268,60913003 1397534268,60915998,dt=2995
1397534268,65823269 1397534268,65827742,dt=4473
1397534268,70823376 1397534268,70835280,dt=11904
1397534268,75823489 1397534268,75828872,dt=5383
1397534268,80823503 1397534268,80859500,dt=35997
1397534268,86823381 1397534268,86831907,dt=8526
Any ideas? thanks
#include <vector>
#include <iostream>
#include <time.h>
long long elapse( const timespec& t1, const timespec& t2 )
return ( t2.tv_sec * 1000000000L + t2.tv_nsec ) -
t1.tv_sec * 1000000000L + t1.tv_nsec );
int main()
const unsigned n=30000;
timespec ts;
std::vector<timespec> t( n );
for( unsigned i=0; i < n; ++i )
clock_gettime( CLOCK_REALTIME, &ts );
t[i] = ts;
std::vector<long> dt( n );
for( unsigned i=1; i < n; ++i )
dt[i] = elapse( t[i-1], t[i] );
if( dt[i] > 1000 )
std::cerr <<
t[i-1].tv_sec << ","
<< t[i-1].tv_nsec << " "
<< t[i].tv_sec << ","
<< t[i].tv_nsec
<< ",dt=" << dt[i] << std::endl;
//normally I get dt[i] = approx 30-35 nano secs
return 0;
The numbers you quoted are in the 3 to 30 microsecond range (3,000 to 30,000 nanoseconds). That is too short a time to be a context switch to another thread/process, let the other thread run, and context switch back to your thread. Most likely the core where your process was running was used by the kernel to service an external interrupt (e.g. network card, disk, timer), then returned to running your process.
You can watch the linux interrupt counters (per CPU core and per source) with this command
watch -d -n 0.2 cat /proc/interrupts
The -n 0.2 will cause the command to be issued at 5Hz, the -d flag will highlight what has changed.
The source of the interrupt could also be a TLB shootdown, which results in an IPI (Inter-Processor Interrupt). You can read more about TLB shootdowns here.
If you want to reduce the number of interrupts serviced by the core running your thread/process, you need to set the interrupt affinity. You can learn more about Red Hat Interrupts and IRQ (Interrupt requests) tuning here, and here.
Worth noting is that you are using CLOCK_REALTIME which isn't guaranteed to be "smooth", it could jump around as the system clock is "disciplined" to keep accurate time by a service like NTP (Network Time Protocol) or PTP (Precision Time Protocol). For your purposes it is better to use CLOCK_MONOTONIC, you can read more about the difference here. When a clock is "disciplined" the clock can jump by a "step" - this is unusual and certainly not the cause of the many spikes you see.
Could you check the resolution with clock_getres()?
I suspect what you are measuring here is called "OS Noise". This is often caused by your program getting pre-empted by the operating system. The operating system then performs other work. There are numerous causes, but commonly it is: other runnable tasks, hardware interrupts, or timer events.
The FTQ/FWQ benchmarks were designed to measure this characteristic and the summary contains some further information:

Accurately Calculating CPU Utilization in Linux using /proc/stat

There are a number of posts and references on how to get CPU Utilization using statistics in /proc/stat. However, most of them use only four of the 7+ CPU stats (user, nice, system, and idle), ignoring the remaining jiffie CPU counts present in Linux 2.6 (iowait, irq, softirq).
As an example, see Determining CPU utilization.
My question is this: Are the iowait/irq/softirq numbers also counted in one of the first four numbers (user/nice/system/idle)? In other words, does the total jiffie count equal the sum of the first four stats? Or, is the total jiffie count equal to the sum of all 7 stats? If the latter is true, then a CPU utilization formula should take all of the numbers into account, like this:
#include <stdio.h>
#include <stdlib.h>
int main(void)
long double a[7],b[7],loadavg;
FILE *fp;
fp = fopen("/proc/stat","r");
fscanf(fp,"%*s %Lf %Lf %Lf %Lf",&a[0],&a[1],&a[2],&a[3],&a[4],&a[5],&a[6]);
fp = fopen("/proc/stat","r");
fscanf(fp,"%*s %Lf %Lf %Lf %Lf",&b[0],&b[1],&b[2],&b[3],&b[4],&b[5],&b[6]);
loadavg = ((b[0]+b[1]+b[2]+b[4]+b[5]+b[6]) - (a[0]+a[1]+a[2]+a[4]+a[5]+a[6]))
/ ((b[0]+b[1]+b[2]+b[3]+b[4]+b[5]+b[6]) - (a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]));
printf("The current CPU utilization is : %Lf\n",loadavg);
I think iowait/irq/softirq are not counted in one of the first 4 numbers. You can see the comment of irqtime_account_process_tick in kernel code for more detail:
(for Linux kernel 4.1.1)
2815 * Tick demultiplexing follows the order
2816 * - pending hardirq update <-- this is irq
2817 * - pending softirq update <-- this is softirq
2818 * - user_time
2819 * - idle_time <-- iowait is included in here, discuss below
2820 * - system time
2821 * - check for guest_time
2822 * - else account as system_time
For the idle time handling, see account_idle_time function:
2772 /*
2773 * Account for idle time.
2774 * #cputime: the cpu time spent in idle wait
2775 */
2776 void account_idle_time(cputime_t cputime)
2777 {
2778 u64 *cpustat = kcpustat_this_cpu->cpustat;
2779 struct rq *rq = this_rq();
2781 if (atomic_read(&rq->nr_iowait) > 0)
2782 cpustat[CPUTIME_IOWAIT] += (__force u64) cputime;
2783 else
2784 cpustat[CPUTIME_IDLE] += (__force u64) cputime;
2785 }
If the cpu is idle AND there is some IO pending, it will count the time in CPUTIME_IOWAIT. Otherwise, it is count in CPUTIME_IDLE.
To conclude, I think the jiffies in irq/softirq should be counted as "busy" for cpu because it was actually handling some IRQ or soft IRQ. On the other hand, the jiffies in "iowait" should be counted as "idle" for cpu because it was not doing something but waiting for a pending IO to happen.
from busybox, its top magic is:
static const char fmt[] ALIGN1 = "cp%*s %llu %llu %llu %llu %llu %llu %llu %llu";
int ret;
if (!fgets(line_buf, LINE_BUF_SIZE, fp) || line_buf[0] != 'c' /* not "cpu" */)
return 0;
ret = sscanf(line_buf, fmt,
&p_jif->usr, &p_jif->nic, &p_jif->sys, &p_jif->idle,
&p_jif->iowait, &p_jif->irq, &p_jif->softirq,
if (ret >= 4) {
p_jif->total = p_jif->usr + p_jif->nic + p_jif->sys + p_jif->idle
+ p_jif->iowait + p_jif->irq + p_jif->softirq + p_jif->steal;
/* procps 2.x does not count iowait as busy time */
p_jif->busy = p_jif->total - p_jif->idle - p_jif->iowait;

CPU contention (wait time) for a process in Linux

How can I check how long a process spends waiting for the CPU in a Linux box?
For example, in a loaded system I want to check how long a SQL*Loader (sqlldr) process waits.
It would be useful if there is a command line tool to do this.
I've quickly slapped this together. It prints out the smallest and largest "interferences" from task switching...
#include <sys/time.h>
#include <stdio.h>
double seconds()
timeval t;
gettimeofday(&t, NULL);
return t.tv_sec + t.tv_usec / 1000000.0;
int main()
double min = 999999999, max = 0;
while (true)
double c = -(seconds() - seconds());
if (c < min)
min = c;
printf("%f\n", c);
if (c > max)
max = c;
printf("%f\n", c);
return 0;
Here's how you should go about measuring it. Have a number of processes, greater than the number of your processors * cores * threading capability wait (block) on an event that will wake them up all at the same time. One such event is a multicast network packet. Use an instrumentation library like PAPI (or one more suited to your needs) to measure the differences in real and virtual "wakeup" time between your processes. From several iterations of the experiment you can get an estimate of the CPU contention time for your processes. Obviously, it's not going to be at all accurate for multicore processors, but maybe it'll help you.
I had this problem some time back. I ended up using getrusage :
You can get detailed help at :
getrusage populates the rusage struct.
Measuring Wait Time with getrusage
You can call getrusage at the beginning of your code and then again call it at the end, or at some appropriate point during execution. You have then initial_rusage and final_rusage. The user-time spent by your process is indicated by rusage->ru_utime.tv_sec and system-time spent by the process is indicated by rusage->ru_stime.tv_sec.
Thus the total user-time spent by the process will be:
user_time = final_rusage.ru_utime.tv_sec - initial_rusage.ru_utime.tv_sec
The total system-time spent by the process will be:
system_time = final_rusage.ru_stime.tv_sec - initial_rusage.ru_stime.tv_sec
If total_time is the time elapsed between the two calls of getrusage then the wait time will be
wait_time = total_time - (user_time + system_time)
Hope this helps
