I am trying to figure out how the stats in the taskstats struct are adding up. I wrote a simple C program that runs for some time doing IO and exits. I monitor the stats of this program using the taskstats struct, which I get from the taskstats netlink multicast group. When I sum the values of cpu_delay_total, blkio_delay_total, swapin_delay_total, freepages_delay_total, ac_utime and ac_stime, I get a value that is about 0.5 seconds larger than the value of elapsed time (ac_etime)
Here are the statistics for a 3.5-second run:
ac_etime: 3536036
ac_utime: 172000
ac_stime: 3032000
cpu_delay_total: 792528445
blkio_delay_total: 46320128
swapin_delay_total: 0
freepages_delay_total: 0
Summing up values for delays, utime and stime yields 4042848.573 (divide the delays by 1000 to convert to microseconds), while etime is only 3536036!
Interestingly, the wall clock time gives the value that is practically equal to utime+stime: cpu_run_real_total: 3204000129, while ac_utime + ac_stime: 3204000
Does the cpu_run_real_total field give the cpu time, despite that the comment in taskstats.h clearly states that this is a wall clock time? And what could be the reason that the sum of these fields is larger than the elapsed time?
My kernel version is 3.2.0-38.
(1) cpu_run_real_total = ac_utime + ac_stime, I check the codes in ./kernel/delayacct.c, function __delayacct_add_tsk():
tmp = (s64)d->cpu_run_real_total;
cputime_to_timespec(tsk->utime + tsk->stime, &ts);
tmp += timespec_to_ns(&ts);
d->cpu_run_real_total = (tmp < (s64)d->cpu_run_real_total) ? 0 : tmp;
From the above codes, we know cpu_run_real_total is sum the utime and stime up.
(2) Why sum the values of cpu_delay_total, blkio_delay_total, swapin_delay_total, freepages_delay_total, ac_utime and ac_stime, the value is larger than the value of ac_etime?
I have not figured out why. But I have guess: the stime may somewhat overlap with the various *_delay_total counters.
Related
I'm implementing TCP-like RTT estimation in a custom protocol. When I look in the function
static void tcp_rtt_estimator(struct sock *sk, long mrtt){
long m = mrtt; /* RTT */
For the first iteration, when no previous RTT estimates have been done, the code snippet is
srtt = m << 3; /* take the measured time to be rtt */
Why isn't the value of m taken directly for srtt? As per my understanding, the parameter mrtt_us is just the value in jiffies for the current round-trip time measurement.
Is the above assumption about mrtt_us incorrect? If yes, then what value should I pass to this function?
P.S.- I've the measured RTT in jiffies which I'm currently passing to this function. Obviously, this is incorrect as the first srtt value becomes something other than the measured rtt because of srtt = m << 3
I figured this out from one of the mail chains on LKML at https://lkml.org/lkml/1998/9/12/41
It mentions that the stored SRTT is actually 8 times the real SRTT. I think it's done in such a way to provide higher precision in calculations.
So to answer the question, the measured value of RTT should be passed to this function in jiffies (Kernel version 3.13)
I am trying to understand linux kernel code for task scheduler.
I am not able to figure out what is goodness value and how it differs from niceness?
Also, how each of them contribute to scheduling?
AFAIK: Each process in the run queue is assigned a "goodness" value which determines how good a process is for running. This value is calculated by the goodness() function.
The process with higher goodness will be the next process to run. If no process is available for running, then the operating system selects a special idle task.
A first-approximation of "goodness" is calculated according to the number of ticks left in the process quantum. This means that the goodness of a process decreases with time, until it becomes zero when the time quantum of the task expires.
The final goodness is calculated in function of the niceness of the process. So basically goodness of a process is a combination of the Time slice left logic and nice value.
static inline int goodness(struct task_struct * p, int this_cpu, struct mm_struct *this_mm)
{
int weight;
/*
* select the current process after every other
* runnable process, but before the idle thread.
* Also, dont trigger a counter recalculation.
*/
weight = -1;
if (p->policy & SCHED_YIELD)
goto out;
if (p->policy == SCHED_OTHER) {
/*
* Give the process a first-approximation goodness value
* according to the number of clock-ticks it has left.
*
* Don't do any other calculations if the time slice is
* over..
*/
weight = p->counter;
if (!weight)
goto out;
...
weight += 20 - p->nice;
goto out;
}
/* code for real-time processes goes here */
out:
return weight;
}
TO UNDERSTAND and REMEMBER NICE value: remember this.
The etymology of Nice value is that a "nicer" process allows for more time for other processes to run, hence lower nice value translates to higher priority.
So,
Higher the goodness value -> More likely to Run
Higher the nice value -> Less likely to run.
Also Goodness value is calculated using the nice value as
20 - p->nice
Since nice value ranges from -20 ( highest priority) to 19 ( lowest priority)
lets assume a nice value of -20 ( EXTREMELY NOT NICE). hence 20 - (-20) = 40
Which means the goodness value has increased, hence this process will be chosen.
I was reading about calculating the cpu usage of a process.
seconds = utime / Hertz
total_time = utime + stime
IF include_dead_children
total_time = total_time + cutime + cstime
ENDIF
seconds = uptime - starttime / Hertz
pcpu = (total_time * 1000 / Hertz) / seconds
print: "%CPU" pcpu / 10 "." pcpu % 10
What I don't get is, by 'seconds' the algorithm means the time computer spent doing operations other than the interested process, and before it. Since, uptime is the time our computer spent being operational and starttime means the time our [interested] process started.
Then why are we dividing the total_time by seconds [Time computer spent doing something else] to get pcpu? It doesn't make sense.
The standard meanings of the variables:
# Name Description
14 utime CPU time spent in user code, measured in jiffies
15 stime CPU time spent in kernel code, measured in jiffies
16 cutime CPU time spent in user code, including time from children
17 cstime CPU time spent in kernel code, including time from children
22 starttime Time when the process started, measured in jiffies
/proc/uptime :The uptime of the system (seconds), and the amount of time spent in idle process (seconds).
Hertz :Number of clock ticks per second
Now that you've provided what each of the variables represent, here's some comments on the pseudo-code:
seconds = utime / Hertz
The above line is pointless, as the new value of seconds is never used before it's overwritten a few lines later.
total_time = utime + stime
Total running time (user + system) of the process, in jiffies, since both utime and stime are.
IF include_dead_children
total_time = total_time + cutime + cstime
ENDIF
This should probably just say total_time = cutime + cstime, since the definitions seem to indicate that, e.g. cutime already includes utime, plus the time spent by children in user mode. So, as written, this overstates the value by including the contribution from this process twice. Or, the definition is wrong... Regardless, the total_time is still in jiffies.
seconds = uptime - starttime / Hertz
uptime is already in seconds; starttime / Hertz converts starttime from jiffies to seconds, so seconds becomes essentially "the time in seconds since this process was started".
pcpu = (total_time * 1000 / Hertz) / seconds
total_time is still in jiffies, so total_time / Hertz converts that to seconds, which is the number of CPU seconds consumed by the process. That divided by seconds would give the scaled CPU-usage percentage since process start if it were a floating point operation. Since it isn't, it's scaled by 1000 to give a resolution of 1/10%. The scaling is forced to be done early by the use of parentheses, to preserve accuracy.
print: "%CPU" pcpu / 10 "." pcpu % 10
And this undoes the scaling, by finding the dividend and the remainder when dividing pcpu by 10, and printing those values in a format that looks like a floating point value.
How can I calculate the amount of processing time used by a process in C on Linux. Specifically, I want to determine how much time elapses when encrypting a file using openssl.
The easiest way for you to do this is by using the clock() function from <time.h> to report the amount of CPU time used by the calling process.
From SUSv4:
The clock() function shall return the implementation's best
approximation to the processor time used by the process since the
beginning of an implementation-defined era related only to the process
invocation.
RETURN VALUE
To determine the time in seconds, the value returned by clock() should
be divided by the value of the macro CLOCKS_PER_SEC. If the processor
time used is not available or its value cannot be represented,
the function shall return the value (clock_t)-1.
Try following,
time_t start, end;
double cpu_time_used;
start = clock();
/* Do encrypting ... */
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
How can I check how long a process spends waiting for the CPU in a Linux box?
For example, in a loaded system I want to check how long a SQL*Loader (sqlldr) process waits.
It would be useful if there is a command line tool to do this.
I've quickly slapped this together. It prints out the smallest and largest "interferences" from task switching...
#include <sys/time.h>
#include <stdio.h>
double seconds()
{
timeval t;
gettimeofday(&t, NULL);
return t.tv_sec + t.tv_usec / 1000000.0;
}
int main()
{
double min = 999999999, max = 0;
while (true)
{
double c = -(seconds() - seconds());
if (c < min)
{
min = c;
printf("%f\n", c);
fflush(stdout);
}
if (c > max)
{
max = c;
printf("%f\n", c);
fflush(stdout);
}
}
return 0;
}
Here's how you should go about measuring it. Have a number of processes, greater than the number of your processors * cores * threading capability wait (block) on an event that will wake them up all at the same time. One such event is a multicast network packet. Use an instrumentation library like PAPI (or one more suited to your needs) to measure the differences in real and virtual "wakeup" time between your processes. From several iterations of the experiment you can get an estimate of the CPU contention time for your processes. Obviously, it's not going to be at all accurate for multicore processors, but maybe it'll help you.
Cheers.
I had this problem some time back. I ended up using getrusage :
You can get detailed help at :
http://www.opengroup.org/onlinepubs/009695399/functions/getrusage.html
getrusage populates the rusage struct.
Measuring Wait Time with getrusage
You can call getrusage at the beginning of your code and then again call it at the end, or at some appropriate point during execution. You have then initial_rusage and final_rusage. The user-time spent by your process is indicated by rusage->ru_utime.tv_sec and system-time spent by the process is indicated by rusage->ru_stime.tv_sec.
Thus the total user-time spent by the process will be:
user_time = final_rusage.ru_utime.tv_sec - initial_rusage.ru_utime.tv_sec
The total system-time spent by the process will be:
system_time = final_rusage.ru_stime.tv_sec - initial_rusage.ru_stime.tv_sec
If total_time is the time elapsed between the two calls of getrusage then the wait time will be
wait_time = total_time - (user_time + system_time)
Hope this helps