CPU contention (wait time) for a process in Linux - linux

How can I check how long a process spends waiting for the CPU in a Linux box?
For example, in a loaded system I want to check how long a SQL*Loader (sqlldr) process waits.
It would be useful if there is a command line tool to do this.

I've quickly slapped this together. It prints out the smallest and largest "interferences" from task switching...
#include <sys/time.h>
#include <stdio.h>
double seconds()
{
timeval t;
gettimeofday(&t, NULL);
return t.tv_sec + t.tv_usec / 1000000.0;
}
int main()
{
double min = 999999999, max = 0;
while (true)
{
double c = -(seconds() - seconds());
if (c < min)
{
min = c;
printf("%f\n", c);
fflush(stdout);
}
if (c > max)
{
max = c;
printf("%f\n", c);
fflush(stdout);
}
}
return 0;
}

Here's how you should go about measuring it. Have a number of processes, greater than the number of your processors * cores * threading capability wait (block) on an event that will wake them up all at the same time. One such event is a multicast network packet. Use an instrumentation library like PAPI (or one more suited to your needs) to measure the differences in real and virtual "wakeup" time between your processes. From several iterations of the experiment you can get an estimate of the CPU contention time for your processes. Obviously, it's not going to be at all accurate for multicore processors, but maybe it'll help you.
Cheers.

I had this problem some time back. I ended up using getrusage :
You can get detailed help at :
http://www.opengroup.org/onlinepubs/009695399/functions/getrusage.html
getrusage populates the rusage struct.
Measuring Wait Time with getrusage
You can call getrusage at the beginning of your code and then again call it at the end, or at some appropriate point during execution. You have then initial_rusage and final_rusage. The user-time spent by your process is indicated by rusage->ru_utime.tv_sec and system-time spent by the process is indicated by rusage->ru_stime.tv_sec.
Thus the total user-time spent by the process will be:
user_time = final_rusage.ru_utime.tv_sec - initial_rusage.ru_utime.tv_sec
The total system-time spent by the process will be:
system_time = final_rusage.ru_stime.tv_sec - initial_rusage.ru_stime.tv_sec
If total_time is the time elapsed between the two calls of getrusage then the wait time will be
wait_time = total_time - (user_time + system_time)
Hope this helps

Related

How can I measure elapsed time when encrypting using openssl in linux by C

How can I calculate the amount of processing time used by a process in C on Linux. Specifically, I want to determine how much time elapses when encrypting a file using openssl.
The easiest way for you to do this is by using the clock() function from <time.h> to report the amount of CPU time used by the calling process.
From SUSv4:
The clock() function shall return the implementation's best
approximation to the processor time used by the process since the
beginning of an implementation-defined era related only to the process
invocation.
RETURN VALUE
To determine the time in seconds, the value returned by clock() should
be divided by the value of the macro CLOCKS_PER_SEC. If the processor
time used is not available or its value cannot be represented,
the function shall return the value (clock_t)-1.
Try following,
time_t start, end;
double cpu_time_used;
start = clock();
/* Do encrypting ... */
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;

New to OpenMP and parallel programming need to partition a loop using scheduling

Ok so here's what the problem says.
Implement a simple loop that calls a function containing a delay. Partition this loop across four threads using static, dynamic and guided scheduling. Measure execution times for each type of scheduling with respect to both the size of the loop and the size of the delay.
this is what I've done so far, I have no idea if I'm on the right track
#include <omp.h>
#include <stdio.h>
int main() {
double start_time, run_time;
omp_set_num_threads(4);
start_time = omp_get_wtime();
#pragma omp parallel
#pragma omp for schedule(static)
for (int n = 0; n < 100; n++){
printf("square of %d=%d\n", n, n*n);
printf("cube of %d=%d\n", n, n*n*n);
int ID = omp_get_thread_num();
printf("Thread(%d) \n", ID);
}
run_time = omp_get_wtime() - start_time;
printf("Time Elapsed (%f)", run_time);
getchar();
}
At first you need a loop, where the distribution makes a difference. The loop has 100 iterations, so the OpenMP schedule will only 100 times decide what is the next iteration for a thread what takes no mensurable time. The output with printf takes very long so in your code it makes no difference which schedule is used. Its better to make a loop without console output and a very high loop count like
#pragma omp parallel
{
#pragma omp for schedule(static) private(value)
for (int i = 0; i < 100000000; i++) {
value = ...
}
}
At last you have to write code in the loop which "result" is used after the loop with a printf for example. If not the body could be deleted by the compiler because of optimize the code (it is not used later so its not needed). You can concentrate the time measurings on the parallel pool without the output of the results.
If your iterations nearly takes the same time, then a static distribution should be faster. If they differ very much the dynamic and guided schedules should dominate your measurings.

Linux clock_gettime() elapse spikes?

I'm try to get high resolution timestamp on linux. Using clock_gettime(), as below, I got "spike" elapses that looks pretty horrible at almost 26 micro second elapse. Most of the "dt"'s are around 30 ns. I was on linux 2.6.32, Red Hat 4.4.6. 'lscpu' shows CPU MHz=2666.121. I thought that means each each clock tick needs about 2 ns. So, asking for ns resolution didn't see like too unreasonable here.
output of program (sorry wasn't able to post this without making it a list. It thinks it's code some how)
1397534268,40823395 1397534268,40827950,dt=4555
1397534268,41233555 1397534268,41236716,dt=3161
1397534268,41389902 1397534268,41392922,dt=3020
1397534268,46488430 1397534268,46491674,dt=3244
1397534268,46531297 1397534268,46534279,dt=2982
1397534268,46823368 1397534268,46849336,dt=25968
1397534268,46915657 1397534268,46918663,dt=3006
1397534268,51488643 1397534268,51491791,dt=3148
1397534268,51530490 1397534268,51533496,dt=3006
1397534268,51823307 1397534268,51826904,dt=3597
1397534268,55823359 1397534268,55827826,dt=4467
1397534268,60531184 1397534268,60534183,dt=2999
1397534268,60823381 1397534268,60844866,dt=21485
1397534268,60913003 1397534268,60915998,dt=2995
1397534268,65823269 1397534268,65827742,dt=4473
1397534268,70823376 1397534268,70835280,dt=11904
1397534268,75823489 1397534268,75828872,dt=5383
1397534268,80823503 1397534268,80859500,dt=35997
1397534268,86823381 1397534268,86831907,dt=8526
Any ideas? thanks
#include <vector>
#include <iostream>
#include <time.h>
long long elapse( const timespec& t1, const timespec& t2 )
{
return ( t2.tv_sec * 1000000000L + t2.tv_nsec ) -
t1.tv_sec * 1000000000L + t1.tv_nsec );
}
int main()
{
const unsigned n=30000;
timespec ts;
std::vector<timespec> t( n );
for( unsigned i=0; i < n; ++i )
{
clock_gettime( CLOCK_REALTIME, &ts );
t[i] = ts;
}
std::vector<long> dt( n );
for( unsigned i=1; i < n; ++i )
{
dt[i] = elapse( t[i-1], t[i] );
if( dt[i] > 1000 )
{
std::cerr <<
t[i-1].tv_sec << ","
<< t[i-1].tv_nsec << " "
<< t[i].tv_sec << ","
<< t[i].tv_nsec
<< ",dt=" << dt[i] << std::endl;
}
else
{
//normally I get dt[i] = approx 30-35 nano secs
}
}
return 0;
}
The numbers you quoted are in the 3 to 30 microsecond range (3,000 to 30,000 nanoseconds). That is too short a time to be a context switch to another thread/process, let the other thread run, and context switch back to your thread. Most likely the core where your process was running was used by the kernel to service an external interrupt (e.g. network card, disk, timer), then returned to running your process.
You can watch the linux interrupt counters (per CPU core and per source) with this command
watch -d -n 0.2 cat /proc/interrupts
The -n 0.2 will cause the command to be issued at 5Hz, the -d flag will highlight what has changed.
The source of the interrupt could also be a TLB shootdown, which results in an IPI (Inter-Processor Interrupt). You can read more about TLB shootdowns here.
If you want to reduce the number of interrupts serviced by the core running your thread/process, you need to set the interrupt affinity. You can learn more about Red Hat Interrupts and IRQ (Interrupt requests) tuning here, and here.
Worth noting is that you are using CLOCK_REALTIME which isn't guaranteed to be "smooth", it could jump around as the system clock is "disciplined" to keep accurate time by a service like NTP (Network Time Protocol) or PTP (Precision Time Protocol). For your purposes it is better to use CLOCK_MONOTONIC, you can read more about the difference here. When a clock is "disciplined" the clock can jump by a "step" - this is unusual and certainly not the cause of the many spikes you see.
Could you check the resolution with clock_getres()?
I suspect what you are measuring here is called "OS Noise". This is often caused by your program getting pre-empted by the operating system. The operating system then performs other work. There are numerous causes, but commonly it is: other runnable tasks, hardware interrupts, or timer events.
The FTQ/FWQ benchmarks were designed to measure this characteristic and the summary contains some further information:
https://asc.llnl.gov/sequoia/benchmarks/FTQ_summary_v1.1.pdf

How can i measure the overhead due to task migration/load balancing on linux with the real time patch?

I am trying to measure the overhead due to task migration. by overhead i would like to measure the latency involved in such a an activity. I know there are separate run queues available for each core and the kernel periodically checks the run queues to check whether there is a imbalance and wakes up a kernel thread ( perhaps a higher priority ) that does the migration.
Could any one provide me with pointers to kernel source code where i can insert time stamps to measure this value?
Is there any other performance metric which i probably investigate to get such an overhead?
I remember there is a post before that discussed about this topic, and someone also posted some codes about how to get the system overhead.
I see you want to add some codes to insert time stamps, do you think it's feasible because task schedule is so frequent. I think you can follow the topic that posted before.
I ever saved the source codes from the post, thanks for the author!
double getCurrentValue() {
double percent;
FILE* file;
unsigned long long totalUser, totalUserLow, totalSys, totalIdle, total;
file = fopen("/proc/stat", "r");
fscanf(file, "cpu %Ld %Ld %Ld %Ld", &totalUser, &totalUserLow,
&totalSys, &totalIdle);
fclose(file);
if (totalUser < lastTotalUser || totalUserLow < lastTotalUserLow ||
totalSys < lastTotalSys || totalIdle < lastTotalIdle) {
//Overflow detection. Just skip this value.
percent = -1.0;
}
else {
total = (totalUser - lastTotalUser) + (totalUserLow - lastTotalUserLow) +
(totalSys - lastTotalSys);
percent = total;
total += (totalIdle - lastTotalIdle);
percent /= total;
percent *= 100;
}
lastTotalUser = totalUser;
lastTotalUserLow = totalUserLow;
lastTotalSys = totalSys;
lastTotalIdle = totalIdle;
return percent;
}

How to get the current time in native Android code?

I was wondering if there is an easy way to get the current time in native Android code. Optimally it would be something comparable to System.getTimeMillies(). I will only be using it to see how long certain function calls will take so a long variable with the current time in milliseconds would be the optimal solution for me.
Thanks in advance!
For the lazy, add this to the top of your code:
#include <time.h>
// from android samples
/* return current time in milliseconds */
static double now_ms(void) {
struct timespec res;
clock_gettime(CLOCK_REALTIME, &res);
return 1000.0 * res.tv_sec + (double) res.tv_nsec / 1e6;
}
Call it like this:
double start = now_ms(); // start time
// YOUR CODE HERE
double end = now_ms(); // finish time
double delta = end - start; // time your code took to exec in ms
For microsecond resolution you can use gettimeofday(). This uses "wall clock time", which continues to advance when the device is asleep, but is subject to sudden shifts forward or backward if the network updates the device's clock.
You can also use clock_gettime(CLOCK_MONOTONIC). This uses the monotonic clock, which never leaps forward or backward, but stops counting when the device sleeps.
The actual resolution of the timers is device-dependent.
Both of these are POSIX APIs, not Android-specific.
Another one for the lazy, this function returns the current time in nanoseconds using CLOCK_MONOTONIC
#include <time.h>
#define NANOS_IN_SECOND 1000000000
static long currentTimeInNanos() {
struct timespec res;
clock_gettime(CLOCK_MONOTONIC, &res);
return (res.tv_sec * NANOS_IN_SECOND) + res.tv_nsec;
}

Resources