I am trying to understand linux kernel code for task scheduler.
I am not able to figure out what is goodness value and how it differs from niceness?
Also, how each of them contribute to scheduling?
AFAIK: Each process in the run queue is assigned a "goodness" value which determines how good a process is for running. This value is calculated by the goodness() function.
The process with higher goodness will be the next process to run. If no process is available for running, then the operating system selects a special idle task.
A first-approximation of "goodness" is calculated according to the number of ticks left in the process quantum. This means that the goodness of a process decreases with time, until it becomes zero when the time quantum of the task expires.
The final goodness is calculated in function of the niceness of the process. So basically goodness of a process is a combination of the Time slice left logic and nice value.
static inline int goodness(struct task_struct * p, int this_cpu, struct mm_struct *this_mm)
{
int weight;
/*
* select the current process after every other
* runnable process, but before the idle thread.
* Also, dont trigger a counter recalculation.
*/
weight = -1;
if (p->policy & SCHED_YIELD)
goto out;
if (p->policy == SCHED_OTHER) {
/*
* Give the process a first-approximation goodness value
* according to the number of clock-ticks it has left.
*
* Don't do any other calculations if the time slice is
* over..
*/
weight = p->counter;
if (!weight)
goto out;
...
weight += 20 - p->nice;
goto out;
}
/* code for real-time processes goes here */
out:
return weight;
}
TO UNDERSTAND and REMEMBER NICE value: remember this.
The etymology of Nice value is that a "nicer" process allows for more time for other processes to run, hence lower nice value translates to higher priority.
So,
Higher the goodness value -> More likely to Run
Higher the nice value -> Less likely to run.
Also Goodness value is calculated using the nice value as
20 - p->nice
Since nice value ranges from -20 ( highest priority) to 19 ( lowest priority)
lets assume a nice value of -20 ( EXTREMELY NOT NICE). hence 20 - (-20) = 40
Which means the goodness value has increased, hence this process will be chosen.
Related
I have a kernel which uses a global uint array, and I want to access and change entries in that array from all threads using the atom_or function in OpenCL.
The code:
void SetMove(int c, uint m, volatile __global uint *prevmove)
{
uint idx = c >> 4;
uint mask = m << (2 * (c & 15));
atom_or(&prevmove[idx], mask);
}
I am implementing a BFS algorithm using OpenCL. This is done in 'waves', where in every wave, an array in of states is analyzed. Each GPU thread evaluates a single state, finds it's successors, and places them in a different array out. At the end of the wave the contents of out are places in in.
The prevmove array is to keep track of what action you should do in a certain state to find the goal state. The SetMove function updates prevmove when a thread finds a new state.
Now the problem is:
The results are always non-deterministic. If I would chance the atom_or operation with a normal, non atomic |= operator, the kernel behaves exactly the same. I have tested this by checking how many entries in prevmove are nonzero when the algorithm terminates. This varies everytime I run my program.
Is cause of my problem due to not implementing the atom_or correctly or is it something else?
(I have #pragma OPENCL EXTENSION cl_khr_int64_extended_atomics : enable in my kernel).
How can I calculate the amount of processing time used by a process in C on Linux. Specifically, I want to determine how much time elapses when encrypting a file using openssl.
The easiest way for you to do this is by using the clock() function from <time.h> to report the amount of CPU time used by the calling process.
From SUSv4:
The clock() function shall return the implementation's best
approximation to the processor time used by the process since the
beginning of an implementation-defined era related only to the process
invocation.
RETURN VALUE
To determine the time in seconds, the value returned by clock() should
be divided by the value of the macro CLOCKS_PER_SEC. If the processor
time used is not available or its value cannot be represented,
the function shall return the value (clock_t)-1.
Try following,
time_t start, end;
double cpu_time_used;
start = clock();
/* Do encrypting ... */
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
I am trying to figure out how the stats in the taskstats struct are adding up. I wrote a simple C program that runs for some time doing IO and exits. I monitor the stats of this program using the taskstats struct, which I get from the taskstats netlink multicast group. When I sum the values of cpu_delay_total, blkio_delay_total, swapin_delay_total, freepages_delay_total, ac_utime and ac_stime, I get a value that is about 0.5 seconds larger than the value of elapsed time (ac_etime)
Here are the statistics for a 3.5-second run:
ac_etime: 3536036
ac_utime: 172000
ac_stime: 3032000
cpu_delay_total: 792528445
blkio_delay_total: 46320128
swapin_delay_total: 0
freepages_delay_total: 0
Summing up values for delays, utime and stime yields 4042848.573 (divide the delays by 1000 to convert to microseconds), while etime is only 3536036!
Interestingly, the wall clock time gives the value that is practically equal to utime+stime: cpu_run_real_total: 3204000129, while ac_utime + ac_stime: 3204000
Does the cpu_run_real_total field give the cpu time, despite that the comment in taskstats.h clearly states that this is a wall clock time? And what could be the reason that the sum of these fields is larger than the elapsed time?
My kernel version is 3.2.0-38.
(1) cpu_run_real_total = ac_utime + ac_stime, I check the codes in ./kernel/delayacct.c, function __delayacct_add_tsk():
tmp = (s64)d->cpu_run_real_total;
cputime_to_timespec(tsk->utime + tsk->stime, &ts);
tmp += timespec_to_ns(&ts);
d->cpu_run_real_total = (tmp < (s64)d->cpu_run_real_total) ? 0 : tmp;
From the above codes, we know cpu_run_real_total is sum the utime and stime up.
(2) Why sum the values of cpu_delay_total, blkio_delay_total, swapin_delay_total, freepages_delay_total, ac_utime and ac_stime, the value is larger than the value of ac_etime?
I have not figured out why. But I have guess: the stime may somewhat overlap with the various *_delay_total counters.
How can I check how long a process spends waiting for the CPU in a Linux box?
For example, in a loaded system I want to check how long a SQL*Loader (sqlldr) process waits.
It would be useful if there is a command line tool to do this.
I've quickly slapped this together. It prints out the smallest and largest "interferences" from task switching...
#include <sys/time.h>
#include <stdio.h>
double seconds()
{
timeval t;
gettimeofday(&t, NULL);
return t.tv_sec + t.tv_usec / 1000000.0;
}
int main()
{
double min = 999999999, max = 0;
while (true)
{
double c = -(seconds() - seconds());
if (c < min)
{
min = c;
printf("%f\n", c);
fflush(stdout);
}
if (c > max)
{
max = c;
printf("%f\n", c);
fflush(stdout);
}
}
return 0;
}
Here's how you should go about measuring it. Have a number of processes, greater than the number of your processors * cores * threading capability wait (block) on an event that will wake them up all at the same time. One such event is a multicast network packet. Use an instrumentation library like PAPI (or one more suited to your needs) to measure the differences in real and virtual "wakeup" time between your processes. From several iterations of the experiment you can get an estimate of the CPU contention time for your processes. Obviously, it's not going to be at all accurate for multicore processors, but maybe it'll help you.
Cheers.
I had this problem some time back. I ended up using getrusage :
You can get detailed help at :
http://www.opengroup.org/onlinepubs/009695399/functions/getrusage.html
getrusage populates the rusage struct.
Measuring Wait Time with getrusage
You can call getrusage at the beginning of your code and then again call it at the end, or at some appropriate point during execution. You have then initial_rusage and final_rusage. The user-time spent by your process is indicated by rusage->ru_utime.tv_sec and system-time spent by the process is indicated by rusage->ru_stime.tv_sec.
Thus the total user-time spent by the process will be:
user_time = final_rusage.ru_utime.tv_sec - initial_rusage.ru_utime.tv_sec
The total system-time spent by the process will be:
system_time = final_rusage.ru_stime.tv_sec - initial_rusage.ru_stime.tv_sec
If total_time is the time elapsed between the two calls of getrusage then the wait time will be
wait_time = total_time - (user_time + system_time)
Hope this helps
The Perl module Proc::ProcessTable occasionally observes that the pctcpu attribute as 'inf', 'nan', or a value greater then 100. Why does it do this? And are there any guidelines on how to deal with this kind of information?
We have observed this on various platforms including Linux 2.4 running on 8 logical processors.
I would guess that 'inf' or 'nan' is the result of some impossibly large value or a divide by zero.
For values greater then 100, could this possibly mean that more then one processor was used?
And for dealing with this information, is the best practice merely marking the data point as untrustworthy and normalizing to 100%?
I do not know why that happens and I cannot stress test the module right now trying to generate such cases.
However, a principle I have followed all my research is not to replace data I know to be non-sense with something that looks reasonable. You basically have missing observations and you should treat them as such. I would not attach a numerical value at all so as not to pretend I have information when I in fact do not.
Then, your statistics for the non-missing points will be meaningful and you can look at any patterns in the missing observations separately.
UPDATE: Looking at the calc_prec() function in the source code:
/* calc_prec()
*
* calculate the two cpu/memory precentage values
*/
static void calc_prec(char *format_str, struct procstat *prs, struct obstack *mem_pool)
{
float pctcpu = 100.0f * (prs->utime / 1e6) / (time(NULL) - prs->start_time);
/* calculate pctcpu - NOTE: This assumes the cpu time is in microsecond units! */
sprintf(prs->pctcpu, "%3.2f", pctcpu);
field_enable(format_str, F_PCTCPU);
/* calculate pctmem */
if (system_memory > 0) {
sprintf(prs->pctmem, "%3.2f", (float) prs->rss / system_memory * 100.f);
field_enable(format_str, F_PCTMEM);
}
}
First, IMHO, it would be better to just divide by 1e4 rather than multiplying by 100.0f after the division. Second, it is possible (if polled immediately after process start) for the time delta to be 0. Third, I would have just done the whole thing in double.
As an aside, this function looks like a good example of why you should not have comments in code.
#include <stdio.h>
#include <time.h>
volatile float calc_percent(
unsigned long utime,
time_t now,
time_t start
) {
return 100.0f * ( utime / 1e6) / (now - start);
}
int main(void) {
printf("%3.2f\n", calc_percent(1e6, time(NULL), time(NULL)));
printf("%3.2f\n", calc_percent(0, time(NULL), time(NULL)));
return 0;
}
This outputs inf in the first case and nan in the second case when compiled with Cygwin gcc-4 on Windows. I do not know if this behavior is standard or just what happens with this particular combination of OS+compiler.