Was reading "Understanding Linux Kernel" book and in it says that, "number of microseconds is calculated by do_fast_gettimeoffset( )". Also it says that "to count the number of microseconds that have elapsed within the current second."
Couldnt understand what the author means by last sentence. Could anyone explain more on that?
If you want to understand the linux kernel, you should be aware that that book has been outdated for a long time and that do_fast_gettimeoffset no longer exists.
do_get_fast_time returns the number of seconds, and is always fast.
do_gettimeoffset returns the number of microseconds since the start of the second, and might be slow.
Related
I am trying to get node CFS scheduler throttling in percent. For that i am reading 2 values 2 times (ignoring timeslices) from /proc/schedstat it has following format:
$ cat /proc/schedstat
version 15
timestamp 4297299139
cpu0 0 0 0 0 0 0 1145287047860 105917480368 8608857
CpuTime RunqTime
so i read from file, sleep for some time, read again, calculate time passed and value delta between, and calc percent then using following code:
cputTime := float64(delta.CpuTime) / delta.TimeDelta / 10000000
runqTime := float64(delta.RunqTime) / delta.TimeDelta / 10000000
percent := runqTime
the trick is that percent could be like 2000%
i assumed that runqtime is incremental, and is expressed in nanoseconds, so i divided it by 10^7 (to get it to 0-100% range), and timedelta is difference between measurements in seconds. what is wrong with it? how to do that properly?
I, for one, do not know how to interpret the output of /proc/schedstat.
You do quote an answer to a unix.stackexchange question, with a link to a mail in LKML that mentions a possible patch to the documentation.
However, "schedstat" is a term which is suspiciously missing from my local man proc page, and from the copies of man proc I could find on the internet. Actually, when searching for schedstat on Google, the results I get either do not mention the word "schedstat" (for example : I get links to copies of the man page, which mentions "sched" and "stat"), or non authoritative comments (fun fact : some of them quote that answer on stackexchange as a reference ...)
So at the moment : if I had to really understand what's in the output, I think I would try to read the code for my version of the kernel.
As far as "how do you compute delta ?", I understand what you intend to do, I had in mind something more like "what code have you written to do it ?".
By running cat /proc/schedstat; sleep 1 in a loop on my machine, I see that the "timestamp" entry is incremented by ~250 units on each iteration (so I honestly can't say what's the underlying unit for that field ...).
To compute delta.TimeDelta : do you use that field ? or do you take two instances of time.Now() ?
The other deltas are less ambiguous, I do imagine you took the difference between the counters you see :)
Do note that, on my mainly idle machine, I sometimes see increments higher than 10^9 over a second on these counters. So again : I do not know how to interpret these numbers.
I current run into an issue that a process seems stuck somehow, it just doesn't gets scheduled, the status is always 'S'. I have monitored sched_switch_task trace by debugfs for a while, didn't see the process get scheduled. So I would like to know when is that last time scheduled of this process by kernel?
Thanks a lot.
It might be possible using the info in /proc/pid#/sched file.
In there you can find these parameters (depending on the OS version, mine is opensuse 3.16.7-21-desktop):
se.exec_start : 593336938.868448
...
se.statistics.wait_start : 0.000000
se.statistics.sleep_start : 593336938.868448
se.statistics.block_start : 0.000000
The values represent timestamps relative to the system boot time, but in a unit which may depend on your system (in my example the unit is 0.5 msec, for a total value of ~6 days 20 hours and change).
In the last 3 parameters listed above at most one appears to be non-zero at any time and it I suspect that the respective non-zero value represents the time when it last entered the corresponding state (with the process actively running when all are zero).
So if your process is indeed stuck the non-zero value would have recorded when it got stuck.
Note: this is mostly based on observations and assumptions - I didn't find these parameters documented anywhere, so take them with a grain of salt.
Plenty of other scheduling info in that file, but mostly stats and without documentation difficult to use.
This is an interview question:
Say that you have an infinite amount of sorted data coming in, implement a way to find a specific time stamp.
What I can think of is to keep the data in a log file or something like that and use the sed command to find the log entry of that specific time stamp.
I do not know if what I think is correct or not.
Other solutions?
This seems like an open-ended question. Your solution was to keep the data in a log file, but with an infinite amount of data, you would then need an impractical amount of disk space as well. The problem is probably supposed to be analyzed in the following manner.
At first glance, with an infinite stream of data coming in, it is not stated what the rate of its coming is. Assuming the rate is r every second and you can check only n timestamps every second. It is not very interesting if r <= n.
If r > n you can only check one in every r/n integers. Then, that means you need to maintain a buffer of size r/n. The fact that the input is sorted implies that you can check the ends of the buffer and see if the desired timestamp is in range. If it is, you go through your buffer and identify the required timestamp.
Interview tip: It is often that these questions are open-ended in order to judge your thought processes. If it is underspecified (like it is here), you should ask for clarification.
In a module I am accessing the task_struct and returning with stime+utime.
I want to convert it to milliseconds. In what format stime and utime will be present in task_struct.
I can also access it from /proc//stat. Are the both unit are different.
Advanced Unix Programming by Marc Rochkind covers this topic to some degree (page 55-ish, if I remember correctly). Pardon me if I paraphrase what he states better.
utime represents user time, and is the time spent executing instructions. It is CPU time only and doesn't include time spent waiting to run.
stime is the CPU time spent executing system calls on behalf of the process.
the units are in clock ticks.
clock ticks per second can be determined with the sysconf system call.
I hope this helps.
Given a task_struct, the total on-cpu time in nanonseconds is stored in task->se.sum_exec_runtime.
This does not appear to add up to task->utime+task->stime unless first adjusted via task_cputime_adjusted(task, &utime, &stime).
For anyone else trying to implement this kind of functionality, I highly recommend reading proc(5). In this case, searching for utime leads you to /proc/[pid]/stat (or proc/self/stat) which provides utime in its 14th column. The implementation can be found in fs/proc/array.c which is where I found the call to task_cputime_adjusted. Verify module output against e.g. /proc/[pid]/stat | awk '{ print $14 } for utime.
Everyone knows that MRTG needs at least one value to be passed on it's input.
In per-target options MRTG has 'gauge', 'absolute' and default (with no options) behavior of 'what to do with incoming data'. Or, how to count it.
Lets look at the elementary, yet popular example :
We pass cumulative data from network interface statistics of 'how much packets were recieved by the interface'.
We take it from '/proc/net/dev' or look at 'ifconfig' output for certain network interface. The number of recieved bytes is increasing every time. Its cumulative.
So as i can imagine there could be two types of possible statistics:
1. How fast this value changes upon the time interval. In oher words - activity.
2. Simple, as-is growing graphic that just draw every new value per every minute (or any other time interwal)
First graphic will be saltatory (activity). Second will just grow up every time.
I read twice rrdtool's and MRTG's docs and can't understand which option mentioned above counts what.
I suppose (i am not sure) that 'gauge' draw values as is, without any differentiation calculations (good for measuring how much memory or cpu is used every 5 minutes). And default or 'absolute' behavior tryes to calculate the speed between nearby measures, but what's the differencr between last two?
Can you, guys, explain in a simple manner which behavior stands after which option of three options possible?
Thanks in advance.
MRTG assumes that everything is being measured as a rate (even if it isnt a rate)
Type 'gauge' assumes that you have already calculated the rate; thus, the provided value is stored as-is (after Data Normalisation). This is appropriate for things like CPU usage.
Type 'absolute' assumes the value passed is the count since the last update. Thus, the value is divided by the number of seconds since the last update to get a rate in thingies per second. This is rarely used, and only for certain unusual data sources that reset their value on being read - eg, a script that counts the number of lines in a log file, then truncates the log file.
Type 'counter' (the default) assumes the value passed is a constantly growing count, possibly that wraps around at 16 or 64 bits. The difference between the value and its previous value is divided by the number of seconds since the last update to get a rate in thingies per second. If it sees the value decrease, it will assume a counter wraparound at 16 or 64 bit. This is appropriate for something like network traffic counters, which is why it is the default behaviour (MRTG was originally written for network traffic graphs)
Type 'derive' is like 'counter', but will allow the counter to decrease (resulting in a negative rate). This is not possible directly in MRTG but you can manually create the necessary RRD if you want.
All types subsequently perform Data Normalisation to adjust the timestamp to a multiple of the Interval. This will be more noticeable for Gauge types where the value is small than for counter types where the value is large.
For information on this, see Alex van der Bogaerdt's excellent tutorial