Linux function profiler output - linux

I would like to profile a code flow in kernel in order to understand where is the bottleneck. I found that function profiler does just that for me:
https://lwn.net/Articles/370423/
Unfortunately the output I see doesn't make sense to me.
From the link above, the output of the function profiler is:
Function Hit Time Avg
-------- --- ---- ---
schedule 22943 1994458706 us 86931.03 us
Where "Time" is the total time spend inside this function during the run. So if I have function_A that calls function_B, if I understood the output correctly, the "Time" measured for function_A includes the duration of function_B as well.
When I actualy run this on my pc I see another new column displayed for the output:
Function Hit Time Avg s^2
-------- --- ---- --- ---
__do_page_fault 3077 477270.5us 155.109 us 148746.9us
(more functions..)
What does s^2 stand for? It cant be standratd deviation because it's higher then the average...
I measured the total duration of this code flow from user space and got 400 ms. When summing up the s^2 column it came close to 400 ms. Which makes me think that perhaps is the "pure" time spent in __do_page_fault, that doest include the duration of the nested functions.
Is this correct? I didn't find any documentation of the s^2 column so I'm hesitant regarding my conclusions.
Thank you!

You can see the code that calculates the s^2 column here. It seems like this is the variance (standard deviation squared). If you take the root out of the number in your example, you get 385 us, which is closer to the average in the example.
The standard deviation is still greater than the mean, but that is fine.

Related

How to get delta percentage from /proc/schedstat

I am trying to get node CFS scheduler throttling in percent. For that i am reading 2 values 2 times (ignoring timeslices) from /proc/schedstat it has following format:
$ cat /proc/schedstat
version 15
timestamp 4297299139
cpu0 0 0 0 0 0 0 1145287047860 105917480368 8608857
CpuTime RunqTime
so i read from file, sleep for some time, read again, calculate time passed and value delta between, and calc percent then using following code:
cputTime := float64(delta.CpuTime) / delta.TimeDelta / 10000000
runqTime := float64(delta.RunqTime) / delta.TimeDelta / 10000000
percent := runqTime
the trick is that percent could be like 2000%
i assumed that runqtime is incremental, and is expressed in nanoseconds, so i divided it by 10^7 (to get it to 0-100% range), and timedelta is difference between measurements in seconds. what is wrong with it? how to do that properly?
I, for one, do not know how to interpret the output of /proc/schedstat.
You do quote an answer to a unix.stackexchange question, with a link to a mail in LKML that mentions a possible patch to the documentation.
However, "schedstat" is a term which is suspiciously missing from my local man proc page, and from the copies of man proc I could find on the internet. Actually, when searching for schedstat on Google, the results I get either do not mention the word "schedstat" (for example : I get links to copies of the man page, which mentions "sched" and "stat"), or non authoritative comments (fun fact : some of them quote that answer on stackexchange as a reference ...)
So at the moment : if I had to really understand what's in the output, I think I would try to read the code for my version of the kernel.
As far as "how do you compute delta ?", I understand what you intend to do, I had in mind something more like "what code have you written to do it ?".
By running cat /proc/schedstat; sleep 1 in a loop on my machine, I see that the "timestamp" entry is incremented by ~250 units on each iteration (so I honestly can't say what's the underlying unit for that field ...).
To compute delta.TimeDelta : do you use that field ? or do you take two instances of time.Now() ?
The other deltas are less ambiguous, I do imagine you took the difference between the counters you see :)
Do note that, on my mainly idle machine, I sometimes see increments higher than 10^9 over a second on these counters. So again : I do not know how to interpret these numbers.

With Time Range

Suppose I have Some Employee Scheduled working hour(9:30 AM- 5.30PM) and He punches in at 9.25AM or 9:45 so I'm trying to find out if there is logic or Code or any kind of help to solve this problem. I want Bot to Identify +30 or -30min for Punch In or Punch-out time. Is there any VBO for Time Manipulations or give time ranges in collection or Data Item?
Why not just use a Decision stage with basic arithmetic to check the lower and upper bounds of your distance from the expected?
Load the expected punch-in time and the actual punch-in time to Time-typed Data Items, then set a third data TimeSpan-typed data item to the deviation you're willing to accept between the scheduled and actual times.
Since Blue Prism's internal expression evaluation engine doesn't support absolute value, you'll have to check that both permutations of the difference fall within the bound of the Acceptable Deviation. Thus, your Decision would look something like the following:
[Scheduled Start Time] - [Punch-in Time] < [Acceptable Deviation] AND [Punch-in Time] - [Scheduled Start Time] < [Acceptable Deviation]

Chrome Lighthouse comma and full stop in performance option

I have a few questions regarding the report of lighthouse (see screenshot below)
The first is a culture think: i assume the value 11.930 ms stands for 11 seconds and 930 ms. Is this the case?
The second is the Delayed Paint compared to the size. The third entry (7.22 KB) delays the paint by 3,101 ms the fourth entry delays the paint by 1,226 ms although the javascript file is more than three times the size 24.03 KB versus 7.22 KB. Does anybody know what might be the cause?
Screenshot of Lighthouse
This is an extract of a lighthouse report. In the screenshot of google-chrome-lighthouse you can see that a few metrics are written with a comma 11,222 ms and others with a full stop 7.410 ms
Thank you for discovering quite a bug! An issue has been filed in the Lighthouse GitHub repo.
To explain what's likely going on, it looks like this report was generated with the CLI (or at least a locale that is different from the one that it is being displayed in). Some numbers (such as the ones in the table) are converted to strings ahead of time while others are converted at display time in the browser. The browser numbers are respecting your OS/user-selected locale while the pre-stringified numbers are not.
To answer your questions...
Yes, the value it's reporting is 11930 milliseconds or 11 seconds and 930 milliseconds (11,930 ms en-US or 11.930 ms de-DE).
The delayed paint metric is reporting to you how many milliseconds after the load started the asset finished loading. There are multiple factors that influence this number including when the asset was discovered by the browser, queuing time, server response time, network variability, and payload size. The small script that delayed paint longer likely had a lower priority or was added to your page later than the larger script was and thus was pushed out farther.

Different number of cycles when running a benchmark more than once on C++ emulator

When running a benchmark e.g. dhrystone with the command:
make output/dhrystone.riscv.out
as described at: http://riscv.org/download.html#tab_rocket,
on the C++ emulator. I get the following output:
When running it for the first time:
Microseconds for one run through Dhrystone: 1064
Dhrystones per Second: 939
cycle = 533718
instret = 148672
and the second time:
Microseconds for one run through Dhrystone: 1064
Dhrystones per Second: 939
cycle = 533715
instret = 148672
Why do the cycles differ? Shouldn't they be exactly the same. I have tried this with other benchmarks too and had even higher deviations. If this is normal where do the deviations come from?
There are small amounts of nondeterminism from randomly initialized registers (e.g., the clock that is recovered by the HTIF is initialized to a random phase). It doesn't seem like these minor deviations would impact any performance benchmarking.
If you need identical results each time (e.g., for verification?), you could modify the emulator code to initialize registers to some known value each time.

explain me a difference of how MRTG measures incoming data

Everyone knows that MRTG needs at least one value to be passed on it's input.
In per-target options MRTG has 'gauge', 'absolute' and default (with no options) behavior of 'what to do with incoming data'. Or, how to count it.
Lets look at the elementary, yet popular example :
We pass cumulative data from network interface statistics of 'how much packets were recieved by the interface'.
We take it from '/proc/net/dev' or look at 'ifconfig' output for certain network interface. The number of recieved bytes is increasing every time. Its cumulative.
So as i can imagine there could be two types of possible statistics:
1. How fast this value changes upon the time interval. In oher words - activity.
2. Simple, as-is growing graphic that just draw every new value per every minute (or any other time interwal)
First graphic will be saltatory (activity). Second will just grow up every time.
I read twice rrdtool's and MRTG's docs and can't understand which option mentioned above counts what.
I suppose (i am not sure) that 'gauge' draw values as is, without any differentiation calculations (good for measuring how much memory or cpu is used every 5 minutes). And default or 'absolute' behavior tryes to calculate the speed between nearby measures, but what's the differencr between last two?
Can you, guys, explain in a simple manner which behavior stands after which option of three options possible?
Thanks in advance.
MRTG assumes that everything is being measured as a rate (even if it isnt a rate)
Type 'gauge' assumes that you have already calculated the rate; thus, the provided value is stored as-is (after Data Normalisation). This is appropriate for things like CPU usage.
Type 'absolute' assumes the value passed is the count since the last update. Thus, the value is divided by the number of seconds since the last update to get a rate in thingies per second. This is rarely used, and only for certain unusual data sources that reset their value on being read - eg, a script that counts the number of lines in a log file, then truncates the log file.
Type 'counter' (the default) assumes the value passed is a constantly growing count, possibly that wraps around at 16 or 64 bits. The difference between the value and its previous value is divided by the number of seconds since the last update to get a rate in thingies per second. If it sees the value decrease, it will assume a counter wraparound at 16 or 64 bit. This is appropriate for something like network traffic counters, which is why it is the default behaviour (MRTG was originally written for network traffic graphs)
Type 'derive' is like 'counter', but will allow the counter to decrease (resulting in a negative rate). This is not possible directly in MRTG but you can manually create the necessary RRD if you want.
All types subsequently perform Data Normalisation to adjust the timestamp to a multiple of the Interval. This will be more noticeable for Gauge types where the value is small than for counter types where the value is large.
For information on this, see Alex van der Bogaerdt's excellent tutorial

Resources