How latency is calculated in SimGrid - simgrid

I have simple configuration of system where two hosts and link between them exist.
<link id="1" bandwidth="1Bps" latency="0"/>
Task is sent from one host to another one:
msg_task_t task = MSG_task_create("name", 1, 1, NULL);
MSG_task_send(task, "worker");
The latter host counts time while receiving task:
XBT_INFO("time %g", MSG_get_clock());
MSG_task_receive(&task, "worker");
XBT_INFO("time %g", MSG_get_clock());
I expect that sending task would last 1 sec, but I have 1.08247:
[worker:worker:(2) 0.000000] [example/INFO] time 0
[worker:worker:(2) 1.082474] [example/INFO] time 1.08247
Why is it?

This is because the default networking model takes into account stuff that were observed in reality by tricking the BW and latency values provided by the user.
Check http://hal.inria.fr/hal-00646896/PDF/rr-validity.pdf for the rational (also published at TOMACS).
In the code, you want to read https://github.com/simgrid/simgrid/blob/master/src/surf/network_cm02.cpp#L23 You will see that if you want a model that is maybe not that representative of large scale systems but easier to understand, you should switch to CM02 by adding --cfg=network/model:CM02 on the command line.

Related

I need to measure the response time for a file export using Trueclient protocol in loadrunner

I need to measure the response time for a file export using Trueclient protocol in loadrunner.After i click on the export button the file will get downloaded. But i am not able to measure the time for the download accurately.
Pull that data from the HTTP request log, which will show the download request, and, if the w3c time-taken value is included in the log, the time required to fulfill the download.
You can process the log at the end of the test for the response time data. If you need to, you cam import a set of datapoints into analysis for representation with the rest of your data. You might want to consider a normalized value for your download, instead of a raw response time. I imagine that the files are of different sizes, so naturally they will have different download times. However, if you divide download bytes with time (in seconds), then you will have a normalized measurement of bytes per second which then allows you to compare one download to the next for consistent operation.
Also, keep in mind that since you are downloading a file, writing to a local disk, for (presumably) multiple users on a host, you will face the risk of turning your local file system into a bottleneck. You can see this same effect if you turn up logging on all users to the highest level and run your test. The wait for lock and wait for write, plus the actual writing of data, becomes a drag anchor to the performance of your virtual user. This is why the recommended log level is "log on error" or send the error to the output window of the controller via lr_output_message() or lr_vuser_status_message(). Consider a control load generator of the same hardware definition as the others with only a single virtual user of this type on it. If the control group and global group degrade together then you have an app issue. If your control user does not degrade, but your other users do, then you have a test bed induced influence on your results.
These are all issues independent of the tool you are using for the test.

set network interface counters (rx, tx, packets, bytes)

Is it possible to SET the interface statistics in Linux after it's been brought up? I'm dealing with rrdtool (mrtg) that gets upset by a daily ifdown and ifup which brings the interface counters back to zero. Ideally I would like to continue counting from where I left and setting the interface values to what they were before the interface went down seems to be the easiest path.
I checked writing to /sys/class/net/ax0/statistics/rx_packets but that gives a Permission Denied error.
netstat, ifup, ifconfig and friends don't seem to support changing these values either.
Anything else I can try?
You can't set the kernel counters, no - but do you really need to?
MRTG will usually graph a rate, based on the difference between samples. So your MRTG/RRD will store packets-per-second values every cycle (usually 5min but maybe 1min). When your device resets the counters, then MRTG will see the value apparently go backwards - which will be discounted as out of range, so one failed sample. But, the next sample will work, and a new rate be given.
If you're getting a big spike in the MRTG graph at the point of the reset, this will be due to an incorrect 'counter rollover' detection. You can prevent this by either setting the MRTG AbsMax setting (to prevent this high value from being valid) or (better) by using SNMPv2 counters (where a reset is more obvious).
If you set your RRD file to have a large enough heartbeat and XFF, then this one missing sample will be interpolated, and so your graphs (which, remember, show the rate rather than the total) will continue to look fine.
Should you need the total, it can be derived by sum(rate x interval) which is automatically done by the Routers2 frontend for MRTG/RRD.

PERF STAT does not count memory-loads but counts memory-stores

Linux Kernel : 4.10.0-20-generic (also tried this on 4.11.3)
Ubuntu : 17.04
I have been trying to collect stats of memory-accesses using perf stat. I am able to collect stats for memory-stores but the count for memory-loads return me a 0 value.
The below is the details for memory-stores :-
perf stat -e cpu/mem-stores/u ./libquantum_base.arnab 100
N = 100, 37 qubits required
Random seed: 33
Measured 3277 (0.200012), fractional approximation is 1/5.
Odd denominator, trying to expand by 2.
Possible period is 10.
100 = 4 * 25
Performance counter stats for './libquantum_base.arnab 100':
158,115,510 cpu/mem-stores/u
0.559922797 seconds time elapsed
For memory-loads, I get a 0 count as can be seen below :-
perf stat -e cpu/mem-loads/u ./libquantum_base.arnab 100
N = 100, 37 qubits required
Random seed: 33
Measured 3277 (0.200012), fractional approximation is 1/5.
Odd denominator, trying to expand by 2.
Possible period is 10.
100 = 4 * 25
Performance counter stats for './libquantum_base.arnab 100':
0 cpu/mem-loads/u
0.563806170 seconds time elapsed
I cannot understand why this does not count properly. Should I use a different event in any way to get proper data ?
The mem-loads event is mapped to the MEM_TRANS_RETIRED.LOAD_LATENCY_GT_3 performance monitoring unit event on Intel processors. The events MEM_TRANS_RETIRED.LOAD_LATENCY_* are special and can only be counted by using the p modifier. That is, you have to specify mem-loads:p to perf to use the event correctly.
MEM_TRANS_RETIRED.LOAD_LATENCY_* is a precise event and it only makes sense to be counted at the precise level. According to this Intel article (emphasis mine):
When a user elects to sample one of these events, special hardware is
used that can keep track of a data load from issue to completion.
This is more complicated than simply counting instances of an event
(as with normal event-based sampling), and so only some loads are
tracked. Loads are randomly chosen, the latency determined for each,
and the correct event(s) incremented (latency >4, >8, >16, etc). Due
to the nature of the sampling for this event, only a small percentage
of an application's data loads can be tracked at any one time.
As you can see, MEM_TRANS_RETIRED.LOAD_LATENCY_* by no means count the total number of loads and it is not designed for that purpose at all.
If you want to to determine which instructions in your code are issuing load requests that take more than a specific number of cycles to complete, then MEM_TRANS_RETIRED.LOAD_LATENCY_* is the right performance event to use. In fact, that is exactly the purpose of perf-mem and it achieves its purpose by using this event.
If you want to count the total number of load uops retired, then you should use L1-dcache-loads, which is mapped to the MEM_UOPS_RETIRED.ALL_LOADS performance event on Intel processors.
On the other hand, mem-stores and L1-dcache-stores are mapped to the exact same performance event on all current Intel processors, namely, MEM_UOPS_RETIRED.ALL_STORES, which does count all retired store uops.
So in summary, if you are using perf-stat, you should (almost) always use L1-dcache-loads and L1-dcache-stores to count retired loads and stores, respectively. These are mapped to the raw events you have used in the answer you posted, only more portable because they also work on AMD processors.
I have used a Broadwell(CPU e5-2620) server machine to collect all of the below events.
To collect memory-load events, I had to use a numeric event value. I basically ran the below command -
./perf record -e "r81d0:u" -c 1 -d -m 128 ../../.././libquantum_base 20
Here r81d0 represents the raw event for counting "memory loads amongst all instructions retired". "u" as can be understood represents user-space.
The below command, on the other hand,
./perf record -e "r82d0:u" -c 1 -d -m 128 ../../.././libquantum_base 20
has "r82d0:u" as a raw event representing "memory stores amongst all instructions retired in userspace".

explain me a difference of how MRTG measures incoming data

Everyone knows that MRTG needs at least one value to be passed on it's input.
In per-target options MRTG has 'gauge', 'absolute' and default (with no options) behavior of 'what to do with incoming data'. Or, how to count it.
Lets look at the elementary, yet popular example :
We pass cumulative data from network interface statistics of 'how much packets were recieved by the interface'.
We take it from '/proc/net/dev' or look at 'ifconfig' output for certain network interface. The number of recieved bytes is increasing every time. Its cumulative.
So as i can imagine there could be two types of possible statistics:
1. How fast this value changes upon the time interval. In oher words - activity.
2. Simple, as-is growing graphic that just draw every new value per every minute (or any other time interwal)
First graphic will be saltatory (activity). Second will just grow up every time.
I read twice rrdtool's and MRTG's docs and can't understand which option mentioned above counts what.
I suppose (i am not sure) that 'gauge' draw values as is, without any differentiation calculations (good for measuring how much memory or cpu is used every 5 minutes). And default or 'absolute' behavior tryes to calculate the speed between nearby measures, but what's the differencr between last two?
Can you, guys, explain in a simple manner which behavior stands after which option of three options possible?
Thanks in advance.
MRTG assumes that everything is being measured as a rate (even if it isnt a rate)
Type 'gauge' assumes that you have already calculated the rate; thus, the provided value is stored as-is (after Data Normalisation). This is appropriate for things like CPU usage.
Type 'absolute' assumes the value passed is the count since the last update. Thus, the value is divided by the number of seconds since the last update to get a rate in thingies per second. This is rarely used, and only for certain unusual data sources that reset their value on being read - eg, a script that counts the number of lines in a log file, then truncates the log file.
Type 'counter' (the default) assumes the value passed is a constantly growing count, possibly that wraps around at 16 or 64 bits. The difference between the value and its previous value is divided by the number of seconds since the last update to get a rate in thingies per second. If it sees the value decrease, it will assume a counter wraparound at 16 or 64 bit. This is appropriate for something like network traffic counters, which is why it is the default behaviour (MRTG was originally written for network traffic graphs)
Type 'derive' is like 'counter', but will allow the counter to decrease (resulting in a negative rate). This is not possible directly in MRTG but you can manually create the necessary RRD if you want.
All types subsequently perform Data Normalisation to adjust the timestamp to a multiple of the Interval. This will be more noticeable for Gauge types where the value is small than for counter types where the value is large.
For information on this, see Alex van der Bogaerdt's excellent tutorial

Calculate latency for touch screen UI running on ARM controller board running Linux

I have an embedded board which has ARM controller, runs Linux as OS, which also has touch based screen. The data to the screen is taken from the Frame Buffer (/dev/fb0). Is there any way we can calculate the response time between two UI screen switching occurs when any option is selected by touch?
There are 3 latencies involved in the above scenario
1. Time taken for the touchscreen to register the finger and raise an input-event.
Usually a few milliseconds.
Enable FTRACE and log the following with timestamps
-- ISR
-- Entry of Bottom-half
-- Invoking of input_report()
2. Time taken by the app responsible for the GUI to update it.
Depending upon the app/framework, usually the most significant contributor to latency.
Add normal console logs with timestamps in the GUI app's code
-- upon receiving the input event
-- just before the command to modify the GUI
3. The time taken by the display to update.
Usually within 15-30 milliseconds
The final latency is a sum-total of the above 3 latencies.

Resources