Assume I have two variants of my compiled program, ./foo and ./bar, and I want to find out if bar is indeed faster.
I can compare runtimes by running time ./foo and time ./bar, but the numbers vary too much to get a meaningful result here.
What is the quickest way to get a statistically sound comparisons of two command line program execution times? E.g. one that also tells me about the variance of the measurements?
The python module timeit also provides a simple command line interface, which is already much more convenient than issuing time commands multiple times:
$ python -m timeit -s 'import os' 'os.system("./IsSpace-before")'
10 loops, best of 3: 4.9 sec per loop
$ python -m timeit -s 'import os' 'os.system("./IsSpace-after")'
10 loops, best of 3: 4.9 sec per loop
The timeit module does not calculate averages and variances, but simply takes the minimum, on the basis that all measurement errors increase the measurement.
Related
I'm trying to find out how I can get \ calculate the CPU utilization of a specific process over X amount of time ( I write my code in python over a Linux based system ).
What I want to get for example is the average CPU of a process in the last hour\day\10 minutes...
Is there a command or a calculation I can run?
*I can't run a command like "top" in the background for X time and calculate the CPU, I need it to be in one set of commands or calculation.
I tried top research on top command but I didn't found useful info for my case.
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu - give the average consumption on the process lifetime
Can there be a way to use uptime or proc[pid]\stat to calculate this?
Thanks,
What about using pidstat?
$ pidstat -p 12345 10
Will output the stats for pid 12345 every 10 seconds. This include CPU%
From there you can run it on background, and redirect output to a file:
$ pidstat -p 12345 10 > my_pid_stats.txt &
Here is a Link with several examples. There is a lot of flexibility in the output, so you can probably customize it to better suit your needs:
https://www.thegeekstuff.com/2014/11/pidstat-examples/
pidstat is part of the sysstat package on ubuntu, in case you decide to install it.
I am trying to run a MiniZinc model with a OSICBC solver via bash, with the following command-line arguments (subject to a time limit of 30000ms or 30s):
minizinc --solver osicbc model.mzn data.dzn --time-limit 30000 --output-time
But for just this run, the entire process upon executing the command to getting outputs takes about a minute, and the output shows that "Time Elapsed: 36.21s" at the end.
Is this the right approach to imposing a time limit in running this model, where total time taken includes the time from which the command is invoked to which the outputs are shown in my terminal?
The --time-limit command line flag was introduced in MiniZinc 2.2.0 to allow the user to restrict the combined time that the compiler and the solver take. It also introduced --solver-time-limit to just limit the solver time.
Note that minizinc will allow the solver some extra time to output their final solutions.
If you experience that these flags do not limit the solver to the specified times and they are not stopped within a second of the given limit, then this would suggest a bug and I would invite you to make a bug report: https://github.com/MiniZinc/libminizinc/issues
I use a time.sleep(0.0000001) in loops in a multithreading application because otherwise it won't respond to a thread.interrupt_main() invoked in a backdoor server thread.
I wanted the sleep to be as short as possible in order for the task to run as fast as possible, yet to be still responsive to the interrupt.
It worked fine with python 2.7 (2.7.9 / Anaconda 2.2.0). Using it with python 3.5.1 (Anaconda 4.1.0) on the same machine, it lasted much longer, slowing everything down significantly.
Further investigations in ipython showed the following in python 2
%timeit time.sleep(0.0000001)
resulted in: 1000000 loops, best of 3: 3.72 µs per loop
%timeit time.sleep(0.000000)
resulted in: 1000000 loops, best of 3: 3.86 µs per loop
In python 3 this was different:
%timeit time.sleep(0.0000001)
resulted in: 100 loops, best of 3: 4 ms per loop
%timeit time.sleep(0.000000)
resulted in: 1000000 loops, best of 3: 7.87 µs per loop
I know about the system dependent resolution of time.sleep() which is definitly larger that the 0.0000001. So what I'm using is basically calling time.sleep() as an interrupt.
What explains the difference of about 1000 times between python 2 and python 3?
Can it be changed?
Is there an alternative possibility to make an application/thread more repsonsive to thread.interrupt_main()?
Edit:
My first aproach of reducing the time parameter showed a time reduction in python 2 for values less than 0.000001, which didn't work in python 3.
A value of 0 seems to work now for both versions.
When running a benchmark e.g. dhrystone with the command:
make output/dhrystone.riscv.out
as described at: http://riscv.org/download.html#tab_rocket,
on the C++ emulator. I get the following output:
When running it for the first time:
Microseconds for one run through Dhrystone: 1064
Dhrystones per Second: 939
cycle = 533718
instret = 148672
and the second time:
Microseconds for one run through Dhrystone: 1064
Dhrystones per Second: 939
cycle = 533715
instret = 148672
Why do the cycles differ? Shouldn't they be exactly the same. I have tried this with other benchmarks too and had even higher deviations. If this is normal where do the deviations come from?
There are small amounts of nondeterminism from randomly initialized registers (e.g., the clock that is recovered by the HTIF is initialized to a random phase). It doesn't seem like these minor deviations would impact any performance benchmarking.
If you need identical results each time (e.g., for verification?), you could modify the emulator code to initialize registers to some known value each time.
I was benchmarking some python code I noticed something strange. I used the following function to measure how fast it took to iterate through an empty for loop:
def f(n):
t1 = time.time()
for i in range(n):
pass
print(time.time() - t1)
f(10**6) prints about 0.035, f(10**7) about 0.35, f(10**8) about 3.5, and f(10**9) about 35. But f(10**10)? Well over 2000. That's certainly unexpected. Why would it take over 60 times as long to iterate through 10 times as many elements? What's with python's for loops that causes this? Is this python-specific, or does this occur in a lot of languages?
When you get above 10^9 you get out of 32bit integer range. Python3 then transparently moves you onto arbitrary precision integers, which are much slower to allocate and use.
In general working with such big numbers is one of the areas where Python3 is a lot slower that Python2 (which at least had fast 64bit integers on many systems). On the good side it makes python easier to use, with fewer overflow type errors.
Some accurate timings using timeit show the times actually roughly increase in line with the input size so your timings seem to be quite a ways off:
In [2]: for n in [10**6,10**7,10**8,10**9,10**10]:
% timeit f(n)
...:
10 loops, best of 3: 22.8 ms per loop
1 loops, best of 3: 226 ms per loop # roughly ten times previous
1 loops, best of 3: 2.26 s per loop # roughly ten times previous
1 loops, best of 3: 23.3 s per loop # roughly ten times previous
1 loops, best of 3: 4min 18s per loop # roughly ten times previous
Using xrange and python2 we see the ratio roughly the same, obviously python2 is much faster overall due to the fact python3 int has been replaced by long:
In [5]: for n in [10**6,10**7,10**8,10**9,10**10]:
% timeit f(n)
...:
100 loops, best of 3: 11.3 ms per loop
10 loops, best of 3: 113 ms per loop
1 loops, best of 3: 1.13 s per loop
1 loops, best of 3: 11.4 s per loop
1 loops, best of 3: 1min 56s per loop
The actual difference in run time seems to be more related to the size of window's long rather than directly related to python 3. The difference is marginal when using unix which handles longs much differently to windows so this is a platform specific issue as much if not more than a python one.