Benchmarking two binary file in linux - linux

I have two binary file.This file is not make by me.
I need to benchmarking this files to see how works fast and well.
I try to use the time command but the big problem is :
to run in same time the files
to stop in same time the running files
to use time command with this two files.
If i use this solution Benchmarking programs on Linux and change the place of the file with time running command the output changed.
0m0.010s file1
0m0.017s file2
change the order in time command .
0m0.002s file2
0m0.013s file1
Thank you. Regards.

There are many ways to do what you want, but probably simplest one is to simply run one program in a loop many times (say 1000 or more) such that total execution time becomes something that is easy to measure - say, 50 seconds. Then repeat the same for another one.
This allows you to get much more accurate measurements, and also minimizes inherent jitter between runs.
Having said that, note that with run times as low as you observe, time to start process may be not a small fraction of total measurement you get. So, if you run a loop, be sure to consider price to start new process 1000 times.

Related

How to find which part of a program takes the most time (including kernel time and context-switching)

When doing performance analysis on CPU-intensive work, I generally use sudo perf record -a -s -e cpu-clock -g <executable>. But that doesn't show how much time is spent on context-switch, how much time is spent in network stack, which I think is not super helpful to a network-IO-bounded task.
My specific case:
I have a program which load data from gcs, deserialize it.
I have verified the peak throughput could be >200MBps, but initialization time takes 4.5 seconds, of which 1 second is spent on authentication, the other 3.5 seconds is really loading the data.
The data initialization reads metadata for ~300 files, but each file only read 1.25MB.
This is the abstract context for my specific case. I want to know if there's any tools available in linux(ubuntu), to see which part of my program takes the most of the time? So that I could make some changes to it.

Running Pandoc command through subprocess.run()

I am running a Pandoc command through python's subprocess.run() and find the command runs very slowly compared to running it on the terminal. It is instantaneous on the terminal and takes 16 seconds in python.
The larger the files the more time it takes. My test file is only 4K and it takes 2 seconds to process it.
I am sending the command as a list like so:
['pandoc', '--defaults=defaults.yaml', '--bibliography', 'bibliography.bib']
In addition without the --bibliography argument, the command runs faster probably due to less processing.
The other thing that I find puzzling is I tried shell=True just for kicks and the running time that took 16 seconds on my files came down to 2 seconds. I know shell=Trueis a big no-no and what I am looking for is answers or alternatives on why this command is running so slow and what I can do to remedy it.
Thanks in advance.

What may slow down an ATA read-verify command sent to HDD on linux?

I am writing a C program to scan hard drives using ATA read-verify(0x40) command on Linux, like what MHDD's scan does on DOS.
I issue the command using HDIO_DRIVE_TASK, and measure ioctl's block time using CLOCK_MONOTONIC.
I run the program as root, and have its ionice set to real time, but the readouts are always larger than what MHDD shows. Also, MHDD's result don't change a lot, but my program's result often vary a lot.
I try to issue the command twice for each block and measure the block time of the second run.
This fixes part of the problem, but my results still vary a lot.
What factors may slow down my command? How should I avoid them?
P.S. I have some spare drives with different health for testing use.

Serial Code Experience Big Difference In Running Time On A GPFS FS

I need to measure the wall time of a serial code running on our cluster. In an exclusive mode, i.e., no other user is using my node, the wall time of the code vary quite a lot, ranging from 2:30m to 3:20m. The code does the same thing in every run. I am wandering if the big variance in the wall time is caused by the GPFS file system since the code reads and writes to files stored in a GPFS file system. My question is if there is a tool I can view the GPFS i/o performance and relate it to the performance of my code?
Thanks.
This is a very big question...we need to narrow it down a bit. So, let me ask some questions.
Let us see the time command output for a simple ls command.
$ time ls
real 0m0.003s
user 0m0.001s
sys 0m0.001s
Wall clock time is == real time, which in your case, is varying. If we go to the next step of debugging, the question to ask is: does user time and system time also varies? If GPFS file system is inside the kernel and consumes varying time, you should see the sys time vary. If the sys time remains the same, but the real time varies, then the program is spending time sleeping on something. There are more deeper ways to find the problem....but can you please clarify your question more?

Benchmark a linux Bash script

Is there a way to benchmark a bash script's performance? the script downloads a remote file, and then makes calls to multiple commandline programs to manipulate. I would like to know (or as much as possible):
Total time
Time spent downloading
Time spent on each command called
-=[ I think these could be wrapped in "time" calls right? ]=-
Average download speed
uses wget
Total Memory used
Total CPU usage
CPU usage per command called
I'm able to make edits to the bash script to insert any benchmark commands needed at specific points (ie, between app calls). Not sure if some "top" ninja-ry could solve this or not. Not able to find anything useful (at least to limited understanding) in man file.
Will be running the benchmarks on OSX Terminal as well as Ubuntu (if either matter).
strace -o trace -c -Ttt ./scrip
-c is to trace the time spent by cpu on specific call.
-Ttt will tell you time in microseconds at time of each system call running.
-o will save output in file "trace".
You should be able to achieve this a number of ways. One way is to use time built-in function for each command of interest and capture the results. You may have to be careful about any pipes and redirects;
You may also consider trapping SIGCHLD, DEBUG, RETURN, ERR and EXIT signals and putting timing information in there, but you may not get some results.
This concept of CPU usage of each command won't give you any thing useful, all commands use 100% of cpu. Memory usage is something you can pull out but you should look at
If you want to get deep process statistics then you would want to use strace... See strace(1) man page for details. I doubt that -Ttt as it is suggest elsewhere is useful all that tells you are system call times and you want other process trace info.
You may also want to see ltrace and dstat tools.
A similar question is answered here Linux benchmarking tools

Resources