I run a script on Ubuntu, and tested its time:
$ time ./merger
./merger 0.02s user 0.03s system 99% cpu 0.050 total
it spent less than 1 second.
but if I used cygwin:
$ time ./merger
real 3m22.407s
user 0m0.367s
sys 0m0.354s
It spent more than 3 minutes.
Why did this happen? What shall I do to increase the executing speed on cygwin?
As others have already mentioned, Cygwin's implementation of fork and process spawning on Windows in general are slow.
Using this fork() benchmark, I get following results:
rr-#cygwin:~$ ./test 1000
Forked, executed and destroyed 1000 processes in 5.660011 seconds.
rr-#arch:~$ ./test 1000
Forked, executed and destroyed 1000 processes in 0.142595 seconds.
rr-#debian:~$ ./test 1000
Forked, executed and destroyed 1000 processes in 1.141982 seconds.
Using time (for i in {1..10000};do cat /dev/null;done) to benchmark process spawning performance, I get following results:
rr-#work:~$ time (for i in {1..10000};do cat /dev/null;done)
(...) 19.11s user 38.13s system 87% cpu 1:05.48 total
rr-#arch:~$ time (for i in {1..10000};do cat /dev/null;done)
(...) 0.06s user 0.56s system 18% cpu 3.407 total
rr-#debian:~$ time (for i in {1..10000};do cat /dev/null;done)
(...) 0.51s user 4.98s system 21% cpu 25.354 total
Hardware specifications:
cygwin: Intel(R) Core(TM) i7-3770K CPU # 3.50GHz
arch: Intel(R) Core(TM) i7-4790K CPU # 4.00GHz
debian: Intel(R) Core(TM)2 Duo CPU T5270 # 1.40GHz
So as you see, no matter what you use, Cygwin will always operate worse. It loses hands down even to worse hardware (cygwin vs. debian in this benchmark, as per this comparison).
Related
I want to figure out how much memory a specific command uses but I'm not sure how to check for the peak memory of the command. Is there anything like the time([command]) usage but for memory?
Basically, I'm going to have to run an interactive queue using SLURM, then test a command for a program I need to use for a single sample, see how much memory was used, then submit a bunch of jobs using that info.
Yes, time is the program that monitors programs and shows the Maximum resident set size. Not to be confused with time Bash builtin that only shows real/user/sys times. On my Arch Linux you have to install time with pacman -S time, it's a separate package.
$ command time -v echo 1
1
Command being timed: "echo 1"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1968
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 90
Voluntary context switches: 1
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Note:
$ type time
time is a shell keyword
$ time -V
bash: -V: command not found
real 0m0.002s
user 0m0.000s
sys 0m0.002s
$ command time -V
time (GNU Time) 1.9
$ /bin/time -V
time (GNU Time) 1.9
$ /usr/bin/time -V
time (GNU Time) 1.9
I am trying to profile my userspace program on aria10 fpga board (with 2 ARM Cortex A9 CPUs) which has PMU support. I am running windriver linux version 9.x. I built my kernel with almost all of the CONFIG_ options people suggested over the internet. Also, my pgm is compiled with –fno-omit-frame-pointer and –g options.
What I see is that ‘perf record’ doesn’t generate any samples at all. ‘perf stat true’ output looks to be valid though (not sure what to make out of it). Does anyone have suggestion/ideas why I am not seeing any sample being generated?
~: perf record --call-graph dwarf -- my_app
^C
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data ]
~: perf report -g graph --no-children
Error:
The perf.data file has no samples!
To display the perf.data header info, please use --header/--header-only options.
~: perf stat true
Performance counter stats for 'true':
1.095300 task-clock (msec) # 0.526 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
22 page-faults # 0.020 M/sec
1088056 cycles # 0.993 GHz
312708 instructions # 0.29 insn per cycle
29159 branches # 26.622 M/sec
16386 branch-misses # 56.20% of all branches
0.002082030 seconds time elapsed
I don't use a VM in this setup. Arria10 is intel FPGA with 2 ARM CPUs that supports PMU.
Edit:
1. I realize now that ARM CPU has HW PMU support (opposite to what I mentioned earlier). Even with HW PMU support, I am not able to do 'perf record' successfully.
This is an old question, but for people who find this via search:
perf record -e cpu-clock <command>
works for me. The problem seems to be that th default event (cycles) is not available
I'm working on a performance tool and I'm interested in the total disk I/O i single process have done since it started. I have the porcess PID and i can easily get the current I/O rate with tools like iotop or sar, but not the total I/O.
Is this even logged in Linux and is there a way to get it?
/Mpresmann
You can read the /proc/<PID>/io file for specific process
$ sudo cat /proc/1/io
rchar: 144440702940
wchar: 4615239440674
syscr: 156954128
syscw: 173077623
read_bytes: 113700176646
write_bytes: 100325525146
cancelled_write_bytes: 2596581376
I am running a script that loads big files. I ran the same script in a single core OpenSuSe server and quad core PC. As expected in my PC it is much more faster than in the server. But, the script slows down the server and makes it impossible to do anything else.
My script is
for 100 iterations
Load saved data (about 10 mb)
time myscript (in PC)
real 0m52.564s
user 0m51.768s
sys 0m0.524s
time myscript (in server)
real 32m32.810s
user 4m37.677s
sys 12m51.524s
I wonder why "sys" is so high when i run the code in server. I used top command to check the memory and cpu usage.
It seems there is still free memory, so swapping is not the reason. % sy is so high, its probably the reason for the speed of server but I dont know what is causing % sy so high. The process that is using highest percent of CPU (99%) is "myscript". %wa is zero in the screenshot but sometimes it gets very high (50 %).
When the script is running, load average is greater than 1 but have never seen to be as high as 2.
I also checked my disc:
strt:~ # hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 16480 MB in 2.00 seconds = 8247.94 MB/sec
Timing buffered disk reads: 20 MB in 3.44 seconds = 5.81 MB/sec
john#strt:~> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 245G 102G 131G 44% /
udev 4.0G 152K 4.0G 1% /dev
tmpfs 4.0G 76K 4.0G 1% /dev/shm
I have checked these things but I am still not sure what is the real problem in my server and how to fix it. Can anyone identify a probable reason for the slowness? What could be the solution?
Or is there anything else I should check?
Thanks!
You're getting a high sys activity because the load of the data you're doing takes system calls that happen in kernel. To resolve your slowness problems without upgrading the server might be possible. You can modify scheduling priority. See the man pages for nice and renice. See here and especially:
Niceness values range from -20 (the highest priority, lowest niceness) and 19 (the lowest priority, highest niceness).
$ ps -lp 941
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 941 1 0 70 -10 - 1713 poll_s ? 00:00:00 sshd
$ nice -n 19 ./test.sh
My niceness value is 19
$ renice -n 10 -p 941
941 (process ID) old priority -10, new priority 10
How can I get CPU user time and system time for each cpu on AIX.
I know I can get this value from cat /proc/stat on a linux machine, and from pstat_getprocessor() on an HP-UX machine. Is there a way to get this same metric on an AIX machine.
$ cat /proc/stat
...
cpu 23697394 7969 2744135 4505191649 2958605 190 17883 0 0
cpu0 12511394 4575 1520243 2251753159 1480624 137 10580 0 0
cpu1 11186000 3394 1223891 2253438490 1477980 53 7302 0 0
...
mpstat is providing these metrics, either parse its output or figure out how/where does it find them.