Is there an equivalent for time([some command]) for checking peak memory usage of a bash command? - linux

I want to figure out how much memory a specific command uses but I'm not sure how to check for the peak memory of the command. Is there anything like the time([command]) usage but for memory?
Basically, I'm going to have to run an interactive queue using SLURM, then test a command for a program I need to use for a single sample, see how much memory was used, then submit a bunch of jobs using that info.

Yes, time is the program that monitors programs and shows the Maximum resident set size. Not to be confused with time Bash builtin that only shows real/user/sys times. On my Arch Linux you have to install time with pacman -S time, it's a separate package.
$ command time -v echo 1
1
Command being timed: "echo 1"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1968
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 90
Voluntary context switches: 1
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Note:
$ type time
time is a shell keyword
$ time -V
bash: -V: command not found
real 0m0.002s
user 0m0.000s
sys 0m0.002s
$ command time -V
time (GNU Time) 1.9
$ /bin/time -V
time (GNU Time) 1.9
$ /usr/bin/time -V
time (GNU Time) 1.9

Related

.NET Core application on Linux slow to crash the first time

I'm running a .NET Core 2.0 console EXE on Ubuntu 16.04. The application terminates with an unhandled exception - which can be anything, e.g. this is enough:
static void Main(string[] args)
{
throw new ApplicationException("Test .NET core exception.");
}
The first time this happens on a given machine there is noticeable delay (2 seconds or so) between the time it prints
Unhandled Exception: System.ApplicationException: Test .NET core exception.
and the time it prints
Aborted (core dumped)
and the process dies. During this time the process uses all of one CPU. (No core files is actually created.)
If I run the application again there is no such delay any more. What is it doing during that first crash and how do I prevent this delay to let it crash quickly? The delay is only a second for a trivial application that just throws an exception, but can be 20-30 seconds for one that uses a lot of RAM.
Running with /usr/bin/time -v dotnet CoreConsoleApp1.dll the first time:
Unhandled Exception: System.ApplicationException: Test .NET core exception.
at CoreConsoleApp1.Program.Main(String[] args) in CoreConsoleApp1\Program.cs:line 20
Command terminated by signal 6
Command being timed: "dotnet CoreConsoleApp1.dll"
User time (seconds): 0.04
System time (seconds): 0.02
Percent of CPU this job got: 1%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.92
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 32388
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 193
Minor (reclaiming a frame) page faults: 2708
Voluntary context switches: 1358
Involuntary context switches: 29
Swaps: 0
File system inputs: 57736
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
The second time:
Unhandled Exception: System.ApplicationException: Test .NET core exception.
at CoreConsoleApp1.Program.Main(String[] args) in CoreConsoleApp1\Program.cs:line 20
Command terminated by signal 6
Command being timed: "dotnet CoreConsoleApp1.dll"
User time (seconds): 0.03
System time (seconds): 0.00
Percent of CPU this job got: 26%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.16
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 29416
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2413
Voluntary context switches: 26
Involuntary context switches: 33
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
The project was built in Visual Studio 2017 on Windows.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial
$ uname -a
Linux evgeny-linux 4.4.0-1052-aws #61-Ubuntu SMP Mon Feb 12 23:05:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c %d %P

Why using pipe for sort (linux command) is slow?

I have a large text file of ~8GB which I need to do some simple filtering and then sort all the rows. I am on a 28-core machine with SSD and 128GB RAM. I have tried
Method 1
awk '...' myBigFile | sort --parallel = 56 > myBigFile.sorted
Method 2
awk '...' myBigFile > myBigFile.tmp
sort --parallel 56 myBigFile.tmp > myBigFile.sorted
Surprisingly, method1 takes 11.5 min while method2 only takes (0.75 + 1 < 2) min. Why is sorting so slow when piped? Is it not paralleled?
EDIT
awk and myBigFile is not important, this experiment is repeatable by simply using seq 1 10000000 | sort --parallel 56 (thanks to #Sergei Kurenkov), and I also observed a six-fold speed improvement using un-piped version on my machine.
When reading from a pipe, sort assumes that the file is small, and for small files parallelism isn't helpful. To get sort to utilize parallelism you need to tell it to allocate a large main memory buffer using -S. In this case the data file is about 8GB, so you can use -S8G. However, at least on your system with 128GB of main memory, method 2 may still be faster.
This is because sort in method 2 can know from the size of the file that it is huge, and it can seek in the file (neither of which is possible for a pipe). Further, since you have so much memory compared to these file sizes, the data for myBigFile.tmp need not be written to disc before awk exits, and sort will be able to read the file from cache rather than disc. So the principle difference between method 1 and method 2 (on a machine like yours with lots of memory) is that sort in method 2 knows the file is huge and can easily divide up the work (possibly using seek, but I haven't looked at the implementation), whereas in method 1 sort has to discover the data is huge, and it can not use any parallelism in reading the input since it can't seek the pipe.
I think sort does not use threads when read from pipe.
I have used this command for your first case. And it shows that sort uses only 1 CPU even though it is told to use 4. atop actually also shows that there is only one thread in sort:
/usr/bin/time -v bash -c "seq 1 1000000 | sort --parallel 4 > bf.txt"
I have used this command for your second case. And it shows that sort uses 2 CPU. atop actually also shows that there are four thread in sort:
/usr/bin/time -v bash -c "seq 1 1000000 > tmp.bf.txt && sort --parallel 4 tmp.bf.txt > bf.txt"
In you first scenario sort is an I/O bound task, it does lots of read syscalls from stdin. In your second scenario sort uses mmap syscalls to read file and it avoids being an I/O bound task.
Below are results for the first and second scenarios:
$ /usr/bin/time -v bash -c "seq 1 10000000 | sort --parallel 4 > bf.txt"
Command being timed: "bash -c seq 1 10000000 | sort --parallel 4 > bf.txt"
User time (seconds): 35.85
System time (seconds): 0.84
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:37.43
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 9320
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2899
Voluntary context switches: 1920
Involuntary context switches: 1323
Swaps: 0
File system inputs: 0
File system outputs: 459136
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
$ /usr/bin/time -v bash -c "seq 1 10000000 > tmp.bf.txt && sort --parallel 4 tmp.bf.txt > bf.txt"
Command being timed: "bash -c seq 1 10000000 > tmp.bf.txt && sort --parallel 4 tmp.bf.txt > bf.txt"
User time (seconds): 43.03
System time (seconds): 0.85
Percent of CPU this job got: 175%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:24.97
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1018004
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2445
Voluntary context switches: 299
Involuntary context switches: 4387
Swaps: 0
File system inputs: 0
File system outputs: 308160
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
You have more system calls, if you use the pipe.
seq 1000000 | strace sort --parallel=56 2>&1 >/dev/null | grep read | wc -l
2059
Without the pipe the file is mapped into memory.
seq 1000000 > input
strace sort --parallel=56 input 2>&1 >/dev/null | grep read | wc -l
33
Kernel calls are in most cases the bottle neck. That is the reason why sendfile has been invented.

calculating correct memory utilization from invalid output of pidstat

I used, pidstat -r -p <pid> <interval> >> log_Path/MemStat.CSV & command to to collect the memory stat.
after running this command I found that RSS VSZ %MEM values are increased continuously, which is not expected, as pidstat provides the the values considering the interval.
After searching on net I found that there is bug in pidstat and I need to update the syssat package.
(please refer last few statements by pidstat author on this link : http://sebastien.godard.pagesperso-orange.fr/tutorial.html)
Now, my question is, how do I calculate the correct %MEM utilization from the current output as we cannot run the test again.
sample output :
Time PID minflt/s majflt/s VSZ RSS %MEM
9:55:22 AM 18236 1280 0 26071488 119136 0.36
9:55:23 AM 18236 4273 0 27402768 126276 0.38
9:55:24 AM 18236 9831 0 27402800 162468 0.49
9:55:25 AM 18236 161 0 27402800 169092 0.51
9:55:26 AM 18236 51 0 27402800 175416 0.53
9:55:27 AM 18236 6859 0 27402800 198340 0.6
9:55:28 AM 18236 1440 0 27402800 203608 0.62
On the Sysstat tutorial page you refer to it is said:
I noticed that pidstat had a memory footprint (VSZ and RSS fields)
that was constantly increasing as the time went by. I quickly found
that I had forgotten to close a file descriptor in a function of my
code and that was responsible for the memory leak...!
So, the output of pidstat has never been invalid; on the contrary, the author wrote
pidstat helped me to detect a memory leak in the pidstat command
itself.

Why is the system CPU time (% sy) high?

I am running a script that loads big files. I ran the same script in a single core OpenSuSe server and quad core PC. As expected in my PC it is much more faster than in the server. But, the script slows down the server and makes it impossible to do anything else.
My script is
for 100 iterations
Load saved data (about 10 mb)
time myscript (in PC)
real 0m52.564s
user 0m51.768s
sys 0m0.524s
time myscript (in server)
real 32m32.810s
user 4m37.677s
sys 12m51.524s
I wonder why "sys" is so high when i run the code in server. I used top command to check the memory and cpu usage.
It seems there is still free memory, so swapping is not the reason. % sy is so high, its probably the reason for the speed of server but I dont know what is causing % sy so high. The process that is using highest percent of CPU (99%) is "myscript". %wa is zero in the screenshot but sometimes it gets very high (50 %).
When the script is running, load average is greater than 1 but have never seen to be as high as 2.
I also checked my disc:
strt:~ # hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 16480 MB in 2.00 seconds = 8247.94 MB/sec
Timing buffered disk reads: 20 MB in 3.44 seconds = 5.81 MB/sec
john#strt:~> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 245G 102G 131G 44% /
udev 4.0G 152K 4.0G 1% /dev
tmpfs 4.0G 76K 4.0G 1% /dev/shm
I have checked these things but I am still not sure what is the real problem in my server and how to fix it. Can anyone identify a probable reason for the slowness? What could be the solution?
Or is there anything else I should check?
Thanks!
You're getting a high sys activity because the load of the data you're doing takes system calls that happen in kernel. To resolve your slowness problems without upgrading the server might be possible. You can modify scheduling priority. See the man pages for nice and renice. See here and especially:
Niceness values range from -20 (the highest priority, lowest niceness) and 19 (the lowest priority, highest niceness).
$ ps -lp 941
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 941 1 0 70 -10 - 1713 poll_s ? 00:00:00 sshd
$ nice -n 19 ./test.sh
My niceness value is 19
$ renice -n 10 -p 941
941 (process ID) old priority -10, new priority 10

Getting CPU utilization information

How could I get the CPU utilization with time info of a process in linux? Basically I want to let my application run overnight. At the same time, I would like to monitor the CPU utilization during the period the application is run.
I tried top | grep appName >& log, it does not seem to return me anything in the log. Could someone help me with this?
Thanks.
vmstat and iostat can both give you periodic information of this nature; I would suggest either setting up the number of times manually, or putting a single poll into a cron job, and then redirecting the output to a file:
vmstat 20 4230 >> cpu_log_file
This would give you a snapshot of usage every 20 seconds for 24 hours.
install sysstat package and run sar
nohup sar -o output.file 12 8 >/dev/null 2>&1 &
use the top or watch command
PID COMMAND %CPU TIME #TH #WQ #PORT #MREG RPRVT RSHRD RSIZE VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW PAGEINS USER
10764 top 8.4 00:01.04 1/1 0 24 33 2000K 244K 2576K 17M 2378M 10764 10719 running 0 9908+ 54 564790+ 282365+ 3381+ 283412+ 838+ 27 root
10763 taskgated 0.0 00:00.00 2 0 25 27 432K 244K 1004K 27M 2387M 10763 1 sleeping 0 376 60 140 60 160 109 11 0 root
Write a program that invokes your process and then calls getrusage(2) and reports statistics for its children.
You can monitor the time used by your program with top while it is running.
Alternatively, you can launch your application with the time command, which will print the total amount of CPU time used by your program at the end of its execution. Just type time ./my_app instead of just ./my_app
For more info, man 1 time

Resources