How to measure CPU usage

How to measure CPU usage - linux

I would like to log CPU usage at a frequency of 1 second.
One possible way to do it is via vmstat 1 command.
The problem is that the time between each output is not always exactly one second, especially on a busy server. I would like to be able to output the timestamp along with the CPU usage every second. What would be a simple way to accomplish this, without installing special tools?

There are many ways to do that. Except top another way is to you the "sar" utility. So something like
sar -u 1 10
will give you the cpu utilization for 10 times every 1 second. At the end it will print averages for each one of the sys, user, iowait, idle
Another utility is the "mpstat", that gives you similar things with sar

Use the well-known UNIX tool top that is normally available on Linux systems:
top -b -d 1 > /tmp/top.log
The first line of each output block from top contains a timestamp.
I see no command line option to limit the number of rows that top displays.
Section 5a. SYSTEM Configuration File and 5b. PERSONAL Configuration File of the top man page describes pressing W when running top in interactive mode to create a $HOME/.toprc configuration file.
I did this, then edited my .toprc file and changed all maxtasks values so that they are maxtasks=4. Then top only displays 4 rows of output.
For completeness, the alternative way to do this using pipes is:
top -b -d 1 | awk '/load average/ {n=10} {if (n-- > 0) {print}}' > /tmp/top.log

You might want to try htop and atop. htop is beautifully interactive while atop gathers information and can report CPU usage even for terminated processes.

I found a neat way to get the timestamp information to be displayed along with the output of vmstat.
Sample command:
vmstat -n 1 3 | while read line; do echo "$(date --iso-8601=seconds) $line"; done
Output:
2013-09-13T14:01:31-0700 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
2013-09-13T14:01:31-0700 r b swpd free buff cache si so bi bo in cs us sy id wa
2013-09-13T14:01:31-0700 1 1 4197640 29952 124584 12477708 12 5 449 147 2 0 7 4 82 7
2013-09-13T14:01:32-0700 3 0 4197780 28232 124504 12480324 392 180 15984 180 1792 1301 31 15 38 16
2013-09-13T14:01:33-0700 0 1 4197656 30464 124504 12477492 344 0 2008 0 1892 1929 32 14 43 10

To monitor the disk usage, cpu and load i created a small bash scripts that writes the values to a log file every 10 seconds.
This logfile is processed by logstash kibana and riemann.
# #!/usr/bin/env bash
# Define a timestamp function
LOGPATH="/var/log/systemstatus.log"
timestamp() {
date +"%Y-%m-%dT%T.%N"
}
#server load
while ( sleep 10 ) ; do
echo -n "$(timestamp) linux::systemstatus::load " >> $LOGPATH
cat /proc/loadavg >> $LOGPATH
#cpu usage
echo -n "$(timestamp) linux::systemstatus::cpu " >> $LOGPATH
top -bn 1 | sed -n 3p >> $LOGPAT
#disk usage
echo -n "$(timestamp) linux::systemstatus::storage " >> $LOGPATH
df --total|grep total|sed "s/total//g"| sed 's/^ *//' >> $LOGPATH
done

Related

slurm: Is there any way to return unused core number?

As we know squeue returns the status of the running jobs.
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
130 debug run.sh user PD 0:00 1 (Resources)
131 debug run.sh user PD 0:00 1 (Resources)
128 debug 52546914 user R 7:28 1 node1
129 debug run.sh user R 0:02 1 node1
For example my core number is 2.
[Q] Is there any way to return only the unused core number? In the example, unused core number should return 0.
Should I write a parser for this in order to retrieve core number next to each R, add them, and subtract it from total core number as follows:
squeue | grep -P ' R ' | awk '{print $7}' | paste -sd+ - | bc

To know the number of core (CPUs) that are available in your cluster, you can use the sinfo command:
$ sinfo -o%C
CPUS(A/I/O/T)
0/1920/0/1920
You can retrieve the numbers into Bash variables easily with
IFS=/ read A I O T <<<$(sinfo -h -o%C)
After running the above command, A will contain the number of allocated cores, I will be the number of idle cores, O will hold the number of 'other' cores, i.e. drained, down, etc. and T will be the total number of cores in the system.
Note that in your question you talk about cores but actually compute the number of nodes. If what you want is the number of nodes, you can use:
$ sinfo -o%A
NODES(A/I)
0/80
See the sinfo man page for more details.

Filter Column Linux

this question will be easy for you I guess - but I'm a Linux "noob".
Given is the output from a Juniper-Router:
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND
2434 root 96 0 96476K 14180K select 0:05 0.10% jdhcpd
Is it possible (I have a shell on the Device) to somehow filter the output to only show the WCPU Percentage? I want to make a script that restarts a service once a threshold is exceeded - but i need only the value itself - not all the "Username, PID" and stuff.

Simply with awk:
<Juniper-call> | awk 'NR==2{ print $9 }'
The above should print a value like:
0.10%

Why using pipe for sort (linux command) is slow?

I have a large text file of ~8GB which I need to do some simple filtering and then sort all the rows. I am on a 28-core machine with SSD and 128GB RAM. I have tried
Method 1
awk '...' myBigFile | sort --parallel = 56 > myBigFile.sorted
Method 2
awk '...' myBigFile > myBigFile.tmp
sort --parallel 56 myBigFile.tmp > myBigFile.sorted
Surprisingly, method1 takes 11.5 min while method2 only takes (0.75 + 1 < 2) min. Why is sorting so slow when piped? Is it not paralleled?
EDIT
awk and myBigFile is not important, this experiment is repeatable by simply using seq 1 10000000 | sort --parallel 56 (thanks to #Sergei Kurenkov), and I also observed a six-fold speed improvement using un-piped version on my machine.

When reading from a pipe, sort assumes that the file is small, and for small files parallelism isn't helpful. To get sort to utilize parallelism you need to tell it to allocate a large main memory buffer using -S. In this case the data file is about 8GB, so you can use -S8G. However, at least on your system with 128GB of main memory, method 2 may still be faster.
This is because sort in method 2 can know from the size of the file that it is huge, and it can seek in the file (neither of which is possible for a pipe). Further, since you have so much memory compared to these file sizes, the data for myBigFile.tmp need not be written to disc before awk exits, and sort will be able to read the file from cache rather than disc. So the principle difference between method 1 and method 2 (on a machine like yours with lots of memory) is that sort in method 2 knows the file is huge and can easily divide up the work (possibly using seek, but I haven't looked at the implementation), whereas in method 1 sort has to discover the data is huge, and it can not use any parallelism in reading the input since it can't seek the pipe.

I think sort does not use threads when read from pipe.
I have used this command for your first case. And it shows that sort uses only 1 CPU even though it is told to use 4. atop actually also shows that there is only one thread in sort:
/usr/bin/time -v bash -c "seq 1 1000000 | sort --parallel 4 > bf.txt"
I have used this command for your second case. And it shows that sort uses 2 CPU. atop actually also shows that there are four thread in sort:
/usr/bin/time -v bash -c "seq 1 1000000 > tmp.bf.txt && sort --parallel 4 tmp.bf.txt > bf.txt"
In you first scenario sort is an I/O bound task, it does lots of read syscalls from stdin. In your second scenario sort uses mmap syscalls to read file and it avoids being an I/O bound task.
Below are results for the first and second scenarios:
$ /usr/bin/time -v bash -c "seq 1 10000000 | sort --parallel 4 > bf.txt"
Command being timed: "bash -c seq 1 10000000 | sort --parallel 4 > bf.txt"
User time (seconds): 35.85
System time (seconds): 0.84
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:37.43
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 9320
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2899
Voluntary context switches: 1920
Involuntary context switches: 1323
Swaps: 0
File system inputs: 0
File system outputs: 459136
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
$ /usr/bin/time -v bash -c "seq 1 10000000 > tmp.bf.txt && sort --parallel 4 tmp.bf.txt > bf.txt"
Command being timed: "bash -c seq 1 10000000 > tmp.bf.txt && sort --parallel 4 tmp.bf.txt > bf.txt"
User time (seconds): 43.03
System time (seconds): 0.85
Percent of CPU this job got: 175%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:24.97
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1018004
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2445
Voluntary context switches: 299
Involuntary context switches: 4387
Swaps: 0
File system inputs: 0
File system outputs: 308160
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

You have more system calls, if you use the pipe.
seq 1000000 | strace sort --parallel=56 2>&1 >/dev/null | grep read | wc -l
2059
Without the pipe the file is mapped into memory.
seq 1000000 > input
strace sort --parallel=56 input 2>&1 >/dev/null | grep read | wc -l
33
Kernel calls are in most cases the bottle neck. That is the reason why sendfile has been invented.

background process log shows big gaps in message timestamps

I started a long running job under nohup in the background over a weekend. When looking at the output after it finished, I noticed that there were large gaps between the timestamps of some log messages. Some gaps were as long as 10 hrs. I had no way of finding out what was going on with my job at that time.
I ran it on a standard Red hat linux server machine at work.
Is this behavior caused by nohup command ? If not what could be possible causes ?
One such long running job was as the script below -
#!/bin/bash
while true
do
echo "`date` `top -n 1 -b | grep progname`"
done
And here one such gap from the log -
Mon May 26 04:29:42 PDT 2014 27685 user 18 0 2883m 2.8g 1732 S 0.0 3.9 29:05.54 progname
Tue May 27 03:20:35 PDT 2014 27685 user 18 0 3371m 3.3g 1732 S 0.0 4.6 34:23.21 progname

Ok.
pid is a variable that is the pid of the process you want to monitor--
Try this for starters (Courtesy of S Chazelas):
export pid=$(ps -ef | grep progname | awk '{$print $2}')
while rss=$(ps -o rss= -p "${pid}")
do
printf '%d %s\n' "$rss" "$(date)"
sleep 60;
done > t.lis
#
echo "Done $(date)" >> t.lis
rss is the resident set size (memory allocated to the process) in pages.
getconf PAGESIZE
will show how many bytes are in a page of memory.

How to see top processes sorted by actual memory usage?

I have a server with 12G of memory. A fragment of top is shown below:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12979 frank 20 0 206m 21m 12m S 11 0.2 26667:24 krfb
13 root 15 -5 0 0 0 S 1 0.0 36:25.04 ksoftirqd/3
59 root 15 -5 0 0 0 S 0 0.0 4:53.00 ata/2
2155 root 20 0 662m 37m 8364 S 0 0.3 338:10.25 Xorg
4560 frank 20 0 8672 1300 852 R 0 0.0 0:00.03 top
12981 frank 20 0 987m 27m 15m S 0 0.2 45:10.82 amarok
24908 frank 20 0 16648 708 548 S 0 0.0 2:08.84 wrapper
1 root 20 0 8072 608 572 S 0 0.0 0:47.36 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
The free -m shows the following:
total used free shared buffers cached
Mem: 12038 11676 362 0 599 9745
-/+ buffers/cache: 1331 10706
Swap: 2204 257 1946
If I understand correctly, the system has only 362 MB of available memory. My question is: How can I find out which process is consuming most of the memory?
Just as background info, the system is running 64bit OpenSuse 12.

use quick tip using top command in linux/unix
$ top
and then hit Shift+m (i.e. write a capital M).
From man top
SORTING of task window
For compatibility, this top supports most of the former top sort keys.
Since this is primarily a service to former top users, these commands do
not appear on any help screen.
command sorted-field supported
A start time (non-display) No
M %MEM Yes
N PID Yes
P %CPU Yes
T TIME+ Yes
Or alternatively: hit Shift + f , then choose the display to order by memory usage by hitting key n then press Enter. You will see active process ordered by memory usage

First, repeat this mantra for a little while: "unused memory is wasted memory". The Linux kernel keeps around huge amounts of file metadata and files that were requested, until something that looks more important pushes that data out. It's why you can run:
find /home -type f -name '*.mp3'
find /home -type f -name '*.aac'
and have the second find instance run at ridiculous speed.
Linux only leaves a little bit of memory 'free' to handle spikes in memory usage without too much effort.
Second, you want to find the processes that are eating all your memory; in top use the M command to sort by memory use. Feel free to ignore the VIRT column, that just tells you how much virtual memory has been allocated, not how much memory the process is using. RES reports how much memory is resident, or currently in ram (as opposed to swapped to disk or never actually allocated in the first place, despite being requested).
But, since RES will count e.g. /lib/libc.so.6 memory once for nearly every process, it isn't exactly an awesome measure of how much memory a process is using. The SHR column reports how much memory is shared with other processes, but there is no guarantee that another process is actually sharing -- it could be sharable, just no one else wants to share.
The smem tool is designed to help users better gage just how much memory should really be blamed on each individual process. It does some clever work to figure out what is really unique, what is shared, and proportionally tallies the shared memory to the processes sharing it. smem may help you understand where your memory is going better than top will, but top is an excellent first tool.

ps aux | awk '{print $2, $4, $11}' | sort -k2rn | head -n 10
(Adding -n numeric flag to sort command.)

First you should read an explanation on the output of free. Bottom line: you have at least 10.7 GB of memory readily usable by processes.
Then you should define what "memory usage" is for a process (it's not easy or unambiguous, trust me).
Then we might be able to help more :-)

List and Sort Processes by Memory Usage:
ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS

ps aux --sort '%mem'
from procps' ps (default on Ubuntu 12.04) generates output like:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
...
tomcat7 3658 0.1 3.3 1782792 124692 ? Sl 10:12 0:25 /usr/lib/jvm/java-7-oracle/bin/java -Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties -D
root 1284 1.5 3.7 452692 142796 tty7 Ssl+ 10:11 3:19 /usr/bin/X -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
ciro 2286 0.3 3.8 1316000 143312 ? Sl 10:11 0:49 compiz
ciro 5150 0.0 4.4 660620 168488 pts/0 Sl+ 11:01 0:08 unicorn_rails worker[1] -p 3000 -E development -c config/unicorn.rb
ciro 5147 0.0 4.5 660556 170920 pts/0 Sl+ 11:01 0:08 unicorn_rails worker[0] -p 3000 -E development -c config/unicorn.rb
ciro 5142 0.1 6.3 2581944 239408 pts/0 Sl+ 11:01 0:17 sidekiq 2.17.8 gitlab [0 of 25 busy]
ciro 2386 3.6 16.0 1752740 605372 ? Sl 10:11 7:38 /usr/lib/firefox/firefox
So here Firefox is the top consumer with 16% of my memory.
You may also be interested in:
ps aux --sort '%cpu'

Building on gaoithe's answer, I attempted to make the memory units display in megabytes, and sorted by memory descending limited to 15 entries:
ps -e -orss=,args= |awk '{print $1 " " $2 }'| awk '{tot[$2]+=$1;count[$2]++} END {for (i in tot) {print tot[i],i,count[i]}}' | sort -n | tail -n 15 | sort -nr | awk '{ hr=$1/1024; printf("%13.2fM", hr); print "\t" $2 }'
588.03M /usr/sbin/apache2
275.64M /usr/sbin/mysqld
138.23M vim
97.04M -bash
40.96M ssh
34.28M tmux
17.48M /opt/digitalocean/bin/do-agent
13.42M /lib/systemd/systemd-journald
10.68M /lib/systemd/systemd
10.62M /usr/bin/redis-server
8.75M awk
7.89M sshd:
4.63M /usr/sbin/sshd
4.56M /lib/systemd/systemd-logind
4.01M /usr/sbin/rsyslogd
Here's an example alias to use it in a bash config file:
alias topmem="ps -e -orss=,args= |awk '{print \$1 \" \" \$2 }'| awk '{tot[\$2]+=\$1;count[\$2]++} END {for (i in tot) {print tot[i],i,count[i]}}' | sort -n | tail -n 15 | sort -nr | awk '{ hr=\$1/1024; printf(\"%13.2fM\", hr); print \"\t\" \$2 }'"
Then you can just type topmem on the command line.

How to total up used memory by process name:
Sometimes even looking at the biggest single processes there is still a lot of used memory unaccounted for. To check if there are a lot of the same smaller processes using the memory you can use a command like the following which uses awk to sum up the total memory used by processes of the same name:
ps -e -orss=,args= |awk '{print $1 " " $2 }'| awk '{tot[$2]+=$1;count[$2]++} END {for (i in tot) {print tot[i],i,count[i]}}' | sort -n
e.g. output
9344 docker 1
9948 nginx: 4
22500 /usr/sbin/NetworkManager 1
24704 sleep 69
26436 /usr/sbin/sshd 15
34828 -bash 19
39268 sshd: 10
58384 /bin/su 28
59876 /bin/ksh 29
73408 /usr/bin/python 2
78176 /usr/bin/dockerd 1
134396 /bin/sh 84
5407132 bin/naughty_small_proc 1432
28061916 /usr/local/jdk/bin/java 7

you can specify which column to sort by, with following steps:
steps:
* top
* shift + F
* select a column from the list
e.g. n means sort by memory,
* press enter
* ok

You can see memory usage by executing this code in your terminal:
$ watch -n2 free -m
$ htop

This very second in time
ps -U $(whoami) -eom pid,pmem,pcpu,comm | head -n4
Continuously updating
watch -n 1 'ps -U $(whoami) -eom pid,pmem,pcpu,comm | head -n4'
I also added a few goodies here you might appreciate (or you might ignore)
-n 1 watch and update every second
-U $(whoami) To show only your processes. $(some command) evaluates now
| head -n4 To only show the header and 3 processes at a time bc often you just need high usage line items
${1-4} says my first argument $1 I want to default to 4, unless I provide it
If you are using a mac you may need to install watch first
brew install watch
Alternatively you might use a function
psm(){
watch -n 1 "ps -eom pid,pmem,pcpu,comm | head -n ${1-4}"
# EXAMPLES:
# psm
# psm 10
}

You have this simple command:
$ free -h

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string