slurm: Is there any way to return unused core number? - slurm

As we know squeue returns the status of the running jobs.
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
130 debug run.sh user PD 0:00 1 (Resources)
131 debug run.sh user PD 0:00 1 (Resources)
128 debug 52546914 user R 7:28 1 node1
129 debug run.sh user R 0:02 1 node1
For example my core number is 2.
[Q] Is there any way to return only the unused core number? In the example, unused core number should return 0.
Should I write a parser for this in order to retrieve core number next to each R, add them, and subtract it from total core number as follows:
squeue | grep -P ' R ' | awk '{print $7}' | paste -sd+ - | bc

To know the number of core (CPUs) that are available in your cluster, you can use the sinfo command:
$ sinfo -o%C
CPUS(A/I/O/T)
0/1920/0/1920
You can retrieve the numbers into Bash variables easily with
IFS=/ read A I O T <<<$(sinfo -h -o%C)
After running the above command, A will contain the number of allocated cores, I will be the number of idle cores, O will hold the number of 'other' cores, i.e. drained, down, etc. and T will be the total number of cores in the system.
Note that in your question you talk about cores but actually compute the number of nodes. If what you want is the number of nodes, you can use:
$ sinfo -o%A
NODES(A/I)
0/80
See the sinfo man page for more details.

Related

Script to get user that has process with most memory usage?

How can I write a script that gives an output of the user that has the process with the most memory usage in the system. The script is sh. I tried to use top command as the starting point but it seems it does not work with pipes because it continues running until it is quit.
If you just want the user name of the process using the most memory, try something like:
$ ps axho user --sort -rss | head -1
This checks the resident memory size rss of the processes. If you'd rather check the whole virtual size, use vsz instead of rss. If you want percentage of resident memory used, use pmem (but this could change from moment to moment due to the scheduler, and may not pull out the biggest memory hog). If you'd rather have the user ID instead of user name, use uid instead of user.
The ps options are:
ax for "all processes" (everybody)
h for "no header" in output
o to specify the output format: user (user name)
--sort -rss sort by rss (descending order)
The head -1 strips out all but the first line (which has the largest rss since it's in descending order).
It might be useful to get not just the user name, but some more information about the process, like:
$ ps axho user,pid,rss --sort -rss | head -1
This gives the user name, process ID, and resident memory usage of the top process, all on one line. You could pull out the values individually in whatever script you use it in.
this works in centos: list most memory cost process
[root#182 ~] # ps aux | sort -k 4 -r | head -n2
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 7048 0.2 9.6 8060236 1573612 ? Ssl Dec14 8:23 java -Djava.security.e
sort -k 4 : sort by the forth column, my pc column4 = %MEM
in other linux/unix, you may find the right column number for memory
if you're allowed to use only top, this can be a solution:
top -o VIRT -n 1 | head -8 | tail -1 | cut -d ' ' -f 5
top -o VIRT allows you to override sorting by default column and sort it by column VIRT
top -n 1 allows you to limit iterations that top will make before exiting. We need only one iteration - it's like when you take a photo while recording a video - you saved info at a particular moment
| head -8 allows you to output only first 8 lines of previous output/ Why 8? It's because top is displaying a lot of info before displaying the table with processes:
we don't need all these lines
Tasks: 247 total, 1 running, 245 sleeping, 0 stopped, 1 zombie
%Cpu(s): 5,9 us, 1,5 sy, 0,0 ni, 92,6 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
MiB Mem : 15698,4 total, 7256,4 free, 3087,8 used, 5354,2 buff/cache
MiB Swap: 18222,0 total, 18222,0 free, 0,0 used. 11441,1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6249 mneznaev 20 0 46,5g 287484 110232 S 0,0 1,8 5:40.11 code
| tail -1 allows you to get the last line of previous 8 lines that we got by previous step with head and now we have:
6249 mneznaev 20 0 46,5g 290332 110232 S 0,0 1,8 5:40.27 code
cut -d ' ' -f 5 allows you to separate file by columns, where -d ' ' is an option that defines space as a separator and -f 5 allows you to get the 5th column/ Why 5th? That's because there are some spaces before the actual value of PID (before 6249 in my case) and first 3 columns after cut by space are empty. And the 5th column is a username:
mneznaev#mneznaev-desktop:~$ top -o VIRT -n 1 | head -8 | tail -1 | cut -d ' ' -f 5
mneznaev
That's it. Hope it was helpful

lisiting all PIDs

ps -ax
PID
1
2
.
.
.
1549
1564
1569
.
.
1716
1730
1759
Is there any way that I can generate the process PIDS in decending order ie userlevel process ,system level process and then strace them
PID
1759
1730
1716
.
.
2
1
You can use the k modifier in order to specify the sorting order. Saying:
ps axk-pid
would sort the output by decreasing PIDs.
From man ps:
k spec Specify sorting order. Sorting syntax is
[+|-]key[,[+|-]key[,...]]. Choose a multi-letter key
from the STANDARD FORMAT SPECIFIERS section. The "+"
is optional since default direction is increasing
numerical or lexicographic order. Identical to --sort.
Examples:
ps jaxkuid,-ppid,+pid
ps axk comm o comm,args
ps kstart_time -ef
Moreover, you can use the -o option in order to control which column is displayed. Saying:
ps axk-pid -o pid
would display only the process ID.

How to measure CPU usage

I would like to log CPU usage at a frequency of 1 second.
One possible way to do it is via vmstat 1 command.
The problem is that the time between each output is not always exactly one second, especially on a busy server. I would like to be able to output the timestamp along with the CPU usage every second. What would be a simple way to accomplish this, without installing special tools?
There are many ways to do that. Except top another way is to you the "sar" utility. So something like
sar -u 1 10
will give you the cpu utilization for 10 times every 1 second. At the end it will print averages for each one of the sys, user, iowait, idle
Another utility is the "mpstat", that gives you similar things with sar
Use the well-known UNIX tool top that is normally available on Linux systems:
top -b -d 1 > /tmp/top.log
The first line of each output block from top contains a timestamp.
I see no command line option to limit the number of rows that top displays.
Section 5a. SYSTEM Configuration File and 5b. PERSONAL Configuration File of the top man page describes pressing W when running top in interactive mode to create a $HOME/.toprc configuration file.
I did this, then edited my .toprc file and changed all maxtasks values so that they are maxtasks=4. Then top only displays 4 rows of output.
For completeness, the alternative way to do this using pipes is:
top -b -d 1 | awk '/load average/ {n=10} {if (n-- > 0) {print}}' > /tmp/top.log
You might want to try htop and atop. htop is beautifully interactive while atop gathers information and can report CPU usage even for terminated processes.
I found a neat way to get the timestamp information to be displayed along with the output of vmstat.
Sample command:
vmstat -n 1 3 | while read line; do echo "$(date --iso-8601=seconds) $line"; done
Output:
2013-09-13T14:01:31-0700 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
2013-09-13T14:01:31-0700 r b swpd free buff cache si so bi bo in cs us sy id wa
2013-09-13T14:01:31-0700 1 1 4197640 29952 124584 12477708 12 5 449 147 2 0 7 4 82 7
2013-09-13T14:01:32-0700 3 0 4197780 28232 124504 12480324 392 180 15984 180 1792 1301 31 15 38 16
2013-09-13T14:01:33-0700 0 1 4197656 30464 124504 12477492 344 0 2008 0 1892 1929 32 14 43 10
To monitor the disk usage, cpu and load i created a small bash scripts that writes the values to a log file every 10 seconds.
This logfile is processed by logstash kibana and riemann.
# #!/usr/bin/env bash
# Define a timestamp function
LOGPATH="/var/log/systemstatus.log"
timestamp() {
date +"%Y-%m-%dT%T.%N"
}
#server load
while ( sleep 10 ) ; do
echo -n "$(timestamp) linux::systemstatus::load " >> $LOGPATH
cat /proc/loadavg >> $LOGPATH
#cpu usage
echo -n "$(timestamp) linux::systemstatus::cpu " >> $LOGPATH
top -bn 1 | sed -n 3p >> $LOGPAT
#disk usage
echo -n "$(timestamp) linux::systemstatus::storage " >> $LOGPATH
df --total|grep total|sed "s/total//g"| sed 's/^ *//' >> $LOGPATH
done

Getting CPU utilization information

How could I get the CPU utilization with time info of a process in linux? Basically I want to let my application run overnight. At the same time, I would like to monitor the CPU utilization during the period the application is run.
I tried top | grep appName >& log, it does not seem to return me anything in the log. Could someone help me with this?
Thanks.
vmstat and iostat can both give you periodic information of this nature; I would suggest either setting up the number of times manually, or putting a single poll into a cron job, and then redirecting the output to a file:
vmstat 20 4230 >> cpu_log_file
This would give you a snapshot of usage every 20 seconds for 24 hours.
install sysstat package and run sar
nohup sar -o output.file 12 8 >/dev/null 2>&1 &
use the top or watch command
PID COMMAND %CPU TIME #TH #WQ #PORT #MREG RPRVT RSHRD RSIZE VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW PAGEINS USER
10764 top 8.4 00:01.04 1/1 0 24 33 2000K 244K 2576K 17M 2378M 10764 10719 running 0 9908+ 54 564790+ 282365+ 3381+ 283412+ 838+ 27 root
10763 taskgated 0.0 00:00.00 2 0 25 27 432K 244K 1004K 27M 2387M 10763 1 sleeping 0 376 60 140 60 160 109 11 0 root
Write a program that invokes your process and then calls getrusage(2) and reports statistics for its children.
You can monitor the time used by your program with top while it is running.
Alternatively, you can launch your application with the time command, which will print the total amount of CPU time used by your program at the end of its execution. Just type time ./my_app instead of just ./my_app
For more info, man 1 time

How to see top processes sorted by actual memory usage?

I have a server with 12G of memory. A fragment of top is shown below:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12979 frank 20 0 206m 21m 12m S 11 0.2 26667:24 krfb
13 root 15 -5 0 0 0 S 1 0.0 36:25.04 ksoftirqd/3
59 root 15 -5 0 0 0 S 0 0.0 4:53.00 ata/2
2155 root 20 0 662m 37m 8364 S 0 0.3 338:10.25 Xorg
4560 frank 20 0 8672 1300 852 R 0 0.0 0:00.03 top
12981 frank 20 0 987m 27m 15m S 0 0.2 45:10.82 amarok
24908 frank 20 0 16648 708 548 S 0 0.0 2:08.84 wrapper
1 root 20 0 8072 608 572 S 0 0.0 0:47.36 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
The free -m shows the following:
total used free shared buffers cached
Mem: 12038 11676 362 0 599 9745
-/+ buffers/cache: 1331 10706
Swap: 2204 257 1946
If I understand correctly, the system has only 362 MB of available memory. My question is: How can I find out which process is consuming most of the memory?
Just as background info, the system is running 64bit OpenSuse 12.
use quick tip using top command in linux/unix
$ top
and then hit Shift+m (i.e. write a capital M).
From man top
SORTING of task window
For compatibility, this top supports most of the former top sort keys.
Since this is primarily a service to former top users, these commands do
not appear on any help screen.
command sorted-field supported
A start time (non-display) No
M %MEM Yes
N PID Yes
P %CPU Yes
T TIME+ Yes
Or alternatively: hit Shift + f , then choose the display to order by memory usage by hitting key n then press Enter. You will see active process ordered by memory usage
First, repeat this mantra for a little while: "unused memory is wasted memory". The Linux kernel keeps around huge amounts of file metadata and files that were requested, until something that looks more important pushes that data out. It's why you can run:
find /home -type f -name '*.mp3'
find /home -type f -name '*.aac'
and have the second find instance run at ridiculous speed.
Linux only leaves a little bit of memory 'free' to handle spikes in memory usage without too much effort.
Second, you want to find the processes that are eating all your memory; in top use the M command to sort by memory use. Feel free to ignore the VIRT column, that just tells you how much virtual memory has been allocated, not how much memory the process is using. RES reports how much memory is resident, or currently in ram (as opposed to swapped to disk or never actually allocated in the first place, despite being requested).
But, since RES will count e.g. /lib/libc.so.6 memory once for nearly every process, it isn't exactly an awesome measure of how much memory a process is using. The SHR column reports how much memory is shared with other processes, but there is no guarantee that another process is actually sharing -- it could be sharable, just no one else wants to share.
The smem tool is designed to help users better gage just how much memory should really be blamed on each individual process. It does some clever work to figure out what is really unique, what is shared, and proportionally tallies the shared memory to the processes sharing it. smem may help you understand where your memory is going better than top will, but top is an excellent first tool.
ps aux | awk '{print $2, $4, $11}' | sort -k2rn | head -n 10
(Adding -n numeric flag to sort command.)
First you should read an explanation on the output of free. Bottom line: you have at least 10.7 GB of memory readily usable by processes.
Then you should define what "memory usage" is for a process (it's not easy or unambiguous, trust me).
Then we might be able to help more :-)
List and Sort Processes by Memory Usage:
ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS
ps aux --sort '%mem'
from procps' ps (default on Ubuntu 12.04) generates output like:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
...
tomcat7 3658 0.1 3.3 1782792 124692 ? Sl 10:12 0:25 /usr/lib/jvm/java-7-oracle/bin/java -Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties -D
root 1284 1.5 3.7 452692 142796 tty7 Ssl+ 10:11 3:19 /usr/bin/X -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
ciro 2286 0.3 3.8 1316000 143312 ? Sl 10:11 0:49 compiz
ciro 5150 0.0 4.4 660620 168488 pts/0 Sl+ 11:01 0:08 unicorn_rails worker[1] -p 3000 -E development -c config/unicorn.rb
ciro 5147 0.0 4.5 660556 170920 pts/0 Sl+ 11:01 0:08 unicorn_rails worker[0] -p 3000 -E development -c config/unicorn.rb
ciro 5142 0.1 6.3 2581944 239408 pts/0 Sl+ 11:01 0:17 sidekiq 2.17.8 gitlab [0 of 25 busy]
ciro 2386 3.6 16.0 1752740 605372 ? Sl 10:11 7:38 /usr/lib/firefox/firefox
So here Firefox is the top consumer with 16% of my memory.
You may also be interested in:
ps aux --sort '%cpu'
Building on gaoithe's answer, I attempted to make the memory units display in megabytes, and sorted by memory descending limited to 15 entries:
ps -e -orss=,args= |awk '{print $1 " " $2 }'| awk '{tot[$2]+=$1;count[$2]++} END {for (i in tot) {print tot[i],i,count[i]}}' | sort -n | tail -n 15 | sort -nr | awk '{ hr=$1/1024; printf("%13.2fM", hr); print "\t" $2 }'
588.03M /usr/sbin/apache2
275.64M /usr/sbin/mysqld
138.23M vim
97.04M -bash
40.96M ssh
34.28M tmux
17.48M /opt/digitalocean/bin/do-agent
13.42M /lib/systemd/systemd-journald
10.68M /lib/systemd/systemd
10.62M /usr/bin/redis-server
8.75M awk
7.89M sshd:
4.63M /usr/sbin/sshd
4.56M /lib/systemd/systemd-logind
4.01M /usr/sbin/rsyslogd
Here's an example alias to use it in a bash config file:
alias topmem="ps -e -orss=,args= |awk '{print \$1 \" \" \$2 }'| awk '{tot[\$2]+=\$1;count[\$2]++} END {for (i in tot) {print tot[i],i,count[i]}}' | sort -n | tail -n 15 | sort -nr | awk '{ hr=\$1/1024; printf(\"%13.2fM\", hr); print \"\t\" \$2 }'"
Then you can just type topmem on the command line.
How to total up used memory by process name:
Sometimes even looking at the biggest single processes there is still a lot of used memory unaccounted for. To check if there are a lot of the same smaller processes using the memory you can use a command like the following which uses awk to sum up the total memory used by processes of the same name:
ps -e -orss=,args= |awk '{print $1 " " $2 }'| awk '{tot[$2]+=$1;count[$2]++} END {for (i in tot) {print tot[i],i,count[i]}}' | sort -n
e.g. output
9344 docker 1
9948 nginx: 4
22500 /usr/sbin/NetworkManager 1
24704 sleep 69
26436 /usr/sbin/sshd 15
34828 -bash 19
39268 sshd: 10
58384 /bin/su 28
59876 /bin/ksh 29
73408 /usr/bin/python 2
78176 /usr/bin/dockerd 1
134396 /bin/sh 84
5407132 bin/naughty_small_proc 1432
28061916 /usr/local/jdk/bin/java 7
you can specify which column to sort by, with following steps:
steps:
* top
* shift + F
* select a column from the list
e.g. n means sort by memory,
* press enter
* ok
You can see memory usage by executing this code in your terminal:
$ watch -n2 free -m
$ htop
This very second in time
ps -U $(whoami) -eom pid,pmem,pcpu,comm | head -n4
Continuously updating
watch -n 1 'ps -U $(whoami) -eom pid,pmem,pcpu,comm | head -n4'
I also added a few goodies here you might appreciate (or you might ignore)
-n 1 watch and update every second
-U $(whoami) To show only your processes. $(some command) evaluates now
| head -n4 To only show the header and 3 processes at a time bc often you just need high usage line items
${1-4} says my first argument $1 I want to default to 4, unless I provide it
If you are using a mac you may need to install watch first
brew install watch
Alternatively you might use a function
psm(){
watch -n 1 "ps -eom pid,pmem,pcpu,comm | head -n ${1-4}"
# EXAMPLES:
# psm
# psm 10
}
You have this simple command:
$ free -h

Resources