how to set tmp disk variable on nextflow - slurm

I want to add the tmp disk value to my nextflow process.
CPU and memory requirements are setup, but how can I add the tmp disk value?
The data is important for the scheduler (slurm) to select a suitable node.
The nextflow process header cloud be:
process TEST {
echo true
cpus '8'
memory '40 GB'
script:
"""
"""
}
This value is called in slurm tmp disk, viewable with squeue -o "%C %m %d", column MIN_TMP_DISK.
If any information is missing, please let me know.
Thanks

You can use the clusterOptions process directive with the SLURM executor. From the sbatch docs, it looks like you are looking for the --tmp option:
--tmp=[units]
Specify a minimum amount of temporary disk space per node. Default units are megabytes. Different units can be
specified using the suffix [K|M|G|T].
For example:
process TEST {
debug true
clusterOptions '--tmp=1T'
cpus 8
memory 40.GB
"""
echo "Hello world"
"""
}

Related

Docker container memory usage seems incorrect

I have a container mounted using docker-compose version 2 which has a memory limit on it of 32mb.
Whenever I run the container I can monitor the used resources like so:
docker stats 02bbab9ae853
It shows the following:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
02bbab9ae853 client-web_postgres-client-web_1_e4513764c3e7 0.07% 8.078MiB / 32MiB 25.24% 5.59MB / 4.4MB 135GB / 23.7MB 0
What looks really weird to me is the memory part:
8.078MiB / 32MiB 25.24%
If outside the container I list of Postgres PIDs I get:
$ pgrep postgres
23051, 24744, 24745, 24746, 24747, 24748, 24749, 24753, 24761
If I stop the container and re-run the above command I get no PID.
That is a clear proof that all PID where created by the stopped container.
Now, if I re-run the container and get every PID and I calculate its RSS memory usage and I sum it together with a python method, I don't get ~8Mb docker is telling me but a much higher value not even close to it (like ~100Mb or so).
This is the python method I'm using to calculate the RSS memory:
def get_process_memory(name):
total = 0.0
try:
for pid in map(int, check_output(["pgrep",name]).split()):
total += psutil.Process(pid).memory_info().rss
except Exception as e:
pass
return total
Does anybody know why the memory declared by docker is so different?
This is of course a problem for me because the memory limit applied doesn't look respected.
I'm using a Raspberry PI.
That's because Docker is reporting only RSS from cgroups memory.stats, but you actually need to sum up cache, rss and swap (https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt). More info about that in https://sysrq.tech/posts/docker-misleading-containers-memory-usage/

Like hdparm, how to calculate disk speed from fio command

Using Hdparm, I will get disk speed value directly using following command:
hdparm -t test_filesystem | awk 'NF'
Likewise, please let me know how to calculate disk speed of any device from fio command output.
I am using below fio command,
fio --name=job1 --rw=read --size=1g --output-format=json --directory=test_directory
Warning: ensure your disk does not have partitions or filesystems or data that you want to keep on it!
To be closer to hdparm you will:
Want to use http://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-filename and use the name of the disk (e.g. filename=/dev/sdj ) rather than directory=.
Want to use an I/O engine (http://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-ioengine ) that can submit asynchronously (e.g. ioengine=libaio)
Want to use options such as http://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-direct that submit I/O in a way that bypasses your OS' cache (e.g. direct=1)
In bash you can use a tool like jq to extract the "bw" key from the JSON output and note the parameter is nested within jobs -> [<direction e.g. "read">]

How to implement cpu-time timeout for script/program in linux?

It's crucial to measure not user time for some program/script, but the cpu time and kill it when this time limit will be breached.
What's the best way to do it?
One of the most obvious solutions is to check with some time step process tree to see if the requested program/script hasn't breached it's limits. It's implemented in a perl script (pshved/timeout). I'm looking for other aproaches
You can use ulimit(1) or setrlimit(2) to limit the cpu time. The process will be automatically killed if it uses more cpu time. It is also possible to specify a soft limit that can be ignored.
Simple example:
#! /bin/bash
(
ulimit -t 5
python -c '
a, b = 0, 1
while True:
a += b
b += a
'
echo $?
)
echo "..."

cpu usage per process?

How can I grab the percentage of cpu usage on a per process basis? So, for example, I'd like to run my program prog and get the cpu usage it incurred in, for example:
prog name cpu0 cpu1 cpu2 cpu3 total
prog 15 20 45 47 127%
Is there any tool for this?
Thanks.
I think that you can make use of the information in /proc/[pid]/stat and /proc/stat to estimate this.
Check out the great answers to How to calculate the CPU usage of a process by PID in Linux from C? which explain how to calculate CPU usage % for a single processor.
The 6th from last number you get from /proc/[pid]/stat is "processor %d, CPU number last executed on" (on Ubuntu 12.04 at least).
To extend to multiple processors, you could sample the CPU usage over a period and (very roughly!) estimate the proportion of time on each processor. Then use these proportions to split the CPU usage between the processors. Based on the info in /proc/stat you can also sample the total time for each processor and then you have all the variables you need!
See http://linux.die.net/man/5/proc for more info about proc.
For firefox:
while [ 1 ]; do ps --no-heading -C firefox -L -o command,psr,pcpu|sort -k 2 -n; echo; sleep 1; done
You'd have to sum the third column (which I see no ridiculously easy way to do) because it's actually showing you every thread. First column is name, second processor, third, %cpu.
linux process explorer project provides this functionality, you can see a graph for the CPU/Memory/IO for each process in the properties dialog.
Here is a simple python i've made:
import re,time,sys
cpuNum=0
if len(sys.argv)==1:
print "use pidcpu <pid1,pid2,..,pidn>"
sys.exit(0)
pids=sys.argv.pop()
def getCpuTot():
global cpuNum
f=open("/proc/stat","r")
ln=f.read()
f.close()
#cpu 858286704 148088 54216880 117129864 2806189 5046 16997674 0 0 0
r=re.findall("cpu[\d\s]{1}\s+(\d+)\s(\d+)\s(\d+)\s(\d+)\s.*?",ln,re.DOTALL)
cpuNum=len(r)-1
return int(r[0][0])+int(r[0][1])+int(r[0][2])+int(r[0][3])
def getPidCPU(pid):
f=open("/proc/"+ str(pid) +"/stat","r")
ln=f.readline()
f.close()
a=ln.split(" ")
return int(a[13])+int(a[14])
cpu1=getCpuTot()
cpupid1=[]
for pid in pids.split(","):
cpupid1.append(getPidCPU(pid))
time.sleep(1)
cpu2=getCpuTot()
cpupid2=[]
for pid in pids.split(","):
cpupid2.append(getPidCPU(pid))
i=0
for pid in pids.split(","):
perc=int(cpuNum*(cpupid2[i]-cpupid1[i])*100/float(cpu2-cpu1))
i+=1
print pid,perc

Get CPU usage in shell script?

I'm running some JMeter tests against a Java process to determine how responsive a web application is under load (500+ users). JMeter will give the response time for each web request, and I've written a script to ping the Tomcat Manager every X seconds which will get me the current size of the JVM heap.
I'd like to collect stats on the server of the % of CPU being used by Tomcat. I tried to do it in a shell script using ps like this:
PS_RESULTS=`ps -o pcpu,pmem,nlwp -p $PID`
...running the command every X seconds and appending the results to a text file. (for anyone wondering, pmem = % mem usage and nlwp is number of threads)
However I've found that this gives a different definition of "% of CPU Utilization" than I'd like - according to the manpages for ps, pcpu is defined as:
cpu utilization of the process in "##.#" format. It is the CPU time used divided by the time the process has been running (cputime/realtime ratio), expressed as a percentage.
In other words, pcpu gives me the % CPU utilization for the process for the lifetime of the process.
Since I want to take a sample every X seconds, I'd like to be collecting the CPU utilization of the process at the current time only - similar to what top would give me
(CPU utilization of the process since the last update).
How can I collect this from within a shell script?
Use top -b (and other switches if you want different outputs). It will just dump to stdout instead of jumping into a curses window.
The most useful tool I've found for monitoring a server while performing a test such as JMeter on it is dstat. It not only gives you a range of stats from the server, it outputs to csv for easy import into a spreadsheet and lets you extend the tool with modules written in Python.
User load: top -b -n 2 |grep Cpu |tail -n 1 |awk '{print $2}' |sed 's/.[^.]*$//'
System load: top -b -n 2 |grep Cpu |tail -n 1 |awk '{print $3}' |sed 's/.[^.]*$//'
Idle load: top -b -n 1 |grep Cpu |tail -n 1 |awk '{print $5}' |sed 's/.[^.]*$//'
Every outcome is a round decimal.
Off the top of my head, I'd use the /proc filesystem view of the system state - Look at man 5 proc to see the format of the entry for /proc/PID/stat, which contains total CPU usage information, and use /proc/stat to get global system information. To obtain "current time" usage, you probably really mean "CPU used in the last N seconds"; take two samples a short distance apart to see the current rate of CPU consumption. You can then munge these values into something useful. Really though, this is probably more a Perl/Ruby/Python job than a pure shell script.
You might be able to get the rough data you're after with /proc/PID/status, which gives a Sleep average for the process. Pretty coarse data though.
also use 1 as iteration count, so you will get current snapshot without waiting to get another one in $delay time.
top -b -n 1
This will not give you a per-process metric, but the Stress Terminal UI is super useful to know how badly you're punishing your boxes. Add -c flag to make it dump the data to a CSV file.

Resources