SLURM sbatch output buffering - slurm

I created some slurm scripts and then tried to execute them with sbatch. But the output file is updated not frequently (once a minute maybe).
Is there a way to change the output buffering latency in sbatch? I know stdbuf is used in such situations but I could not make it work with sbatch.

The issue is certainly with buffering. If you are trying to run python code, add flush=True in print command like print(...,flush=True).

Related

Bash - Evaluate CPU and Memory performance of a single command that run instantly

I'm writing a Bash script to evaluate time/CPU/memory performances of commands given as input to the script.
I implemented the evaluation of time by using date command, but I have issues to evaluate CPU and memory performance related to the single command. I know that I can use top command but it shows me only runtime processes.
My issue is that if I run the script by giving as input the command, I don't know previously the assigned PID to this command, and if I want to evaluate an instant command as whoami, I cannot find it when I use top command, even if I use pipe on them.
I think for commands that needs more time, I would like to calculate an average, but for commands like whoami, ls or similar instant commands, I don't have idea how I can get the CPU and memory performance for that specific instant of time.
Thank you in advance!

SGE: run a interacitve session (qrsh) within a batch submission (qsub)

I am trying to run a simple code within SGE. However, I get very different results from same code when running from an interactive session (qrsh) or via qsub. Mostly many codes fail to run from qsub (without any warning or error).
Is there anyway to set up an interactive session within a batch submission (running qrsh within qsub)?
qsub test.sh
-V
-cwd
source MYPATH
qrsh
MYCODEHERE
`
Not sure if what you ask is possible. I can think of two ways why you are observing different results.
1) Environment differences: Between cluster nodes
2) Incomplete outputs: Maybe the code runs into an edge cases (not enough memory etc.) and exits silently.
Not exactly what you asked for but just trying to help.
You could submit a parallel job and then use qrsh -inherit <hostname> <command> to run a command under qrsh. Unfortunately grid engine limits the number of times you can call qrsh -inherit to either the number of slots in the job or one less (dependent on the job_is_first_task setting of the PE.
However it is likely that the problems are caused by a different environment between the qrsh environment and that provided by qsub by default. If you are selecting the shell to interpret your job script in the traditional unix way (Putting #!/bin/bash or similar as the first line of your job script you could try adding a -l to that line to make it a login shell #!/bin/bash -l which is likely more similar to what you get with qrsh.

qsub: What is the standard when to get occasional updates on a submitted job?

I have just begun using an HPC, and I'm having trouble adjusting my workflow.
I submit a job using qsub myjob.sh. Then I can view the status of the job by typing qstat -u myusername. This gives me some details about my job, such as how long it has been running for.
My job is a python script that occasionally prints out an update to indicate how things are going in the program. I know that this will instead be found in outputfile once the job is done, but how can I go about monitoring this program as it runs? One way it to print the output to a file instead of printing to screen, but this seems like a bit of a hack.
Any other tips on imporving this process would be great.

Automatic Background Perl Execution on Ubuntu

I've been troubleshooting this issue for about a week and I am nowhere, so I wanted to reach out for some help.
I have a perl script that I execute via command like, usually in a manner of
nohup ./script.pl --param arg --param2 arg2 &
I usually have about ten of these running at once to process the same type of data from different sources (that is specified through parameters). The script works fine and I can see logs for everything in nohup.out and monitor status via ps output. This script also uses a sql database to track status of various tasks, so I can track finishes of certain sources.
However, that was too much work, so I wrote a wrapper script to execute the script automatically and that is where I am running into problems. I want something exactly the same as I have, but automatic.
The getwork.pl script runs ps and parses output to find out how many other processes are running, if it is below the configured thresh it will query the database for the most out of date source and kick off the script.
The problem is that the kicked off jobs aren't running properly, sometimes they terminate without any error messages and sometimes they just hang and sit idle until I kill them.
The getwork script queries sql and gets the entire execution command via sql concatanation, so in the sql query I am doing something like CONCAT('nohup ./script.pl --arg ',param1,' --arg2 ',param2,' &') to get the command string.
I've tried everything to get these kicked off, I've tried using system (), but again, some jobs kick off, some don't, sometimes it gets stuck, sometimes jobs start and then die within a minute. If I take the exact command I used to start the job and run it in bash, it works fine.
I've tried to also open a pipe to the command like
open my $ca, "| $command" or die ($!);
print $ca $command;
close $ca;
That works just about as well as everything else I've tried. The getwork script used to be executed through cron every 30 minutes, but I scrapped that because I needed another shell wrapper script, so now there is an infinite look in the get work script that executes a function every 30 minutes.
I've also tried many variations of the execution command, including redirecting output to different files, etc... nothing seems to be consistent. Any help would be much appreciated, because I am truly stuck here....
EDIT:
Also, I've tried to add separate logging within each script, it would start a new log file with it's PID ($$). There was a bunch of weirdness there too, all log files would get created, but then some of the processes would be running and writing to the file, others would just have an empty text file and some would just have one or two log entries. Sometimes the process would still be running and just not doing anything, other times it would die with nothing in the log. Me, running the command in shell directly always works though.
Thanks in advance
You need a kind of job managing framework.
One of the bigest one is Gearman: http://www.slideshare.net/andy.sh/gearman-and-perl

Progress of a command running in unix

I have a very long command running on a very large file. It involves sort, uniq, grep and awk commands in the single command that pipes the results of one command to another.
Once I issue this command for execution, the command prompt doesn't return back until the command has completely executed.
Is there a way to know what is the progress of the command in terms of how much of its execution it has completed or anything similar that gives us an idea of how much of a particular command inside this main command has completed?
Without knowing exactly what you're doing I can't say whether or not it would work for you, but have a look at pv. It might fit the bill.
Perl was originally created because AWK wasn't quite powerful enough for the task at hand. With commands like sort and grep, and a syntax very similar to AWK's, it should not be hard to translate a command line using those programs into a short Perl script.
The advantage of Perl is that you can easily communicate the progress of your script via print statements. For example, you could indicate when the input file was done being loaded, when the sort was completed, etc.

Resources