Can multiple qsub submissions read the same group of files? - qsub

I would like to use a bash script and qsub to run 30-40 python programs at once.
Each python program reads and searches through the same set of files (~400 total) for a set of sequences.
Is there a problem where multiple python programs could be trying to read from the same file? If so, what are the consequences?

There are no problems with multiple jobs reading from the same files that are imposed by Torque. (I think that statement is likely to be true for any resource manager / scheduler.)
The main issue I can imagine would be your file system's performance and whether or not it can keep up with potentially concurrent accesses to those files.

Related

HPC SLURM and batch calls to MPI-enabled application in Master-Worker system

I am trying to implement some sort of Master-Worker system in a HPC with the resource manager SLURM, and I am looking for advices on how to implement such a system.
I have to use some python code that plays the role of the Master, in the sense that between batches of calculations the Master will run 2 seconds of its own calculations, before sending a new batch of work to the Workers. Each Worker must run an external executable over a single node of the HPC. The external executable (Gromacs) is itself MPI-enabled. There will be ~25 Workers and many batches of calculations.
What I have in mind atm (also see EDIT further below):
What I'm currently trying:
Allocate via SLURM as many MPI tasks as I want to use nodes, within a bash script that I'm calling via sbatch run.sh
#!/bin/bash -l
#SBATCH --nodes=4
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=12
module load required_env_module_for_external_executable
srun python my_python_code.py
Catch within python my_python_code.py the current MPI rank, and use rank/node 0 to run the Master python code
from mpi4py import MPI
name = MPI.Get_processor_name()
rank = MPI.COMM_WORLD.Get_rank()
size = MPI.COMM_WORLD.Get_size()
if rank == 0: # Master
run_initialization_and_distribute_work_to_Workers()
else: # Workers
start_Worker_waiting_for_work()
Within the python code of the Workers, start the external (MPI-enabled) application using MPI.COMM_SELF.Spawn()
def start_Worker_waiting_for_work():
# here we are on a single node
executable = 'gmx_mpi'
exec_args = 'mdrun -deffnm calculation_n'
# create some relationship between current MPI rank
# and the one the executable should use ?
mpi_info = MPI.Info.Create()
mpi_info.Set('host', MPI.Get_processor_name())
commspawn = MPI.COMM_SELF.Spawn(executable, args=exec_args,
maxprocs=1, info=mpi_info)
commspawn.Barrier()
commspawn.Disconnect()
res_analysis = do_some_analysis() # check what the executable produced
return res_analysis
What I would like some explanations on:
Can someone confirm that this approach seems valid for implementing the desired system ? Or is it obvious this has no chance to work ? If so, please, why ?
I am not sure that MPI.COMM_SELF.Spawn() will make the executable inherit from the SLURM resource allocation. If not, how to fix this ? I think that MPI.COMM_SELF.Spawn() is what I am looking for, but I'm not sure.
The external executable requires some environment modules to be loaded. If they are loaded at sbatch run.sh, are they still loaded when I invoke from MPI.COMM_SELF.Spawn() from my_python_code.py ?
As a slightly different approach, is it possible to have something like pre-allocations/reservations to book resources for the Workers, then use MPI.COMM_WORLD.Spawn() together with the pre-allocations/reservations ? The goal is also to avoid entering the SLURM queue at each new batch, as this may waste a lot of clock time (hence the will to book all required resources at the very beginning).
Since the python Master has to always stay alive anyways, SLURM job dependencies cannot be useful here, can they ?
Thank you so much for any help you may provide !
EDIT: Simplification of the workflow
In an attempt to keep my question simple, I first omited the fact that I actually had the Workers doing some analysis. But this work can be done on the Master using OpenMP multiprocessing, as Gilles Gouillardet suggested. It executes fast enough.
Then the Workers are necessary indeed, because each task takes about 20-25 min on a single Worker/Node.
I also added some bits about maintaining my own queue of tasks to be sent to the SLURM queue and ultimately to the Workers, just in case the number of tasks t would exceed a few tens/hundreds jobs. This should provide some flexibility also in the future, when re-using this code for different applications.
Probably this is fine like this. I will try to go this way and update these lines. EDIT: It works fine.
At first glance, this looks over convoluted to me:
there is no communication between a slave and GROMACS
there is some master/slave communications, but is MPI really necessary?
are the slaves really necessary? (e.g. can the master process simply serialize the computation and then directly start GROMACS?)
A much simpler architecture would be to have one process on your frontend, that will do:
prepare the GROMACS inputs
sbatch gromacs (start several jobs in a row)
wait for the GROMACS jobs to complete
analyze the GROMACS outputs
re-iterate or exit
If the slave is doing some work you do not want to serialize on the master, can you replace the MPI communications by using files on a shared filesystem? in that case, you can do the computation on the compute nodes within a GROMACS job, before and after executing GROMACS. If not, maybe TCP/IP based communications can do the trick.

How to automate the executions of the same program in pycharm?

I have a bunch of datasets that need to be tested by using always the same .py program.
I want to automate all the testing process, so that after that one dataset has been tested and evaluated, automatically the program .py starts by testing (and evaluating) the next one.
I'm using the PyCharm IDE and I tried by adding configuration files that execute the same .py program but that take in input different file paths.
I was wondering whether there is a tool (or a command sequence to be followed in the command line) that let me automate the process of automatic testing or that let me call all the configurations created one after the other.
I'm talking about 6-8 hours per configuration, and currently I've 4 configurations.
Thanks in advance!

Multiple cronjobs possibly accessing log file at the same time

As the title says: I have multiple cronjobs loading python scripts, that start at different times. The scripts do different things and thus take different amounts of time to finish. So it's pretty likely that they will overlap from time to time. During their runtime they write all their output into log file. The scripts (and its results) are completely independent from each other.
What happens if multiple scripts try to write to the same logfile at the same time?
If there is a problem, how can I solve it?

Does a PBS batch system move multiple serial jobs across nodes?

If I need to run many serial programs "in parallel" (because the problem is simple but time consuming - I need to read in many different data sets for the same program), the solution is simple if I only use one node. All I do is keep submitting serial jobs with an ampersand after each command, e.g. in the job script:
./program1 &
./program2 &
./program3 &
./program4
which will naturally run each serial program on a different processor. This works well on a login server or standalone workstation, and of course for a batch job asking for only one node.
But what if I need to run 110 different instances of the same program to read 110 different data sets? If I submit to multiple nodes (say 14) with a script which submits 110 ./program# commands, will the batch system run each job on a different processor across the different nodes, or will it try to run them all on the same, 8 core node?
I have tried to use a simple MPI code to read different data, but various errors result, with about 100 out of the 110 processes succeeding, and the others crashing. I have also considered job arrays, but I'm not sure if my system supports it.
I have tested the serial program extensively on individual data sets - there are no runtime errors, and I do not exceed the available memory on each node.
No, PBS won't automatically distribute the jobs among nodes for you. But this is a common thing to want to do, and you have a few options.
Easiest and in some ways most advantagous for you is to bunch the tasks into 1-node sized chunks, and submit those bundles as individual jobs. This will get your jobs started faster; a 1-node job will normally get scheduled faster than a (say) 14 node job, just because there's more one-node sized holes in the schedule than 14. This works particularly well if all the jobs take roughly the same amount of time, because then doing the division is pretty simple.
If you do want to do it all in one job (say, to simplify the bookkeeping), you may or may not have access to the pbsdsh command; there's a good discussion of it here. This lets you run a single script on all the processors in your job. You then write a script which queries $PBS_VNODENUM to find out which of the nnodes*ppn jobs it is, and runs the appropriate task.
If not pbsdsh, Gnu parallel is another tool which can enormously simplify these tasks. It's like xargs, if you're familiar with that, but will run commands in parallel, including on multiple nodes. So you'd submit your (say) 14-node job and have the first node run a gnu parallel script. The nice thing is that this will do scheduling for you even if the jobs are not all of the same length. The advice we give to users on our system for using gnu parallel for these sorts of things is here. Note that if gnu parallel isn't installed on your system, and for some reason your sysadmins won't do it, you can set it up in your home directory, it's not a complicated build.
You should consider job arrays.
Briefly, you insert #PBS -t 0-109 in your shell script (where the range 0-109 can be any integer range you want, but you stated you had 110 datasets) and torque will:
run 110 instances of your script, allocating each with the resources you specify (in the script with #PBS tags or as arguments when you submit).
assign a unique integer from 0 to 109 to the environment variable PBS_ARRAYID for each job.
Assuming you have access to environment variables within the code, you can just tell each job to run on data set number PBS_ARRAYID.

Use full processing power with perl

I have a perl script which is running correct but it is only using 1 core of my 2 core CPU, how can i make it utilise all cores.
I know that i can create threads using threads->new(); but how do i fit that into something like:
my $twig= new XML::Twig::XPath(TwigRoots => {TrdCaptRpt => \&top_level});
$twig->parsefile($file);
where the subroutine is being called by something else.
The standard approach with Perl is to not try to use multiple cores with one invocation of the script, but instead to run jobs in parallel on separate cores.
Yes, you can use threading with Perl, but Perl's threading is (very) heavyweight. To avoid potential race conditions, when you spawn a thread Perl simply copies everything that it does not want to explicitly share. Therefore using threading is likely to be much slower than not.
You would need to modify the code of XML::Twig. There is no canned answer of what would need to be done. if you find yourself having to run this script for multiple files, a better and very simple option, is to write your script so it can run for more than 1 file at the same time. You could do that with threads or you could do that with a wrapper script that executes 2 copies of your script at the same time (perhaps with xargs?).

Resources