how to limit job submision depending on location or partition - linux

I am wondering if there is a way to limit the job submission depending on the location where the submission was made in HPC.
Thing is, recently a storage for a scratch disk was added. So now I have two partitions.
home directory
scratch directory
I want all the HPC users are forced to submit their jobs only in scratch directory not in home directory.
HPC is operating LSF for job scheduler. So, can I have job submission (i.e. bsub) are controlled through LSF such that only jobs submitted under the scratch directory runs in HPC?
Thanks in advance.

I don't think there's a way to do this natively, but there is a way to customize LSF submission-time checks to reject jobs submitted from the wrong directory. Take a look at this documentation:
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_admin/chap_sub_exec_controls_lsf_admin.html
In particular, what you want to do is define an esub script which checks for the appropriate submission CWD. Let's say you name your script esub.dircheck, it would look something like this:
#!/bin/sh
# Reject if submission is under /home
if [[ $PWD/ = /home/* ]]; then
echo "Job submission from /home forbidden"
exit $LSB_SUB_ABORT_VALUE
fi
Now you can place the esub.dircheck into $LSF_SERVERDIR (make sure it's executable by all). Finally, if you want the check to happen for every job submission, set the following parameter in lsf.conf:
LSB_ESUB_METHOD=dircheck
One final note: I'm just checking that PWD has home as a prefix in the code above, but you probably need to do something a bit more sophisticated if you want to make sure that the directory you're in is under /home because there could be symbolic links that gum up the prefix check. Take a look at this answer for details.

Related

How can I configure SLURM at the user level (e.g. with something like a ".slurmrc")?

Is there something like .slurmrc for SLURM that would allow each user to set their own defaults for parameters that they would normally specify on the command line.
For example, I run 95% of my jobs on what I'll call our HighMem partition. Since my routine jobs can easily go over the default of 1GB, I almost always request 10GB of RAM. To make the best use of my time, I would like to put the partition and RAM requests in a configuration file so that I don't have to type them in all the time. So, instead of typing the following:
sbatch --partition=HighMem --mem=10G script.sh
I could just type this:
sbatch script.sh
I tried searching for multiple variations on "SLURM user-level configuration" and it seemed that all SLURM-related hits dealt with slurm.conf (a global-level configuration file).
I even tried creating slurm.conf and .slurmrc in my home directory, just in case that worked, but they didn't have any effect on the partition used.
update 1
Yes, I thought about scontrol, but the only configuration file it deals with is global and most parameters in it aren't even relevant for a normal user.
update 2
My supervisor pointed out the SLURM Perl API to me. The last time I looked at it, it seemed too complicated to me, but this time upon looking at the code for https://github.com/SchedMD/slurm/blob/master/contribs/perlapi/libslurm/perl/t/06-complete.t, it would seem that it wouldn't too be hard to create a script that behaves similar to sbatch that reads in a default configuration file and sets the desired parameters. However, I haven't had any success in setting the 'std_out' to a file name that gets written to.
If your example is representative, defining an alias
alias sbatch='sbatch --partition=HighMem --mem=10G'
could be the easiest way. Alternatively, a Bash function could also be used
sbatch() {
command sbatch --partition=HighMem --mem=10G "$#"
}
Put any of these in your .bash_profile for persistence.

SLURM / Sbatch creates many small output files

I am running a pipeline on a SLURM-cluster, and for some reason a lot of smaller files (between 500 and 2000 bytes in size) named along the lines of slurm-XXXXXX.out (where XXXXXX is a number). I've tried to find out what these files are on the SLURM website, but I can't find any mention of them. I assume they are some sort of in-progress files that the system uses while parsing my pipeline?
If it matters, the pipeline I'm running is using snakemake. I know I've seen these types of files before though, without snakemake, but I they weren't a big problem back then. I'm afraid that clearing the working directory of these files after each step of the workflow will interrupt in-progress steps, so I'm not doing anything with them at the moment.
What are these files, and how can I suppress their output or, alternatively, delete them after their corresponding job is finished? Did I mess up my workflow somehow, and that's why they are created?
You might want to take a look at the sbatch documentation. The files that you are referring to are essentially SLURM logs as explained there:
By default both standard output and standard error are directed to a
file of the name "slurm-%j.out", where the "%j" is replaced with the
job allocation number.
You can change the filename with the --error=<filename pattern> and --output=<filename pattern> command line options. The filename_pattern can have one or more symbols that will be replaced as explained in the documentation. According to the FAQs, you should be able to suppress standard output and standard error by using the following command line options:
sbatch --output=/dev/null --error=/dev/null [...]

Mysterious find command hogging memory on Linux Mint

I'm running linux mint 17 and I notice that every so often my computer slows to a crawl.W When I look at top I see "/usr/bin/find / -ignore_readdir_race (..." etc. sucking up most of my memory. It runs for a really long time (several hours) and my guess is that its an automated indexing process for my hard drive.
I'm working on a project that requires me to have over 6 million audio files on a mounted SSD so another guess is that the filesystem manager is trying to index all these files for quick search. Is that the case? Is there any way to turn it off for the SSD?
The locate command reports data collected for its database by a regular cron task. You can exclude directories from the database, making the task run more quickly. According to updatedb.conf(5)
PRUNEPATHS
A whitespace-separated list of path names of directories which should not be scanned by updatedb(8). Each path name must be exactly in the form in which the directory would be reported by locate(1).
By default, no paths are skipped.
On my Debian machine for instance, /etc/updatedb.conf contains this line:
PRUNEPATHS="/tmp /var/spool /media"
You could modify your /etc/updatedb.conf to add the directories which you want to ignore. Only the top-level directory of a directory tree need be listed; subdirectories are ignored when the parent is ignored.
Further reading:
Tip of the day: Speed up `locate`
How do I get mlocate to only index certain directories?
It's a daily cron job that updates databases used by the locate command. See updatedb(8) if you want to learn more. Having six million audio files will likely cause this process to eat up a lot of CPU as it's trying to index your local filesystems.
If you don't use locate, I'd recommend simply disabling updatedb, something like this:
sudo kill -9 <PID>
sudo chmod -x /etc/cron.daily/mlocate
sudo mv /var/lib/mlocate/mlocate.db /var/lib/mlocate/mlocate.db.bak
If all else fails just remove the package.

Make isolated build or any shell command

Please, give me a hint to the simplest and lightest solution to isolate a linux shell script (usually ubuntu in case it has smth special)
What I mean about isolation:
1. Filesystem - the most important - I want it cannot access any folders (read) outside workspace except those I will manually configure in some way
2. actually, other types of isolation does not matter
It is ok for "soft" isolation, I mean script may just fail/aborted if trying to access(read) denied paths, but "hard" isolation to get "Not found" for such attempts looks like a cleaner solution
I do not need any process isolations, script may use sudo/fakeroot/etc. inside it, but this should not affect isolation.
Also, I plan to use different isolations inside one workspace:
for ex., I have folders:
a/
b/
include/
target/
I want to make a giving it access only to "a"(rw), "include"(r) and "target" (rw+sudo)
make b giving it access only to "b"(rw), "include"(r) and "target" (rw+sudo)
and target will get both results from A and B, allowing B overwrite anything of results of A - the same if there is no isolation
The target of isolation I'm talking about is to prevent B reading from A, even knowing that there is A and vice versa
Thanks!
Two different users and SSH is a simple way to solve your problem. One of the key benefits is that this will start a "clean" environment in a new shell.
ssh <user_a>#localhost '<path_to_build_script_a>'
ssh <user_b>#localhost '<path_to_build_script_b>'
User a and b must both be members of the group that owns common directories.
Note that it's the directory write permission that decide if a user can create new files inside that directory.
Edit: 2013-07-29
For lots of sequential isolated builds like in your case, one solution is to do as you already have suggested; automate file permission changes so that each build only have access to the files and folders it should.

transferring files with cron?

I am trying to figure out if it is possible to set up a cron job that will transfer all files from one folder on my server to another folder and then take a set number of files (files chosen randomly) from another folder and put it in the original folder. If so any hints on how to do this, I have no experience with cron at all, I just don't want to have to log in with ftp and do the transfers manually.
Cron is really simple, all it's doing is to run a command of your choice at the specified times of day.
In your case, you probably want to write a shell script that use rsync, scp or ftp to transfer the files, make sure that exits successfully (check exit code from transfer, stored in the $? variable), then move the set of files into the original folder.
I would use rsync and passwordless authentication via ssh keys. That's good for security, and if you want to, you can even limit the receiving side to only allow that ssh key to run rsync's server side.
If this script is called /opt/scripts/myscript.sh, and is to be run once every 10 minutes, add the following to your crontab (run crontab -e to edit your crontab):
*/10 * * * * /opt/scripts/myscript.sh
Remember that the environment variables you have available in your shell are not the same as those available when the cronjob runs, so PATH, etc, may be different. This often causes cron jobs to fail the first few times you run them (See my law on cron jobs: http://efod.se/blog/archive/2010/02/19/forsbergs-law-on-cron-jobs :-)). Any output from cron is sent via mail to the user running the cron job, which is helpful for debugging. Simple debugging writing messages to some file in /tmp/ is also often a way to get your cron jobs running.
In many cases it makes sense to run cron jobs as a special user. Don't run your cron jobs as root unless they absolutely must have root access, it's better to run things as special users that only has limited permissions in the file system.
To edit your cron file:
crontab -e
An example entry for transferring files would look like:
30 3 * * * rsync -av School/* username#server:~/School/ >| /home/username/CronLogs/school_update
the fields are: minute, hour, day, month, day of week, command
So in my example, I transfer files everyday at 3:30am by executing the rsync command listed. Note the *'s mark the fields as unused.
For a quick reference/tutorial, see: this link

Resources