How to chroot a job within sun grid engine? - linux

I have a two part question about making our cluster more secure from accidental changes and deletions from running cluster jobs. I don't care about restricting the read or execute permissions of the job, basically I want the job to not be able to delete or change any files outside of its initial directory.
1) Is it possible to chroot a job submitted to sun grid so the job can only write to directory specified when the job is submitted?
2) It is possible to do the above but also allow read-only access to the rest of the system with all paths intact? essentially to have the filesystem appear normal and complete, but the job can only write and delete from the directory it started in.

Related

Weird "Stale file handle, errno=116" on remote cluster after dozens of hours running

I'm now running a simulation code called CMAQ on a remote cluster. I first ran a benchmark test in serial to see the performance of the software. However, the job always runs for dozens of hours and then crashes with the following "Stale file handle, errno=116" error message:
PBS Job Id: 91487.master.cluster
Job Name: cmaq_cctm_benchmark_serial.sh
Exec host: hs012/0
An error has occurred processing your job, see below.
Post job file processing error; job 91487.master.cluster on host hs012/0Unknown resource type REJHOST=hs012.cluster MSG=invalid home directory '/home/shangxin' specified, errno=116 (Stale file handle)
This is very strange because I never modify the home directory and this "/home/shangxin/" is surely my permanent directory where the code is....
Also, in the standard output .log file, the following message is always shown when the job fails:
Bus error
100247.930u 34.292s 27:59:02.42 99.5% 0+0k 16480+0io 2pf+0w
What does this message mean specifically?
I once thought this error is due to that the job consumes the RAM up and this is a memory overflow issue. However, when I logged into the computing node while running to check the memory usage with "free -m" and "htop" command, I noticed that both the RAM and swap memory occupation never exceed 10%, at a very low level, so the memory usage is not a problem.
Because I used "tee" to record the job running to a log file, this file can contain up to tens of thousands of lines and the size is over 1MB. To test whether this standard output overwhelms the cluster system, I ran another same job but without the standard output log file. The new job still failed with the same "Stale file handle, errno=116" error after dozens of hours, so the standard output is also not the reason.
I also tried running the job in parallel with multiple cores, it still failed with the same error after dozens of hours running.
I can make sure that the code I'm using has no problem because it can successfully finish on other clusters. The administrator of this cluster is looking into the issue but also cannot find out the specific reasons for now.
Has anyone ever run into this weird error? What should we do to fix this problem on the cluster? Any help is appreciated!
On academic clusters, home directories are frequently mounted via NFS on each node in the cluster to give you a uniform experience across all of the nodes. If this were not the case, each node would have its own version of your home directory, and you would have to take explicit action to copy relevant files between worker nodes and/or the login node.
It sounds like the NFS mount of your home directory on the worker node probably failed while your job was running. This isn't a problem you can fix directly unless you have administrative privileges on the cluster. If you need to make a work-around and cannot wait for sysadmins to address the problem, you could:
Try using a different network drive on the worker node (if one is available). On clusters I've worked on, there is often scratch space or other NFS directly under root /. You might get lucky and find an NFS mount that is more reliable than your home directory.
Have your job work in a temporary directory local to the worker node, and write all of its output files and logs to that directory. At the end of your job, you would need to make it copy everything to your home directory on the login node on the cluster. This could be difficult with ssh if your keys are in your home directory, and could require you to copy keys to the temporary directory which is generally a bad idea unless you restrict access to your keys with file permissions.
Try getting assigned to a different node of the cluster. In my experience, academic clusters often have some nodes that are more flakey than others. Depending on local settings, you may be able to request certain nodes directly, or potentially request resources that are only available on stable nodes. If you can track which nodes are unstable, and you find your job assigned to an unstable node, you could resubmit your job, then cancel the job that is on an unstable node.
The easiest solution is to work with the cluster administrators, but I understand they don't always work on your schedule.

How to insure rsync gets complete file if remote cron deletes it during transfer

Alternate title: How can I prevent the deletion of a file if rsync (or any other app) currently has the file open for reading?
I have a cron on my live system that dumps a database to a backup file every 30 min.
On my backup system I have a cron that runs rsync to fetch that backup file, also every 30 min.
I thought about creating lock files and using them to tell the remote if it's safe to fetch the file or visa versa. I also thought about having the dump script create file names in a rotation and create a "name" file that the remote fetches to know what file is "safe" to retrieve. I know I can synchronize the cron jobs but I'm wondering if there is a better, safer way.
Is there some OS feature I can use to accomplish this?

Deploying and scheduling changes with Ansible OSS

Please note: I am not interested in any enterprise/for-pay (Tower?) solutions here, only solutions available via Ansible's OSS offering.
OK so I've got my Ansible project configured and working perfectly, woo hoo! Looks something like this:
myansible01.example.com:/opt/ansible/
site.yml
fizz.yml
buzz.yml
group_vars/
roles/
common/
tasks/
main.yml
handlers
main.yml
foos/
tasks/
main.yml
handlers/
main.yml
There's several things I need to accomplish to get this working in a production environment:
I need to be able to automate the deployment of changes to this project
I need to schedule playbooks to be ran, say, every 30 seconds (to ensure all managed nodes are always in compliance)
So my concerns:
How are changes usually deployed to live Ansible projects? Say the project is located at myansible01.example.com:/opt/ansible (my Ansible server). Is it sufficient to simply delete the Ansible project root (rm -rf /opt/ansible) and then copy the latest (containing changes) Ansible project back to the same location? What happens if Ansible is currently running any plays while I perform this "drop-n-swap"?
It looks like the commercial offering (Ansible Tower) has a scheduling feature built into it, but not the OSS offering. How can I schedule Ansible OSS to run plays at certain times? For instance, I might want certain plays to be ran every 30 seconds, so as to ensure nodes are always within compliance. Is cron sufficient to do this, or is there a more standard approach?
For this kind of task you typically want an orchestration engine such as Jenkins to do all your, well, orchestration.
You can set Jenkins to run playbooks on timers or other events such as a push to an SCM such as git.
Typically a job starts by checking out a tag/branch of our Ansible code base and then applying it to all of our specified servers so you always know what is being run. If you want, this can simply be the head on master (in git terms) so it's always applying the most recent changes. If you were also to have this to hook into your SCM repo then a simple push will force those changes to be applied to all of your servers.
Because of that immediacy you might want to consider only doing this on some test servers that then have some form of testing done against them (such as Serverspec) to verify that your changes are good before rolling them out to a production environment.
Jenkins, by default, will not run a job while the same job is running (or if you are maxed out on executor slots) so you can always be sure that it will only pull the repo (including any changes) after your Ansible run is complete. If you have multiple jobs running you can use blocking to prevent jobs running at the same time (both trying to apply potentially different configurations to the servers) but you don't have to worry about a new job starting and pulling the repo into the already running job as Jenkins separates these into separate work spaces.
We use Jenkins for manual runs of Ansible against our environment but we also have a "self healing" Jenkins job that simply runs a tagged commit of our Ansible code base against our environment, forcing it to an idempotent state to prevent natural drift of configurations. When we need to do something different to the environment or are running a slightly further ahead commit of our code base in to it we can easily disable the self healing job until we're happy with things and then either just re-enable the job to put things back or advance the tag that Jenkins is using to now use the more recent commit.

How to run a command in Linux restricting file access in a certain directory?

What I want to do is to process some batch jobs involving scripts or programs that may unexpectedly perform destructive operations (like doing recursive delete on the "/" dir). The scripts and programs are written by amateurs, who are very likely to do stupid things.
To prevent my OS from being destroyed, I want to restrict file access permissions on a certain directory for each job. I do not want the job script or executable to be able to write or delete anything outside their assigned directory (including the directories assigned for other jobs, I don't want any job to disturb the result of others).
Currently, I thought of two plans:
One is to use chroot on executing the jobs. But when using chroot, I will not be able to access bash utilities or other installed programs such as ffmpeg. This is not acceptable for my script jobs.
The other is to create a user for each job, and run jobs as a certain user. But by this way, I would have to create thousands of users; cleaning up would be a great headache.
I'm pretty much prefer the second idea, but I'm seeking for a way to avoid creating so many users and having to delete those users afterwards.
Put all the tools you need inside a directory and then chroot into it perhaps?
The second option is fine too. You don't have to create many users. Just one or two and all the scripts can run as them.
Third and probably the best is to spawn off a VM and just ssh into it and run the scripts. If the VM gets ruined, you can always spin up another one.
How do your users access your computer?
Do they have to be root? because otherwise they won't be able to recursively delete /...
Anyway, if it's through ssh, you could configure a separate write-only chrooted environment, like:
sshd_config:
AllowGroups trusted users
# all users gets chrooted to a directory
ChrootDirectory /srv/jail
# users from a certain Unix group (let's call it "trusted") get chrooted to /
Match Group trusted
ChrootDirectory /
fstab:
# mount-bind non sensitive directories (ie.: `/tmp`, `/home`) in read-write
/srv/jail/home /home home bind 0 0
/srv/jail/var/spool/mail /var/spool/mail mail bind 0 0
# you mount-bind sensitive directories in read-only
/usr /srv/jail/usr usr bind 0 0
/usr /srv/jail/usr usr bind,remount,ro 0 0
# etc.
I would personally advice against Docker...

Running mesos-local for testing a framework fails with Permission denied

I am sharing a linux box with some coworkers, all of them developing in the mesos ecosphere. The most convenient way to test a framework that I am hacking around with commonly is to run mesos-local.sh (combining both master and slaves in one).
That works great as long as none of my coworkers do the same. As soon as one of them did use that shortcut, no other can do that anymore as the master specific temp-files are stored in /tmp/mesos and the user that ran that instance of mesos will have the ownership of those files and folders. So when another user tries to do the same thing something like the following will happen when trying to run any task from a framework;
F0207 05:06:02.574882 20038 paths.hpp:344] CHECK_SOME(mkdir): Failed
to create executor directory
'/tmp/mesos/0/slaves/201402051726-3823062160-5050-31807-0/frameworks/201402070505-3823062160-5050-20015-0000/executors/default/runs/d46e7a7d-29a2-4f66-83c9-b5863e018fee'Permission
denied
Unfortunately, mesos-local.sh does not offer a flag for overriding that path whereas mesos-master.sh does via --work_dir=VALUE.
Hence the obvious workaround is to not use mesos-local.sh but master and slave as separate instances. Not too convenient though...
The easiest workaround for preventing that problem, no matter if you run mesos-master.sh or mesos-local.sh is to patch the environment setup within bin/mesos-master-flags.sh.
That file is used by both, the mesos-master itself as well as mesos-local, hence it is the perfect place to override the work-directory.
Edit bin/mesos-master-flags.sh and add the following to it;
export MESOS_WORK_DIR=/tmp/mesos-"$USER"
Now run bin/mesos-local.sh and you should see something like this in the beginning of its log output;
I0207 05:36:58.791069 20214 state.cpp:33] Recovering state from
'/tmp/mesos-tillt/0/meta'
By that all users that patched their mesos-master-flags.sh accordingly will have their personal work-dir setup and there is no more stepping on each others feet.
And if you prefer not to patch any files, you could just as well simply prepend the startup of that mesos instance by setting the environment variable manually:
MESOS_WORK_DIR=/tmp/mesos-foo bin/mesos-local.sh

Resources