I'd like to do two things in sequence:
Submit a job with sbatch
Once the job has been allocated, retrieve the hostname of the allocated node and, using that name, execute a second command on the host (login) node.
Step 2 is the hard part. I suppose I could write a Python script that polls squeue. Is there a better way? Can I set up a callback that Slurm will execute once a job starts running?
(In case you're curious, my motivation is to launch a Jupyter notebook on a compute node and automatically set up ssh port forwarding as in this blog post.)
Related
I am using PM2 in cluster mode and have 2 instances of my node.js application running. I have some long executing cron jobs (about 30 seconds) that I am trying to run. I am placing an if statement before the execution of the cron jobs to ensure that they only run on the first process via
if (process.env.NODE_APP_INSTANCE === 0) {
myCronFunction()
}
The goal was that since there are two processes, and PM2 should be load balancing them, if the cron job executes on process one, then process two would still be available to respond to the request. I am not sure what's going on, if PM2 is not successfully load balancing them, or what. But when my cron job executes on instance one, instance two is still not responding to requests until after the job on instance one finishes executing.
I'm not sure why that is. It is my understanding that they are supposed to be completely independent of one another.
Anyone have any ideas?
I have the main server-'A' hosting the SLURM cluster. The set up is working fine as expected.
I wanted to know if there is a way to submit the jobs to that main server from another server- 'B' remotely and get the responses.
This situation arises because I don't want to give access to the terminal of the main server- 'A' to the users on 'B'.
I have gone through the documentation and FAQs, but unfortunately couldn't find the details.
If you install the Slurm client on Server B . Copy your slurm.conf to it and then ensure it has the correct authentication (i.e the correct Munge key) , it should work.
Is it possible to do that without writing my own daemon? I know slurm can send you and email for each job, but I would like a single email when I have no more pending or running jobs.
One option is to submit an empty job just to send the email, and ask Slurm to run that job the latest.
You can do that using the --dependency singleton option. From the documentation:
singleton This job can begin execution after any previously launched
jobs sharing the same job name and user have terminated.
So you need to name all your jobs the same name (--name=commonname), and you should request the minimum resources possible to make sure that this job is not delayed further when all your other jobs are finished.
On cloudControl, I can either run a local task via a worker or I can run a cronjob.
What if I want to perform a local task on a regular basis (I don't want to call a publicly accessible website).
I see possible solutions:
According to the documentation,
"cronjobs on cloudControl are periodical calls to a URL you specify."
So calling the file locally is not possible(?). So I'd have to create a page I can call via URL. And I have to perform checks, if the client is on localhost (=the server) -- I would like to avoid this way.
I make the worker sleep() for the desired amount of time and then make it re-run.
// do some arbitrary action
Foo::doSomeAction();
// e.g. sleep 1 day
sleep(86400);
// restart worker
exit(2);
Which one is recommended?
(Or: Can I simply call a local file via cron?)
The first option is not possible, because the url request is made from a seperate webservice.
You could either use HTTP authentication in the cron task, but the worker solution is also completely valid.
Just keep in mind that the worker can get migrated to a different server (in case of software updates or hardware failure), so do SomeAction() may get executed more often than once per day from time to time.
I configured a ubuntu server(AWS ec2 instance) system as a cronserver, 9 cronjobs run between 4:15-7:15 & 21:00-23:00. I wrote a cron job on other system(ec2 intance) to stop this cronserver after 7:15 and start again # 21:00. I want the cronserver to stop by itself after the execution of the last script. Is it possible to write such script.
When you start the temporary instance, specify
--instance-initiated-shutdown-behavior terminate
Then, when the instance has completed all its tasks, simply run the equivalent of
sudo halt
or
sudo shutdown -h now
With the above flag, this will tell the instance that shutting down from inside the instance should terminate the instance (instead of just stopping it).
Yes, you can add an ec2stop command to the end of the last script.
You'll need to:
install the ec2 api tools
put your AWS credentials on the intance, or create IAM credentials that have authority to stop instances
get the instance id, perhaps from the inIstance-data
Another option is to run the cron jobs as commands from the controlling instance. The main cron job might look like this:
run processing instance
-wait for sshd to accept connections
ssh to processing instance, running each processing script
stop processing instance
This approach gets all the processing jobs done back to back, leaving your instance up for the least amount of time., and you don't have to put the credentials on thee instance.
If your use case allows for the instance to be terminated instead of stopped, then you might be able to replace the start/stop cron jobs with EC2 autoscaling. It now sports schedules for running instances.
http://docs.amazonwebservices.com/AutoScaling/latest/DeveloperGuide/index.html?scaling_plan.html