linux batch jobs in parallel

linux batch jobs in parallel - linux

I have seven licenses of a particular software. Therefore, I want to start 7 jobs simultaneously. I can do that using '&'. Now, 'wait' command waits till the end of all of those 7 processes to be finished to spawn the next 7. Now, I would like to write the shell script where after I start the first seven, as and when a job gets completed I would like to start another. This is because some of those 7 jobs might take very long while some others get over really quickly. I don't want to waste time waiting for all of them to finish. Is there a way to do this in linux? Could you please help me?
Thanks.

GNU parallel is the way to go. It is designed for launching multiples instances of a same command, each with a different argument retrieved either from stdin or an external file.
Let's say your licensed script is called myScript, each instance having the same options --arg1 --arg2 and taking a variable parameter --argVariable for each instance spawned, those parameters being stored in file myParameters :
cat myParameters | parallel -halt 1 --jobs 7 ./myScript --arg1 --argVariable {} --arg2
Explanations :
-halt 1 tells parallel to halt all jobs if one fails
--jobs 7 will launch 7 instances of myScript
On a debian-based linux system, you can install parallel using :
sudo apt-get install parallel
As a bonus, if your licenses allow it, you can even tell parallel to launch these 7 instances amongst multiple computers.

You could check how many are currently running and start more if you have less than 7:
while true; do
if [ "`ps ax -o comm | grep process-name | wc -l`" -lt 7 ]; then
process-name &
fi
sleep 1
done

Write two scripts. One which restarts a job everytime it is finished and one that starts 7 times the first script.
Like:
script1:
./script2 job1
...
./script2 job7
and
script2:
while(...)
./jobX

I found a fairly good solution using make, which is a part of the standard distributions. See here

Related

How do i activate cron command once within specific time frame?

Basic information about my system: I have a music system where people can schedule songs to start and end at a specific time.
OS: Arch linux
It sets two crons at the moment. One lets say at 1.50 (start time with a command like "play etc") and another set at 3.20 (end time with a command like "end etc").
My setup works perfectly and i can end delete schedules etc etc but i now noticed an issue! If i set the above times and turn the system off (My system is a raspberry pi) and turn back on at lets say 2.00 and i missed the 1.50 deadline, the music doesnt start (obviously) and i want to try make it so no matter what time i turn it on within a range lets say: 1.50 - 3.20 it will start the play command. But it will run the command once!
I looked around and the commands i got was like:
0 1.50-3.20/2 * * * your_command.sh
But thats to run every 2 hours. I want it to run once only between these times?
Thanks!

You could add an additional cron job which starts a script on every reboot. For instance, you could add a line like this to your crontab:
#reboot /home/pi/startplayback.sh
Your startplayback.sh script should check if current time is within the desired period and run the desired command if it is. For example the code below will print PLAY! if the script is run between 1:50 and 3:20. You could replace echo 'PLAY!' by run WHATEVER
#!/bin/bash
current=$(date '+%H%M')
(( current=(10#$current) ))
((current > 150 && current < 320 )) && echo 'PLAY!'
P.S. Don't forget to make your script executable sudo chmod +x startplayback.sh

You might want to look at the at command and its utilities.
SYNOPSIS
at [-q queue] [-f file] [-mldbv] time
at [-q queue] [-f file] [-mldbv] -t [[CC]YY]MMDDhhmm[.SS]
at -c job [job ...]
at -l [job ...]
at -l -q queue
at -r job [job ...]
atq [-q queue] [-v]
atrm job [job ...]
batch [-q queue] [-f file] [-mv] [time]
at is good for scheduling one time jobs to be run at some point in the future. It maintains a queue of these jobs, so you can use it to schedule things with a great variety of different time specifications.
Cron is in my opinion a scheduler for jobs that are to be repeated over and over.
So a quick and dirty example for you:
echo 'ls -lathF' | at now + 1 minute
As expected you will see a job to be run in one minute. Try atq to see the list of jobs.
When the job is done, output will be mailed to your user by default.

I solved the issue by creating a PHP file and load the page on reboot then do its work and redirect back to such and such.

How to control multithreaded background jobs in for loop in shell script

I found that my linux workstation with 12 CPUs had almost stopped to work after I executed a shell script (tcsh) having a for-loop where more than hundreds of loops are executed simultaneously by adding '&' at the end of the command. Is there any way to control the number or executing time for background processes in the for-loop using tcsh?

GNU Parallel is made for this kind of situations.
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Have 5 scripts running at any given time

I have a bash script (running under CentOS 6.4) that launches 90 different PHP scripts, ie.
#!/bin/bash
php path1/some_job_1.php&
php path2/some_job_2.php&
php path3/some_job_3.php&
php path4/some_job_4.php&
php path5/some_job_5.php
php path6/some_job_6.php&
php path7/some_job_7.php&
php path8/some_job_8.php&
php path9/some_job_9.php&
php path10/some_job_10.php
...
exit 0
In order to avoid overloading my server, I use the ampersand &, it works, but my goal is to always have 5 scripts running at the same time
Is there a way to achieve this?

This question is popped several times, but I could not find a proper answer for it. I think now I found a good solution!
Unfortunately parallel is not the part of the standard distributions, but make is. It has a switch -j to do makes parallel.
man make(1)]: (more info on make's parallel execution)
-j [jobs], --jobs[=jobs]
Specifies the number of jobs (commands) to run simultaneously. If
there is more than one -j option, the last one is effective. If
the -j option is given without an argument, make will not limit
the number of jobs that can run simultaneously.
So with a proper Makefile the problem could be solved.
.PHONY: all $(PHP_DEST)
# Create an array of targets in the form of PHP1 PHP2 ... PHP90
PHP_DEST := $(addprefix PHP, $(shell seq 1 1 90))
# Default target
all: $(PHP_DEST)
# Run the proper script for each target
$(PHP_DEST):
N=$(subst PHP,,$#); php path$N/some_job_$N.php
It creates 90 of PHP# targets each calls php path#/some_job_#.php. If You run make -j 5 then it will run 5 instance of php parallel. If one finishes it starts the next.
I renamed the Makefile to parallel.mak, I run chmod 700 parallel.mak and I added #!/usr/bin/make -f to the first line. Now it can be called as ./parallel.mak -j 5.
Or even You can use the more sophisticated -l switch:
-l [load], --load-average[=load]
Specifies that no new jobs (commands) should be started if there
are others jobs running and the load average is at least load (a
floating-point number). With no argument, removes a previous load
limit.
In this case make will decide how many jobs can be launched depending on the system's load.
I tested it with ./parallel.mak -j -l 1.0 and run nicely. It started 4 programs in parallel at first contrary -j without args means run as many process parallel as it can.

Use cron and schedule them at the same time.
Or use parallel.

Instance limited cron job

I want to run a cron job every minute that will launch a script. Simple enough there. However, I need to make sure that not more than X number (defined in the script) of instances are ever running. These are queue workers, so if at any minute interval 6 workers are still active, then I would not launch another instance. The script simply launches a PHP script which exits if no job available. Right now I have a shell script that perpetually launches itself every 10 seconds after exit... but there are long periods of time where there are no jobs, and a minute delay is fine. Eventually I would like to have two cron jobs for peak and off-peak, with different intervals.

Make sure you have unique script name.
Then check if 6 instances are already running
if [ $(pgrep '^UNIQUE_SCIPT_NAME$' -c) -lt 6 ]
then
# start my script
else
# do not start my script
fi

I'd say that if you want to iterate as often as every minute, then a process like your current shell script that relaunches itself is what you actually want to do. Just increase the delay from 10 seconds to a minute.
That way, you can also more easily control your delay for peak and off-peak, as you wanted. It would be rather elegant to simply use a shorter delay if the script found something to do the last time it was launched, or a longer delay if it did not find anything.

You could use a script like OneAtATime to guard against multiple simultaneous executions.

This is what i am using in my shell scripts:
echo -n "Checking if job is already running... "
me=`basename $0`
running=$(ps aux | grep ${me} | grep -v .log | grep -v grep | wc -l)
if [ $running -gt 1 ];
then
echo "already running, stopping job"
exit 1
else
echo "OK."
fi;
The command you're looking for is in line 3. Just replace $(me) with your php script name. In case you're wondering about the grep .log part: I'm piping the output into a log file, whose name partially contains the script name, so this way i'm avoiding it to be double-counted.

Is there a variable in Linux that shows me the last time the machine was turned on?

I want to create a script that, after knowing that my machine has been turned on for at least 7h, it does something.
Is this possible? Is there a system variable or something like that that shows me the last time the machine was turned on?

The following command placed in /etc/rc.local:
echo 'touch /tmp/test' | at -t $(date -d "+7 hours" +%m%d%H%M)
will create a job that will run a touch /tmp/test in seven hours.
To protect against frequent reboots and prevent adding multiple jobs you could use one at queue exclusively for this type of jobs (e.g. c queue). Adding -q c to the list of at parameters will place the job in the c queue. Before adding new job you can delete all jobs from c queue:
for job in $(atq -q c | sed 's/[ \t].*//'); do atrm $job; done

You can parse the output of uptime I suppose.

As Pavel and thkala point out below, this is not a robust solution. See their comments!
The uptime command shows you how long the system has been running.
To accomplish your task, you can make a script that first does sleep 25200 (25200 seconds = 7 hours), and then does something useful. Have this script run at startup, for example by adding it to /etc/rc.local. This is a better idea than polling the uptime command to see if the machine has been up for 7 hours (which is comparable to a kid in the backseat of a car asking "are we there yet?" :-))

Just wait for uptime to equal seven hours.
http://linux.die.net/man/1/uptime

I don't know if this is what you are looking for, but uptime command will give you for how many computer was running since last reboot.

$ cut -d ' ' -f 1 </proc/uptime
This will give you the current system uptime in seconds, in floating point format.
The following could be used in a bash script:
if [[ "$(cut -d . -f 1 </proc/uptime)" -gt "$(($HOURS * 3600))" ]]; then
...
fi

Add the following to your crontab:
#reboot sleep 7h; /path/to/job
Either /etc/crontab, /etc/cron.d/, or your users crontab, depending on whether you want to run it as root or the user -- don't forget to put "root" after "#reboot" if you put it in /etc/crontab or cron.d
This has the benefit that if you reboot multiple times, the jobs get cancelled at shut down, so you won't get a bunch of them stacking up if you reboot several times within 7 hours. The "#reboot" time specification triggers the job to be run once when the system is rebooted. "sleep 7h;" waits for 7 hours before running "/path/to/job".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

linux batch jobs in parallel - linux

You could check how many are currently running and start more if you have less than 7: while true; do if [ "`ps ax -o comm | grep process-name | wc -l`" -lt 7 ]; then process-name & fi sleep 1 done

Write two scripts. One which restarts a job everytime it is finished and one that starts 7 times the first script. Like: script1: ./script2 job1 ... ./script2 job7 and script2: while(...) ./jobX

I found a fairly good solution using make, which is a part of the standard distributions. See here

Related

How do i activate cron command once within specific time frame?

How to control multithreaded background jobs in for loop in shell script

Have 5 scripts running at any given time

Instance limited cron job

Is there a variable in Linux that shows me the last time the machine was turned on?

Categories

Resources