Instance limited cron job

Instance limited cron job - linux

I want to run a cron job every minute that will launch a script. Simple enough there. However, I need to make sure that not more than X number (defined in the script) of instances are ever running. These are queue workers, so if at any minute interval 6 workers are still active, then I would not launch another instance. The script simply launches a PHP script which exits if no job available. Right now I have a shell script that perpetually launches itself every 10 seconds after exit... but there are long periods of time where there are no jobs, and a minute delay is fine. Eventually I would like to have two cron jobs for peak and off-peak, with different intervals.

Make sure you have unique script name.
Then check if 6 instances are already running
if [ $(pgrep '^UNIQUE_SCIPT_NAME$' -c) -lt 6 ]
then
# start my script
else
# do not start my script
fi

I'd say that if you want to iterate as often as every minute, then a process like your current shell script that relaunches itself is what you actually want to do. Just increase the delay from 10 seconds to a minute.
That way, you can also more easily control your delay for peak and off-peak, as you wanted. It would be rather elegant to simply use a shorter delay if the script found something to do the last time it was launched, or a longer delay if it did not find anything.

You could use a script like OneAtATime to guard against multiple simultaneous executions.

This is what i am using in my shell scripts:
echo -n "Checking if job is already running... "
me=`basename $0`
running=$(ps aux | grep ${me} | grep -v .log | grep -v grep | wc -l)
if [ $running -gt 1 ];
then
echo "already running, stopping job"
exit 1
else
echo "OK."
fi;
The command you're looking for is in line 3. Just replace $(me) with your php script name. In case you're wondering about the grep .log part: I'm piping the output into a log file, whose name partially contains the script name, so this way i'm avoiding it to be double-counted.

Related

how to kill log running jobs in every one hour interval?

I want to search all the jobs which are running more that one hour. kill them. Then sleep for 60 mins. Again search if any job is running more than 60? loop the process.

If you want to find the PIDs for the processes running for more than 60 minutes on your linux box you can use a very simple and basic bash script like the one bellow:
#!/bin/sh
MIN=60
SEC=$((MIN*60))
ps -eo etimes=,pid= | while read sec pid; do
if [ ${sec} -gt ${SEC} ]; then
echo ${pid}
#kill -9 ${pid} # remove the # at the beginning of the line to actually kill those processes
fi
done
This will display the PIDs of the running processes, one per line
Assuming you name this script 60min.sh, you can run it every 60 minute using a cron job:
0 * * * * /bin/bash /path_to/60min.sh
This cron job will run your 60min.sh script every 60 minutes (or every hour)
Please keep in mind that you might accidentally kill system processes and your system might become unstable or unusable so you will have to reboot.
If you run different processes using a specific linux user I would recommend you to search the processes beloging to that user only and not to user root.

Linux - Run script after time period expires

I have a small NodeJS script that does some processing. Depending on the amount of data needing to be processed, this can take a couple of seconds to hours.
What I want is to do is schedule this command to run every hour after the previous attempt has completed. I'm wary of using something like cron because I need to ensure that two instances of the script aren't running at the same.

If you really don't like cron (or at) you can just use a simple bash script:
#!/bin/bash
while true
do
#Do something
echo Invoke long-running node.js script
#Wait an hour
sleep 3600
done
The (obvious) drawback is that you will have to make it run in background somehow (i.e. via nohup or screen) and add a proper error handling (taking that you script might fail, and you still want it to run again in an hour).
A bit more elaborate "custom script" solution might be like that:
#!/bin/bash
#Settings
LAST_RUN_FILE=/var/run/lock/hourly.timestamp
FLOCK_LOCK_FILE=/var/run/lock/hourly.lock
FLOCK_FD=100
#Minimum time to wait between two job runs
MIN_DELAY=3600
#Welcome message, parameter check
if [ -z $1 ]
then
echo "Please specify the command (job) to run, as follows:"
echo "./hourly COMMAND"
exit 1
fi
echo "[$(date)] MIN_DELAY=$MIN_DELAY seconds, JOB=$#"
#Set an exclusive lock, or skip execution if it is already set
eval "exec $FLOCK_FD>$FLOCK_LOCK_FILE"
if ! flock -n $FLOCK_FD
then
echo "Lock is already set, skipping execution."
exit 0
fi
#Last run timestamp
if ! [ -e $LAST_RUN_FILE ]
then
echo "Timestamp file ($LAST_RUN_FILE) is missing, creating a new one."
echo 0 >$LAST_RUN_FILE
fi
#Compute delay, and wait
let DELAY="$MIN_DELAY-($(date +%s)-$(cat $LAST_RUN_FILE))"
if [ $DELAY -gt 0 ]
then
echo "Waiting for $DELAY seconds, before proceeding..."
sleep $DELAY
fi
#Proceed with an actual task
echo "[$(date)] Running the task..."
echo
"$#"
#Update the last run timestamp
echo
echo "Done, going to update the last run timestamp now."
date +%s >$LAST_RUN_FILE
This will do 2 things:
Set an exclusive execution lock (with flock), so that no two instances of the job will run at the same time, irregardless of how you start them (manually or via cron e.t.c.);
If the last job was completed less then MIN_DELAY seconds ago,
it will sleep for the remaining time, before running the job again;
Now, if you schedule this script to run, say every 15 minutes with cron, like that:
* * * * * /home/myuser/hourly my_periodic_task and it's arguments
It will be guaranteed to execute with the fixed delay of at least MIN_DELAY (one hour) since the last job completed, and any intermediate runs will be skipped.
In the worst case, it will execute in MIN_DELAY + 15 minutes,
(as the scheduling period is discrete), but never earlier than that.
Other non-cron scheduling methods should work too (i.e. just running this script in a loop, or re-scheduling and each run with at).

You can use a cron and add process.exit(0) to your node script

How to kill a process on no output for some period of time

I've written a program that is suppose to run for a long time and it outputs the progress to stdout, however, under some circumstances it begins to hang and the easiest thing to do is to restart it.
My question is: Is there a way to do something that would kill the process only if it had no output for a specific number of seconds?
I have started thinking about it, and the only thing that comes to mind is something like this:
./application > output.log &
tail -f output.log
then create script which would look at the date and time of the last modification on output.log and restart the whole thing.
But it looks very tedious, and i would hate to go through all that if there were an existing command for that.

As far as I know, there isn't a standard utility to do it, but a good start for a one-liner would be:
timeout=10; if [ -z "`find output.log -newermt #$[$(date +%s)-${timeout}]`" ]; then killall -TERM application; fi
At least, this will avoid the tedious part of coding a more complex script.
Some hints:
Using the find utility to compare the last modification date of the output.log file against a time reference.
The time reference is returned by date utility as the current time in seconds (+%s) since EPOCH (1970-01-01 UTC).
Using bash $[] operation to subtract the $timeout value (10 seconds on the example)
If no output is returned from the above find, then the file wasn't changed for more than 10 seconds. This will trigger a true in the if condition and the killall command will be executed.
You can also set an alias for that, using:
alias kill_application='timeout=10; if [ -z "`find output.log -newermt #$[$(date +%s)-${timeout}]`" ]; then killall -TERM application; fi';
And then use it whenever you want by just issuing the command kill_application
If you want to automatically restart the application without human intervention, you can install a crontab entry to run every minute or so and also issue the application restart command after the killall (Probably you may also want to change the -TERM to -KILL, just in case the application becomes unresponsive to handleable signals).

The inotifywait could help here, it efficiently waits for changes to files. The exit status can be checked to identify if the event (modify) occurred in the specified interval of time.
$ inotifywait -e modify -t 10 output.log
Setting up watches.
Watches established.
$ echo $?
2
Some related info from man:
OPTIONS
-e <event>, --event <event>
Listen for specific event(s) only.
-t <seconds>, --timeout <seconds>
Exit if an appropriate event has not occurred within <seconds> seconds.
EXIT STATUS
2 The -t option was used and an event did not occur in the specified interval of time.
EVENTS
modify A watched file or a file within a watched directory was written to.

linux batch jobs in parallel

I have seven licenses of a particular software. Therefore, I want to start 7 jobs simultaneously. I can do that using '&'. Now, 'wait' command waits till the end of all of those 7 processes to be finished to spawn the next 7. Now, I would like to write the shell script where after I start the first seven, as and when a job gets completed I would like to start another. This is because some of those 7 jobs might take very long while some others get over really quickly. I don't want to waste time waiting for all of them to finish. Is there a way to do this in linux? Could you please help me?
Thanks.

GNU parallel is the way to go. It is designed for launching multiples instances of a same command, each with a different argument retrieved either from stdin or an external file.
Let's say your licensed script is called myScript, each instance having the same options --arg1 --arg2 and taking a variable parameter --argVariable for each instance spawned, those parameters being stored in file myParameters :
cat myParameters | parallel -halt 1 --jobs 7 ./myScript --arg1 --argVariable {} --arg2
Explanations :
-halt 1 tells parallel to halt all jobs if one fails
--jobs 7 will launch 7 instances of myScript
On a debian-based linux system, you can install parallel using :
sudo apt-get install parallel
As a bonus, if your licenses allow it, you can even tell parallel to launch these 7 instances amongst multiple computers.

You could check how many are currently running and start more if you have less than 7:
while true; do
if [ "`ps ax -o comm | grep process-name | wc -l`" -lt 7 ]; then
process-name &
fi
sleep 1
done

Write two scripts. One which restarts a job everytime it is finished and one that starts 7 times the first script.
Like:
script1:
./script2 job1
...
./script2 job7
and
script2:
while(...)
./jobX

I found a fairly good solution using make, which is a part of the standard distributions. See here

Is there a variable in Linux that shows me the last time the machine was turned on?

I want to create a script that, after knowing that my machine has been turned on for at least 7h, it does something.
Is this possible? Is there a system variable or something like that that shows me the last time the machine was turned on?

The following command placed in /etc/rc.local:
echo 'touch /tmp/test' | at -t $(date -d "+7 hours" +%m%d%H%M)
will create a job that will run a touch /tmp/test in seven hours.
To protect against frequent reboots and prevent adding multiple jobs you could use one at queue exclusively for this type of jobs (e.g. c queue). Adding -q c to the list of at parameters will place the job in the c queue. Before adding new job you can delete all jobs from c queue:
for job in $(atq -q c | sed 's/[ \t].*//'); do atrm $job; done

You can parse the output of uptime I suppose.

As Pavel and thkala point out below, this is not a robust solution. See their comments!
The uptime command shows you how long the system has been running.
To accomplish your task, you can make a script that first does sleep 25200 (25200 seconds = 7 hours), and then does something useful. Have this script run at startup, for example by adding it to /etc/rc.local. This is a better idea than polling the uptime command to see if the machine has been up for 7 hours (which is comparable to a kid in the backseat of a car asking "are we there yet?" :-))

Just wait for uptime to equal seven hours.
http://linux.die.net/man/1/uptime

I don't know if this is what you are looking for, but uptime command will give you for how many computer was running since last reboot.

$ cut -d ' ' -f 1 </proc/uptime
This will give you the current system uptime in seconds, in floating point format.
The following could be used in a bash script:
if [[ "$(cut -d . -f 1 </proc/uptime)" -gt "$(($HOURS * 3600))" ]]; then
...
fi

Add the following to your crontab:
#reboot sleep 7h; /path/to/job
Either /etc/crontab, /etc/cron.d/, or your users crontab, depending on whether you want to run it as root or the user -- don't forget to put "root" after "#reboot" if you put it in /etc/crontab or cron.d
This has the benefit that if you reboot multiple times, the jobs get cancelled at shut down, so you won't get a bunch of them stacking up if you reboot several times within 7 hours. The "#reboot" time specification triggers the job to be run once when the system is rebooted. "sleep 7h;" waits for 7 hours before running "/path/to/job".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Instance limited cron job - linux

Make sure you have unique script name. Then check if 6 instances are already running if [ $(pgrep '^UNIQUE_SCIPT_NAME$' -c) -lt 6 ] then # start my script else # do not start my script fi

You could use a script like OneAtATime to guard against multiple simultaneous executions.

Related

how to kill log running jobs in every one hour interval?

Linux - Run script after time period expires

How to kill a process on no output for some period of time

linux batch jobs in parallel

Is there a variable in Linux that shows me the last time the machine was turned on?

Categories

Resources