Synchronizing four shell scripts to run one after another in unix - linux

I have 4 shell script to generate a file(let's say param.txt) which is used by another tool(informatica) and as the tool is done with processing, it deletes param.txt.
The intent here is all four scripts can get invoked at different time lets say 12:10 am, 12:13 am, 12:16 am, 12:17 am. First script runs at 12:10am and creates param.txt and trigger informatica process which uses param.txt. Informatica process takes another 5-10 minutes to complete and deletes the param.txt. The 2nd script invokes at 12:13 am and waits for unavailability of param.txt and as informatica process deletes it, script 2 creates new param.txt and triggers same informatica again. The same happen for another 2 scripts.
I am using Until and sleep commands in all 4 shell script to check the unavailability of param.txt like below:
until [ ! -f "$paramfile" ]
do
Sleep 10
done
<create param.txt file>
Issue here is, sometimes when all 4 scripts begin, the first one succeeds and generates param.txt(as there was no param.txt before) and other waits but when informatica process completes and deletes param.txt, remaining 3 scripts or 2 of them checks the unavailability at same time and one of them creates it but all succeed. I have checked different combinations of sleep interval between four scripts but this situation is occurring almost every time.

You are experiencing a classical race condition. To solve this issue, you need a shared "lock" (or similar) between your 4 scripts.
There are several ways to implement this. One way to do this in bash is by using the flock command, and an agreed-upon filename to use as a lock. The flock man page has some usage examples which resemble this:
(
flock -x 200 # try to acquire an exclusive lock on the file
# do whatever check you want. You are guaranteed to be the only one
# holding the lock
if [ -f "$paramfile" ]; then
# do something
fi
) 200>/tmp/lock-life-for-all-scripts
# The lock is automatically released when the above block is exited
You can also ask flock to fail right away if the lock can't be acquired, or to fail after a timeout (e.g. to print "still trying to acquire the lock" and restart).
Depending on your use case, you could also put the lock on the 'informatica' binary (be sure to use 200< in that case, to open the file for reading instead of (over)writing)

You can use GNU Parallel as a counting semaphore or a mutex, by invoking it as sem instead of as parallel. Scroll down to Mutex on this page.
So, you could use:
sem --id myGlobalId 'create input file; run informatica'
sem --id myGlobalId 'create input file; run informatica'
sem --id myGlobalId 'create input file; run informatica'
sem --id myGlobalId 'create input file; run informatica'
Note I have specified a global id in case you run the jobs from different terminals or cron. This is not necessary if you are starting all jobs from one terminal.

Thanks for your valuable suggestions. It did help me to think from other dimension. However I missed to mention that I am using Solaris UNIX where I couldn't find equivalent of flock or similar function. I could have asked team to install one utility but in mean time I found a workaround for this issue.
I read about the mkdir function being atomic in nature where as 'touch' command to create a file is not(still don't have complete explanation on how it works). That means at a time only 1 script can create/delete directory 'lockdir' out of 4 and other 3 has to wait.
while true;
do
if mkdir "$lockdir"; then
< create param file >
break;
fi
Sleep 30
done

Related

Multiple crontabs needed

So I have looked for a solution to this everywhere and I never got an answer.
How do I make multiple crontabs? I currently am running scripts that interfere and I am 110% sure that if I am able to run multiple crontabs that I will solve this issue. (Yeah I tried everything).
Can I perhaps make multiple users that each have their own crontab? And will those crontabs run at the same time?
Thanks!
There's no exclusivity inherent in multiple crontabs. If two separate crontabs each say to run a script every Monday at 4am, then cron will run both scripts more or less at the same time, on Mondays at 4am.
At a guess, you need locking, so that only one or the other of your interfering scripts runs at a time. flock(1) is a very convenient tool for use in shell scripts.
#!/bin/bash
exec 3> /path/to/lock
flock 3
# something useful here
exit 0 # releases lock
The above acquires an advisory lock, or waits until the lock is freed and then acquires it. To try only once without waiting, you can do the following:
#!/bin/bash
exec 3> /path/to/lock
flock -n 3 || exit 1

Automatic qsub job completion status notification

I have a shell script that calls five other scripts from it. The first script creates 50 qsub jobs in the cluster. Individual job execution time varies from a couple of minutes to an hour. I need to know when all the 50 jobs get finished because after completing all the jobs I need to run the second script. How to find whether all the qsub jobs are completed or not? One possible solution can be using an infinite loop and check job status by using qstate command with job ID. In this case, I need to check the job status continuously. It is not an excellent solution. Is it possible that after execution, qsub job will notify me by itself. Hence, I don't need to monitor frequently job status.
qsub is capable of handling job dependencies, using -W depend=afterok:jobid.
e.g.
#!/bin/bash
# commands to run on the cluster
COMMANDS="script1.sh script2.sh script3.sh"
# intiliaze JOBID variable
JOBIDS=""
# queue all commands
for CMD in $COMMANDS; do
# queue command and store the job id
JOBIDS="$JOBIDS:`qsub $CMD`"
done
# queue post processing, depended on the submitted jobs
qsub -W depend=afterok:$JOBIDS postprocessing.sh
exit 0
More examples can be found here http://beige.ucs.indiana.edu/I590/node45.html
I never heard about how to do that, and I would be really interested if someone came with a good answer.
In the meanwhile, I suggest that you use file tricks. Either your script outputs a file at the end, or you check for the existence of the log files (assuming they are created only at the end).
while [ ! -e ~/logs/myscript.log-1 ]; do
sleep 30;
done

Does a Cron job overwrite itself

I can't seem to get my script to run in parallel every minute via cron on Ubuntu 14.
I have created a cron job which executes every minute. The cron job executes a script that runs much longer than a minute. When a minute expires it seems the new cron execution overwrites the previous execution. Is this correct? Any ideas welcomed.
I need concurrent independent running jobs. The cron job runs a script which queries a mysql database. The idea is to poll a db- if yes execute script in its own process.
cron will not stop a previous execution of a process to start a new one. cron will simply kick off the new process even though the old process is still running.
If you need cron to terminate the previous process, you'll need to modify your script to handle that itself.
You need a locking mechanism to identify that the script is already running.
There are several ways of doing this but you need to be careful to use an atomic method.
I use lock directories as creating a directory is guaranteed to be atomic -
LOCKDIR=/tmp/myproc.lock
if ! mkdir $LOCKDIR >/dev/null 2>&1
then
print -u2 "Processing already running - terminating"
exit 1
fi
trap "rm -rf $LOCKDIR" EXIT
This is a common occurrence. Try adding a check in your script to see if a lockfile already exists. If it does, exit. If not, continue.
Cronjobs are not overrun. They do however have the possibility of overlapping. Unless your script explicitly kills any pre-existing process, it shouldn't be able to stop the previously running script.
However, introducing the concept of lockfiles will save you from all these confusions altogether.

Linux job scheduler launching a script 2 hours after it terminates

I have a script that runs unknown period of time that depends on its input. It can run one hour when little data available, or it can run for 8 hours if much data is to be processed.
I need to run it periodically, particularly 2 hours after previous run was completed.
Is there an utility to do that?
Use 'at' instead of 'cron' and at the end of your script add:
at now +2 hours $*
This means that each occurrence is chained - so if it terminates abnormally the next instance won't be scheduled - but I don't think there's a more robust solution without adding a lot of code/complexity.
I don't like the at solution proposed, so here another solution:
Use cron to launch your every two hours
Upon startup, your application(*) checks if there's a pidfile.
2.1 if it is present, then there may be another instance running: read contents of the file (pid) and see if that pid is the pid of an existing process, a zombie process or something else. If it is the pid of a running, existing process, then exit. If it is the pid of a zombie process then the previous job ended unexpectedly and then you have to delete the pidfile and go to step 3. Otherwise.
After deleting pidfile, you create a new one and put your pid into it. Then proceed to do your job.
*: In order not to add complexity, this application i cited could also be a simple wrapper that spawns your code using exec.
This solution can also be scripted quite easily.
Hope it helps,
SnoopyBBT
If it looks complicated, here is another, dirtier solution:
while true ; do
./your_application
sleep 7200
done
Hope this helps,
SnoopyBBT

Perl call a system command and keep the script running at the same time

I am running a perl script that does a logical check and if certain conditions been met. Example: If it's been over a certain length of time I want to run a system() command on a linux server that runs another script that updates that data. script that updates the file takes 10-15 seconds, with the current amount of files it has to go through, but can be up to 30 seconds during peak times of the month.
I want the perl script to run and if it has to run the system() command, I don't want it to wait for the system() to finish before finishing the rest of the script. What is the best way to go about this?
Thank you
System runs a command in the shell, so you can use all of your shell features, including job control. So just stick & at the end of your command thus:
system "sleep 30 &";
Use fork to create a child process, and then in the child, call your other script using exec instead of system. exec will execute your other script in a separate process and return immediately, which will allow the child to finish. Meanwhile, your parent script can finish what it needs to do and exit as well.
Check this out. It may help you.
There's another good example of how to use fork on this page.
Not intended to be a pun due to publication date, but beware of zombies!
It is a bit tricky, see perlipc for details.
However, as far as I understood your problem, you don't need to maintain any relation between the updater and the caller processes. In this case, it is easier to just "fire and forget":
use strict;
use warnings qw(all);
use POSIX;
# fork child
unless (fork) {
# create a new session
POSIX::setsid();
# fork grandchild
unless (fork) {
# close standard descriptors
open STDIN, '<', '/dev/null';
open STDOUT, '>', '/dev/null';
open STDERR, '>', '/dev/null';
# run another process
exec qw(sleep 10);
}
# terminate child
exit;
}
In this example, sleep 10 don't belong to the process' group anymore, so even killing the parent process won't affect the child.
There's a good tutorial about running external programs from Perl (including running background processes) at http://aaroncrane.co.uk/talks/pipes_and_processes/paper.html

Resources