How to make sure only one instance of a Bash script is running at a time? - linux

I want to make a sh script that will only run at most once at any point.
Say, if I exec the script then I go to exec the script again, how do I make it so that if the first exec of the script is still working the second one will fail with an error. I.e. I need to check if the script is running elsewhere before doing anything. How would I go about doing this??
The script I have runs a long running process (i.e. runs forever). I wanted to use something like cron to call the script every 15mins so in case the process fails, it will be restarted by the next cron run script.

You want a pid file, maybe something like this:
pidfile=/path/to/pidfile
if [ -f "$pidfile" ] && kill -0 `cat $pidfile` 2>/dev/null; then
echo still running
exit 1
fi
echo $$ > $pidfile

I think you need to use lockfile command. See using lockfiles in shell scripts (BASH) or http://www.davidpashley.com/articles/writing-robust-shell-scripts.html.
The second article uses "hand-made lock file" and shows how to catch script termination & releasing the lock; although using lockfile -l <timeout seconds> will probably be a good enough alternative for most cases.
Example of usage without timeout:
lockfile script.lock
<do some stuff>
rm -f script.lock
Will ensure that any second script started during this one will wait indefinitely for the file to be removed before proceeding.
If we know that the script should not run more than X seconds, and the script.lock is still there, that probably means previous instance of the script was killed before it removed script.lock. In that case we can tell lockfile to force re-create the lock after a timeout (X = 10 below):
lockfile -l 10 /tmp/mylockfile
<do some stuff>
rm -f /tmp/mylockfile
Since lockfile can create multiple lock files, there is a parameter to guide it how long it should wait before retrying to acquire the next file it needs (-<sleep before retry, seconds> and -r <number of retries>). There is also a parameter -s <suspend seconds> for wait time when the lock has been removed by force (which kind of complements the timeout used to wait before force-breaking the lock).

You can use the run-one package, which provides run-one, run-this-one and keep-one-running.
The package: https://launchpad.net/ubuntu/+source/run-one
The blog introducing it: http://blog.dustinkirkland.com/2011/02/introducing-run-one-and-run-this-one.html

Write the process id into a file and then when a new instance starts, check the file to see if the old instance is still running.

(
if flock -n 9
then
echo 'Not doing the critical operation (lock present).'
exit;
fi
# critical section goes here
) 9>'/run/lock/some_lock_file'
rm -f '/run/lock/some_lock_file'
From example in flock(1) man page. Very practical for using in shell scripts.

I just wrote a tool that does this:
https://github.com/ORESoftware/quicklock
writing a good one takes about 15 loc, so not something you want to include in every shell script.
basically works like this:
$ ql_acquire_lock
the above calls this bash function:
function ql_acquire_lock {
set -e;
name="${1:-$PWD}" # the lock name is the first argument, if that is empty, then set the lockname to $PWD
mkdir -p "$HOME/.quicklock/locks"
fle=$(echo "${name}" | tr "/" _)
qln="$HOME/.quicklock/locks/${fle}.lock"
mkdir "${qln}" &> /dev/null || { echo "${ql_magenta}quicklock: could not acquire lock with name '${qln}'${ql_no_color}."; exit 1; }
export quicklock_name="${qln}"; # export the var *only if* above mkdir command succeeds
trap on_ql_trap EXIT;
}
when the script exits, it automatically releases the lock using trap
function on_ql_trap {
echo "quicklock: process with pid $$ was trapped.";
ql_release_lock
}
to manually release the lock at will, use ql_release_lock:
function ql_maybe_fail {
if [[ "$1" == "true" ]]; then
echo -e "${ql_magenta}quicklock: exiting with 1 since fail flag was set for your 'ql_release_lock' command.${ql_no_color}"
exit 1;
fi
}
function ql_release_lock () {
if [[ -z "${quicklock_name}" ]]; then
echo -e "quicklock: no lockname was defined. (\$quicklock_name was not set).";
ql_maybe_fail "$1";
return 0;
fi
if [[ "$HOME" == "${quicklock_name}" ]]; then
echo -e "quicklock: dangerous value set for \$quicklock_name variable..was equal to user home directory, not good.";
ql_maybe_fail "$1"
return 0;
fi
rm -r "${quicklock_name}" &> /dev/null &&
{ echo -e "quicklock: lock with name '${quicklock_name}' was released."; } ||
{ echo -e "quicklock: no lock existed for lockname '${quicklock_name}'."; ql_maybe_fail "$1"; }
trap - EXIT # clear/unset trap
}

I suggest using flock, but in a different way than suggested by #Josef Kufner. I think this is quite easy and flock should be available on most systems by default:
flock -n lockfile myscript.sh

Related

fork wget 's with ability to control specific downloads

I'm writing a bash script, aka download-manager.
Point of interest is make this simple lines more advanced:
for link in ${links}; do
wget -q --show-progress ${link}
done
How to fork all downloads and provide for script-user a friendly way to kill one specific download after all has started?
Does wget -bqc run in parallel or not?
Is there anything to use something instead --show-progress to provide ability for script-user to show current status of specific download?
So idea is:
#declare associative array [download_url]=pid_of_download
declare -A downloads
#declare array of killed downloads
declare -a killedDownloads=()
function download() {
local url=$1
# killed downloads can be reloaded, but not successfully downloaded URL's
if [ ${downloads[${url}]} ] && [[ ! ${killedDownloads[*]} =~ ${do wnloads[${url}]} ]]; then
echo "Already download-[ing/ed] !"
else
wget -q ${url} &
downloads[${url}]=$!
fi
}
$! contains the process ID of the most recently executed background command, obviously PID of just executed wget command
When user wants to kill some download, we print all download array. While printing, we can also show status obtained as a substring from
jobs -l | grep ${downloads[$i])
...and use killedDownloads to know which URLs was actually killed, not finished.
if [[ ${killedDownloads[*]} =~ ${downloads[$i]} ]]
//set status var to killed
Kill itself :
function stop(){
local pid=$1
if ps -p ${pid} > /dev/null; then
kill -9 ${pid}
wait ${pid} 2>/dev/null
echo "Killed downloading of ${pid}"
killedDownloads+=${pid}
else
echo "No active download with id = ${pid} exists"
fi
}
And of course you'll need to put this inside some interactive loop, add checks for invalid urls, etc.

How to check if script is running or not from script itself?

Having below sample script sample.sh
#!/bin/bash
if ps aux | grep -o "sample.sh" >/dev/null
then
echo "Already script running"
exit 0
fi
echo "start script"
while true
do
echo "script running"
sleep 5
done
In above script i want to check if this script previously running or not if running then not run it again.
problem is check condition always become true (because to check the condition require to run script) and it always show me "Already script running" message.
Any idea how to solve it?
You need a proper lock. I'd do using flock like this:
exec 201> /tmp/lock.$(basename $0).file
if ! flock -n 201 ; then
echo "another instance of $0 is running";
exit 1
fi
# cmds
exec 201>&-
rm -rf /tmp/lock.$(basename $0).file
This basically creates lock for script using a temporary file. The temporary file has particular significance other than it's used to tell whether your script has acquired a lock.
When there's an instance of this program running, the next run of the same program can't run as the lock will prevent it.
For me will be safer to use a lock file , create it when process start and delete after completion.
Let the script record its own PID in a file. Before doing so, it first checks if that file currently contains an active PID, in which case it exits.
pid=$(< ${PID_FILE:?} || exit
kill -0 $PID && exit
The next exercise is to prevent race conditions when writing the file.
Try this, it gives number of sample.sh run by the user
ps -aux | awk -v app='sample.sh' '$0 ~ app { print $1 }' |grep $USERNAME|wc -l
Wtite a tmp file to the /tmp directory.
have your script check to see if the file exists, if it does then don't run.
#!/bin/sh
# our tmpfile
tmpfile="/tmp/mytmpfile"
# check to see if it exists.
# if it does then exit script
if [[ -f ${tmpfile} ]]; then
echo script already running.
exit
fi
# it doesn't exist at this point so lets make one
touch ${tmpfile}
# do whatever now.
# end of script
rm ${tmpfile}

How to stock PID of lockfile command in Linux

I am using lockfile command in linux to manage access to a special file.
When my principal script crashes for some reason, i finish having hanging locks that prevent any new launch of the principal script and bother heavily its execution.
Is there a way to stock the PID of my lockfile processes so i can track them and make proper clean-up before relaunching my principal script.
Hope i was clear enough...
This is a fragile mechanism. I prefer to use real file locks, so when the process that owns them dies, the O/S will release the lock automatically. Easy to do in perl (using the flock function), but i don't know if it's possible in Bash.
More to the point, i suppose you could use the lockfile itself to hold the PID of the script holding the lock, right?
(I don't do shell scripting much... i think the code below is mostly right, but use at your own risk. There are race conditions.)
while [[ lockfile -! -r 0 lock.file ]]
do
kill -0 `cat lock.file`
if [[ $? -ne 0 ]]
then
# process doesn't exist anymore
echo $$ >lock.file
# do something important
rm -f lock.file
break
fi
sleep 5
done
Or, how about this:
while [[ true ]]
do
if [[ ! -e pid.file ]]
then
echo $$ > pid.file
else
if [[ kill -0 `cat pid.file`]]
then
# owner process exists
sleep 30
else
# process gone, take ownership
echo $$ > pid.file
# ### DO SOMETHING IMPORTANT HERE ###
rm -f pid.file
break
fi
fi
done
I like the second one better. It's still far from perfect (lots of race conditions), but it might work if there aren't too many processes fighting for the lock. Also, the sleep 30 should include some randomness in it, if possible (the length of the sleep should have a random component).
But see here, it looks like you can use flock with some versions of the shell. This would be similar to what i do in perl, and it would be safer than the althernatives i can think of.

wait child process but get error: 'pid is not a child of this shell'

I write a script to get data from HDFS parrallel,then I wait these child processes in a for loop, but sometimes it returns "pid is not a child of this shell". sometimes, it works well。It's so puzzled. I use "jobs -l" to show all the jobs run in the background. I am sure these pid is the child process of the shell process, and I use "ps aux" to make sure these pids is note assign to other process. Here is my script.
PID=()
FILE=()
let serial=0
while read index_tar
do
echo $index_tar | grep index > /dev/null 2>&1
if [[ $? -ne 0 ]]
then
continue
fi
suffix=`printf '%03d' $serial`
mkdir input/output_$suffix
$HADOOP_HOME/bin/hadoop fs -cat $index_tar | tar zxf - -C input/output_$suffix \
&& mv input/output_$suffix/index_* input/output_$suffix/index &
PID[$serial]=$!
FILE[$serial]=$index_tar
let serial++
done < file.list
for((i=0;i<$serial;i++))
do
wait ${PID[$i]}
if [[ $? -ne 0 ]]
then
LOG "get ${FILE[$i]} failed, PID:${PID[$i]}"
exit -1
else
LOG "get ${FILE[$i]} success, PID:${PID[$i]}"
fi
done
Just find the process id of the process you want to wait for and replace that with 12345 in below script. Further changes can be made as per your requirement.
#!/bin/sh
PID=12345
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running" >> /home/parv/waitAndRun.log
sleep .6
done
echo "Process $PID has finished" >> /home/parv/waitAndRun.log
/usr/bin/waitingScript.sh
http://iamparv.blogspot.in/2013/10/unix-wait-for-running-process-not-child.html
Either your while loop or the for loop runs in a subshell, which is why you cannot await a child of the (parent, outer) shell.
Edit this might happen if the while loop or for loop is actually
(a) in a {...} block
(b) participating in a piper (e.g. for....done|somepipe)
If you're running this in a container of some sort, the condition apparently can be caused by a bug in bash that is easier to encounter in a containerized envrionment.
From my reading of the bash source (specifically see comments around RECYCLES_PIDS and CHILD_MAX in bash-4.2/jobs.c), it looks like in their effort to optimize their tracking of background jobs, they leave themselves vulnerable to PID aliasing (where a new process might obscure the status of an old one); to mitigate that, they prune their background process history (apparently as mandated by POSIX?). If you should happen to want to wait on a pruned process, the shell can't find it in the history and assumes this to mean that it never knew about it (i.e., that it "is not a child of this shell").

Using named pipes with bash - Problem with data loss

Did some search online, found simple 'tutorials' to use named pipes. However when I do anything with background jobs I seem to lose a lot of data.
[[Edit: found a much simpler solution, see reply to post. So the question I put forward is now academic - in case one might want a job server]]
Using Ubuntu 10.04 with Linux 2.6.32-25-generic #45-Ubuntu SMP Sat Oct 16 19:52:42 UTC 2010 x86_64 GNU/Linux
GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu).
My bash function is:
function jqs
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT SIGKILL
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
if read txt <"$pipe"
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
fi
fi
done
}
I run this in the background:
> jqs&
[1] 5336
And now I feed it:
for i in 1 2 3 4 5 6 7 8
do
(echo aaa$i > /tmp/__job_control_manager__ && echo success$i &)
done
The output is inconsistent.
I frequently don't get all success echoes.
I get at most as many new text echos as success echoes, sometimes less.
If I remove the '&' from the 'feed', it seems to work, but I am blocked until the output is read. Hence me wanting to let sub-processes get blocked, but not the main process.
The aim being to write a simple job control script so I can run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run.
Full job manager below:
function jq_manage
{
export __gn__="$1"
pipe=/tmp/__job_control_manager_"$__gn__"__
trap "rm -f $pipe" EXIT
trap "break" SIGKILL
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
date
jobs
if (($(jobs | egrep "Running.*echo '%#_Group_#%_$__gn__'" | wc -l) < $__jN__))
then
echo "Waiting for new job"
if read new_job <"$pipe"
then
echo "new job is [[$new_job]]"
if [[ "$new_job" == 'quit' ]]
then
break
fi
echo "In group $__gn__, starting job $new_job"
eval "(echo '%#_Group_#%_$__gn__' > /dev/null; $new_job) &"
fi
else
sleep 3
fi
done
}
function jq
{
# __gn__ = first parameter to this function, the job group name (the pool within which to allocate __jN__ jobs)
# __jN__ = second parameter to this function, the maximum of job numbers to run concurrently
export __gn__="$1"
shift
export __jN__="$1"
shift
export __jq__=$(jobs | egrep "Running.*echo '%#_GroupQueue_#%_$__gn__'" | wc -l)
if (($__jq__ '<' 1))
then
eval "(echo '%#_GroupQueue_#%_$__gn__' > /dev/null; jq_manage $__gn__) &"
fi
pipe=/tmp/__job_control_manager_"$__gn__"__
echo $# >$pipe
}
Calling
jq <name> <max processes> <command>
jq abc 2 sleep 20
will start one process.
That part works fine. Start a second one, fine.
One by one by hand seem to work fine.
But starting 10 in a loop seems to lose the system, as in the simpler example above.
Any hints as to what I can do to solve this apparent loss of IPC data would be greatly appreciated.
Regards,
Alain.
Your problem is if statement below:
while true
do
if read txt <"$pipe"
....
done
What is happening is that your job queue server is opening and closing the pipe each time around the loop. This means that some of the clients are getting a "broken pipe" error when they try to write to the pipe - that is, the reader of the pipe goes away after the writer opens it.
To fix this, change your loop in the server open the pipe once for the entire loop:
while true
do
if read txt
....
done < "$pipe"
Done this way, the pipe is opened once and kept open.
You will need to be careful of what you run inside the loop, as all processing inside the loop will have stdin attached to the named pipe. You will want to make sure you redirect stdin of all your processes inside the loop from somewhere else, otherwise they may consume the data from the pipe.
Edit: With the problem now being that you are getting EOF on your reads when the last client closes the pipe, you can use jilles method of duping the file descriptors, or you can just make sure you are a client too and keep the write side of the pipe open:
while true
do
if read txt
....
done < "$pipe" 3> "$pipe"
This will hold the write side of the pipe open on fd 3. The same caveat applies with this file descriptor as with stdin. You will need to close it so any child processes dont inherit it. It probably matters less than with stdin, but it would be cleaner.
As said in other answers you need to keep the fifo open at all times to avoid losing data.
However, once all writers have left after the fifo has been open (so there was a writer), reads return immediately (and poll() returns POLLHUP). The only way to clear this state is to reopen the fifo.
POSIX does not provide a solution to this but at least Linux and FreeBSD do: if reads start failing, open the fifo again while keeping the original descriptor open. This works because in Linux and FreeBSD the "hangup" state is local to a particular open file description, while in POSIX it is global to the fifo.
This can be done in a shell script like this:
while :; do
exec 3<tmp/testfifo
exec 4<&-
while read x; do
echo "input: $x"
done <&3
exec 4<&3
exec 3<&-
done
Just for those that might be interested, [[re-edited]] following comments by camh and jilles, here are two new versions of the test server script.
Both versions now works exactly as hoped.
camh's version for pipe management:
function jqs # Job queue manager
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT TERM
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
if read -u 3 txt
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
else
sleep 1
# process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
fi
fi
done 3< "$pipe" 4> "$pipe" # 4 is just to keep the pipe opened so any real client does not end up causing read to return EOF
}
jille's version for pipe management:
function jqs # Job queue manager
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT TERM
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
exec 3< "$pipe"
exec 4<&-
while true
do
if read -u 3 txt
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
else
sleep 1
# process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
fi
else
# Close the pipe and reconnect it so that the next read does not end up returning EOF
exec 4<&3
exec 3<&-
exec 3< "$pipe"
exec 4<&-
fi
done
}
Thanks to all for your help.
Like camh & Dennis Williamson say don't break the pipe.
Now I have smaller examples, direct on the command line:
Server:
(
for i in {0,1,2,3,4}{0,1,2,3,4,5,6,7,8,9};
do
if read s;
then echo ">>$i--$s//";
else
echo "<<$i";
fi;
done < tst-fifo
)&
Client:
(
for i in {%a,#b}{1,2}{0,1};
do
echo "Test-$i" > tst-fifo;
done
)&
Can replace the key line with:
(echo "Test-$i" > tst-fifo&);
All client data sent to the pipe gets read, though with option two of the client one may need to start the server a couple of times before all data is read.
But although the read waits for data in the pipe to start with, once data has been pushed, it reads the empty string forever.
Any way to stop this?
Thanks for any insights again.
On the one hand the problem is worse than I thought:
Now there seems to be a case in my more complex example (jq_manage) where the same data is being read over and over again from the pipe (even though no new data is being written to it).
On the other hand, I found a simple solution (edited following Dennis' comment):
function jqn # compute the number of jobs running in that group
{
__jqty__=$(jobs | egrep "Running.*echo '%#_Group_#%_$__groupn__'" | wc -l)
}
function jq
{
__groupn__="$1"; shift # job group name (the pool within which to allocate $__jmax__ jobs)
__jmax__="$1"; shift # maximum of job numbers to run concurrently
jqn
while (($__jqty__ '>=' $__jmax__))
do
sleep 1
jqn
done
eval "(echo '%#_Group_#%_$__groupn__' > /dev/null; $#) &"
}
Works like a charm.
No socket or pipe involved.
Simple.
run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run
You can do this with GNU Parallel. You will not need a this scripting.
http://www.gnu.org/software/parallel/man.html#options
You can set max-procs "Number of jobslots. Run up to N jobs in parallel." There is an option to set the number of CPU cores you want to use. You can save the list of executed jobs to a log file, but that is a beta feature.

Resources