This question already has answers here:
Quick-and-dirty way to ensure only one instance of a shell script is running at a time
(43 answers)
Closed 4 years ago.
I have a bash script and sometimes happened that, even my script was scheduled, it was executed two times. So I added few code lines to check if the script is already in execution. Initially I hadn't a problem, but in the last three days I had got it again
PID=`echo $$`
PROCESS=${SL_ROOT_FOLDER}/xxx/xxx/xxx_xxx_$PID.txt
ps auxww | grep $scriptToVerify | grep -v $$ | grep -v grep > $PROCESS
num_proc=`awk -v RS='\n' 'END{print NR}' $PROCESS`
if [ $num_proc -gt 1 ];
then
sl_log "---------------------------Warning---------------------------"
sl_log "$scriptToVerify already executed"
sl_log "num proc $num_proc"
sl_log "--------"
sl_log $PROCESS
sl_log "--------"
exit 0;
fi
In this way I check how many rows I've got into my log and if the result is more than one, then I have two process in execution and one will be stopped.
This method doesn't work correctly, though. How can I fix my code to check how many scripts in execution I have?
Anything that involves:
read some state information
check results
do action based on results
finish
must do all three steps at the same time otherwise there is a "race" condition. For example:
(A) reads state
(A) checks results (ok)
(A) does action (ok)
(A) finishes
(B) reads state
(B) checks results (bad)
(B) does action (bad)
(B) finishes
but if timing is slightly different:
(A) reads state
(A) checks results (ok)
(B) reads state
(B) checks results (ok)
(A) does action (ok)
(B) does action (ok)
(A) finishes
(B) finishes
The usual example people give is updating bank balances.
Using your method, you may be able to reduce the frequency of your code running when it shouldn't but you can never avoid it entirely.
A better solution is to use locking. This guarantees that only one process can run at a time. For example, using flock, you can wrap all calls to your script:
flock -x /var/lock/myscript-lockfile myscript
Or, inside your script, you can do something like:
exec 300>/var/lock/myscript-lockfile
flock -x 300
# rest of script
flock -u 300
or:
exec 300>/var/logk/myscript-lockfile
if ! flock -nx 300; then
# another process is running
exit 1
fi
# rest of script
flock -u 300
flock(1)
#!/bin/bash
# Makes sure we exit if flock fails.
set -e
(
# Wait for lock on /var/lock/.myscript.exclusivelock (fd 200) for 10 seconds
flock -x -w 10 200
# Do stuff
) 200>/var/lock/.myscript.exclusivelock
This ensures that code between "(" and ")" is run only by one process at a time and that the process does wait for a lock too long.
Credit goes to Alex B.
Related
This question already has answers here:
Assign variable in the background shell
(2 answers)
Closed 1 year ago.
I want to trigger curl requests every 400ms in shell script and put the results in a variable, and after finishing the curl request (eg 10 requests) finally write all results in a file. when I use the following code for this purpose
result="$(curl --location --request GET 'http://localhost:8087/say-hello')" & sleep 0.400;
Because & creates a new process result can not achieve. so what should I do?
You can use the -m curl option instead of the sleep.
-m, --max-time <seconds>
Maximum time in seconds that you allow the whole operation to
take. This is useful for preventing your batch jobs from hang‐
ing for hours due to slow networks or links going down. See
also the --connect-timeout option.
The difference can be sound in the next sequence of commands:
a=1; a=$(echo 2) ; sleep 1; echo $a
2
and with a background process
a=1; a=$(echo 2) & sleep 1; echo $a
[1] 973
[1]+ Done a=$(echo 2)
1
Why is a not changed in the second case?
Actually it is changed... in a new environment. The & creates a new process with its own a, and that a is assigned the value 2. When the process is finished, the variable a of that subprocess is deleted and you only of the original value of a.
Depending on your requirements you might want to make a resultdir, have every background curl process write to a different tmpfile, wait with wait until all curls are finished and collect your results.
I have a Linux command that can be called by another application multiple times (in quick succession) with different parameters. The problem is, if the command gets executed in too quick of succession, the function that it performs will not work properly.
What I’m looking for is some simple way to ensure that each call to the command will be properly delayed/spaced (by a couple milliseconds) from each other.
Order of execution does not matter in this case and I have no control over how the application makes the calls.
Edit: The command being called is used to transmit an RF signal on a Raspberry Pi. As such, the command execution must be exclusive (no concurrency) with an additional delay between executions to prevent the receivers from misreading the signals.
For anyone with the same problem, this worked for me: https://unix.stackexchange.com/questions/408934/how-to-serialize-command-execution-on-linux
CMD="<some command> && sleep <some delay in seconds>"
flock /tmp/some_lockfile $CMD
For a simple concurrency control, which will limit concurrent execution to instances, consider the following while loop (modify as needed).
Note that the script must be invoked as /path/to/script.sh so that it will find other instances. Starting with 'bash /path/to/script.sh' will require changes!
#! /bin/bash
# Process identifier.
echo "START $$"
ME=${0##*/}
# Max number of instances
N=5
# Sleep while there are more than N instances.
while [[ "$(pgrep -c -x $ME)" -gt "$N" ]] ; do echo Waiting ... ; sleep 1 ; done
# Execute the job
sleep "$#"
echo "Done $$"
Considering the following example, emulating a command which gives output after 10 seconds: exec 5< <(sleep 10; pwd)
In Solaris, if I check the file descriptor earlier than 10 seconds, I can see that it has a size of 0 and this tells me that it hasn't been populated with data yet. I can simply check every second until the file test condition is met (different from 0) and then pull the data:
while true; do
if [[ -s /proc/$$/fd/5 ]]; then
variable=$(cat <&5)
break
fi
sleep 1
done
But in Linux I can't do this (RedHat, Debian etc). All file descriptors appear with a size of 64 bytes no matter if they hold data or not. For various commands that will take a variable amount of time to dump their output, I will not know when I should read the file descriptor. No, I don't want to just wait for cat <&5 to finish, I need to know when I should perform the cat in the first place. Because I am using this mechanism to issue simultaneous commands and assign their output to corresponding file descriptors. As mentioned already, this works great in Solaris.
Here is the skeleton of an idea :
#!/bin/bash
exec 5< <(sleep 4; pwd)
while true
do
if
read -t 0 -u 5 dummy
then
echo Data available
cat <&5
break
else
echo No data
fi
sleep 1
done
From the Bash reference manual :
If timeout is 0, read returns immediately, without trying to read and
data. The exit status is 0 if input is available on the specified file
descriptor, non-zero otherwise.
The idea is to use read with -t 0 (to have zero timeout) and -u 5 (read from file descriptor 5) to instantly check for data availability.
Of course this is just a toy loop to demonstrate the concept.
The solution given by User Fred using only bash builtins works fine, but is a tiny bit non-optimal due to polling for the state of a file descriptor. If calling another interpreter (for example Python) is not a no-go, a non-polling version is possible:
#! /bin/bash
(
sleep 4
echo "This is the data coming now"
echo "More data"
) | (
python3 -c 'import select;select.select([0],[],[])'
echo "Data is now available and can be processed"
# Replace with more sophisticated real-world processing, of course:
cat
)
The single line python3 -c 'import select;select.select([0],[],[])' waits until STDIN has data ready. It uses the standard select(2) system call, for which I have not found a direct shell equivalent or wrapper.
I have a program that has very big computation times. I need to call it with different arguments. I want to run them on a server with a lot of processors, so I'd like to launch them in parallel in order to save time. (One program instance only uses one processor)
I have tried my best to write a bash script which looks like this:
#!/bin/bash
# set maximal number of parallel jobs
MAXPAR=5
# fill the PID array with nonsense pid numbers
for (( PAR=1; PAR<=MAXPAR; PAR++ ))
do
PID[$PAR]=-18
done
# loop over the arguments
for ARG in 50 60 70 90
do
# endless loop that checks, if one of the parallel jobs has finished
while true
do
# check if PID[PAR] is still running, suppress error output of kill
if ! kill -0 ${PID[PAR]} 2> /dev/null
then
# if PID[PAR] is not running, the next job
# can run as parellel job number PAR
break
fi
# if it is still running, check the next parallel job
if [ $PAR -eq $MAXPAR ]
then
PAR=1
else
PAR=$[$PAR+1]
fi
# but sleep 10 seconds before going on
sleep 10
done
# call to the actual program (here sleep for example)
#./complicated_program $ARG &
sleep $ARG &
# get the pid of the process we just started and save it as PID[PAR]
PID[$PAR]=$!
# give some output, so we know where we are
echo ARG=$ARG, par=$PAR, pid=${PID[PAR]}
done
Now, this script works, but I don't quite like it.
Is there any better way to deal with the beginning? (Setting PID[*]=-18 looks wrong to me)
How do I wait for the first job to finish without the ugly infinite loop and sleeping some seconds? I know there is wait, but I'm not sure how to use it here.
I'd be grateful for any comments on how to improve style and conciseness.
I have a much more complicated code that, more or less, does the same thing.
The things you need to consider:
Does the user need to approve the spawning of a new thread
Does the user need to approve the killing of an old thread
Does the thread terminate on it's own or it needs to be killed
Does the user want the script to run endlessly, as long as it has MAXPAR threads
If so, does the user need an escape sequence to stop further spawning
Here is some code for you:
spawn() #function that spawns a thread
{ #usage: spawn 1 ls -l
i=$1 #save the thread index
shift 1 #shift arguments to the left
[ ${thread[$i]} -ne 0 ] && #if the thread is not already running
[ ${#thread[#]} -lt $threads] && #and if we didn't reach maximum number of threads,
$# & #run the thread in the background, with all the arguments
thread[$1]=$! #associate thread id with thread index
}
terminate() #function that terminates threads
{ #usage: terminate 1
[ your condition ] && #if your condition is met,
kill {thread[$1]} && #kill the thread and if so,
thread[$1]=0 #mark the thread as terminated
}
Now, the rest of the code depends on your needs (things to consider), so you will either loop through input arguments and call spawn, and then after some time loop through threads indexes and call terminate. Or, if the threads end on their own, loop through input arguments and call both spawn and terminate,but the condition for the terminate is then:
[ ps -aux 2>/dev/null | grep " ${thread[$i]} " &>/dev/null ]
#look for thread id in process list (note spaces around id)
Or, something along the lines of that, you get the point.
Using the tips #theotherguy gave in the comments, I rewrote the script in a better way using the sem command that comes with GNU Parallel:
#!/bin/bash
# set maximal number of parallel jobs
MAXPAR=5
# loop over the arguments
for ARG in 50 60 70 90
do
# call to the actual program (here sleep for example)
# prefixed by sem -j $MAXPAR
#sem -j $MAXPAR ./complicated_program $ARG
sem -j $MAXPAR sleep $ARG
# give some output, so we know where we are
echo ARG=$ARG
done
I want to run a cron job every minute that will launch a script. Simple enough there. However, I need to make sure that not more than X number (defined in the script) of instances are ever running. These are queue workers, so if at any minute interval 6 workers are still active, then I would not launch another instance. The script simply launches a PHP script which exits if no job available. Right now I have a shell script that perpetually launches itself every 10 seconds after exit... but there are long periods of time where there are no jobs, and a minute delay is fine. Eventually I would like to have two cron jobs for peak and off-peak, with different intervals.
Make sure you have unique script name.
Then check if 6 instances are already running
if [ $(pgrep '^UNIQUE_SCIPT_NAME$' -c) -lt 6 ]
then
# start my script
else
# do not start my script
fi
I'd say that if you want to iterate as often as every minute, then a process like your current shell script that relaunches itself is what you actually want to do. Just increase the delay from 10 seconds to a minute.
That way, you can also more easily control your delay for peak and off-peak, as you wanted. It would be rather elegant to simply use a shorter delay if the script found something to do the last time it was launched, or a longer delay if it did not find anything.
You could use a script like OneAtATime to guard against multiple simultaneous executions.
This is what i am using in my shell scripts:
echo -n "Checking if job is already running... "
me=`basename $0`
running=$(ps aux | grep ${me} | grep -v .log | grep -v grep | wc -l)
if [ $running -gt 1 ];
then
echo "already running, stopping job"
exit 1
else
echo "OK."
fi;
The command you're looking for is in line 3. Just replace $(me) with your php script name. In case you're wondering about the grep .log part: I'm piping the output into a log file, whose name partially contains the script name, so this way i'm avoiding it to be double-counted.