Linux wait process

Linux wait process - linux

I have a basic script to test Linux wait method, the code is as below.
#!/bin/bash
echo "testing wait command 1" &
process_id=$!
echo "testing wait command 2" &
wait $process_id
echo "command 1 completed"
echo "command 2 completed"
According to my understand, the output should like this
testing wait command 1
testing wait command 2
command 1 completed
command 2 completed
But the actual output is this
testing wait command 2
testing wait command 1
command 1 completed
command 2 completed
I do not understand why command 2 come in front of command 1.
If I remove & from echo "testing wait command 1" &, the output is what I expected.

I use GNU bash, version 5.0.18(1)-release (x86_64-slackware-linux-gnu) and have run your script several
times and always got testing wait command 1 printed first.
That being said, when & is used Bash forks a new process and you
cannot expect a given order in which kernel scheduler will run
different processes or more technically speaking threads on the
system. It may seem to you that all processes on the system are run at
the same time but in fact system scheduler will constantly switch
between them and lets them use CPU. This switch is known as context
switch and it happens so frequently that humans don't even
notice. Process can be preempted at any time, not only to let other
process use CPU resources but also due to hardware interrupt which is
triggered by hardware and handled by kernel. You cannot even expect
that script execution will take the same amount of time every time you
run it.

This means command 2 runs faster and is completed before command 1. If you want strict order - remove '&' after command 1.

Related

Difference between ctrl-z and "&" in linux

My understanding is that when you are running any command (say sleep 10) within a given shell (say bash), what's happening under the hood is that a fork system call is called, and sleep 10 is now running as a child process with the parent being the bash shell I executed the sleep.
Now, if I want to send sleep to the background, I would either do sleep 10 & or run sleep 10 and press ctrl+z so the process is sent to the background. pstree shows that using any of these options, sleep keeps being a child process of the bash shell.
Now my question is, when doing this through SSH, I noted the following:
If I do: sleep 999 & and sleep 888 <- followed by a ctrl+z, and the close the ssh session, only sleep 999 & survived.
Why is this? I actually was expecting one of these:
both processes gets terminated because the parent process is gone
both processes gets associated to init as the parent process.

or run sleep 10 and press ctrl+z so the process is sent to the background
No, not really. Didn't you see that big message that says "[1]+ Stopped sleep 10"? ctrl+z stops the process and returns you to the current shell. You can now type fg to continue process, or type bg to continue the process in the background. Research "bash job control" and see bash manual Job Control Basics.
Why is this?
Stopped processes are first continued SIGCONT and then SIGTERM is sent to them so they can terminate.
Bash manual is available online: https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html .

How to start and monitor a set of programs in bash?

I use a system that is started by a script similar to that:
#!/bin/bash
prog_a & # run continuously
prog_b & # run continuously
prog_c & # run continuously
sleep 2 # wait for some stuff to be done
prog_d # start 'main' program ...
killall -9 prog_a
killall -9 prog_b
killall -9 prog_c
It works well. If I do a ctrl-c in the terminal (or if prog_d crashes), then prog_d died and the first processes prog_a, prog_b, prog_c are killed.
The problem I have is that sometimes prog_a, prog_b or prog_c crashed. And prog_d is still alive. What I would like in fact is: if one program died, then the other ones are killed.
Is it possible to do that simply in bash ? I have tried to create a kind of:
wait pid1 pid2 pid3 ... # wait that pid1 or pid2 or pid3 died
But without success (I need to be able to do a ctrl-c to kill prog_d).
Thanks !

I would do that with GNU Parallel, which has nice handling for what to do when any job fails... whether one or more or a percentage fail, whether other jobs should be terminated immediately or only no new jobs should be started.
In your specific case:
parallel -j 4 --halt now,fail=1 --line-buffer ::: progA progB progC 'sleep 2; progD'
That says... "run all four jobs parallel, and halt immediately killing all others if any job fails. Buffer the output by lines. The jobs to be run are then specified after the ::: and they are just your jobs but with a delay before the final one."
You may like the output tagged by the job-name, so you can see which outputs came from which processes, if so, use parallel --tag ...
You may like to delay/stagger the starts of each job, in which case use parallel --delay 1 to start jobs at 1 second intervals and remove the sleep 2.

nohup node service using cron job on CentOS 7 [duplicate]

I have a python script that'll be checking a queue and performing an action on each item:
# checkqueue.py
while True:
check_queue()
do_something()
How do I write a bash script that will check if it's running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep?):
# keepalivescript.sh
if processidfile exists:
if processid is running:
exit, all ok
run checkqueue.py
write processid to processidfile
I'll call that from a crontab:
# crontab
*/5 * * * * /path/to/keepalivescript.sh

Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.
There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.
Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.
until myserver; do
echo "Server 'myserver' crashed with exit code $?. Respawning.." >&2
sleep 1
done
The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0, until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.
Why do we wait a second? Because if something's wrong with the startup sequence of myserver and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1 takes away the strain from that.
Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an #reboot rule. Open your cron rules with crontab:
crontab -e
Then add a rule to start your monitor script:
#reboot /usr/local/bin/myservermonitor
Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver start at a certain init level and be respawned automatically.
Edit.
Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.
Consider this:
PID recycling (killing the wrong process):
/etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
A while later: foo dies somehow.
A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
You notice foo's gone: /etc/init.d/foo/restart reads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..
What if you don't even have write access or are in a read-only environment?
It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.
See also: Are PID-files still flawed when doing it 'right'?
By the way; even worse than PID files is parsing ps! Don't ever do this.
ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.
If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.

Have a look at monit (http://mmonit.com/monit/). It handles start, stop and restart of your script and can do health checks plus restarts if necessary.
Or do a simple script:
while true
do
/your/script
sleep 1
done

In-line:
while true; do <your-bash-snippet> && break; done
This will restart continuously <your-bash-snippet> if it fails: && break will stop the loop if <your-bash-snippet> stop gracefully (return code 0).
To restart <your-bash-snippet> in all cases:
while true; do <your-bash-snippet>; done
e.g. #1
while true; do openconnect x.x.x.x:xxxx && break; done
e.g. #2
while true; do docker logs -f container-name; sleep 2; done

The easiest way to do it is using flock on file. In Python script you'd do
lf = open('/tmp/script.lock','w')
if(fcntl.flock(lf, fcntl.LOCK_EX|fcntl.LOCK_NB) != 0):
sys.exit('other instance already running')
lf.write('%d\n'%os.getpid())
lf.flush()
In shell you can actually test if it's running:
if [ `flock -xn /tmp/script.lock -c 'echo 1'` ]; then
echo 'it's not running'
restart.
else
echo -n 'it's already running with PID '
cat /tmp/script.lock
fi
But of course you don't have to test, because if it's already running and you restart it, it'll exit with 'other instance already running'
When process dies, all it's file descriptors are closed and all locks are automatically removed.

You should use monit, a standard unix tool that can monitor different things on the system and react accordingly.
From the docs: http://mmonit.com/monit/documentation/monit.html#pid_testing
check process checkqueue.py with pidfile /var/run/checkqueue.pid
if changed pid then exec "checkqueue_restart.sh"
You can also configure monit to email you when it does do a restart.

if ! test -f $PIDFILE || ! psgrep `cat $PIDFILE`; then
restart_process
# Write PIDFILE
echo $! >$PIDFILE
fi

watch "yourcommand"
It will restart the process if/when it stops (after a 2s delay).
watch -n 0.1 "yourcommand"
To restart it after 0.1s instead of the default 2 seconds
watch -e "yourcommand"
To stop restarts if the program exits with an error.
Advantages:
built-in command
one line
easy to use and remember.
Drawbacks:
Only display the result of the command on the screen once it's finished

I'm not sure how portable it is across operating systems, but you might check if your system contains the 'run-one' command, i.e. "man run-one".
Specifically, this set of commands includes 'run-one-constantly', which seems to be exactly what is needed.
From man page:
run-one-constantly COMMAND [ARGS]
Note: obviously this could be called from within your script, but also it removes the need for having a script at all.

I've used the following script with great success on numerous servers:
pid=`jps -v | grep $INSTALLATION | awk '{print $1}'`
echo $INSTALLATION found at PID $pid
while [ -e /proc/$pid ]; do sleep 0.1; done
notes:
It's looking for a java process, so I
can use jps, this is much more
consistent across distributions than
ps
$INSTALLATION contains enough of the process path that's it's totally unambiguous
Use sleep while waiting for the process to die, avoid hogging resources :)
This script is actually used to shut down a running instance of tomcat, which I want to shut down (and wait for) at the command line, so launching it as a child process simply isn't an option for me.

I use this for my npm Process
#!/bin/bash
for (( ; ; ))
do
date +"%T"
echo Start Process
cd /toFolder
sudo process
date +"%T"
echo Crash
sleep 1
done

Disabling Hanging Script

When launching a bash script in LINUX, the script succeeds and is successful, yet the terminal hangs. I must always input CTRL+C to end the program. I am able to type in the terminal and press enter, but the script does not respond.
I can not change the script files, but can I launch it so that it disables waiting for the user? Any troubleshooting tips to disable this behaviour?

You can execute the script with & at the end, this will give the control back to the shell (executes the script as a background process).
./script.sh &
If you want to stop the script, you need to get its process id and then kill it. To get the process id, either execute ps aux | grep script where script is your script name, or execute echo $! right after you launched the script. When you have the process id, you can kill the process with kill 1234 where 1234 is the process id.
If the execution time of the script can be estimated, you can kill it automatically after a certain amount of time:
bash -c '(sleep 5m; kill $$ 2> /dev/null) & exec script' &
In this command sleep 5m is the time after the process will be killed, and script is the name of your script (or the command).
For example if the script's execution time is 30 seconds on average, then you can set the timeout to a minute or two to give it some extra time in case the execution is slower than usual. Note that this command doesn't guarantee that the script finished its execution, so use it with care.

linux batch jobs in parallel

I have seven licenses of a particular software. Therefore, I want to start 7 jobs simultaneously. I can do that using '&'. Now, 'wait' command waits till the end of all of those 7 processes to be finished to spawn the next 7. Now, I would like to write the shell script where after I start the first seven, as and when a job gets completed I would like to start another. This is because some of those 7 jobs might take very long while some others get over really quickly. I don't want to waste time waiting for all of them to finish. Is there a way to do this in linux? Could you please help me?
Thanks.

GNU parallel is the way to go. It is designed for launching multiples instances of a same command, each with a different argument retrieved either from stdin or an external file.
Let's say your licensed script is called myScript, each instance having the same options --arg1 --arg2 and taking a variable parameter --argVariable for each instance spawned, those parameters being stored in file myParameters :
cat myParameters | parallel -halt 1 --jobs 7 ./myScript --arg1 --argVariable {} --arg2
Explanations :
-halt 1 tells parallel to halt all jobs if one fails
--jobs 7 will launch 7 instances of myScript
On a debian-based linux system, you can install parallel using :
sudo apt-get install parallel
As a bonus, if your licenses allow it, you can even tell parallel to launch these 7 instances amongst multiple computers.

You could check how many are currently running and start more if you have less than 7:
while true; do
if [ "`ps ax -o comm | grep process-name | wc -l`" -lt 7 ]; then
process-name &
fi
sleep 1
done

Write two scripts. One which restarts a job everytime it is finished and one that starts 7 times the first script.
Like:
script1:
./script2 job1
...
./script2 job7
and
script2:
while(...)
./jobX

I found a fairly good solution using make, which is a part of the standard distributions. See here

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string