Is pipe in linux asynchronous or not? - linux

I thought that when i running below commands
sleep 10 | sleep 2 | sleep 5
Linux process will be
86014 ttys002 0:00.03 bash
86146 ttys002 0:00.00 sleep 10
when sleep 10 is end -> sleep 2 (when sleep 2 is end)-> sleep 5
that's i thought
But in Linux bash sleep 10, sleep 2, sleep 5 are in ps same time
Standard output of sleep 10 process will be redirected to sleep 5's process
But, in that case, sleep 5's process will be finished before sleep 10
I'm confused, any google keywords or concepts of that phenomenon?
(I'm not good at English, maybe it's hard to understand this text🥲. Thank you)

I think you expect the commands to be run in sequence. But that is not what a pipe does.
To run two commands in sequence you use the ;, what it is called a command list, I think:
$ time ( sleep 1 ; sleep 2 )
real 0m3.004s
You can also do command lists with && (or ||) so that the sequence is interrupted if one command returns failure (or success).
But when you run two commands with | both are run in parallel, and the stdout of the first is connected to the stdinof the second. That way, the pipe acts as a synchronization object:
If the second command is faster and empties the pipe, when it reads it will wait for more data
If the first command is faster and writes too much data to the pipe, its buffer will fill up and it will block until some data is read.
Additionally, if the second command dies, as soon as the first one writes to stdout it will get a SIGPIPE, and possibly die.
(Note what would happen if your programs were not run concurrently: your first program could write megabytes of text to stdout, and with nobody to read it, the pipe would overflow!)
But since sleep does not read or write to the console, when you do sleep 1 | sleep 2 nothing special happens and both are run concurrently.
The same happens with 3 or any other number of commands in your pipe.
The net effect is that the full sleep is the longest:
$ time ( sleep 1 | sleep 2 | sleep 3 | sleep 4 )
real 0m4.004s

Related

How to start and monitor a set of programs in bash?

I use a system that is started by a script similar to that:
#!/bin/bash
prog_a & # run continuously
prog_b & # run continuously
prog_c & # run continuously
sleep 2 # wait for some stuff to be done
prog_d # start 'main' program ...
killall -9 prog_a
killall -9 prog_b
killall -9 prog_c
It works well. If I do a ctrl-c in the terminal (or if prog_d crashes), then prog_d died and the first processes prog_a, prog_b, prog_c are killed.
The problem I have is that sometimes prog_a, prog_b or prog_c crashed. And prog_d is still alive. What I would like in fact is: if one program died, then the other ones are killed.
Is it possible to do that simply in bash ? I have tried to create a kind of:
wait pid1 pid2 pid3 ... # wait that pid1 or pid2 or pid3 died
But without success (I need to be able to do a ctrl-c to kill prog_d).
Thanks !
I would do that with GNU Parallel, which has nice handling for what to do when any job fails... whether one or more or a percentage fail, whether other jobs should be terminated immediately or only no new jobs should be started.
In your specific case:
parallel -j 4 --halt now,fail=1 --line-buffer ::: progA progB progC 'sleep 2; progD'
That says... "run all four jobs parallel, and halt immediately killing all others if any job fails. Buffer the output by lines. The jobs to be run are then specified after the ::: and they are just your jobs but with a delay before the final one."
You may like the output tagged by the job-name, so you can see which outputs came from which processes, if so, use parallel --tag ...
You may like to delay/stagger the starts of each job, in which case use parallel --delay 1 to start jobs at 1 second intervals and remove the sleep 2.

Launch the same program with different arguments in parallel via bash

I have a program that has very big computation times. I need to call it with different arguments. I want to run them on a server with a lot of processors, so I'd like to launch them in parallel in order to save time. (One program instance only uses one processor)
I have tried my best to write a bash script which looks like this:
#!/bin/bash
# set maximal number of parallel jobs
MAXPAR=5
# fill the PID array with nonsense pid numbers
for (( PAR=1; PAR<=MAXPAR; PAR++ ))
do
PID[$PAR]=-18
done
# loop over the arguments
for ARG in 50 60 70 90
do
# endless loop that checks, if one of the parallel jobs has finished
while true
do
# check if PID[PAR] is still running, suppress error output of kill
if ! kill -0 ${PID[PAR]} 2> /dev/null
then
# if PID[PAR] is not running, the next job
# can run as parellel job number PAR
break
fi
# if it is still running, check the next parallel job
if [ $PAR -eq $MAXPAR ]
then
PAR=1
else
PAR=$[$PAR+1]
fi
# but sleep 10 seconds before going on
sleep 10
done
# call to the actual program (here sleep for example)
#./complicated_program $ARG &
sleep $ARG &
# get the pid of the process we just started and save it as PID[PAR]
PID[$PAR]=$!
# give some output, so we know where we are
echo ARG=$ARG, par=$PAR, pid=${PID[PAR]}
done
Now, this script works, but I don't quite like it.
Is there any better way to deal with the beginning? (Setting PID[*]=-18 looks wrong to me)
How do I wait for the first job to finish without the ugly infinite loop and sleeping some seconds? I know there is wait, but I'm not sure how to use it here.
I'd be grateful for any comments on how to improve style and conciseness.
I have a much more complicated code that, more or less, does the same thing.
The things you need to consider:
Does the user need to approve the spawning of a new thread
Does the user need to approve the killing of an old thread
Does the thread terminate on it's own or it needs to be killed
Does the user want the script to run endlessly, as long as it has MAXPAR threads
If so, does the user need an escape sequence to stop further spawning
Here is some code for you:
spawn() #function that spawns a thread
{ #usage: spawn 1 ls -l
i=$1 #save the thread index
shift 1 #shift arguments to the left
[ ${thread[$i]} -ne 0 ] && #if the thread is not already running
[ ${#thread[#]} -lt $threads] && #and if we didn't reach maximum number of threads,
$# & #run the thread in the background, with all the arguments
thread[$1]=$! #associate thread id with thread index
}
terminate() #function that terminates threads
{ #usage: terminate 1
[ your condition ] && #if your condition is met,
kill {thread[$1]} && #kill the thread and if so,
thread[$1]=0 #mark the thread as terminated
}
Now, the rest of the code depends on your needs (things to consider), so you will either loop through input arguments and call spawn, and then after some time loop through threads indexes and call terminate. Or, if the threads end on their own, loop through input arguments and call both spawn and terminate,but the condition for the terminate is then:
[ ps -aux 2>/dev/null | grep " ${thread[$i]} " &>/dev/null ]
#look for thread id in process list (note spaces around id)
Or, something along the lines of that, you get the point.
Using the tips #theotherguy gave in the comments, I rewrote the script in a better way using the sem command that comes with GNU Parallel:
#!/bin/bash
# set maximal number of parallel jobs
MAXPAR=5
# loop over the arguments
for ARG in 50 60 70 90
do
# call to the actual program (here sleep for example)
# prefixed by sem -j $MAXPAR
#sem -j $MAXPAR ./complicated_program $ARG
sem -j $MAXPAR sleep $ARG
# give some output, so we know where we are
echo ARG=$ARG
done

linux batch jobs in parallel

I have seven licenses of a particular software. Therefore, I want to start 7 jobs simultaneously. I can do that using '&'. Now, 'wait' command waits till the end of all of those 7 processes to be finished to spawn the next 7. Now, I would like to write the shell script where after I start the first seven, as and when a job gets completed I would like to start another. This is because some of those 7 jobs might take very long while some others get over really quickly. I don't want to waste time waiting for all of them to finish. Is there a way to do this in linux? Could you please help me?
Thanks.
GNU parallel is the way to go. It is designed for launching multiples instances of a same command, each with a different argument retrieved either from stdin or an external file.
Let's say your licensed script is called myScript, each instance having the same options --arg1 --arg2 and taking a variable parameter --argVariable for each instance spawned, those parameters being stored in file myParameters :
cat myParameters | parallel -halt 1 --jobs 7 ./myScript --arg1 --argVariable {} --arg2
Explanations :
-halt 1 tells parallel to halt all jobs if one fails
--jobs 7 will launch 7 instances of myScript
On a debian-based linux system, you can install parallel using :
sudo apt-get install parallel
As a bonus, if your licenses allow it, you can even tell parallel to launch these 7 instances amongst multiple computers.
You could check how many are currently running and start more if you have less than 7:
while true; do
if [ "`ps ax -o comm | grep process-name | wc -l`" -lt 7 ]; then
process-name &
fi
sleep 1
done
Write two scripts. One which restarts a job everytime it is finished and one that starts 7 times the first script.
Like:
script1:
./script2 job1
...
./script2 job7
and
script2:
while(...)
./jobX
I found a fairly good solution using make, which is a part of the standard distributions. See here

Bash script how to sleep in new process then execute a command

So, I was wondering if there was a bash command that lets me fork a process which sleeps for several seconds, then executes a command.
Here's an example:
sleep 30 'echo executing...' &
^This doesn't actually work (because the sleep command only takes the time argument), but is there something that could do something like this? So, basically, a sleep command that takes a time argument and something to execute when the interval is completed? I want to be able to fork it into a different process then continue processing the shell script.
Also, I know I could write a simple script that does this, but due to some restraints to the situation (I'm actually passing this through a ssh call), I'd rather not do that.
You can do
(sleep 30 && command ...)&
Using && is safer than ; because it ensures that command ... will run only if the sleep timer expires.
You can invoke another shell in the background and make it do what you want:
bash -c 'sleep 30; do-whatever-else' &
The default interval for sleep is in seconds, so the above would sleep for 30 seconds. You can specify other intervals like: 30m for 30 minutes, or 1h for 1 hour, or 3d for 3 days.

Instance limited cron job

I want to run a cron job every minute that will launch a script. Simple enough there. However, I need to make sure that not more than X number (defined in the script) of instances are ever running. These are queue workers, so if at any minute interval 6 workers are still active, then I would not launch another instance. The script simply launches a PHP script which exits if no job available. Right now I have a shell script that perpetually launches itself every 10 seconds after exit... but there are long periods of time where there are no jobs, and a minute delay is fine. Eventually I would like to have two cron jobs for peak and off-peak, with different intervals.
Make sure you have unique script name.
Then check if 6 instances are already running
if [ $(pgrep '^UNIQUE_SCIPT_NAME$' -c) -lt 6 ]
then
# start my script
else
# do not start my script
fi
I'd say that if you want to iterate as often as every minute, then a process like your current shell script that relaunches itself is what you actually want to do. Just increase the delay from 10 seconds to a minute.
That way, you can also more easily control your delay for peak and off-peak, as you wanted. It would be rather elegant to simply use a shorter delay if the script found something to do the last time it was launched, or a longer delay if it did not find anything.
You could use a script like OneAtATime to guard against multiple simultaneous executions.
This is what i am using in my shell scripts:
echo -n "Checking if job is already running... "
me=`basename $0`
running=$(ps aux | grep ${me} | grep -v .log | grep -v grep | wc -l)
if [ $running -gt 1 ];
then
echo "already running, stopping job"
exit 1
else
echo "OK."
fi;
The command you're looking for is in line 3. Just replace $(me) with your php script name. In case you're wondering about the grep .log part: I'm piping the output into a log file, whose name partially contains the script name, so this way i'm avoiding it to be double-counted.

Resources