Why 'wait' doesn't wait for detached jobs - linux

I followed this blog entry to parallelize sort by splitting a large file, sorting and merging.
The steps are:
split -l5000000 data.tsv '_tmp'
ls -1 _tmp* | while read FILE; do sort $FILE -o $FILE & done
sort -m _tmp* -o data.tsv.sorted
Between step 2 and 3, one must wait until the sorting step has finished.
I assumed that wait without any arguments would be the right thing, since according to the man page, if wait is called without arguments all currently active child processes are waited for.
However, when I try this in the shell (i.e. executing steps 1 and 2, and then wait), wait returns immediately, although top shows the sort processes are still running.
Ultimately I want to increase the speed of a script with that, so its not a one time thing I could do manually on the shell.
I know sort has a --parallel option since version 8, however on the cluster I am running this, an older version is installed, and I am also curious about how to solve this issue.

Here's a simple test case reproducing your problem:
true | { sleep 10 & }
wait
echo "This echos immediately"
The problem is that the pipe creates a subshell, and the forked processes are part of that subshell. The solution is to wait in that subshell instead of your main parent shell:
true | { sleep 10 & wait }
echo "This waits"
Translated back into your code, this means:
ls -1 _tmp* | { while read FILE; do sort $FILE -o $FILE & done; wait; }

From the bash man page:
Each command in a pipeline is executed as a separate process (i.e., in a subshell).
So when you pipe to while, a subshell is created. Everything else in step 2 is executed within this subshell, (ie, all the background processes). The script then exits the while loop, leaving the subshell, and wait is executed in the parent shell, where there is nothing to wait for. You can avoid using the pipeline by using a process substitution:
while read FILE; do
sort $FILE -o $FILE &
done < <(ls -1 _tmp*)

Related

Csh script wait for multiple pid

Does the wait command work in a csh script to wait for more than 1 PID to finish?
Where the wait command waits for all the PID listed to complete before moving on to the next line
e.g.
wait $job1_pid $job2_pid $job3_pid
nextline
as the documentation online that I usually see only shows the wait command with only 1 PID, although I have read of using wait for multiple PID, like here :
http://www2.phys.canterbury.ac.nz/dept/docs/manuals/unix/DEC_4.0e_Docs/HTML/MAN/MAN1/0522____.HTM
which says quote "If one or more pid operands are specified that represent known process IDs,the wait utility waits until all of them have terminated"
No, the builtin wait command in csh can only wait for all jobs to finish. The command in the documentation that you're referencing is a separate executable that is probably located at /usr/bin/wait or similar. This executable cannot be used for what you want to use it for.
I recommend using bash and its more powerful wait builtin, which does allow you to wait for specific jobs or process ids.
From the tcsh man page, wait waits for all background jobs. tcsh is compatible with csh, which is what the university's documentation you linked is referring to.
wait The shell waits for all background jobs. If the shell is interactive, an interrupt will disrupt the wait and cause the shell
to print the names and job numbers of all outstanding jobs.
You can find this exact text on the csh documentation here.
The wait executable described in the documentation is actually a separate command that waits for a list of process ids.
However, the wait executable is not actually capable of waiting for the child processes of the running shell script and has no chance of doing the right thing in a shell script.
For instance, on OS X, /usr/bin/wait is this shell script.
#!/bin/sh
# $FreeBSD: src/usr.bin/alias/generic.sh,v 1.2 2005/10/24 22:32:19 cperciva Exp $
# This file is in the public domain.
builtin `echo ${0##*/} | tr \[:upper:] \[:lower:]` ${1+"$#"}
Anyway, I can't get the /usr/bin/wait executable to work reliably in a Csh script ... because the the background jobs are not child processes of the /usr/bin/wait process itself.
#!/bin/csh -f
setenv PIDDIR "`mktemp -d`"
sleep 4 &
ps ax | grep 'slee[p]' | awk '{ print $1 }' > $PIDDIR/job
/usr/bin/wait `cat $PIDDIR/job`
I would highly recommend writing this script in bash or similar where the builtin wait does allow you to wait for pids and capturing pids from background jobs is easier.
#!/bin/bash
sleep 4 &
pid_sleep_4="$!"
sleep 7 &
pid_sleep_7="$!"
wait "$pid_sleep_4"
echo "waited for sleep 4"
wait "$pid_sleep_7"
echo "waited for sleep 7"
If you don't want to rewrite the entire csh script you're working on, you can call out to bash from inside a csh script like so.
#!/bin/csh -f
bash <<'EOF'
sleep 4 &
pid_sleep_4="$!"
sleep 7 &
pid_sleep_7="$!"
wait "$pid_sleep_4"
echo "waited for sleep 4"
wait "$pid_sleep_7"
echo "waited for sleep 7"
'EOF'
Note that you must end that heredoc with 'EOF' including the single quotes.

how can I make bash block on getting a line of stdout from a job that I have spawned

I need to launch a process within a shell script. (It is a special logging process.) It needs to live for most of the shell script, while some other processes will run, and then at the end we will kill it.
A problem that I am having is that I need to launch this process, and wait for it to "warm up", before proceeding to launch more processes.
I know that I can wait for a line of input from a pipe using read, and I know that I can spawn a child process using &. But when I use them together, it doesn't work like I expect.
As a mockup:
When I run this (sequential):
(sleep 1 && echo "foo") > read
my whole shell blocks for 1 second, and the echo of foo is consumed by read, as I expect.
I want to do something very similar, except that I run the "foo" job in parallel:
(sleep 1 && echo "foo" &) > read
But when I run this, my shell doesn't block at all, it returns instantly -- I don't know why the read doesn't wait for a line to be printed on the pipe?
Is there some easy way to combine "spawning of a job" (&) with capturing the stdout pipe within the original shell?
An example that is very close to what I actually need is, I need to rephrase this somehow,
(sleep 1 && echo "foo" && sleep 20 &) > read; echo "bar"
and I need for it to print "bar" after exactly one second, and not immediately, or 21 seconds later.
Here's an example using named pipes, pretty close to what I used in the end. Thanks to Luis for his comments suggesting named pipes.
#!/bin/sh
# Set up temporary fifo
FIFO=/tmp/test_fifo
rm -f "$FIFO"
mkfifo "$FIFO"
# Spawn a second job that writes to FIFO after some time
sleep 1 && echo "foo" && sleep 20 >$FIFO &
# Block the main job on getting a line from the FIFO
read line <$FIFO
# So that we can see when the main job exits
echo $line
Thanks also to commenter Emily E., the example that I posted that was misbehaving was indeed writing to a file called read instead of using the shell-builtin command read.

crash-stopping bash pipeline [duplicate]

This question already has answers here:
How do you catch error codes in a shell pipe?
(5 answers)
Closed 7 years ago.
I have a pipeline, say a|b where if a runs into a problem, I want to stop the whole pipeline.
'a exiting with exit=1 doesn't do this as often 'b doesn't care about return codes.
e.g.
echo 1|grep 0|echo $? <-- this shows that grep did exit=1
but
echo 1|grep 0 | wc <--- wc is unfazed by grep's exit here
If I ran the pipeline as a subprocess of an owning process, any of the pipeline processes could kill the owning process. But this seems a bit clumsy -- but it would zap the whole pipeline.
Not possible with basic shell constructs, probably not possible in shell at all.
Your first example doesn't do what you think. echo doesn't use standard input, so putting it on the right side of a pipe is never a good idea. The $? that you're echoing is not the exit value of the grep 0. All commands in a pipeline run simultaneously. echo has already been started, with the existing value of $?, before the other commands in the pipeline have finished. It echoes the exit value of whatever you did before the pipeline.
# The first command is to set things up so that $? is 2 when the
# second command is parsed.
$ sh -c 'exit 2'
$ echo 1|grep 0|echo $?
2
Your second example is a little more interesting. It's correct to say that wc is unfazed by grep's exit status. All commands in the pipeline are children of the shell, so their exit statuses are reported to the shell. The wc process doesn't know anything about the grep process. The only communication between them is the data stream written to the pipe by grep and read from the pipe by wc.
There are ways to find all the exit statuses after the fact (the linked question in the comment by shx2 has examples) but a basic rule that you can't avoid is that the shell will always wait for all the commands to finish.
Early exits in a pipeline sometimes do have a cascade effect. If a command on the right side of a pipe exits without reading all the data from the pipe, the command on the left of that pipe will get a SIGPIPE signal the next time it tries to write, which by default terminates the process. (The 2 phrases to pay close attention to there are "the next time it tries to write" and "by default". If a the writing process spends a long time doing other things between writes to the pipe, it won't die immediately. If it handles the SIGPIPE, it won't die at all.)
In the other direction, when a command on the left side of a pipe exits, the command on the right side of that pipe gets EOF, which does cause the exit to happen fairly soon when it's a simple command like wc that doesn't do much processing after reading its input.
With direct use of pipe(), fork(), and wait3(), it would be possible to construct a pipeline, notice when one child exits badly, and kill the rest of them immediately. This requires a language more sophisticated than the shell.
I tried to come up with a way to do it in shell with a series of named pipes, but I don't see it. You can run all the processes as separate jobs and get their PIDs with $!, but the wait builtin isn't flexible enough to say "wait for any child in this set to exit, and tell me which one it was and what the exit status was".
If you're willing to mess with ps and/or /proc you can find out which processes have exited (they'll be zombies), but you can't distinguish successful exit from any other kind.
Write
set -e
set -o pipefail
at the beginning of your file.
-e will exit on an error and -o pipefail will produce an errorcode on each stage of you "pipeline"

Linux - Execute shell scripts simultaneously on the background and know when its done

I'm using rsync to transfer files from a server to another server (both owned by me), my only problem is that these files are over 50GB and I got a ton of them to transfer (Over 200 of them).
Now I could just open multiple tabs and run rsync or add the "&" at the end of the script to execute it in the background.
So my question is, how can I execute this command in the background and when its done transferring, I want a message to be shown on the terminal window that executed the script.
(rsync -av -progress [FOLDER_NAME] [DISTINATION]:[PATH] &) && echo 'Finished'
I know thats completely wrong but I need to use & to run it in the background and && to run echo after rsync finished.
Next to the screen-based solution, you could use xargs tool, too.
echo '/srcpath1 host1 /dstpath1
/srcpath2 host2 /dstpath2
/srcpath3 host3 /dstpath3'| \
xargs -P 5 --max-lines 1 bash -e 'rsync -av -progress $1 $2:$3'
xargs reads its input for stdin, and executes a command for every single words or lines. This time, lines.
What it makes very good: it can do with its child processes parallel! In this configuration, xargs does this by using always 5 parallel child processes. This number can be 1 or even infinite.
xargs will exit, if all of its childs are ready, and handles every ctrl/c, child processing, etc very well and problem tolerant.
Instead of the echo, the input of xargs can come from a file, or even from a previous command in the pipe, too. Or from a for or while loop.
You could use gnu screen for that, screen could monitor output for silence and for activity. Additional benefit - you could close terminal and reattach to screen later - even better if you run could screen on server - then you could shutdown or reboot your machine and processes in screen still be executing.
Well, to answer your specific question, your invocation:
(rsync ... &) && echo 'Finished'
creates a subshell - the ( ... ) bit - in which rsync is run in the background, which means the subshell will exit as soon as it has started rsync, not after rsync finishes. The && echo ... part then notices that the subshell has exited successfully and does its thing, which is not what you want, because rsync is most likely still running.
To accomplish what you want, you need to do this:
(rsync ... && echo 'Finished') &
That will put the subshell itself in the background, and the subshell will run rsync and then echo. If you need to wait for that subshell to finish at some point later in your script, simply insert a wait at the appropriate point.
You could also structure it this way:
rsync ... &
# other stuff to do while rsync runs
wait
echo 'Finished'
Which is "better" is really just a matter of preference. There's one minor difference in that the && will run echo only if rsync doesn't report an error exit code - but replacing && with ; would make the two patterns more equivalent. The second method makes the echo synchronous with other output from your script, so it doesn't show up in the middle of other output, so it might be slightly preferable from that respect, but capturing the exit condition of rsync would be more complicated if it was necessary...

How to wait until the file is closed

I have an external process that start a write to a file. How do I write script that waits until the file is closed (when the other process done with the writing).
There are several ways to achieve this:
If you can, start the process from your script. The script will continue when the process terminates and that means it can't write any more data to the file.
If you can't control the process but you know that the process terminates after writing the file, you can find out the process ID and then check if the process is still running with kill -0 $PID. If $? is 0 afterwards, the process is still alive.
If that's not possible, then you can use lsof -np $PID to get a list of all open files for this process and check if your file is in the list. This is somewhat slow, though.
[EDIT] Note that all these approaches are somewhat brittle. The correct solution is to have the external process write the file using a temporary name and then rename it as soon as it's done.
The rename makes sure that everyone else either sees the whole file with all the data or nothing.
The easy way: let the script execute the program and wait until it's finished.
Create a small C program using inotify. The program should:
Create an inotify instance.
Add a watch to the instance for the IN_CLOSE_WRITE event for the file path of interest.
Wait for the event.
Exit with an appropriate code.
Then in your script, invoke the program with the file path as an argument.
You could extend this by adding a timeout argument, and allowing different events to be specified.
This isn't nice, and it makes me feel dirty writing it... /tmp/foo.txt is the file being tested...
#!/bin/bash
until [ "$( find /proc/*/fd 2> /dev/null |
xargs -i{} readlink {} |
grep -c '^/tmp/foo.txt$' )" == "0" ]; do
sleep 1;
done;
Loop until file is stable approach, this should work if you are waiting for experiment results (so you don't need real-time event handling):
EXPECTED_WRITE_INTERVAL_SECONDS=1
FILE="file.txt"
while : ; do
omod=$(stat -c %Y $FILE)
# echo "OLD: $omod"
sleep $EXPECTED_WRITE_INTERVAL_SECONDS
nmod=$(stat -c %Y $FILE)
# echo "NEW: $nmod"
if [ $omod == $nmod ] ; then
break
fi
done

Resources