Using named pipes with bash - Problem with data loss - linux

Did some search online, found simple 'tutorials' to use named pipes. However when I do anything with background jobs I seem to lose a lot of data.
[[Edit: found a much simpler solution, see reply to post. So the question I put forward is now academic - in case one might want a job server]]
Using Ubuntu 10.04 with Linux 2.6.32-25-generic #45-Ubuntu SMP Sat Oct 16 19:52:42 UTC 2010 x86_64 GNU/Linux
GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu).
My bash function is:
function jqs
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT SIGKILL
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
if read txt <"$pipe"
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
fi
fi
done
}
I run this in the background:
> jqs&
[1] 5336
And now I feed it:
for i in 1 2 3 4 5 6 7 8
do
(echo aaa$i > /tmp/__job_control_manager__ && echo success$i &)
done
The output is inconsistent.
I frequently don't get all success echoes.
I get at most as many new text echos as success echoes, sometimes less.
If I remove the '&' from the 'feed', it seems to work, but I am blocked until the output is read. Hence me wanting to let sub-processes get blocked, but not the main process.
The aim being to write a simple job control script so I can run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run.
Full job manager below:
function jq_manage
{
export __gn__="$1"
pipe=/tmp/__job_control_manager_"$__gn__"__
trap "rm -f $pipe" EXIT
trap "break" SIGKILL
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
date
jobs
if (($(jobs | egrep "Running.*echo '%#_Group_#%_$__gn__'" | wc -l) < $__jN__))
then
echo "Waiting for new job"
if read new_job <"$pipe"
then
echo "new job is [[$new_job]]"
if [[ "$new_job" == 'quit' ]]
then
break
fi
echo "In group $__gn__, starting job $new_job"
eval "(echo '%#_Group_#%_$__gn__' > /dev/null; $new_job) &"
fi
else
sleep 3
fi
done
}
function jq
{
# __gn__ = first parameter to this function, the job group name (the pool within which to allocate __jN__ jobs)
# __jN__ = second parameter to this function, the maximum of job numbers to run concurrently
export __gn__="$1"
shift
export __jN__="$1"
shift
export __jq__=$(jobs | egrep "Running.*echo '%#_GroupQueue_#%_$__gn__'" | wc -l)
if (($__jq__ '<' 1))
then
eval "(echo '%#_GroupQueue_#%_$__gn__' > /dev/null; jq_manage $__gn__) &"
fi
pipe=/tmp/__job_control_manager_"$__gn__"__
echo $# >$pipe
}
Calling
jq <name> <max processes> <command>
jq abc 2 sleep 20
will start one process.
That part works fine. Start a second one, fine.
One by one by hand seem to work fine.
But starting 10 in a loop seems to lose the system, as in the simpler example above.
Any hints as to what I can do to solve this apparent loss of IPC data would be greatly appreciated.
Regards,
Alain.

Your problem is if statement below:
while true
do
if read txt <"$pipe"
....
done
What is happening is that your job queue server is opening and closing the pipe each time around the loop. This means that some of the clients are getting a "broken pipe" error when they try to write to the pipe - that is, the reader of the pipe goes away after the writer opens it.
To fix this, change your loop in the server open the pipe once for the entire loop:
while true
do
if read txt
....
done < "$pipe"
Done this way, the pipe is opened once and kept open.
You will need to be careful of what you run inside the loop, as all processing inside the loop will have stdin attached to the named pipe. You will want to make sure you redirect stdin of all your processes inside the loop from somewhere else, otherwise they may consume the data from the pipe.
Edit: With the problem now being that you are getting EOF on your reads when the last client closes the pipe, you can use jilles method of duping the file descriptors, or you can just make sure you are a client too and keep the write side of the pipe open:
while true
do
if read txt
....
done < "$pipe" 3> "$pipe"
This will hold the write side of the pipe open on fd 3. The same caveat applies with this file descriptor as with stdin. You will need to close it so any child processes dont inherit it. It probably matters less than with stdin, but it would be cleaner.

As said in other answers you need to keep the fifo open at all times to avoid losing data.
However, once all writers have left after the fifo has been open (so there was a writer), reads return immediately (and poll() returns POLLHUP). The only way to clear this state is to reopen the fifo.
POSIX does not provide a solution to this but at least Linux and FreeBSD do: if reads start failing, open the fifo again while keeping the original descriptor open. This works because in Linux and FreeBSD the "hangup" state is local to a particular open file description, while in POSIX it is global to the fifo.
This can be done in a shell script like this:
while :; do
exec 3<tmp/testfifo
exec 4<&-
while read x; do
echo "input: $x"
done <&3
exec 4<&3
exec 3<&-
done

Just for those that might be interested, [[re-edited]] following comments by camh and jilles, here are two new versions of the test server script.
Both versions now works exactly as hoped.
camh's version for pipe management:
function jqs # Job queue manager
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT TERM
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
if read -u 3 txt
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
else
sleep 1
# process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
fi
fi
done 3< "$pipe" 4> "$pipe" # 4 is just to keep the pipe opened so any real client does not end up causing read to return EOF
}
jille's version for pipe management:
function jqs # Job queue manager
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT TERM
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
exec 3< "$pipe"
exec 4<&-
while true
do
if read -u 3 txt
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
else
sleep 1
# process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
fi
else
# Close the pipe and reconnect it so that the next read does not end up returning EOF
exec 4<&3
exec 3<&-
exec 3< "$pipe"
exec 4<&-
fi
done
}
Thanks to all for your help.

Like camh & Dennis Williamson say don't break the pipe.
Now I have smaller examples, direct on the command line:
Server:
(
for i in {0,1,2,3,4}{0,1,2,3,4,5,6,7,8,9};
do
if read s;
then echo ">>$i--$s//";
else
echo "<<$i";
fi;
done < tst-fifo
)&
Client:
(
for i in {%a,#b}{1,2}{0,1};
do
echo "Test-$i" > tst-fifo;
done
)&
Can replace the key line with:
(echo "Test-$i" > tst-fifo&);
All client data sent to the pipe gets read, though with option two of the client one may need to start the server a couple of times before all data is read.
But although the read waits for data in the pipe to start with, once data has been pushed, it reads the empty string forever.
Any way to stop this?
Thanks for any insights again.

On the one hand the problem is worse than I thought:
Now there seems to be a case in my more complex example (jq_manage) where the same data is being read over and over again from the pipe (even though no new data is being written to it).
On the other hand, I found a simple solution (edited following Dennis' comment):
function jqn # compute the number of jobs running in that group
{
__jqty__=$(jobs | egrep "Running.*echo '%#_Group_#%_$__groupn__'" | wc -l)
}
function jq
{
__groupn__="$1"; shift # job group name (the pool within which to allocate $__jmax__ jobs)
__jmax__="$1"; shift # maximum of job numbers to run concurrently
jqn
while (($__jqty__ '>=' $__jmax__))
do
sleep 1
jqn
done
eval "(echo '%#_Group_#%_$__groupn__' > /dev/null; $#) &"
}
Works like a charm.
No socket or pipe involved.
Simple.

run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run
You can do this with GNU Parallel. You will not need a this scripting.
http://www.gnu.org/software/parallel/man.html#options
You can set max-procs "Number of jobslots. Run up to N jobs in parallel." There is an option to set the number of CPU cores you want to use. You can save the list of executed jobs to a log file, but that is a beta feature.

Related

why the command 'exec' can remove the blocking state of fifo file?

I'am studying how to use multi thread to process tasks.And i noticed that the fifo file can help that.here is the effect:
#!/bin/bash
my_cmd(){
echo "process $1"
sleep 3
}
ff="d:/myfifo/$$.fifo"
mkfifo $ff
exec 7<>$ff
for i in {1..10};do echo;done >&7
for i in {1..1000};do {
read -u 7
my_cmd $i
echo >&7
}& done
rm $ff
wait
echo "end"
This shell script can run normally(process 1000 cmds, 10 at a time).And i modified this script slightly
#!/bin/bash
my_cmd(){
echo "process $1"
sleep 3
}
ff="d:/myfifo/$$.fifo"
mkfifo $ff
exec 7<>$ff
for i in {1..10};do echo;done >$ff # modified
for i in {1..1000};do {
read <$ff # modified
my_cmd $i
echo >$ff # modified
}& done
wait
rm $ff
echo "end"
As expected,the second script can also run normally.But i made a error when i modified it again.
#!/bin/bash
my_cmd(){
echo "process $1"
sleep 3
}
ff="d:/myfifo/$$.fifo"
mkfifo $ff
# exec 7<>$ff modified
for i in {1..10};do echo;done >$ff
for i in {1..1000};do {
read <$ff
my_cmd $i
echo >$ff
}& done
wait
rm $ff
echo "end"
The script wait a input to fifo file,because the fifo file entered a blocking state.It seems that this command 'exec 7<>$ff' lifted the blocking state of this fifo file.So is this the case?
On Linux, at least (Not sure about other OSes, and POSIX doesn't define a behavior), opening a fifo for both reading and writing will succeed at once without blocking waiting for the other end of the pipe to be opened.
So when you commented out the exec 7<>$ff line, the next line for i in {1..10};do echo;done >$ff will opening the fifo for writing, and block waiting for something else to open it for writing before going on. With the original version using the exec, it was already opened for reading so there was no need to block.
The Linux fifo(7) documentation does note
A process that uses both ends of the connection in order to communicate with itself should be very careful to avoid deadlocks.

When piping in BASH, is it possible to get the PID of the left command from within the right command?

The Problem
Given a BASH pipeline:
./a.sh | ./b.sh
The PID of ./a.sh being 10.
Is there way to find the PID of ./a.sh from within ./b.sh?
I.e. if there is, and if ./b.sh looks something like the below:
#!/bin/bash
...
echo $LEFT_PID
cat
Then the output of ./a.sh | ./b.sh would be:
10
... Followed by whatever else ./a.sh printed to stdout.
Background
I'm working on this bash script, named cachepoint, that I can place in a pipeline to speed things up.
E.g. cat big_data | sed 's/a/b/g' | uniq -c | cachepoint | sort -n
This is a purposefully simple example.
The pipeline may run slowly at first, but on subsequent runs, it will be quicker, as cachepoint starts doing the work.
The way I picture cachepoint working is that it would use the first few hundred lines of input, along with a list of commands before it, in order to form a hash ID for the previously cached data, thus breaking the stdin pipeline early on subsequent runs, resorting instead to printing the cached data. Cached data would get deleted every hour or so.
I.e. everything left of | cachepoint would continue running, perhaps to 1,000,000 lines, in normal circumstances, but on subsequent executions of cachepoint pipelines, everything left of | cachepoint would exit after maybe 100 lines, and cachepoint would simply print the millions of lines it has cached. For the hash of the pipe sources and pipe content, I need a way for cachepoint to read the PIDs of what came before it in the pipeline.
I use pipelines a lot for exploring data sets, and I often find myself piping to temporary files in order to bypass repeating the same costly pipeline more than once. This is messy, so I want cachepoint.
This Shellcheck-clean code should work for your b.sh program on any Linux system:
#! /bin/bash
shopt -s extglob
shopt -s nullglob
left_pid=
# Get the identifier for the pipe connected to the standard input of this
# process (e.g. 'pipe:[10294010]')
input_pipe_id=$(readlink "/proc/self/fd/0")
if [[ $input_pipe_id != pipe:* ]]; then
echo 'ERROR: standard input is not a pipe' >&2
exit 1
fi
# Find the process that has standard output connected to the same pipe
for stdout_path in /proc/+([[:digit:]])/fd/1; do
output_pipe_id=$(readlink -- "$stdout_path")
if [[ $output_pipe_id == "$input_pipe_id" ]]; then
procpid=${stdout_path%/fd/*}
left_pid=${procpid#/proc/}
break
fi
done
if [[ -z $left_pid ]]; then
echo "ERROR: Failed to set 'left_pid'" >&2
exit 1
fi
echo "$left_pid"
cat
It depends on the fact that, on Linux, for a process with id PID the path /proc/PID/fd/0 looks like a symlink to the device connected to the standard input of the process and /proc/PID/fd/1 looks like a symlink to the device connected to the standard output of the process.

Linux Single Instance Kill if running too long

I am using the following to keep a single instance of a script running on my server. I have a cronjob to run this every minute.
How do I daemonize an arbitrary script in unix?
#!/bin/bash
if [[ $# < 1 ]]; then
echo "Name of pid file not given."
exit
fi
# Get the pid file's name.
PIDFILE=$1
shift
if [[ $# < 1 ]]; then
echo "No command given."
exit
fi
echo "Checking pid in file $PIDFILE."
#Check to see if process running.
PID=$(cat $PIDFILE 2>/dev/null)
if [[ $? = 0 ]]; then
ps -p $PID >/dev/null 2>&1
if [[ $? = 0 ]]; then
echo "Command $1 already running."
exit
fi
fi
# Write our pid to file.
echo $$ >$PIDFILE
# Get command.
COMMAND=$1
shift
# Run command
$COMMAND "$*"
Now I found out that my script had hung for some reason and therefore it was stuck. I'd like a way to check if the $PIDFILE is "old" and if so, kill the process. I know that's possible (check the timestamp on the file) but I don't know the syntax or if this is even a good idea. Also, when this script is running, the CPU should be pretty heavily used. If it hangs (rare but it happened at least once so far), the CPU usage drops to 0%. It would be nice if I could check that the process is really hung/not active, but I don't know if there's an easy way to do that (and I don't want to have many false positives where it gets killed but it's running fine).
To answer the question in your title, which seems quite different from your problem, use timeout.
Now, for your problem, I don't see where it could hang, unless you gave it a fifo queue for the pid file. Now, to run and respawn, you can just run this script once, on startup:
#!/bin/bash
while /bin/true; do
"$#"
wait
done
Which brings up another bug in the code you got from the other question: "$*" will pass all the arguments to the script as a single argument; without the quotes it'll split arguments with white space. "$#" will pass them individually and handling white space properly.
Call with /path/to/script command [argument]....

wait child process but get error: 'pid is not a child of this shell'

I write a script to get data from HDFS parrallel,then I wait these child processes in a for loop, but sometimes it returns "pid is not a child of this shell". sometimes, it works well。It's so puzzled. I use "jobs -l" to show all the jobs run in the background. I am sure these pid is the child process of the shell process, and I use "ps aux" to make sure these pids is note assign to other process. Here is my script.
PID=()
FILE=()
let serial=0
while read index_tar
do
echo $index_tar | grep index > /dev/null 2>&1
if [[ $? -ne 0 ]]
then
continue
fi
suffix=`printf '%03d' $serial`
mkdir input/output_$suffix
$HADOOP_HOME/bin/hadoop fs -cat $index_tar | tar zxf - -C input/output_$suffix \
&& mv input/output_$suffix/index_* input/output_$suffix/index &
PID[$serial]=$!
FILE[$serial]=$index_tar
let serial++
done < file.list
for((i=0;i<$serial;i++))
do
wait ${PID[$i]}
if [[ $? -ne 0 ]]
then
LOG "get ${FILE[$i]} failed, PID:${PID[$i]}"
exit -1
else
LOG "get ${FILE[$i]} success, PID:${PID[$i]}"
fi
done
Just find the process id of the process you want to wait for and replace that with 12345 in below script. Further changes can be made as per your requirement.
#!/bin/sh
PID=12345
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running" >> /home/parv/waitAndRun.log
sleep .6
done
echo "Process $PID has finished" >> /home/parv/waitAndRun.log
/usr/bin/waitingScript.sh
http://iamparv.blogspot.in/2013/10/unix-wait-for-running-process-not-child.html
Either your while loop or the for loop runs in a subshell, which is why you cannot await a child of the (parent, outer) shell.
Edit this might happen if the while loop or for loop is actually
(a) in a {...} block
(b) participating in a piper (e.g. for....done|somepipe)
If you're running this in a container of some sort, the condition apparently can be caused by a bug in bash that is easier to encounter in a containerized envrionment.
From my reading of the bash source (specifically see comments around RECYCLES_PIDS and CHILD_MAX in bash-4.2/jobs.c), it looks like in their effort to optimize their tracking of background jobs, they leave themselves vulnerable to PID aliasing (where a new process might obscure the status of an old one); to mitigate that, they prune their background process history (apparently as mandated by POSIX?). If you should happen to want to wait on a pruned process, the shell can't find it in the history and assumes this to mean that it never knew about it (i.e., that it "is not a child of this shell").

Is it possible to make a bash shell script interact with another command line program?

I am using a interactive command line program in a Linux terminal running the bash shell. I have a definite sequence of command that I input to the shell program. The program writes its output to standard output. One of these commands is a 'save' command, that writes the output of the previous command that was run, to a file to disk.
A typical cycle is:
$prog
$$cmdx
$$<some output>
$$save <filename>
$$cmdy
$$<again, some output>
$$save <filename>
$$q
$<back to bash shell>
$ is the bash prompt
$$ is the program's prompt
q is the quit command for prog
prog is such that it appends the output of the previous command to filename
How can I automate this process? I would like to write a shell script that can start this program, and cycle through the steps, feeding it the commands one by one and, and then quitting. I hope the save command works correctly.
If your command doesn't care how fast you give it input, and you don't really need to interact with it, then you can use a heredoc.
Example:
#!/bin/bash
prog <<EOD
cmdx
save filex
cmdy
save filey
q
EOD
If you need branching based on the output of the program, or if your program is at all sensitive to the timing of your commands, then Expect is what you want.
I recommend you use Expect. This tool is designed to automate interactive shell applications.
Where there's a need, there's a way! I think that it's a good bash lesson to see
how process management and ipc works. The best solution is, of course, Expect.
But the real reason is that pipes can be tricky and many commands are designed
to wait for data, meaning that the process will become a zombie for reasons that
bay be difficult to predict. But learning how and why reminds us of what is
going on under the hood.
When two processes engage in a conversation, the danger is that one or both will
try to read data that will never arrive. The rules of engagement have to be
crystal clear. Things like CRLF and character encoding can kill the party.
Luckily, two close partners like a bash script and its child process are
relatively easy to keep in line. The easiest thing to miss is that bash is
launching a child process for just about every thing it does. If you can make it
work with bash, you thoroughly know what you're doing.
The point is that we want to talk to another process. Here's a server:
# a really bad SMTP server
# a hint at courtesy to the client
shopt -s nocasematch
echo "220 $HOSTNAME SMTP [$$]"
while true
do
read
[[ "$REPLY" =~ ^helo\ [^\ ] ]] && break
[[ "$REPLY" =~ ^quit ]] && echo "Later" && exit
echo 503 5.5.1 Nice guys say hello.
done
NAME=`echo "$REPLY" | sed -r -e 's/^helo //i'`
echo 250 Hello there, $NAME
while read
do
[[ "$REPLY" =~ ^mail\ from: ]] && { echo 250 2.1.0 Good guess...; continue; }
[[ "$REPLY" =~ ^rcpt\ to: ]] && { echo 250 2.1.0 Keep trying...; continue; }
[[ "$REPLY" =~ ^quit ]] && { echo Later, $NAME; exit; }
echo 502 5.5.2 Please just QUIT
done
echo Pipe closed, exiting
Now, the script that hopefully does the magic.
# Talk to a subprocess using named pipes
rm -fr A B # don't use old pipes
mkfifo A B
# server will listen to A and send to B
./smtp.sh < A > B &
# If we write to A, the pipe will be closed.
# That doesn't happen when writing to a file handle.
exec 3>A
read < B
echo "$REPLY"
# send an email, so long as response codes look good
while read L
do
echo "> $L"
echo $L > A
read < B
echo $REPLY
[[ "$REPLY" =~ ^2 ]] || break
done <<EOF
HELO me
MAIL FROM: me
RCPT TO: you
DATA
Subject: Nothing
Message
.
EOF
# This is tricky, and the reason sane people use Expect. If we
# send QUIT and then wait on B (ie. cat B) we may have trouble.
# If the server exits, the "Later" response in the pipe might
# disappear, leaving the cat command (and us) waiting for data.
# So, let cat have our STDOUT and move on.
cat B &
# Now, we should wait for the cat process to get going before we
# send the QUIT command. If we don't, the server will exit, the
# pipe will empty and cat will miss its chance to show the
# server's final words.
echo -n > B # also, 'sleep 1' will probably work.
echo "> quit"
echo "quit" > A
# close the file handle
exec 3>&-
rm A B
Notice that we are not simply dumping the SMTP commands on the server. We check
each response code to make sure things are OK. In this case, things will not be
OK and the script will bail.
I use Expect to interact with the shell for switch and router backups. A bash script calls the expect script with the correct variables.
for i in <list of machines> ; do expect_script.sh $i ; exit
This will ssh to each box, run the backup commands, copy out the appropriate files, and then move on to the next box.
For simple use cases you may use a combination of subshell, echo & sleep:
# in Terminal.app
telnet localhost 25
helo localhost
ehlo localhost
quit
(sleep 5; echo "helo localhost"; sleep 5; echo "ehlo localhost"; sleep 5; echo quit ) |
telnet localhost 25
echo "cmdx\nsave\n...etc..." | prog
..?

Resources