perl hangs on exit (after closing a filehandle)

perl hangs on exit (after closing a filehandle) - linux

I've got a function that does (in short):
my $file = IO::File->new("| some_command >> /dev/null 2>&1")
or die "cannot open some_command for writing: $!\n";
...
undef $file;
Right now I'm not even writing anything to $file. Currently there are no other operations on $file at all. When I run the program, it doesn't exit properly. I see that handle is closed, but my program is still waiting for the process to close. Captured with strace:
close(6) = 0
rt_sigaction(SIGHUP, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0
wait4(16861, ^C <unfinished ...>
I don't see this problem if I open the same process for reading.
What do I have to do to make the program to exit?
Edit: Suggestions so far were to use the Expect library or to finish the input stream via ctrl+d. But I do not want to interact with the program in any way at this point. I want it to finish exactly now without any more IO going on. Is that possible?

undef $file removes a reference count from the filehandle and makes it eligible for garbage collection. If $file is a handle to a regular file and there are no other references to the filehandle anywhere else, it should work as documented in IO::File. In this case $file is a handle to a shell command, and there may be some other internal references to the filehandle that keep it from getting destroyed. Using $file->close is safer and makes your intent much clearer.
To kill off the command when closing the filehandle doesn't work, you need the process ID. If you invoked the command like
my ($file,$pid);
$pid = open($file, "| some_command >> /dev/null 2>&1");
then you could
kill 'TERM',$pid;
at the end of your program. I don't know how to extract the process ID from the return value of IO::File::new though.

If some_command is waiting for input, it will likely sit there forever doing just that, waiting for input.
From what the docs say, I don't think it makes any difference, but I always do $file->close() instead of/before undef'ing the handle.
EDIT: Send it Control D?
Perhaps some_command is reading tty's instead of stdin, like passwd does. If you are in that realm, I'd suggest looking up Expect.
Control D simply duplicates the zero byte read that close should do anyway for a command line program.
Have you tried using $file->close() instead of the undef?

Does some_command slurp all input and process it? Such as grep?
Or does it prompt? Like, say... chfn?
Does it return any useful information? Like an indication that it's finished?
If it's the latter, you might want to read up on Expect so you that you can interact with it.

This ugly, ugly hack will cause some_command to be parented to init instead of staying in your perl's process tree. Perl no longer has any process to wait for, and the pipe still works -- yay UNIX.
my $file = IO::File->new("| some_command >> /dev/null 2>&1 &")
Cons: The shell will succeed at & even if some_command fails, so you won't get any errors back.
or die "cannot open some_command for writing: $!\n"; # now useless
If some_command exited as soon as it got an EOF on stdin (and never stops reading from stdin), though, I'd expect this wouldn't be necessary.
$ cat | some_command
^D
Does that hang, and can you fix that?

Related

Chronologically capturing STDOUT and STDERR

This very well may fall under KISS (keep it simple) principle but I am still curious and wish to be educated as to why I didn't receive the expected results. So, here we go...
I have a shell script to capture STDOUT and STDERR without disturbing the original file descriptors. This is in hopes of preserving the original order of output (see test.pl below) as seen by a user on the terminal.
Unfortunately, I am limited to using sh, instead of bash (but I welcome examples), as I am calling this from another suite and I may wish to use it in a cron in the future (I know cron has the SHELL environment variable).
wrapper.sh contains:
#!/bin/sh
stdout_and_stderr=$1
shift
command=$#
out="${TMPDIR:-/tmp}/out.$$"
err="${TMPDIR:-/tmp}/err.$$"
mkfifo ${out} ${err}
trap 'rm ${out} ${err}' EXIT
> ${stdout_and_stderr}
tee -a ${stdout_and_stderr} < ${out} &
tee -a ${stdout_and_stderr} < ${err} >&2 &
${command} >${out} 2>${err}
test.pl contains:
#!/usr/bin/perl
print "1: stdout1\n";
print STDERR "2: stderr1\n";
print "3: stdout2\n";
In the scenario:
sh wrapper.sh /tmp/xxx perl test.pl
STDOUT contains:
1: stdout1
3: stdout2
STDERR contains:
2: stderr1
All good so far...
/tmp/xxx contains:
2: stderr1
1: stdout1
3: stdout2
However, I was expecting /tmp/xxx to contain:
1: stdout1
2: stderr1
3: stdout2
Can anyone explain to me why STDOUT and STDERR are not appending /tmp/xxx in the order that I expected? My guess would be that the backgrounded tee processes are blocking the /tmp/xxx resource from one another since they have the same "destination". How would you solve this?
related: How do I write stderr to a file while using "tee" with a pipe?

It is a feature of the C runtime library (and probably is imitated by other runtime libraries) that stderr is not buffered. As soon as it is written to, stderr pushes all of its characters to the destination device.
By default stdout has a 512-byte buffer.
The buffering for both stderr and stdout can be changed with the setbuf or setvbuf calls.
From the Linux man page for stdout:
NOTES: The stream stderr is unbuffered. The stream stdout is line-buffered when it points to a terminal. Partial lines will not appear until fflush(3) or exit(3) is called, or a newline is printed. This can produce unexpected results, especially with debugging output. The buffering mode of the standard streams (or any other stream) can be changed using the setbuf(3) or setvbuf(3) call. Note that in case stdin is associated with a terminal, there may also be input buffering in the terminal driver, entirely unrelated to stdio buffering. (Indeed, normally terminal input is line buffered in the kernel.) This kernel input handling can be modified using calls like tcsetattr(3); see also stty(1), and termios(3).

After a little bit more searching, inspired by #wallyk, I made the following modification to wrapper.sh:
#!/bin/sh
stdout_and_stderr=$1
shift
command=$#
out="${TMPDIR:-/tmp}/out.$$"
err="${TMPDIR:-/tmp}/err.$$"
mkfifo ${out} ${err}
trap 'rm ${out} ${err}' EXIT
> ${stdout_and_stderr}
tee -a ${stdout_and_stderr} < ${out} &
tee -a ${stdout_and_stderr} < ${err} >&2 &
script -q -F 2 ${command} >${out} 2>${err}
Which now produces the expected:
1: stdout1
2: stderr1
3: stdout2
The solution was to prefix the $command with script -q -F 2 which makes script quite (-q) and then forces file descriptor 2 (STDOUT) to flush immediately (-F 2).
I am now researching to determine how portable this is. I think -F pipe may be Mac and FreeBSD and -f or --flush may be other distros...
related: How to make output of any shell command unbuffered?

How to handle updates from an continuous process pipe in Perl

I am trying to follow log files in Perl on Fedora but unfortunately, Fedora uses journalctl to read binary log files that I cannot parse directly. This, according to my understanding, means I can only read Fedora's log files by calling journalctl.
I tried using IO::Pipe to do this, but the problem is that $p->reader(..) waits until journalctl --follow is done writing output (which will be never since --follow is like tail -F) and then allows me to print everything out which is not what I want. I would like to be able to set a callback function to be called each time a new line is printed to the process pipe so that I can parse/handle each new log event.
use IO::Pipe;
my $p = IO::Pipe->new();
$p->reader("journalctl --follow"); #Waits for process to exit
while (<$p>) {
print;
}

I assume that journalctl is working like tail -f. If this is correct, a simple open should do the job:
use Fcntl; # Import SEEK_CUR
my $pid = open my $fh, '|-', 'journalctl --follow'
or die "Error $! starting journalctl";
while (kill 0, $pid) {
while (<$fh>) {
print $_; # Print log line
}
sleep 1; # Wait some time for new lines to appear
seek($fh,0,SEEK_CUR); # Reset EOF
}
open opens a filehandle for reading the output of the called command: http://perldoc.perl.org/functions/open.html
seek is used to reset the EOF marker: http://perldoc.perl.org/functions/seek.html Without reset, all subsequent <$fh> calls will just return EOF even if the called script issued additional output in the meantime.
kill 0,$pid will be true as long as the child process started by open is alive.
You may replace sleep 1 by usleep from Time::HiRes or select undef,undef,undef,$fractional_seconds; to wait less than a second depending on the frequency of incoming lines.
AnyEvent should also be able to do the job via it's AnyEvent::Handle.
Update:
Adding use POSIX ":sys_wait_h"; at the beginning and waitpid $pid, WNOHANG) to the outer loop would also detect (and reap) a zombie journalctl process:
while (kill(0, $pid) and waitpid($pid, WNOHANG) != $pid) {
A daemon might also want to check if $pid is still a child of the current process ($$) and if it's still the original journalctl process.

I have no access to journalctl, but if you avoid IO::Pipe and open the piped output directly then the data will not be buffered
use strict;
use warnings 'all';
open my $follow_fh, '-|', 'journalctl --follow' or die $!;
print while <$follow_fh>;

Bash output happening after prompt, not before, meaning I have to manually press enter

I am having a problem getting bash to do exactly what I want, it's not a major issue, but annoying.
1.) I have a third party software I run that produces some output as stderr. Some of it is useful, some of it is regularly stuff I don't care about and I don't want this dumped to screen, however I do want the useful parts of the stderr dumped to screen. I figured the best way to achieve this was to pass stderr to a function, then use conditions in that function to either show the stderr or not.
2.) This works fine. However the solution I have implemented dumped out my errors at the right time, but then returns a bash prompt and I want to summarise the status of the errors at the end of the function, but echo-ing here prints the text after the prompt meaning that I have to press enter to get back to a clean prompt. It shall become clear with the example below.
My error stream generator:
./TestErrorStream.sh
#!/bin/bash
echo "test1" >&2
My function to process this:
./Function.sh
#!/bin/bash
function ProcessErrors()
{
while read data;
do
echo Line was:"$data"
done
sleep 5 # This is used simply to simulate the processing work I'm doing on the errors.
echo "Completed"
}
I source the Function.sh file to make ProcessErrors() available, then I run:
2> >(ProcessErrors) ./TestErrorStream.sh
I expect (and want) to get:
user#user-desktop:~/path$ 2> >(ProcessErrors) ./TestErrorStream.sh
Line was:test1
Completed
user#user-desktop:~/path$
However what I really get is:
user#user-desktop:~/path$ 2> >(ProcessErrors) ./TestErrorStream.sh
Line was:test1
user#user-desktop:~/path$ Completed
And no clean prompt. Of course the prompt is there, but "Completed" is being printed after the prompt, I want to printed before, and then a clean prompt to appear.
NOTE: This is a minimum working example, and it's contrived. While other solutions to my error stream problem are welcome I also want to understand how to make bash run this script the way I want it to.
Thanks for your help
Joey

Your problem is that the while loop stay stick to stdin until the program exits.
The release of stdin occurs at the end of the "TestErrorStream.sh", so your prompt is almost immediately available compared to what remains to process in the function.
I suggest you wrap the command inside a script so you'll be able to handle the time you want before your prompt is back (I suggest 1sec more than the suspected time needed for the function to process the remaining lines of codes)
I successfully managed to do this like that :
./Functions.sh
#!/bin/bash
function ProcessErrors()
{
while read data;
do
echo Line was:"$data"
done
sleep 5 # simulate required time to process end of function (after TestErrorStream.sh is over and stdin is released)
echo "Completed"
}
./TestErrorStream.sh
#!/bin/bash
echo "first"
echo "firsterr" >&2
sleep 20 # any number here
./WrapTestErrorStream.sh
#!/bin/bash
source ./Functions.sh
2> >(ProcessErrors) ./TestErrorStream.sh
sleep 6 # <= this one is important
With the above you'll get a nice "Completed" before your prompt after 26 seconds of processing. (Works fine with or without the additional "time" command)
user#host:~/path$ time ./WrapTestErrorStream.sh
first
Line was:firsterr
Completed
real 0m26.014s
user 0m0.000s
sys 0m0.000s
user#host:~/path$
Note: the process substitution ">(ProcessErrors)" is a subprocess of the script "./TestErrorStream.sh". So when the script ends, the subprocess is no more tied to it nor to the wrapper. That's why we need that final "sleep 6"

#!/bin/bash
function ProcessErrors {
while read data; do
echo Line was:"$data"
done
sleep 5
echo "Completed"
}
# Open subprocess
exec 60> >(ProcessErrors)
P=$!
# Do the work
2>&60 ./TestErrorStream.sh
# Close connection or else subprocess would keep on reading
exec 60>&-
# Wait for process to exit (wait "$P" doesn't work). There are many ways
# to do this too like checking `/proc`. I prefer the `kill` method as
# it's more explicit. We'd never know if /proc updates itself quickly
# among all systems. And using an external tool is also a big NO.
while kill -s 0 "$P" &>/dev/null; do
sleep 1s
done
Off topic side-note: I'd love to see how posturing bash veterans/authors try to own this. Or perhaps they already did way way back from seeing this.

bash script read pipe or argument

I want my script to read a string either from stdin , if it's piped, or from an argument. So first i want to check if some text is piped and if not it should use an argument as input. My code looks something like this:
value=$(cat) # read from stdin
if [ "$pipe" != "" ]; then #check if pipe is not empty
#Do something with pipe string
else
#Do something with argument string
fi
The problem is when it's not piped, then the script will halt and wait for "ctrl d" and i dont want that. Any suggestions on how to solve this?
Thanks in advance.
/Tomas

What about checking the argument first?
if (($#)) ; then
process "$1"
else
cat | process
fi
Or, just take advantage from the same behaviour of cat:
cat "$#" | process

If you only need to know if it's a pipe or a redirection, it should be sufficient to determine if stdin is a terminal or not:
if [ -t 0 ]; then
# stdin is a tty: process command line
else
# stdin is not a tty: process standard input
fi
[ (aka test) with -t is equivalent to the libc isatty() function.
The above will work with both something | myscript and myscript < infile. This is the simplest solution, assuming your script is for interactive use.
The [ command is a builtin in bash and some other shells, and since [/test with -tis in POSIX, it's portable too (not relying on Linux, bash, or GNU utility features).
There's one edge case, test -t also returns false if the file descriptor is invalid, but it would take some slight adversity to arrange that. test -e will detect this, though assuming you have a filename such as /dev/stdin to use.
The POSIX tty command can also be used, and handles the adversity above. It will print the tty device name and return 0 if stdin is a terminal, and will print "not a tty" and return 1 in any other case:
if tty >/dev/null ; then
# stdin is a tty: process command line
else
# stdin is not a tty: process standard input
fi
(with GNU tty, you can use tty -s for silent operation)
A less portable way, though certainly acceptable on a typical Linux, is to use GNU stat with its %F format specifier, this returns the text "character special file", "fifo" and "regular file" in the cases of terminal, pipe and redirection respectively. stat requires a filename, so you must provide a specially-named file of the form /dev/stdin, /dev/fd/0, or /proc/self/fd/0, and use -L to chase symlinks:
stat -L -c "%F" /dev/stdin
This is probably the best way to handle non-interactive use (since you can't make assumptions about terminals then), or to detect an actual pipe (FIFO) distinct from redirection.
There is a slight gotcha with %F in that you cannot use it to tell the difference between a terminal and certain other device files, for example /dev/zero or /dev/null which are also "character special files" and might reasonably appear. An unpretty solution is to use %t to report the underlying device type (major, in hex), assuming you know what the underlying tty device number ranges are... and that depends on whether you're using BSD style ptys or Unix98 ptys, or whether you're on the actual console, among other things. In the simple case %t will be 0 though for a pipe or a redirection of a normal (non-special) file.
More general solutions to this kind of problem are to use bash's read with a timeout (read -t 0 ...) or non-blocking I/O with GNU dd (dd iflag=nonblock).
The latter will allow you to detect lack of input on stdin, dd will return an exit code of 1 if there is nothing ready to read. However, these are more suitable for non-blocking polling loops, rather than a once-off check: there is a race condition when you start two or more processes in a pipeline as one may be ready to read before another has written.

It's easier to check for command line arguments first and fallback to stdin if no arguments. Shell Parameter Expansion is a nice shorthand instead of the if-else:
value=${*:-`cat`}
# do something with $value

redirecting stdin _and_ stdout to a pipe

I would like to run a program "A", have its output go to the input to another program "B", as well as stdin going to intput of "B". If program "A" closes, I'd like "B" to continue running.
I can redirect A output to B input easily:
./a | ./b
And I can combine stderr into the output if I'd like:
./a 2>&1 | ./b
But I can't figure out how to combine stdin into the output. My guess would be:
./a 0>&1 | ./b
but it doesn't work.
Here's a test that doesn't require us to rewrite up any test programs:
$ echo ls 0>&1 | /bin/sh -i
$ a b info.txt
$
/bin/sh: Cannot set tty process group (No such process)
If possible, I'd like to do this using only bash redirection on the command line (I don't want to write a C program to fork off child processes and do anything complicated everytime I want to do some redirection of stdin to a pipe).

This cannot be done without writing an auxiliary program.
In general, stdin could be a read-only file descriptor (heck, it might refer to read-only file). So you cannot "insert" anything into it.
You will need to write a "helper" program that monitors two file descriptors (say, 0 and 3) in order to read from both and "merge" them. A simple select or poll loop would be sufficient, and you could write it in most scripting languages, but not the shell, I don't think.
Then you can use shell redirection to feed your program's output to descriptor 3 of the "helper".
Since what you want is basically the opposite of "tee", I might call it "eet"...
[edit]
If only you could launch "cat" in the background...
But that will fail because background processes with a controlling terminal cannot read from stdin. So if you could just detach "cat" from its controlling terminal and run it in the background...
On Linux, "setsid cat" should do it, roughly. But (a) I could not get it to work very well and (b) I really do not have time for this today and (c) it is non-standard anyway.
I would just write the helper program.
[edit 2]
OK, this seems to work:
{ seq 5 ; sleep 2 ; seq 5 ; } | /bin/bash -c 'set -m ; setsid cat ; echo HELLO'
The set -m thing forces bash to enable job control, which apparently is needed to prevent the shell from redirecting stdin from /dev/null.
Here, the echo HELLO represents your "program A". The seq commands (with the sleep in the middle) are just to provide some input. And yes, you can pipe this whole thing to process B.
About as ugly and non-portable a solution as you could ask for...

A pipe has two ends. One is for writing, and that which gets written appears in the other end, which is for reading.
It's a pipe, not a T or Y junction.
I don't think your scenario is possible. Having "stdin going to input of" anything doesn't make sense.

If I understand your requirements correctly, you want this set up (ASCII art to the fore):
o----+----->| A |----+---->| B |---->o
| ^
| |
+------------------+
with the additional constraint that if process A closes up shop, process B should be able to continue with the input stream going to B.
This is a non-standard setup, as you realize, and can only be achieved by using an auxilliary program to drive the input to A and B. You end up with some interesting synchronization issues but it will all work remarkably well as long as your messages are short enough.
The plumbing necessary to achieve this is notable - you'll need two pipes, one for the input to A and the other for the input to B, and the output of A will be connected to the input of B as well.
o---->| C |---------->| A |----+---->| B |---->o
| ^
| |
+--------------------------+
Note that C will be writing the data twice, once to A and once to B. Note, too, that the pipe from A to B is the same pipe as the pipe from C to A.

To make the given test case work you have to while ... read from the controlling terminal device /dev/tty inside a sh -c '...' construct.
Note the use of eval (could it be avoided here?) and that multi-line commands on input> will fail.
echo 'ls; export var=myval' | (
stdin="$(</dev/stdin)"
/bin/sh -i -c '
eval "$1";
while IFS="" read -e -r -p "input> " line; do
history -s "${line}"
eval "${line}";
done </dev/tty
' argv0 "${stdin}"
)
input> echo $var
For a similar problem and the use of named pipes see here:
BASH: Best architecture for reading from two input streams

This can't be done exactly as shown, but to perform your example you can make use of cat's ability to join files together:
cat <(echo ls) - | /bin/sh
(You can do -i, but then you'll have to have another process kill the /bin/sh, as your attempts to Ctrl-C and Ctrl-D out will fail.)
This assumes that you want to pass in your piped input and then accept from stdin. You can also make it so that it does something after stdin is done, or on both sides -- but it won't merge input character-by-character or line-by-line.

This seems to do what you want:
$ ( ./a <&-; cat ) | ./b
(It's not clear to me if you want a to get input...this solution sends all input to b)
Of course, in this case the inputs to b are strictly ordered: all of the output of a
is sent to b first, then a terminates, then input goes to b. If you want things
interleaved, try:
$ ( ./a <&- & cat ) | ./b

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string