pass stdout as file name for command line util? - linux

I'm working with a command line utility that requires passing the name of a file to write output to, e.g.
foo -o output.txt
The only thing it writes to stdout is a message that indicates that it ran successfully. I'd like to be able to pipe everything that is written to output.txt to another command line utility. My motivation is that output.txt will end up being a 40 GB file that I don't need to keep, and I'd rather pipe the streams than work on massive files in a stepwise manner.
Is there any way in this scenario to pipe the real output (i.e. output.txt) to another command? Can I somehow magically pass stdout as the file argument?

Solution 1: Using process substitution
The most convenient way of doing this is by using process substitution. In bash the syntax looks as follows:
foo -o >(other_command)
(Note that this is a bashism. There's similar solutions for other shells, but bottom line is that it's not portable.)
Solution 2: Using named pipes explicitly
You can do the above explicitly / manually as follows:
Create a named pipe using the mkfifo command.
mkfifo my_buf
Launch your other command with that file as input
other_command < my_buf
Execute foo and let it write it's output to my_buf
foo -o my_buf
Solution 3: Using /dev/stdout
You can also use the device file /dev/stdout as follows
foo -o /dev/stdout | other_command

Named pipes work fine, but you have a nicer, more direct syntax available via bash process substitution that has the added benefit of not using a permanent named pipe that must later be deleted (process substitution uses temporary named pipes behind the scenes):
foo -o >(other command)
Also, should you want to pipe the output to your command and also save the output to a file, you can do this:
foo -o >(tee output.txt) | other command

For the sake of making stackoverflow happy let me write a long enough sentence because my proposed solution is only 18 characters long instead of the required 30+
foo -o /dev/stdout

You could use the magic of UNIX and create a named pipe :)
Create the pipe
$ mknod -p mypipe
Start the process that reads from the pipe
$ second-process < mypipe
Start the process, that writes into the pipe
$ foo -o mypipe

foo -o <(cat)
if for some reason you don't have permission to write to /dev/stdout

I use /dev/tty as the output filename, equivalent to using /dev/nul/ when you want to output nothing at all. Then | and you are done.

Related

How to avoid creating tmp-Files on Linux?

I have a program that needs to access a file. The content of this file is in a variable and I want to avoid creating a real file. Is it possible to create some kind of virtual file??
X='some stuff'
Y='other stuff'
echo $X > some_file
echo "$Y" | command --file some_file -
Normally I would pipe this to the command, but STDIN already "in use". Is there any way to pass both variables to the command, without creating tmp files?
Standard input is file descriptor 0, which by convention is used to pass the “default” input. Apart from standard input, most commands only let you specify an input by passing a file name.
Linux and most other modern unices have a “special” file name which means “whatever is already open on this file descriptor”: /dev/fd/N. So to make command read from file descriptor 3 rather than some disk file, pass command --file /dev/fd/3. This works with most commands; the only obstacle at this point is that some commands insist a file name with a specific extension. You can usually work around this with a symbolic link.
Since you're telling the command to read from file descriptor 3, you need to have it open when you run the command. You can open file descriptor 3 to read from a file with command --file /dev/fd/3 <some_file, but if you do that you might as well run command --file some_file. Where this gets useful is that file descriptor 3 can come from a pipe. Pipelines in the shell always connect the standard output of the left-hand side to the standard input of the right-hand side, but you can use file descritor redirection to move the file descriptors around.
echo "$X" | {
echo "$Y" | command --file /dev/fd/3 -
} 3<&0
In this example, the whole braced group has its file descriptor 3 reading from the pipe that receives $X. Thus, when command reads from the file whose name is passed to --file, it sees the value of X.
This works in any sh variant.
There's a simpler syntax that works in bash, ksh and zsh: process substitution. The code below is pretty much equivalent to the code above:
echo "$Y" | command --file <(echo "$X") -
The shell picks a free file descriptor and makes a pipe from echo "$X" to that descriptor, and replaces <(…) by the correct /dev/fd/N.
In both cases, the file passed to --file is a pipe. So this only works if the command works with a pipe. Some commands don't work with pipes because they don't read from the file linearly from start to end; with such commands, you have to use a temporary file.

Stream specific numbered Bash file descriptor into variable

I am trying to stream a specific numbered file descriptor into a variable in Bash. I can do this from normal standard in using the following function, but, how do it do it from a specific file descriptor. I need to direct the FD into the sub-shell if I use the same approach. I could always do it reading line by line, but, if I can do it in a continuous stream then that would be massively preferable.
The function I have is:
streamStdInTo ()
{
local STORE_INvar="${1}" ; shift
printf -v "${STORE_INvar}" '%s' "$( cat - )"
}
Yes, I know that this wouldn't work normally as the end of a pipeline would be lost (due to its execution in a sub-shell), however, either in the context of the Bash 4 set +m ; shopt -s lastpipe method of executing the end of a pipeline in the same shell as the start, or, by directing into this via a different file descriptor I am hoping to be able to use it.
So, my question is, How do I use the above but with different file descriptors than the normal?
It's not entirely clear what you mean, but perhaps you are looking for something like:
cat - <&4 # read from fd 4
Or, just call your current function with the redirect:
streamStdInTo foo <&4
edit:
Addressing some questions from the comment, you can use a fifo:
#!/bin/bash
trap 'rm -f $f' 0
f=$(mktemp xxx)
rm $f
mkfifo $f
echo foo > $f &
exec 4< $f
cat - <&4
wait
I think there's a lot of confusion about what exactly you're trying to do. If I understand correctly the end goal here is to run a pipeline and capture the output in a variable, right? Kind of like this:
var=$(cmd1 | cmd2)
Except I guess the idea here is that the name of "$var" is stored in another variable:
varname=var
You can do an end-run around Bash's usual job control situation by using process substitution. So instead of this normal pipeline (which would work in ksh or zsh, but not in bash unless you set lastpipe):
cmd1 | cmd2 | read "$varname"
You would use this command, which is equivalent apart from how the shell handles the job:
read "$varname" < <(cmd1 | cmd2)
With process substitution, "read $varname" isn't run in a pipeline, so Bash doesn't fork to run it. (You could use your streamStdInTo() function there as well, of course)
As I understand it, you wanted to solve this problem by using numeric file descriptors:
cmd1 | cmd2 >&$fd1 &
read "$varname" <&$fd2
To create those file descriptors that connect the pipeline background job to the "read" command, what you need is called a pipe, or a fifo. These can be created without touching the file system (the shell does it all the time!) but the shell doesn't directly expose this functionality, which is why we need to resort to mkfifo to create a named pipe. A named pipe is a special file that exists on the filesystem, but the data you write to it doesn't go to the disk. It's a data queue stored in memory (a pipe). It doesn't need to stay on the filesystem after you've opened it, either, it can be deleted almost immediately:
pipedir=$(mktemp -d /tmp/pipe_maker_XXXX)
mkfifo ${pipedir}/pipe
exec {temp_fd}<>${pipedir}/pipe # Open both ends of the pipe
exec {fd1}>${pipedir}/pipe
exec {fd2}<${pipedir}/pipe
exec {temp_fd}<&- # Close the read/write FD
rm -rf ${pipedir} # Don't need the named FIFO any more
One of the difficulties in working with named pipes in the shell is that attempting to open them just for reading, or just for writing causes the call to block until something opens the other end of the pipe. You can get around that by opening one end in a background job before trying to open the other end, or by opening both ends at once as I did above.
The "{fd}<..." syntax dynamically assigns an unused file descriptor number to the variable $fd and opens the file on that file descriptor. It's been around in ksh for ages (since 1993?), but in Bash I think it only goes back to 4.1 (from 2010).

Linux All Output to a File

Is there any way to tell Linux system put all output(stdout,stderr) to a file?
With out using redirection, pipe or modification the how scrips get called.
Just tell the Linux use a file for output.
for example:
script test1.sh:
#!/bin/bash
echo "Testing 123 "
If i run it like "./test1.sh" (with out redirection or pipe)
i'd like to see "Testing 123" in a file (/tmp/linux_output)
Problem: in the system a binary makes a call to a script and this script call many other scrips. it is not possible to modify each call so If i can modify Linux put "output" to a file i can review the logs.
#!/bin/bash
exec >file 2>&1
echo "Testing 123 "
You can read more about exec here
If you are running the program from a terminal, you can use the command script.
It will open up a sub-shell. Do what you need to do.
It will copy all output to the terminal into a file. When you are done, exit the shell. ^D, or exit.
This does not use redirection or pipes.
You could set your terminal's scrollback buffer to a large number of lines and then see all the output from your commands in the buffer - depending on your terminal window and the options in its menus, there may be an option in there to capture terminal I/O to a file.
Your requirement if taken literally is an impractical one, because it is based in a slight misunderstanding. Fundamentally, to get the output to go in a file, you will have to change something to direct it there - which would violate your literal constraint.
But the practical problem is solvable, because unless explicitly counteracted in the child, the output directions configured in a parent process will be inherited. So you only have to setup the redirection once, using either a shell, or a custom launcher program or intermediary. After that it will be inherited.
So, for example:
cat > test.sh
#/bin/sh
echo "hello on stdout"
rm nosuchfile
./test2.sh
And a child script for it to call
cat > test2.sh
#/bin/sh
echo "hello on stdout from script 2"
rm thisfileisnteither
./nonexistantscript.sh
Run the first script redirecting both stdout and stderr (bash version - and you can do this in many ways such as by writing a C program that redirects its outputs then exec()'s your real program)
./test.sh &> logfile
Now examine the file and see results from stdout and stderr of both parent and child.
cat logfile
hello on stdout
rm: nosuchfile: No such file or directory
hello on stdout from script 2
rm: thisfileisnteither: No such file or directory
./test2.sh: line 4: ./nonexistantscript.sh: No such file or directory
Of course if you really dislike this, you can always always modify the kernel - but again, that is changing something (and a very ungainly solution too).

Need explanations for Linux bash builtin exec command behavior

From Bash Reference Manual I get the following about exec bash builtin command:
If command is supplied, it replaces the shell without creating a new process.
Now I have the following bash script:
#!/bin/bash
exec ls;
echo 123;
exit 0
This executed, I got this:
cleanup.sh ex1.bash file.bash file.bash~ output.log
(files from the current directory)
Now, if I have this script:
#!/bin/bash
exec ls | cat
echo 123
exit 0
I get the following output:
cleanup.sh
ex1.bash
file.bash
file.bash~
output.log
123
My question is:
If when exec is invoked it replaces the shell without creating a new process, why when put | cat, the echo 123 is printed, but without it, it isn't. So, I would be happy if someone can explain what's the logic of this behavior.
Thanks.
EDIT:
After #torek response, I get an even harder to explain behavior:
1.exec ls>out command creates the out file and put in it the ls's command result;
2.exec ls>out1 ls>out2 creates only the files, but do not put inside any result. If the command works as suggested, I think the command number 2 should have the same result as command number 1 (even more, I think it should not have had created the out2 file).
In this particular case, you have the exec in a pipeline. In order to execute a series of pipeline commands, the shell must initially fork, making a sub-shell. (Specifically it has to create the pipe, then fork, so that everything run "on the left" of the pipe can have its output sent to whatever is "on the right" of the pipe.)
To see that this is in fact what is happening, compare:
{ ls; echo this too; } | cat
with:
{ exec ls; echo this too; } | cat
The former runs ls without leaving the sub-shell, so that this sub-shell is therefore still around to run the echo. The latter runs ls by leaving the sub-shell, which is therefore no longer there to do the echo, and this too is not printed.
(The use of curly-braces { cmd1; cmd2; } normally suppresses the sub-shell fork action that you get with parentheses (cmd1; cmd2), but in the case of a pipe, the fork is "forced", as it were.)
Redirection of the current shell happens only if there is "nothing to run", as it were, after the word exec. Thus, e.g., exec >stdout 4<input 5>>append modifies the current shell, but exec foo >stdout 4<input 5>>append tries to exec command foo. [Note: this is not strictly accurate; see addendum.]
Interestingly, in an interactive shell, after exec foo >output fails because there is no command foo, the shell sticks around, but stdout remains redirected to file output. (You can recover with exec >/dev/tty. In a script, the failure to exec foo terminates the script.)
With a tip of the hat to #Pumbaa80, here's something even more illustrative:
#! /bin/bash
shopt -s execfail
exec ls | cat -E
echo this goes to stdout
echo this goes to stderr 1>&2
(note: cat -E is simplified down from my usual cat -vET, which is my handy go-to for "let me see non-printing characters in a recognizable way"). When this script is run, the output from ls has cat -E applied (on Linux this makes end-of-line visible as a $ sign), but the output sent to stdout and stderr (on the remaining two lines) is not redirected. Change the | cat -E to > out and, after the script runs, observe the contents of file out: the final two echos are not in there.
Now change the ls to foo (or some other command that will not be found) and run the script again. This time the output is:
$ ./demo.sh
./demo.sh: line 3: exec: foo: not found
this goes to stderr
and the file out now has the contents produced by the first echo line.
This makes what exec "really does" as obvious as possible (but no more obvious, as Albert Einstein did not put it :-) ).
Normally, when the shell goes to execute a "simple command" (see the manual page for the precise definition, but this specifically excludes the commands in a "pipeline"), it prepares any I/O redirection operations specified with <, >, and so on by opening the files needed. Then the shell invokes fork (or some equivalent but more-efficient variant like vfork or clone depending on underlying OS, configuration, etc), and, in the child process, rearranges the open file descriptors (using dup2 calls or equivalent) to achieve the desired final arrangements: > out moves the open descriptor to fd 1—stdout—while 6> out moves the open descriptor to fd 6.
If you specify the exec keyword, though, the shell suppresses the fork step. It does all the file opening and file-descriptor-rearranging as usual, but this time, it affects any and all subsequent commands. Finally, having done all the redirections, the shell attempts to execve() (in the system-call sense) the command, if there is one. If there is no command, or if the execve() call fails and the shell is supposed to continue running (is interactive or you have set execfail), the shell soldiers on. If the execve() succeeds, the shell no longer exists, having been replaced by the new command. If execfail is unset and the shell is not interactive, the shell exits.
(There's also the added complication of the command_not_found_handle shell function: bash's exec seems to suppress running it, based on test results. The exec keyword in general makes the shell not look at its own functions, i.e., if you have a shell function f, running f as a simple command runs the shell function, as does (f) which runs it in a sub-shell, but running (exec f) skips over it.)
As for why ls>out1 ls>out2 creates two files (with or without an exec), this is simple enough: the shell opens each redirection, and then uses dup2 to move the file descriptors. If you have two ordinary > redirects, the shell opens both, moves the first one to fd 1 (stdout), then moves the second one to fd 1 (stdout again), closing the first in the process. Finally, it runs ls ls, because that's what's left after removing the >out1 >out2. As long as there is no file named ls, the ls command complains to stderr, and writes nothing to stdout.

How to implement pipe under Linux?

I would like my code to handle the output coming from pipe.
for example, ls -l | mycode
how to achieve this under Linux?
Just read from stdin, such as with scanf().
The pipe in Linux/Unix will transfer the output of the first program to the standard input of the second. How you access the standard input will depend on what language you are using.
When you type "ls -l | mycode" into the shell, it is the shell program itself (e.g. bash, zsh) that does all the trickery with pipes. It simply provides the output from ls -l to mycode on standard input. Similarly, anything you write on standard output or error can be redirected or piped by the shell to some other process or file. Exactly how to read and write to those files depends on the language.

Resources