Understanding exec in bash - linux

After reading explanations of how the exec builtin works in bash, I understand that its basic function is to replace the current process without forking. It also seems to be used for redirecting I/O and closing file descriptors in the current process, which confuses me. Is this some unrelated additional thing exec does? Can it be understood in the context of "replacing the current process"? And how does this work when combined with process substitution, e.g. exec 3< <(my program)?

Here's what exec does:
Set up all redirections in the current process.
This is a combination of open, dup2 and close syscalls for most operations like > foo
pipe + fork + /dev/fd/* is used for process substition
Temporary files are created and opened for here-documents and here-strings
Replace the process image (using execve) with the specified program, if any
If you don't specify a program to run, step 2 is simply skipped, and all redirections therefore affect the rest of the script.
<(Process substitution) works by pipe+fork+/dev/fd/:
Create a pipe as normal.
Copy it to FD 63 or somewhere it won't be in the way
Fork and run a program that reads/writes to the pipe.
Replace the process substitution with /dev/fd/63, a special file that will return FD 63 when opened. (try echo <(ls)).
From then on, it works just like redirecting from any other file. You open /dev/fd/63 for reading on FD 3, and then you end up reading from the pipe. exec therefore doesn't need to do anything special.

Related

How redirecting output from a function works in bash?

From what I've learned, also stated in the answer to this thread, redirection of stdout works as follows:
When we do something like: ls > dirlist
bash does the followings:
forks a process, which still runs bash
in the subprocess, open the file dirlist for writing on file descriptor 1
calling exec passing to it the ls executable.
this way, when ls writes to FD 1, it actually writes to the file.
With this in mind, I wonder about the following:
$ foo() { echo "hello" ; }
$ foo > file
$ cat file
hello
as far as I know, functions run in the same shell process, so how does redirection works in that case?
Redirection itself is just a shell construct, so the shell can make it work however it wants. Every command, whether external processes or shell builtins, has its own idea of standard output, and standard output is inherited just as it is by child processes from parent processes. In this case, the command foo either inherits its standard output from the shell or takes whatever file a shell redirection specifies. Once inside the function, echo writes to whatever file it inherits from foo.
Put another way, for its own built-in commands (which includes functions, compound statements like while, if, etc) the shell effectively simulates exec without actually calling exec.

How does 'ls' command work when it has multiple arguments?

How does the 'ls' command work in Linux/Unix?
So that's some reference.
But I was wondering how a command such as
ls -1 | grep 'myfile'
would be executed by the shell, i.e. when is exec called, when is fork called, when id dup called(if at all).
Also, how is this entire command parsed?
What does fork do
Fork is the primary (and historically, only) method of process creation on Unix-like operating systems.
What does exec do
In computing, exec is a functionality of an operating system that runs an executable file in the context of an already existing process
What does this mean
When you run a command (that is not built in like exit, cd), shell creates a child process using fork. This child process then uses exec executes the binary file for that command (e.g: /bin/ls)
What happens when during input/output redirecction
Every process is supplied with three streams standard input (STDIN), standard output (STDOUT) and standard error (STDERR). By default these streams are mapped to parent process's respective streams. Thus commands like wc or nano which reads from STDIN can be supplied with data from parent shell process's STDIN, and their output is captured by parent shell process and displayed.
However, when using redirection like
ls /tmp /abcd 1>out.log and 2>err.log
stdout is now mapped to a file output stream of out.log, similarly stderr is mapped to err.log. And the output is written to corresponding files.
PIPE chaining
ls -1 | grep 'myfile'
In shell PIPE | is used to chain the STDOUT of first command to STDIN of second command.
This means output of ls -1 (list of files and directories) is given as input to grep myfile which searches for lines containing "myfile" and prints to its STDOUT. The combined effect is to search filename containing char sequence "myfile"
I'm answering here specifically the textual question of the title,
How does 'ls' command work when it has multiple arguments?
...not addressing the question which came underneath. Was needing to check if a list of files was present in a directory and this question's phrasing was the closest to what I needed.
If you need to do this, either separate them with a space or wrap them in curly brackets with commas and no space as follows:
ls something.txt something_else.txt
or
ls {something.txt,something_else.txt}

Stream specific numbered Bash file descriptor into variable

I am trying to stream a specific numbered file descriptor into a variable in Bash. I can do this from normal standard in using the following function, but, how do it do it from a specific file descriptor. I need to direct the FD into the sub-shell if I use the same approach. I could always do it reading line by line, but, if I can do it in a continuous stream then that would be massively preferable.
The function I have is:
streamStdInTo ()
{
local STORE_INvar="${1}" ; shift
printf -v "${STORE_INvar}" '%s' "$( cat - )"
}
Yes, I know that this wouldn't work normally as the end of a pipeline would be lost (due to its execution in a sub-shell), however, either in the context of the Bash 4 set +m ; shopt -s lastpipe method of executing the end of a pipeline in the same shell as the start, or, by directing into this via a different file descriptor I am hoping to be able to use it.
So, my question is, How do I use the above but with different file descriptors than the normal?
It's not entirely clear what you mean, but perhaps you are looking for something like:
cat - <&4 # read from fd 4
Or, just call your current function with the redirect:
streamStdInTo foo <&4
edit:
Addressing some questions from the comment, you can use a fifo:
#!/bin/bash
trap 'rm -f $f' 0
f=$(mktemp xxx)
rm $f
mkfifo $f
echo foo > $f &
exec 4< $f
cat - <&4
wait
I think there's a lot of confusion about what exactly you're trying to do. If I understand correctly the end goal here is to run a pipeline and capture the output in a variable, right? Kind of like this:
var=$(cmd1 | cmd2)
Except I guess the idea here is that the name of "$var" is stored in another variable:
varname=var
You can do an end-run around Bash's usual job control situation by using process substitution. So instead of this normal pipeline (which would work in ksh or zsh, but not in bash unless you set lastpipe):
cmd1 | cmd2 | read "$varname"
You would use this command, which is equivalent apart from how the shell handles the job:
read "$varname" < <(cmd1 | cmd2)
With process substitution, "read $varname" isn't run in a pipeline, so Bash doesn't fork to run it. (You could use your streamStdInTo() function there as well, of course)
As I understand it, you wanted to solve this problem by using numeric file descriptors:
cmd1 | cmd2 >&$fd1 &
read "$varname" <&$fd2
To create those file descriptors that connect the pipeline background job to the "read" command, what you need is called a pipe, or a fifo. These can be created without touching the file system (the shell does it all the time!) but the shell doesn't directly expose this functionality, which is why we need to resort to mkfifo to create a named pipe. A named pipe is a special file that exists on the filesystem, but the data you write to it doesn't go to the disk. It's a data queue stored in memory (a pipe). It doesn't need to stay on the filesystem after you've opened it, either, it can be deleted almost immediately:
pipedir=$(mktemp -d /tmp/pipe_maker_XXXX)
mkfifo ${pipedir}/pipe
exec {temp_fd}<>${pipedir}/pipe # Open both ends of the pipe
exec {fd1}>${pipedir}/pipe
exec {fd2}<${pipedir}/pipe
exec {temp_fd}<&- # Close the read/write FD
rm -rf ${pipedir} # Don't need the named FIFO any more
One of the difficulties in working with named pipes in the shell is that attempting to open them just for reading, or just for writing causes the call to block until something opens the other end of the pipe. You can get around that by opening one end in a background job before trying to open the other end, or by opening both ends at once as I did above.
The "{fd}<..." syntax dynamically assigns an unused file descriptor number to the variable $fd and opens the file on that file descriptor. It's been around in ksh for ages (since 1993?), but in Bash I think it only goes back to 4.1 (from 2010).

When does I/O redirection occur relative to the execution of commands in the Linux shell?

On my desktop, there is only one file,its name is "file1.txt",then I execute shell script like this:
$ find . -name "*.txt" > file2.txt
After that, I run the other shell script like this:
$ cat file2.txt
Its output is:
./file1.txt
./file2.txt
So it looks like that the execution of find command is behind the creat of file “file2.txt", Am I right?
You are correct; the I/O redirection takes place before the find command is executed, so the file file2.txt already exists (but is empty) when the find command is running. Therefore, the output of the find command will include file2.txt.
It makes sense if you think about it. The redirection has to be done before find executes. You can't have it writing to the terminal first and then going to the file, even if there was a mechanism that allowed that.
You are right: the shell opens the output file first, creating it. Then it creates a subprocess with fork. The shell then closes the file and waits for the child to return. The child process calls dup or dup2 to open the output file with file descriptor 1, and only then it executes the command with one of the functions of the exec family.

Need explanations for Linux bash builtin exec command behavior

From Bash Reference Manual I get the following about exec bash builtin command:
If command is supplied, it replaces the shell without creating a new process.
Now I have the following bash script:
#!/bin/bash
exec ls;
echo 123;
exit 0
This executed, I got this:
cleanup.sh ex1.bash file.bash file.bash~ output.log
(files from the current directory)
Now, if I have this script:
#!/bin/bash
exec ls | cat
echo 123
exit 0
I get the following output:
cleanup.sh
ex1.bash
file.bash
file.bash~
output.log
123
My question is:
If when exec is invoked it replaces the shell without creating a new process, why when put | cat, the echo 123 is printed, but without it, it isn't. So, I would be happy if someone can explain what's the logic of this behavior.
Thanks.
EDIT:
After #torek response, I get an even harder to explain behavior:
1.exec ls>out command creates the out file and put in it the ls's command result;
2.exec ls>out1 ls>out2 creates only the files, but do not put inside any result. If the command works as suggested, I think the command number 2 should have the same result as command number 1 (even more, I think it should not have had created the out2 file).
In this particular case, you have the exec in a pipeline. In order to execute a series of pipeline commands, the shell must initially fork, making a sub-shell. (Specifically it has to create the pipe, then fork, so that everything run "on the left" of the pipe can have its output sent to whatever is "on the right" of the pipe.)
To see that this is in fact what is happening, compare:
{ ls; echo this too; } | cat
with:
{ exec ls; echo this too; } | cat
The former runs ls without leaving the sub-shell, so that this sub-shell is therefore still around to run the echo. The latter runs ls by leaving the sub-shell, which is therefore no longer there to do the echo, and this too is not printed.
(The use of curly-braces { cmd1; cmd2; } normally suppresses the sub-shell fork action that you get with parentheses (cmd1; cmd2), but in the case of a pipe, the fork is "forced", as it were.)
Redirection of the current shell happens only if there is "nothing to run", as it were, after the word exec. Thus, e.g., exec >stdout 4<input 5>>append modifies the current shell, but exec foo >stdout 4<input 5>>append tries to exec command foo. [Note: this is not strictly accurate; see addendum.]
Interestingly, in an interactive shell, after exec foo >output fails because there is no command foo, the shell sticks around, but stdout remains redirected to file output. (You can recover with exec >/dev/tty. In a script, the failure to exec foo terminates the script.)
With a tip of the hat to #Pumbaa80, here's something even more illustrative:
#! /bin/bash
shopt -s execfail
exec ls | cat -E
echo this goes to stdout
echo this goes to stderr 1>&2
(note: cat -E is simplified down from my usual cat -vET, which is my handy go-to for "let me see non-printing characters in a recognizable way"). When this script is run, the output from ls has cat -E applied (on Linux this makes end-of-line visible as a $ sign), but the output sent to stdout and stderr (on the remaining two lines) is not redirected. Change the | cat -E to > out and, after the script runs, observe the contents of file out: the final two echos are not in there.
Now change the ls to foo (or some other command that will not be found) and run the script again. This time the output is:
$ ./demo.sh
./demo.sh: line 3: exec: foo: not found
this goes to stderr
and the file out now has the contents produced by the first echo line.
This makes what exec "really does" as obvious as possible (but no more obvious, as Albert Einstein did not put it :-) ).
Normally, when the shell goes to execute a "simple command" (see the manual page for the precise definition, but this specifically excludes the commands in a "pipeline"), it prepares any I/O redirection operations specified with <, >, and so on by opening the files needed. Then the shell invokes fork (or some equivalent but more-efficient variant like vfork or clone depending on underlying OS, configuration, etc), and, in the child process, rearranges the open file descriptors (using dup2 calls or equivalent) to achieve the desired final arrangements: > out moves the open descriptor to fd 1—stdout—while 6> out moves the open descriptor to fd 6.
If you specify the exec keyword, though, the shell suppresses the fork step. It does all the file opening and file-descriptor-rearranging as usual, but this time, it affects any and all subsequent commands. Finally, having done all the redirections, the shell attempts to execve() (in the system-call sense) the command, if there is one. If there is no command, or if the execve() call fails and the shell is supposed to continue running (is interactive or you have set execfail), the shell soldiers on. If the execve() succeeds, the shell no longer exists, having been replaced by the new command. If execfail is unset and the shell is not interactive, the shell exits.
(There's also the added complication of the command_not_found_handle shell function: bash's exec seems to suppress running it, based on test results. The exec keyword in general makes the shell not look at its own functions, i.e., if you have a shell function f, running f as a simple command runs the shell function, as does (f) which runs it in a sub-shell, but running (exec f) skips over it.)
As for why ls>out1 ls>out2 creates two files (with or without an exec), this is simple enough: the shell opens each redirection, and then uses dup2 to move the file descriptors. If you have two ordinary > redirects, the shell opens both, moves the first one to fd 1 (stdout), then moves the second one to fd 1 (stdout again), closing the first in the process. Finally, it runs ls ls, because that's what's left after removing the >out1 >out2. As long as there is no file named ls, the ls command complains to stderr, and writes nothing to stdout.

Resources