How does 'ls' command work when it has multiple arguments? - linux

How does the 'ls' command work in Linux/Unix?
So that's some reference.
But I was wondering how a command such as
ls -1 | grep 'myfile'
would be executed by the shell, i.e. when is exec called, when is fork called, when id dup called(if at all).
Also, how is this entire command parsed?

What does fork do
Fork is the primary (and historically, only) method of process creation on Unix-like operating systems.
What does exec do
In computing, exec is a functionality of an operating system that runs an executable file in the context of an already existing process
What does this mean
When you run a command (that is not built in like exit, cd), shell creates a child process using fork. This child process then uses exec executes the binary file for that command (e.g: /bin/ls)
What happens when during input/output redirecction
Every process is supplied with three streams standard input (STDIN), standard output (STDOUT) and standard error (STDERR). By default these streams are mapped to parent process's respective streams. Thus commands like wc or nano which reads from STDIN can be supplied with data from parent shell process's STDIN, and their output is captured by parent shell process and displayed.
However, when using redirection like
ls /tmp /abcd 1>out.log and 2>err.log
stdout is now mapped to a file output stream of out.log, similarly stderr is mapped to err.log. And the output is written to corresponding files.
PIPE chaining
ls -1 | grep 'myfile'
In shell PIPE | is used to chain the STDOUT of first command to STDIN of second command.
This means output of ls -1 (list of files and directories) is given as input to grep myfile which searches for lines containing "myfile" and prints to its STDOUT. The combined effect is to search filename containing char sequence "myfile"

I'm answering here specifically the textual question of the title,
How does 'ls' command work when it has multiple arguments?
...not addressing the question which came underneath. Was needing to check if a list of files was present in a directory and this question's phrasing was the closest to what I needed.
If you need to do this, either separate them with a space or wrap them in curly brackets with commas and no space as follows:
ls something.txt something_else.txt
or
ls {something.txt,something_else.txt}

Related

How redirecting output from a function works in bash?

From what I've learned, also stated in the answer to this thread, redirection of stdout works as follows:
When we do something like: ls > dirlist
bash does the followings:
forks a process, which still runs bash
in the subprocess, open the file dirlist for writing on file descriptor 1
calling exec passing to it the ls executable.
this way, when ls writes to FD 1, it actually writes to the file.
With this in mind, I wonder about the following:
$ foo() { echo "hello" ; }
$ foo > file
$ cat file
hello
as far as I know, functions run in the same shell process, so how does redirection works in that case?
Redirection itself is just a shell construct, so the shell can make it work however it wants. Every command, whether external processes or shell builtins, has its own idea of standard output, and standard output is inherited just as it is by child processes from parent processes. In this case, the command foo either inherits its standard output from the shell or takes whatever file a shell redirection specifies. Once inside the function, echo writes to whatever file it inherits from foo.
Put another way, for its own built-in commands (which includes functions, compound statements like while, if, etc) the shell effectively simulates exec without actually calling exec.

Does Linux Pipe command needs a process to execute?

ls -la | sort | less
In the above command, how many linux processes run?
Is it 3 (one for ls -la, one for sort, one for less)?
Or is it 5 (one for ls -la, one for sort, one for less, one each for each pipe)?
Does | commands need a separate process to run?
3 processes. The parent process, or the process where you're calling this command from (your command line), calls pipe(2) once for each two processes that get piped together so ls -la | sort | less needs to call pipe(2) twice to create two pipes: one for piping ls to sort, and one to pipe sort to less. Bash then forks itself once for each process (in this case 3 times). Before the children run their commands, they overwrite stdin and/or stdout. An example flow of the command would be:
Bash creates 2 pipes, one from ls to sort, and one from sort to less
Bash forks itself 3 times
Child 1 (ls) sets its stdout fd to write to pipe A
Child 2 (sort) sets its stdin fd to read from pipe A
Child 2 (sort) sets its stdout fd to write to pipe B
Child 3 (less) sets its stdin fd to read from pipe B
Each child runs its command
The pipes are used to direct stdin and/or stdout of the child processes, but the pipes themselves are not processes.
A pipe in a shell is, in general, a call to pipe(2), two calls to dup2(2), and invocation of the two commands. The pipe itself does not require a separate process since the kernel is responsible for channeling data from one process to the next.

Understanding exec in bash

After reading explanations of how the exec builtin works in bash, I understand that its basic function is to replace the current process without forking. It also seems to be used for redirecting I/O and closing file descriptors in the current process, which confuses me. Is this some unrelated additional thing exec does? Can it be understood in the context of "replacing the current process"? And how does this work when combined with process substitution, e.g. exec 3< <(my program)?
Here's what exec does:
Set up all redirections in the current process.
This is a combination of open, dup2 and close syscalls for most operations like > foo
pipe + fork + /dev/fd/* is used for process substition
Temporary files are created and opened for here-documents and here-strings
Replace the process image (using execve) with the specified program, if any
If you don't specify a program to run, step 2 is simply skipped, and all redirections therefore affect the rest of the script.
<(Process substitution) works by pipe+fork+/dev/fd/:
Create a pipe as normal.
Copy it to FD 63 or somewhere it won't be in the way
Fork and run a program that reads/writes to the pipe.
Replace the process substitution with /dev/fd/63, a special file that will return FD 63 when opened. (try echo <(ls)).
From then on, it works just like redirecting from any other file. You open /dev/fd/63 for reading on FD 3, and then you end up reading from the pipe. exec therefore doesn't need to do anything special.

Grep command Linux ordering of source string and target string

Grep command syntax is as:
grep "literal_string" filename --> search from string in filename.
So I am assuming the order of is like this
-- keyword(grep) --> string to be searched --> filename/source string and command is interpreted from left to right.
My question is how the commands such as this got processed:
ps -ef | grep rman
Do the order is optional?
How grep is able to know that source is on left and not on right? Or I am missing something here.
When using Unix Pipes, most system commands will take the output from the previous command (to the left of the pipe ) and then pass the output onto the command to the right of the pipe.
The order is important when using grep with or without a pipe.
Thus
grep doberman /file/about/dogs
is the same as
cat /file/about/dogs | grep doberman
See Pipes on http://linuxcommand.org/lts0060.php for some more information.
As step further down from Kyle's answer regarding pipes is that most shell commands read their input from stdin and write their output to stdout. Now many of the commands will also allow you to specify a filename to read from or write too, or allow you to redirect a file to stdin as input and redirect the commands stdout to a file. But regardless how you specify what to read, the command process input from it's stdin and provides output on stdout (errors on stderr). stdin, stdout, and stderr are the designation of file descriptors 0, 1 & 2, respectively.
This basic function is what allows command to be piped together. Where a pipe (represented by the | character) does nothing more that take the stdout from the first command (on left) and direct it to the next commands stdin. As such, yes, the order is important.
Another point to remember is that each piped process is run in its own subshell. Put another way, each | will spawn another shell to run the following command in. This has implications if you are relying on the environment of one process for the next.
Hopefully, these answers will give you a better feel for what is taking place.

When does I/O redirection occur relative to the execution of commands in the Linux shell?

On my desktop, there is only one file,its name is "file1.txt",then I execute shell script like this:
$ find . -name "*.txt" > file2.txt
After that, I run the other shell script like this:
$ cat file2.txt
Its output is:
./file1.txt
./file2.txt
So it looks like that the execution of find command is behind the creat of file “file2.txt", Am I right?
You are correct; the I/O redirection takes place before the find command is executed, so the file file2.txt already exists (but is empty) when the find command is running. Therefore, the output of the find command will include file2.txt.
It makes sense if you think about it. The redirection has to be done before find executes. You can't have it writing to the terminal first and then going to the file, even if there was a mechanism that allowed that.
You are right: the shell opens the output file first, creating it. Then it creates a subprocess with fork. The shell then closes the file and waits for the child to return. The child process calls dup or dup2 to open the output file with file descriptor 1, and only then it executes the command with one of the functions of the exec family.

Resources