Does Linux Pipe command needs a process to execute? - linux

ls -la | sort | less
In the above command, how many linux processes run?
Is it 3 (one for ls -la, one for sort, one for less)?
Or is it 5 (one for ls -la, one for sort, one for less, one each for each pipe)?
Does | commands need a separate process to run?

3 processes. The parent process, or the process where you're calling this command from (your command line), calls pipe(2) once for each two processes that get piped together so ls -la | sort | less needs to call pipe(2) twice to create two pipes: one for piping ls to sort, and one to pipe sort to less. Bash then forks itself once for each process (in this case 3 times). Before the children run their commands, they overwrite stdin and/or stdout. An example flow of the command would be:
Bash creates 2 pipes, one from ls to sort, and one from sort to less
Bash forks itself 3 times
Child 1 (ls) sets its stdout fd to write to pipe A
Child 2 (sort) sets its stdin fd to read from pipe A
Child 2 (sort) sets its stdout fd to write to pipe B
Child 3 (less) sets its stdin fd to read from pipe B
Each child runs its command
The pipes are used to direct stdin and/or stdout of the child processes, but the pipes themselves are not processes.

A pipe in a shell is, in general, a call to pipe(2), two calls to dup2(2), and invocation of the two commands. The pipe itself does not require a separate process since the kernel is responsible for channeling data from one process to the next.

Related

Trouble understanding bash piping behaviour

I'm a bit confused on how does bash performs pipe redirections.
First on piping behaviour:
cat /dev/random | ls doesn't waits cat and ends as soon as ls result is printed, however
cat /dev/random | grep foo waits for cat before executing grep.
It make sense because ls doesn't need cat result to work as grep do, but I don't understand how it can work, does bash waits processes with waitpid calls on some processes? Does it waits for EOF on write end of the pipe's right side, and on read end on the pipe's left side?
More, I'm not sure about which commands are forked or not and where it's done:
I guess built-in commands are exectuted in the main process as most of them are used to modify some shell settings. On the other hand (I guess), binary are always executed in a subprocess, with fork, am I right?
If I am, it means piping redierction doesn't call fork itself as it don't know if the commands to execute are built-ins or not.
Bash source code is way too far from my skills, I don't understand it, can someone explain me how it behaves?
I tried to reimplement its behaviour (without success), I asked here, with some code if you want to check.

Shell script for running subprocesses two at a time

Let's assume there are a total of 10 subprocesses which I want my shell script to run. Subprocess (i.e. a process created within the shell script) being called x1...10 for simplicity. A normal shell script would have 10 lines; let's assume each line calls ./xi. However, to maximize efficiency, I know my hardware allows for two of the subprocesses to be launched at the same time. Therefore, at any point in time, two of these processes should be running. The moment that one is done, the next is launched. No order should be assumed in how they finish, any order is fine as they are assumed independent. Is there an elegant way of doing this in a shell script? Note, each x1...x10 should run once only.
seq 10 | xargs -P2 -I{} ./xi
seq 10 - Outputs 10 numbers. We don't care about them.
xargs run a command for each input
-P2 runs two processes at a time.
-I{} replaces each {} in the command for the input line. So just discard the input.
./x{} run this command for each line in the input.
Final answer: cat myshellscript.sh | xargs -L 1 -I CMD -P 2 bash -c CMD
With myshellscript.sh being a file like this:
./task-jsuqh
./task-siuww
./task-uqywh
./task-sdqaw

How does 'ls' command work when it has multiple arguments?

How does the 'ls' command work in Linux/Unix?
So that's some reference.
But I was wondering how a command such as
ls -1 | grep 'myfile'
would be executed by the shell, i.e. when is exec called, when is fork called, when id dup called(if at all).
Also, how is this entire command parsed?
What does fork do
Fork is the primary (and historically, only) method of process creation on Unix-like operating systems.
What does exec do
In computing, exec is a functionality of an operating system that runs an executable file in the context of an already existing process
What does this mean
When you run a command (that is not built in like exit, cd), shell creates a child process using fork. This child process then uses exec executes the binary file for that command (e.g: /bin/ls)
What happens when during input/output redirecction
Every process is supplied with three streams standard input (STDIN), standard output (STDOUT) and standard error (STDERR). By default these streams are mapped to parent process's respective streams. Thus commands like wc or nano which reads from STDIN can be supplied with data from parent shell process's STDIN, and their output is captured by parent shell process and displayed.
However, when using redirection like
ls /tmp /abcd 1>out.log and 2>err.log
stdout is now mapped to a file output stream of out.log, similarly stderr is mapped to err.log. And the output is written to corresponding files.
PIPE chaining
ls -1 | grep 'myfile'
In shell PIPE | is used to chain the STDOUT of first command to STDIN of second command.
This means output of ls -1 (list of files and directories) is given as input to grep myfile which searches for lines containing "myfile" and prints to its STDOUT. The combined effect is to search filename containing char sequence "myfile"
I'm answering here specifically the textual question of the title,
How does 'ls' command work when it has multiple arguments?
...not addressing the question which came underneath. Was needing to check if a list of files was present in a directory and this question's phrasing was the closest to what I needed.
If you need to do this, either separate them with a space or wrap them in curly brackets with commas and no space as follows:
ls something.txt something_else.txt
or
ls {something.txt,something_else.txt}

Grep command Linux ordering of source string and target string

Grep command syntax is as:
grep "literal_string" filename --> search from string in filename.
So I am assuming the order of is like this
-- keyword(grep) --> string to be searched --> filename/source string and command is interpreted from left to right.
My question is how the commands such as this got processed:
ps -ef | grep rman
Do the order is optional?
How grep is able to know that source is on left and not on right? Or I am missing something here.
When using Unix Pipes, most system commands will take the output from the previous command (to the left of the pipe ) and then pass the output onto the command to the right of the pipe.
The order is important when using grep with or without a pipe.
Thus
grep doberman /file/about/dogs
is the same as
cat /file/about/dogs | grep doberman
See Pipes on http://linuxcommand.org/lts0060.php for some more information.
As step further down from Kyle's answer regarding pipes is that most shell commands read their input from stdin and write their output to stdout. Now many of the commands will also allow you to specify a filename to read from or write too, or allow you to redirect a file to stdin as input and redirect the commands stdout to a file. But regardless how you specify what to read, the command process input from it's stdin and provides output on stdout (errors on stderr). stdin, stdout, and stderr are the designation of file descriptors 0, 1 & 2, respectively.
This basic function is what allows command to be piped together. Where a pipe (represented by the | character) does nothing more that take the stdout from the first command (on left) and direct it to the next commands stdin. As such, yes, the order is important.
Another point to remember is that each piped process is run in its own subshell. Put another way, each | will spawn another shell to run the following command in. This has implications if you are relying on the environment of one process for the next.
Hopefully, these answers will give you a better feel for what is taking place.

How linux Os let applications read from pipe

I am confused by how linux could let application read from pipe like "cat /etc/hosts | grep 'localhost'". I know in a independent program fork a child and communicate by pipe between each other. But for two independent program communicating by pipe I don't know how.
In example "cat /etc/hosts | grep 'localhost'" How could Grep know which file descriptor it should read to get the input from "cat /etc/hosts". Is there a "conventional" pipe provided by OS, to let Grep know where to get the input? I want to know the mechanism behind this.
grep in your example gets it from stdin. It is the shell's responsibility to call pipe(2) to create the pipe and then dup2(2) in each of the fork(2) children to assign their end of the pipe to stdin or stdout before calling one of the exec(3) functions to actually run the other executables.

Resources