How linux Os let applications read from pipe - linux

I am confused by how linux could let application read from pipe like "cat /etc/hosts | grep 'localhost'". I know in a independent program fork a child and communicate by pipe between each other. But for two independent program communicating by pipe I don't know how.
In example "cat /etc/hosts | grep 'localhost'" How could Grep know which file descriptor it should read to get the input from "cat /etc/hosts". Is there a "conventional" pipe provided by OS, to let Grep know where to get the input? I want to know the mechanism behind this.

grep in your example gets it from stdin. It is the shell's responsibility to call pipe(2) to create the pipe and then dup2(2) in each of the fork(2) children to assign their end of the pipe to stdin or stdout before calling one of the exec(3) functions to actually run the other executables.

Related

How does 'ls' command work when it has multiple arguments?

How does the 'ls' command work in Linux/Unix?
So that's some reference.
But I was wondering how a command such as
ls -1 | grep 'myfile'
would be executed by the shell, i.e. when is exec called, when is fork called, when id dup called(if at all).
Also, how is this entire command parsed?
What does fork do
Fork is the primary (and historically, only) method of process creation on Unix-like operating systems.
What does exec do
In computing, exec is a functionality of an operating system that runs an executable file in the context of an already existing process
What does this mean
When you run a command (that is not built in like exit, cd), shell creates a child process using fork. This child process then uses exec executes the binary file for that command (e.g: /bin/ls)
What happens when during input/output redirecction
Every process is supplied with three streams standard input (STDIN), standard output (STDOUT) and standard error (STDERR). By default these streams are mapped to parent process's respective streams. Thus commands like wc or nano which reads from STDIN can be supplied with data from parent shell process's STDIN, and their output is captured by parent shell process and displayed.
However, when using redirection like
ls /tmp /abcd 1>out.log and 2>err.log
stdout is now mapped to a file output stream of out.log, similarly stderr is mapped to err.log. And the output is written to corresponding files.
PIPE chaining
ls -1 | grep 'myfile'
In shell PIPE | is used to chain the STDOUT of first command to STDIN of second command.
This means output of ls -1 (list of files and directories) is given as input to grep myfile which searches for lines containing "myfile" and prints to its STDOUT. The combined effect is to search filename containing char sequence "myfile"
I'm answering here specifically the textual question of the title,
How does 'ls' command work when it has multiple arguments?
...not addressing the question which came underneath. Was needing to check if a list of files was present in a directory and this question's phrasing was the closest to what I needed.
If you need to do this, either separate them with a space or wrap them in curly brackets with commas and no space as follows:
ls something.txt something_else.txt
or
ls {something.txt,something_else.txt}

Get OutputStream from already running process

I want to write to the stdin of a running process (not Java). How can I get the Process object or the OutputStream directly? Runtime.getRuntime() only helps me spawn new things, not find existing processes.
This looks possible on Linux, no idea about elsewhere. Searching for "get stdin of running process" revealed several promising looking discussions:
Writing to stdin of background process
Write to stdin of a running process using pipe
Can I send some text to the STDIN of an active process running in a screen session?
Essentially, you can write to the 0th file descriptor of a process via /proc/$pid/fd/0. From there, you just have to open an OutputStream to that path.
I just tested this (not the Java part, that's presumably straightforward) and it worked as advertized:
Shell-1 $ cat
This blocks, waiting on stdin
Shell-2 $ ps aux | grep 'cat$' | awk '{ print $2 }'
1234
Shell-2 $ echo "Hello World" > /proc/1234/fd/0
Now back in Shell-1:
Shell-1 $ cat
Hello World
Note this does not close the process's stdin. You can keep writing to the file descriptor.

Grep command Linux ordering of source string and target string

Grep command syntax is as:
grep "literal_string" filename --> search from string in filename.
So I am assuming the order of is like this
-- keyword(grep) --> string to be searched --> filename/source string and command is interpreted from left to right.
My question is how the commands such as this got processed:
ps -ef | grep rman
Do the order is optional?
How grep is able to know that source is on left and not on right? Or I am missing something here.
When using Unix Pipes, most system commands will take the output from the previous command (to the left of the pipe ) and then pass the output onto the command to the right of the pipe.
The order is important when using grep with or without a pipe.
Thus
grep doberman /file/about/dogs
is the same as
cat /file/about/dogs | grep doberman
See Pipes on http://linuxcommand.org/lts0060.php for some more information.
As step further down from Kyle's answer regarding pipes is that most shell commands read their input from stdin and write their output to stdout. Now many of the commands will also allow you to specify a filename to read from or write too, or allow you to redirect a file to stdin as input and redirect the commands stdout to a file. But regardless how you specify what to read, the command process input from it's stdin and provides output on stdout (errors on stderr). stdin, stdout, and stderr are the designation of file descriptors 0, 1 & 2, respectively.
This basic function is what allows command to be piped together. Where a pipe (represented by the | character) does nothing more that take the stdout from the first command (on left) and direct it to the next commands stdin. As such, yes, the order is important.
Another point to remember is that each piped process is run in its own subshell. Put another way, each | will spawn another shell to run the following command in. This has implications if you are relying on the environment of one process for the next.
Hopefully, these answers will give you a better feel for what is taking place.

How to implement pipe under Linux?

I would like my code to handle the output coming from pipe.
for example, ls -l | mycode
how to achieve this under Linux?
Just read from stdin, such as with scanf().
The pipe in Linux/Unix will transfer the output of the first program to the standard input of the second. How you access the standard input will depend on what language you are using.
When you type "ls -l | mycode" into the shell, it is the shell program itself (e.g. bash, zsh) that does all the trickery with pipes. It simply provides the output from ls -l to mycode on standard input. Similarly, anything you write on standard output or error can be redirected or piped by the shell to some other process or file. Exactly how to read and write to those files depends on the language.

Program dumps data to stdout fast. Looking for way to write commands without getting flooded

Program is dumping to stdout and while I try to type new commands I can't see what I'm writing because it gets thrown along with the output. Is there a shell that separates commands and outputs? Or can I use two shells where I can run commands on one and make it dump to the stdout of another?
You can redirect the output of the program to another terminal window. For example:
program > /dev/pts/2 &
The style of terminal name may depend on how your system is organized.
There's 'more' to let you pageinate through output, and 'tee' which lets you split a programs output, so it goes to both stdout and to a file.
$ yourapp | more // show in page-sized chunks
$ yourapp | tee output.txt // flood to stdout, but also save a copy in output.txt
and best of all
$ yourapp | tee output.txt | more // pageinate + save copy
Either redirect standard output and error when you run the program, so it doesn't bother you:
./myprog >myprog.out 2>&1
or, alternatively, run a different terminal to do your work in. That leaves your program free to output whatever it likes to its terminal without bothering you.
Having said that, I'd still capture the information from the program to a file in case you have to go back and look at it.

Resources