How does Shell implement pipe programmatically? - linux

I understand how I/O redirection works in Unix/Linux, and I know Shell uses this feature to pipeline programs with a special type of file - anonymous pipe. But I'd like to know the details of how Shell implements it programmatically? I'm interested in not only the system calls involved, but also the whole picture.
For example ls | sort, how does Shell perform I/O redirection for ls and sort?

The whole picture is complex and the best way to understand is to study a small shell. For a limited picture, here goes. Before doing anything, the shell parses the whole command line so it knows exactly how to chain processes. Let's say it encounters proc1 | proc2.
It sets up a pipe. Long story short, writing into thepipe[0] ends up in thepipe[1]
int thepipe[2];
pipe(thepipe);
It forks the first process and changes the direction of its stdout before exec
dup2 (thepipe[1], STDOUT_FILENO);
It execs the new program which is blissfully unaware of redirections and just writes to stdout like a well-behaved process
It forks the second process and changes the source of its stdin before exec
dup2 (thepipe[0], STDIN_FILENO);
It execs the new program, which is unaware its input comes from another program
Like I said, this is a limited picture. In a real picture the shell daisy-chains these in a loop and also remembers to close pipe ends at opportune moments.

This is a sample program from the book operating system concepts by silberschatz
Program is self-explanatory if you know the concepts of fork() and related things..hope this helps! (If you still want an explanation then I can explain it!)
Obviously some changes(such as change in fork() etc) should be made in this program if you want it to make it work like
ls | sort
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#define BUFFER SIZE 25
#define READ END 0
#define WRITE END 1
int main(void)
{
char write msg[BUFFER SIZE] = "Greetings";
char read msg[BUFFER SIZE];
int fd[2];
pid t pid;
/* create the pipe */
if (pipe(fd) == -1) {
fprintf(stderr,"Pipe failed");
return 1;
}
/* fork a child process */
pid = fork();
if (pid < 0) { /* error occurred */
fprintf(stderr, "Fork Failed");
return 1;
}
if (pid > 0) { /* parent process */
/* close the unused end of the pipe */
close(fd[READ END]);
/* write to the pipe */
write(fd[WRITE END], write msg, strlen(write msg)+1);
/* close the write end of the pipe */
close(fd[WRITE END]);
}
else { /* child process */
/* close the unused end of the pipe */
close(fd[WRITE END]);
/* read from the pipe */
read(fd[READ END], read msg, BUFFER SIZE);
printf("read %s",read msg);
}
}
/* close the write end of the pipe */
close(fd[READ END]);
return 0;
}

Related

Execve problems when reading input from pipe

I wrote a simple C program to execute another program using execve.
exec.c:
#include <unistd.h>
#include <stdio.h>
int main(int argc, char** argv) {
char path[128];
scanf("%s", path);
char* args[] = {path, NULL};
char* env[] = {NULL};
execve(path, args, env);
printf("error\n");
return 0;
}
I compiled it:
gcc exec.c -o exec
and after running it and writing "/bin/sh", it succesfully ran the shell and displayed the $ sign like a normal shell as can be seen in the picture.
Then I did the following: I created a server using nc -l 12345 and ran nc localhost 12345 | ./exec. It worked, but for some reason I can't understand, the $ sign was not displayed this time. I couldn't figure out the reason to this. (demonstrating images attached)
Now, here is the weirdest thing.
When I try to pass the program path AND more input via the pipe at once it seems like the executed process just ignores the input and closes.
For example:
But, if I run the following it works exactly the same way it worked when I piped nc output:
So, to conclude my questions:
I don't understand why the executed shell doesn't print the $ prompt sign when reads input from a pipe instead of stdin.
Why won't the executed program read input from the pipe when the input is already there and not waiting? It seems like it works only in the cases where the pipe remains open after the command execution.
Like AlexP already mentioned, the prompt sign is only displayed, when input comes from a terminal.
The second question was trickier: When you call libc-function scanf, its implementation will not only consume /bin/sh from the pipe, but also store the next input ls its internal buffers. Those internal buffers, will be overwritten by execve, so the shell gets nothing.
Here is your script without scanf to verify this:
#include <unistd.h>
#include <stdio.h>
int main(int argc, char** argv) {
char path[128];
read(0, path, 8); // consume `/bin/sh`
path[7] = '\0';
char* args[] = {path, NULL};
char* env[] = {NULL};
execve(path, args, env);
printf("error\n");
return 0;
}
Why did the example with cat work in the first place?
That's (probably) because of buffering also. Try:
(echo /bin/sh; echo ls) | stdbuf -i0 ./exec
I recommend this nice Article about buffering for further reading.

Get the content on the command line with an external promgram

I would like to write a small program which will analyize my current input on the command line and generate some suggesstions like those search engines do.
The problems is how can an external program get the content on command line? For example
# an external program started and got passed in the PID of the shell below.
# the user typed something in the shell like this...
<PROMPT> $ echo "grab this command"
# the external program now get 'echo "grab this command"'
# and ideally the this could be done in realtime.
More over, can I just modify the content of current command line?
EDIT
bash uses libreadline to manage the command line, but still I can not imagine how to make use of this.
You could write your own shell wrapper using c. Open bash in a process using popen and use fgetc and fputc to write the data to the process and the output file.
A quick dirty hack could look like this (bash isn't started in interactive mode, but otherwise should work fine. --> no prompt):
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
pid_t pid;
void kill_ch(int sig) {
kill(pid, SIGKILL);
}
/**
*
*/
int main(int argc, char** argv) {
int b;
FILE *cmd = NULL;
FILE *log = NULL;
signal(SIGALRM, (void (*)(int))kill_ch);
cmd = popen("/bin/bash -s", "r+");
if (cmd == NULL) {
fprintf(stderr, "Error: Failed to open process");
return EXIT_FAILURE;
}
setvbuf(cmd, NULL, _IOLBF, 0);
log = fopen("out.txt", "a");
if (log == NULL) {
fprintf(stderr, "Error: Failed to open logfile");
return EXIT_FAILURE;
}
setvbuf(log, NULL, _IONBF, 0);
pid = fork();
if (pid != 0)
goto EXEC_WRITE;
else
goto EXEC_READ;
EXEC_READ:
while (1) {
b = fgetc(stdin);
if (b != EOF) {
fputc((char) b, cmd);
fputc((char) b, log);
}
}
EXEC_WRITE:
while (1) {
b = fgetc(cmd);
if (b == EOF) {
return EXIT_SUCCESS;
}
fputc(b, stdout);
fputc(b, log);
}
return EXIT_SUCCESS;
}
I might not fully understand your question but I think you'd basically have two options.
The first option would be to explicitly call your "magic" program by prefixing your call with it like so
<PROMPT> $ magic echo "grab this command"
(magic analyzes $* and says...)
Your input would print "grab this command" to stdout
<PROMPT> $
In this case the arguments to "magic" would be handled as positional parameters ($*, $1 ...)
The second option would be to wrap an interpreter-like something around your typing. E.g. the Python interpreter does so if called without arguments. You start the interpreter, which will basically read anything you type (stdin) in an endless loop, interpret it, and produce some output (typically on stdout).
<PROMPT> $ magic
<MAGIC_PROMPT> $ echo "grab this command"
(your magic interpreter processes the input and says...)
Your input would print "grab this command" to stdout
<MAGIC_PROMPT> $

Detect if pid is zombie on Linux

We can detect if some is a zombie process via shell command line
ps ef -o pid,stat | grep <pid> | grep Z
To get that info in our C/C++ programs we use popen(), but we would like to avoid using popen(). Is there a way to get the same result without spawning additional processes?
We are using Linux 2.6.32-279.5.2.el6.x86_64.
You need to use the proc(5) filesystem. Access to files inside it (e.g. /proc/1234/stat ...) is really fast (it does not involve any physical I/O).
You probably want the third field from /proc/1234/stat (which is readable by everyone, but you should read it sequentially, since it is unseekable.). If that field is Z then process of pid 1234 is zombie.
No need to fork a process (e.g. withpopen or system), in C you might code
pid_t somepid;
// put the process pid you are interested in into somepid
bool iszombie = false;
// open the /proc/*/stat file
char pbuf[32];
snprintf(pbuf, sizeof(pbuf), "/proc/%d/stat", (int) somepid);
FILE* fpstat = fopen(pbuf, "r");
if (!fpstat) { perror(pbuf); exit(EXIT_FAILURE); };
{
int rpid =0; char rcmd[32]; char rstatc = 0;
fscanf(fpstat, "%d %30s %c", &rpid, rcmd, &rstatc);
iszombie = rstatc == 'Z';
}
fclose(fpstat);
Consider also procps and libproc so see this answer.
(You could also read the second line of /proc/1234/status but this is probably harder to parse in C or C++ code)
BTW, I find that the stat file in /proc/ has a weird format: if your executable happens to contain both spaces and parenthesis in its name (which is disgusting, but permitted) parsing the /proc/*/stat file becomes tricky.

Reading with cat: Stop when not receiving data

Is there any way to tell the cat command to stop reading when not receiving any data? maybe with some "timeout" that specifies for how long no data is incoming.
Any ideas?
There is a timeout(1) command. Example:
timeout 5s cat /dev/random
Dependening on your circumstances. E.g. you run bash with -e and care normally for the exit code.
timeout 5s cat /dev/random || true
cat itself, no. It reads the input stream until told it's the end of the file, blocking for input if necessary.
There's nothing to stop you writing your own cat equivalent which will use select on standard input to timeout if nothing is forthcoming fast enough, and exit under those conditions.
In fact, I once wrote a snail program (because a snail is slower than a cat) which took an extra argument of characters per second to slowly output a file (a).
So snail 10 myprog.c would output myprog.c at ten characters per second. For the life of me, I can't remember why I did this - I suspect I was just mucking about, waiting for some real work to show up.
Since you're having troubles with it, here's a version of dog.c (based on my afore-mentioned snail program) that will do what you want:
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/select.h>
static int dofile (FILE *fin) {
int ch = ~EOF, rc;
fd_set fds;
struct timeval tv;
while (ch != EOF) {
// Set up for fin file, 5 second timeout.
FD_ZERO (&fds); FD_SET (fileno (fin), &fds);
tv.tv_sec = 5; tv.tv_usec = 0;
rc = select (fileno(fin)+1, &fds, NULL, NULL, &tv);
if (rc < 0) {
fprintf (stderr, "*** Error on select (%d)\n", errno);
return 1;
}
if (rc == 0) {
fprintf (stderr, "*** Timeout on select\n");
break;
}
// Data available, so it will not block.
if ((ch = fgetc (fin)) != EOF) putchar (ch);
}
return 0;
}
int main (int argc, char *argv[]) {
int argp, rc;
FILE *fin;
if (argc == 1)
rc = dofile (stdin);
else {
argp = 1;
while (argp < argc) {
if ((fin = fopen (argv[argp], "rb")) == NULL) {
fprintf (stderr, "*** Cannot open input file [%s] (%d)\n",
argv[argp], errno);
return 1;
}
rc = dofile (fin);
fclose (fin);
if (rc != 0)
break;
argp++;
}
}
return rc;
}
Then, you can simply run dog without arguments (so it will use standard input) and, after five seconds with no activity, it will output:
*** Timeout on select
(a) Actually, it was called slowcat but snail is much nicer and I'm not above a bit of minor revisionism if it makes the story sound better :-)
mbuffer, with its -W option, works for me.
I needed to sink stdin to a file, but with an idle timeout:
I did not need to actually concatenate multiple sources (but perhaps there are ways to use mbuffer for this.)
I did not need any of cat's possible output-formatting options.
I did not mind the progress bar that mbuffer brings to the table.
I did need to add -A /bin/false to suppress a warning, based on a suggestion in the linked man page. My invocation for copying stdin to a file with 10 second idle timeout ended up looking like
mbuffer -A /bin/false -W 10 -o ./the-output-file
Here is the code for timeout-cat:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
void timeout(int sig) {
exit(EXIT_FAILURE);
}
int main(int argc, char* argv[]) {
int sec = 0; /* seconds to timeout (0 = no timeout) */
int c;
if (argc > 1) {
sec = atoi(argv[1]);
signal(SIGALRM, timeout);
alarm(sec);
}
while((c = getchar()) != EOF) {
alarm(0);
putchar(c);
alarm(sec);
}
return EXIT_SUCCESS;
}
It does basically the same as paxdiablo's dog.
It works as a cat without an argument - catting the stdin. As a first argument provide timeout seconds.
One limitation (applies to dog as well) - lines are line-buffered, so you have n-seconds to provide a line (not any character) to reset the timeout alarm. This is because of readline.
usage:
instead of potentially endless:
cat < some_input > some_output
you can do compile code above to timeout_cat and:
./timeout_cat 5 < some_input > some_output
Try to consider tail -f --pid
I am assuming that you are reading some file and when the producer is finished (gone?) you stop.
Example that will process /var/log/messages until watcher.sh finishes.
./watcher.sh&
tail -f /var/log/messages --pid $! | ... do something with the output
I faced same issue of cat command blocking while reading on tty port via adb shell but did not find any solution (timeout command was also not working). Below is the final command I used in my python script (running on ubuntu) to make it non-blocking. Hope this will help someone.
bash_command = "adb shell \"echo -en 'ATI0\\r\\n' > /dev/ttyUSB0 && cat /dev/ttyUSB0\" & sleep 1; kill $!"
response = subprocess.check_output(['bash', '-c', bash_command])
Simply cat then kill the cat after 5 sec.
cat xyz & sleep 5; kill $!
Get the cat output as a reply after 5 seconds
reply="`cat xyz & sleep 5; kill $!`"
echo "reply=$reply"

Understanding a piece of code in C++

Can you help me understand the following code?
void errorexit(char *pchar) {
// display an error to the standard err.
fprintf(stderr, pchar);
fprintf(stderr, "\n");
exit(1);
}
Calling errorexit("Error Message") will print "Error Message" to the standard error stream (often in a terminal) and exit the program. Any programs (such as the shell) that called your program will know that the there was an error since your program exited with a non-zero status.
It is printing out the string pointed to by pchar to the standard error output via fprintf and then forcing the application to exit with a return code of 1. This would be used for critical errors when the application can't continue running.
That function prints the provided string and a newline to stderr and then terminates the current running program, providing 1 as the return value.
fprintf is like printf in that it outputs characters, but fprintf is a little different in that it takes a file handle as an argument. I this case stderr is the file handle for standard error. This handle is already defined for you by stdio.h, and corresponds to the error output stream. stdout is what printf outputs to, so fprintf(stdout, "hello") is equivalent to printf("hello").
exit is a function that terminates the execution of the current process and returns whatever value was its argument as the return code to the parent process (usually the shell). A non-zero return code usually indicates failure, the specific value indicating the type of failure.
If you ran this program from the shell:
#include <stdio.h>
#include "errorexit.h"
int main(int argc, char* argv[])
{
printf("Hello world!\n");
errorexit("Goodbye :(");
printf("Just kidding!\n");
return 0;
}
You'd see this output:
Hello world!
Goodbye :(
And your shell would show "1" as the return value (in bash, you can view the last return code with echo $?).
Note that "Just kidding!" would not be printed, as errorexit calls exit, ending the program before main finishes.

Resources