Can I do something with a pipe of other process?

Can I do something with a pipe of other process? - linux

Here I want to inspect the status of a pipe between two running processes. I can identify the pipe file descriptor of both end in /proc/pid/fd, and thus the pipe number using ls -l like this:
l-wx------ 1 user user 64 Jul 22 08:53 5 -> 'pipe:[135687452]'
As I know the pipe is seen as a file in the Linux vfs philosophy, I suspect there is some way to dive into it without disturbing the owner process. Is there anything I can do with the file descriptor or the pipe ID to access more to the pipe? I hope I can get the content in the pipe or the content length in the pipe.
Many Thanks!

Reading the actual content would disturb the owner process. You can use FIONREAD to see the content length, though. Per man 7 pipe:
The following ioctl(2) operation, which can be applied to a file
descriptor that refers to either end of a pipe, places a count of
the number of unread bytes in the pipe in the int buffer pointed
to by the final argument of the call:
ioctl(fd, FIONREAD, &nbytes);
The FIONREAD operation is not specified in any standard, but is
provided on many implementations.

Related

What does fs.writeSync(1, "a string") mean in Node.js?

From the docs:
process.on('uncaughtException', (err) => {
fs.writeSync(1, `Caught exception: ${err}\n`);
});
Is 1 the stdout stream? I've read the docs for fs.write and there's no discussion of how to use an integer in the first argument. The source code wasn't much help either.
I put this line
(require('fs')).writeSync(1, `Starting...`);
into my code, thinking it would go to stdout but there was no output like that when I started my app. and it did output.
OK now that I see it did actually work (output to console): where is this documented though?

From the fs documentation
fs.writeSync(fd, buffer[, offset[, length[, position]]])
It's common to pass file handles around on Unix-derived systems using nominal types like FILE, but in reality all files are represented and referenced by an unsigned integer called a file descriptor or fd (which can also refer to other types of open files, including pipes, FIFOs, sockets, terminals, devices)
In regards to where the argument of 1 comes from, all systems compliant with the Single Unix Specification inherit three such file descriptors when starting a shell, which processes spawned from that shell inherit.
Descriptor 0 is standard input, the fd from which the process takes it's input.
Descriptor 1 is standard output, the fd to which the process writes it's output.
Descriptor 2 is standard error, the fd to which the process writes error messages.
All three of these are normally connected to the shell (and thus terminal emulator's) output.

The docs absolutely do show an integer 'fd' as the first argument, I'm not sure why you say it doesn't. What number 1 means isn't covered in the Node docs beyond the statement that they're POSIX file descriptors. That info is covered in Wikipedia. https://en.m.wikipedia.org/wiki/File_descriptor

Is it possible that linux file descriptor 0 1 2 not for stdin, stdout and stderr?

When a program begins, does it take file descriptors 0, 1 and 2 for stdin, stdout and stderr by default?. And will API calls such as open(...), socket(...) not return 0, 1 and 2 since these values are already taken?. Is there any case in which open(...) or socket(...) would return 0, 1 or 2. And 0, 1 and 2 are not related with stdin, stdout, and stderr.

At the file descriptor level, stdin is defined to be file descriptor 0, stdout is defined to be file descriptor 1; and stderr is defined to be file descriptor 2. See this.
Even if your program -or the shell- changes (e.g. redirect with dup2(2)) what is file descriptor 0, it always stays stdin (since by definition STDIN_FILENO is 0).
So of course stdin could be a pipe or a socket or a file (not a terminal). You could test with isatty(3) if it is a tty, and/or use fstat(2) to get status information on it.
Syscalls like open(2) or pipe(2) or socket(2) may give e.g. STDIN_FILENO (i.e. 0) if that file descriptor is free (e.g. because it has been close(2)-d before). But when that occurs, it is still stdin by definition.
Of course, in stdio(3), the FILE stream stdin is a bit more complex. Your program could fclose(3), freopen(3), fdopen(3) ...
Probably the kernel sets stdin, stdout, and stderr file descriptors to the console when magically starting /sbin/init as the first process.

Though a couple of answers already existed but I didn't find them informative enough that they explained the complete story.
Since I went ahead and researched more, I am adding my findings.
Whenever a process starts, an entry of the running process is added to the /proc/<pid> directory. This is the place where all of the data related to the process is kept. Also, on process start the kernel allocates 3 file-descriptors to the process for communication with the 3 data streams referred to as stdin, stdout and stderr.
the linux kernel uses an algorithm to always create a FD with the lowest possible integer value so these data-streams are mapped to the numbers 0, 1 and 2.
Since these are nothing but references to a stream and we can close a stream. One can easily call close(<fd>), in our case close(1), to close the file descriptor.
on doing ls -l /proc/<pid>/fd/, we see only 2 FDs listed there 0 and 2.
If we now do an open() call the kernel will create a new FD to map this new file reference and since kernel uses lowest integer first algorithm, it will pick up the integer value 1.
So now, the new FD created points to the file we opened (using the open() syscall)
Any of the data transfer that happens now is not via the default data-stream that was earlier linked but the new file that we opened.
So yes, we can map the FD 0, 1 or 2 to any of the file and not necessary stdin, stdout or stderr

When a program begins, does it take file descriptor 0 1 2 for stdin, stdout and stderr by default .
If you launch your program in an interactive shell normally, yes.
By #EJP:
Inheriting a socket as FD 0 also happens if the program is started by inetd, or anything else that behaves the same way.
will the API such as open(...), socket(...) not return 0 1 2 since these value are already be taken.
Yes.
Is there any case that open(...) or socket(...) would return 0 1 2.
Yes, if you do something like
close(0);
close(1);
close(2);
open(...); /* this open will return 0 if success */

Linux read operations requesting duplicate bytes?

This is a bit of a strange question. I'm writing a fuse module using the go-fuse library, and at the moment I have a "fake" file with a size of 6000 bytes, and which will output some unrelated data for all read requests. My read function looks like this:
func (f *MyFile) Read(buf []byte, off int64) (fuse.ReadResult, fuse.Status) {
log.Printf("Reading into buffer of len %d from %d\n",len(buf),off)
FillBuffer(buf, uint64(off), f.secret)
return fuse.ReadResultData(buf), fuse.OK
}
As you can see I'm outputting a log on every read containing the range of the read request. The weird thing is that when I cat the file I get the following:
2013/09/13 21:09:03 Reading into buffer of len 4096 from 0
2013/09/13 21:09:03 Reading into buffer of len 8192 from 0
So cat is apparently reading the first 4096 bytes of data, discarding it, then reading 8192 bytes, which encompasses all the data and so succeeds. I've tried with other programs too, including hexdump and vim, and they all do the same thing. Interestingly, if I do a head -c 3000 dir/fakefile it still does the two reads, even though the later one is completely unnecessary. Does anyone have any insights into why this might be happening?

I suggest you strace your cat process to see for yourself. On my system, cat reads by 64K chunks, and does a final read() to make sure it read the whole file. That last read() is necessary to make the distinction between a reading a "chunk-sized file" and a bigger file. i.e. it makes sure there is nothing left to read, as the file size could have changed between the fstat() and the read() system calls.
Is your "fake file" size being returned correctly to FUSE by stat/fstat() system calls?

how can i show the size of files in /proc? it should not be size zero

from the following message, we know that there are two characters in file /proc/sys/net/ipv4/ip_forward, but why ls just showed this file is of size zero?
i know this is not a file on disk, but a file in the memory, so is there any command which i can see the real size of the files in /proc?
root#OpenWrt:/proc/sys/net/ipv4# cat ip_forward | wc -c
2
root#OpenWrt:/proc/sys/net/ipv4# ls -l ip_forward
-rw-r--r-- 1 root root 0 Sep 3 00:20 ip_forward
root#OpenWrt:/proc/sys/net/ipv4# pwd
/proc/sys/net/ipv4

Those are not really files on disk (as you mention) but they are also not files in memory - the names in /proc correspond to calls into the running kernel in the operating system, and the contents are generated on the fly.
The system doesn't know how large the files would be without generating them, but if you read the "file" twice there's no guarantee you get the same data because the system may have changed.
You might be looking for the program
sysctl -a
instead.

Things in /proc are not really files. In most cases, they're not even files in memory. When you access these files, the proc filesystem driver performs a system call that gets data appropriate for the file, and then formats it for output. This is usually dynamic data that's constructed on the fly. An example of this is /proc/net/arp, which contains the current ARP cache.
Getting the size of these things can only be done by formatting the entire output, so it's not done just when listing the file. If you want the sizes, use wc -c as you did.

The /proc/ filesystem is an "illusion" maintained by the kernel, which does not bother giving the size of (most of) its pseudo-files (since computing that "real" size would usually involve having built the entire textual pseudo-file's content), and expects most [pseudo-] textual files from /proc/ to be read in sequence from first to last byte (i.e. till EOF), in reasonably sized (e.g. 1K) blocks. See proc(5) man page for details.
So there is no way to get the true size (of some file like /proc/self/maps or /proc/sys/net/ipv4/ip_forward) in a single syscall (like stat(2), because it would give a size of 0, as reported by stat(1) or ls(1) commands). A typical way of reading these textual files might be
FILE* f = fopen("/proc/self/maps", "r");
// or some other textual /proc file,
// e.g. /proc/sys/net/ipv4/ip_forward
if (f)
{
do {
// you could use readline instead of fgets
char line[256];
memset (line, 0, sizeof(line));
if (NULL == fgets(line, sizeof(line), f))
break;
// do something with line, for example:
fputs(line, stdout);
} while (!feof (f));
fclose (f);
}
Of course, some files (e.g. /proc/self/cmdline) are documented as possibly containing NUL bytes. You'll need some fread for them.

It's not really a file in the memory, it's an interface between the user and the kernel.

Confused about stdin, stdout and stderr?

I am rather confused with the purpose of these three files. If my understanding is correct, stdin is the file in which a program writes into its requests to run a task in the process, stdout is the file into which the kernel writes its output and the process requesting it accesses the information from, and stderr is the file into which all the exceptions are entered. On opening these files to check whether these actually do occur, I found nothing seem to suggest so!
What I would want to know is what exactly is the purpose of these files, absolutely dumbed down answer with very little tech jargon!

Standard input - this is the file handle that your process reads to get information from you.
Standard output - your process writes conventional output to this file handle.
Standard error - your process writes diagnostic output to this file handle.
That's about as dumbed-down as I can make it :-)
Of course, that's mostly by convention. There's nothing stopping you from writing your diagnostic information to standard output if you wish. You can even close the three file handles totally and open your own files for I/O.
When your process starts, it should already have these handles open and it can just read from and/or write to them.
By default, they're probably connected to your terminal device (e.g., /dev/tty) but shells will allow you to set up connections between these handles and specific files and/or devices (or even pipelines to other processes) before your process starts (some of the manipulations possible are rather clever).
An example being:
my_prog <inputfile 2>errorfile | grep XYZ
which will:
create a process for my_prog.
open inputfile as your standard input (file handle 0).
open errorfile as your standard error (file handle 2).
create another process for grep.
attach the standard output of my_prog to the standard input of grep.
Re your comment:
When I open these files in /dev folder, how come I never get to see the output of a process running?
It's because they're not normal files. While UNIX presents everything as a file in a file system somewhere, that doesn't make it so at the lowest levels. Most files in the /dev hierarchy are either character or block devices, effectively a device driver. They don't have a size but they do have a major and minor device number.
When you open them, you're connected to the device driver rather than a physical file, and the device driver is smart enough to know that separate processes should be handled separately.
The same is true for the Linux /proc filesystem. Those aren't real files, just tightly controlled gateways to kernel information.

It would be more correct to say that stdin, stdout, and stderr are "I/O streams" rather
than files. As you've noticed, these entities do not live in the filesystem. But the
Unix philosophy, as far as I/O is concerned, is "everything is a file". In practice,
that really means that you can use the same library functions and interfaces (printf,
scanf, read, write, select, etc.) without worrying about whether the I/O stream
is connected to a keyboard, a disk file, a socket, a pipe, or some other I/O abstraction.
Most programs need to read input, write output, and log errors, so stdin, stdout,
and stderr are predefined for you, as a programming convenience. This is only
a convention, and is not enforced by the operating system.

As a complement of the answers above, here is a sum up about Redirections:
EDIT: This graphic is not entirely correct.
The first example does not use stdin at all, it's passing "hello" as an argument to the echo command.
The graphic also says 2>&1 has the same effect as &> however
ls Documents ABC > dirlist 2>&1
#does not give the same output as
ls Documents ABC > dirlist &>
This is because &> requires a file to redirect to, and 2>&1 is simply sending stderr into stdout

I'm afraid your understanding is completely backwards. :)
Think of "standard in", "standard out", and "standard error" from the program's perspective, not from the kernel's perspective.
When a program needs to print output, it normally prints to "standard out". A program typically prints output to standard out with printf, which prints ONLY to standard out.
When a program needs to print error information (not necessarily exceptions, those are a programming-language construct, imposed at a much higher level), it normally prints to "standard error". It normally does so with fprintf, which accepts a file stream to use when printing. The file stream could be any file opened for writing: standard out, standard error, or any other file that has been opened with fopen or fdopen.
"standard in" is used when the file needs to read input, using fread or fgets, or getchar.
Any of these files can be easily redirected from the shell, like this:
cat /etc/passwd > /tmp/out # redirect cat's standard out to /tmp/foo
cat /nonexistant 2> /tmp/err # redirect cat's standard error to /tmp/error
cat < /etc/passwd # redirect cat's standard input to /etc/passwd
Or, the whole enchilada:
cat < /etc/passwd > /tmp/out 2> /tmp/err
There are two important caveats: First, "standard in", "standard out", and "standard error" are just a convention. They are a very strong convention, but it's all just an agreement that it is very nice to be able to run programs like this: grep echo /etc/services | awk '{print $2;}' | sort and have the standard outputs of each program hooked into the standard input of the next program in the pipeline.
Second, I've given the standard ISO C functions for working with file streams (FILE * objects) -- at the kernel level, it is all file descriptors (int references to the file table) and much lower-level operations like read and write, which do not do the happy buffering of the ISO C functions. I figured to keep it simple and use the easier functions, but I thought all the same you should know the alternatives. :)

I think people saying stderr should be used only for error messages is misleading.
It should also be used for informative messages that are meant for the user running the command and not for any potential downstream consumers of the data (i.e. if you run a shell pipe chaining several commands you do not want informative messages like "getting item 30 of 42424" to appear on stdout as they will confuse the consumer, but you might still want the user to see them.
See this for historical rationale:
"All programs placed diagnostics on the standard output. This had
always caused trouble when the output was redirected into a ﬁle, but
became intolerable when the output was sent to an unsuspecting
process. Nevertheless, unwilling to violate the simplicity of the
standard-input-standard-output model, people tolerated this state of
affairs through v6. Shortly thereafter Dennis Ritchie cut the Gordian
knot by introducing the standard error ﬁle. That was not quite enough.
With pipelines diagnostics could come from any of several programs
running simultaneously. Diagnostics needed to identify themselves."

stdin
Reads input through the console (e.g. Keyboard input).
Used in C with scanf
scanf(<formatstring>,<pointer to storage> ...);
stdout
Produces output to the console.
Used in C with printf
printf(<string>, <values to print> ...);
stderr
Produces 'error' output to the console.
Used in C with fprintf
fprintf(stderr, <string>, <values to print> ...);
Redirection
The source for stdin can be redirected. For example, instead of coming from keyboard input, it can come from a file (echo < file.txt ), or another program ( ps | grep <userid>).
The destinations for stdout, stderr can also be redirected. For example stdout can be redirected to a file: ls . > ls-output.txt, in this case the output is written to the file ls-output.txt. Stderr can be redirected with 2>.

Using ps -aux reveals current processes, all of which are listed in /proc/ as /proc/(pid)/, by calling cat /proc/(pid)/fd/0 it prints anything that is found in the standard output of that process I think. So perhaps,
/proc/(pid)/fd/0 - Standard Output File
/proc/(pid)/fd/1 - Standard Input File
/proc/(pid)/fd/2 - Standard Error File
for example
But only worked this well for /bin/bash other processes generally had nothing in 0 but many had errors written in 2

For authoritative information about these files, check out the man pages, run the command on your terminal.
$ man stdout
But for a simple answer, each file is for:
stdout for a stream out
stdin for a stream input
stderr for printing errors or log messages.
Each unix program has each one of those streams.

stderr will not do IO Cache buffering so if our application need to print critical message info (some errors ,exceptions) to console or to file use it where as use stdout to print general log info as it use IO Cache buffering there is a chance that before writing our messages to file application may close ,leaving debugging complex

A file with associated buffering is called a stream and is declared to be a pointer to a defined type FILE. The fopen() function creates certain descriptive data for a stream and returns a pointer to designate the stream in all further transactions. Normally there are three open streams with constant pointers declared in the header and associated with the standard open files.
At program startup three streams are predefined and need not be opened explicitly: standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output). When opened the standard error stream is not fully buffered; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device
https://www.mkssoftware.com/docs/man5/stdio.5.asp

Here is a lengthy article on stdin, stdout and stderr:
What Are stdin, stdout, and stderr on Linux?
To summarize:
Streams Are Handled Like Files
Streams in Linux—like almost everything else—are treated as though
they were files. You can read text from a file, and you can write text
into a file. Both of these actions involve a stream of data. So the
concept of handling a stream of data as a file isn’t that much of a
stretch.
Each file associated with a process is allocated a unique number to
identify it. This is known as the file descriptor. Whenever an action
is required to be performed on a file, the file descriptor is used to
identify the file.
These values are always used for stdin, stdout, and stderr:
0: stdin
1: stdout
2: stderr
Ironically I found this question on stack overflow and the article above because I was searching for information on abnormal / non-standard streams. So my search continues.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string