Related
Ive read numerous posts about tty. They all start with the historical reasons for the tty name. Please leave this out and just describe the tty system as it exists today. Then they talk about how a tty is a file and that stdin, stdout and sterr of a process started in a terminal are all mapped to this file.
How are three files mapped to a single file?
Some say that tty allows line editing before enter is hit and does other line discipline stuff. There is a blog post which says that each tty has its own stdin and stdout . The blog post by Linus Akesson which i'm still grappling with explains that there is in fact a tty driver in the kernel and a tty device file. then there is the controlling terminal, sessions, terminal emulators, raw and cooked modes, pty and what not.
To better understand what tty is, can someone explain to me a what happens in this simple situation:
A terminal is opened and it runs the default shell. From the shell a process is run and it asks for input.
What happens when the call scanf is made?
How does the terminal know that scanf is called?
The editing buffer in the terminal which we see afterwards(the line where we enter text) - where does it come from? Does this buffer exist in the tty device file and is being outputted like an stdout file is printed?
Which process is controlling this buffer? The tty driver?
What happens when we press enter? Does the tty driver 'submit' the line to the stdin part of the tty device?
How does the process know that input has been submitted.
And the output part: When the same process outputs something, does it write to the tty device? But isnt the tty already outputting the current editing buffer line?
IF theres a better way to decribe what a tty does without answering the above questions then please do so If i missed some crucial part please fill in as you think necessary.
This turned out really long, so brace yourself...
TTY Device
You are looking at a TTY as a screen divided to e.g. 80x24 tiles.
But a TTY is a console: it contains an input device (generally connected to a keyboard) and an output device (generally connected to a screen).
TTY Abstraction
While the TTY is connected to (physical or simulated) devices, Unix processes don't see devices, they see an abstraction. This abstraction consists of an input stream, an output stream, and a control interface.
The control interface can turn on/off some "fancy" features like sending the input it receives not only to the process using the terminal, but also to its own output stream (that feature is called "echo" and can be controlled with stty echo), but the fact that you are seeing something on the terminal output doesn't mean it's connected to any form of stdout.
This abstraction is implemented in the kernel by the TTY driver and the line discipline. You can think of the line discipline as the Strategy design pattern.
This abstraction has to be available to userspace, and the way Unix drivers export anything to userspace is by creating special files such as /dev/tty.
A TTY file allows you to read() the input stream, write() to the output stream, and turn features on/off via ioctl()
TTY Files
Usually, every time you launch a new terminal, a new TTY file is created by the driver.
Any process can open a tty file, regardless of whether that file will be the process' stdin, stdout, stderr, all of them, or none of them.
You can see for yourself: Open a terminal and type tty. Let's say it printed /dev/pts/3.
Now open another terminal and run:
exec 10>/dev/pts/3 # open /dev/pts/3 as file descriptor 10
echo hello >&10 # write "hello" through file descriptor 10
This will cause echo to write hello to file descriptor 10, which is the first terminal. Accordingly, the first terminal will print hello.
Standard Streams
Unix implements the 3 standard streams, stdin, stdout, and stderr. These receive no special treatment whatsoever from the terminal driver or line discipline, and most of their implementation is in the shell.
When you start your terminal emulator, it opens a tty file, say /dev/pts/3. It then creates a new process (fork()), opens /dev/pts/3 as file descriptors 0 (stdin), 1 (stdout), and 2 (stderr), and then executes the shell.
Which means that when the shell starts, it has the terminal file, with its streams and control interfaces. When the shell writes to either stdout or stderr, both writes go to the TTY's output stream.
When the shell executes another process, the process inherits /dev/pts/3 as its file descriptors 0, 1, 2, unless the shell does redirection, or the executed programs changes these file descriptors.
Specific Answers
Now we're ready to answer your questions:
What happens when the call scanf is made?
scanf() calls read(STDIN), which calls the TTY driver's implementation of read().
In cooked mode, this will block until the input stream has buffered a full line. In raw mode, it will block until at least one character was read.
Then the TTY input buffer is copied to scanf's buffer.
How does the terminal know that scanf is called?
It doesn't. If you type something into the terminal while the program is running and not awaiting your input, it will be buffered in the terminal's input stream.
Then, whenever scanf is called, if at all, it will read that buffer. If scanf isn't called, then the program ends, control returns to the shell. and the shell reads that buffer. You can see it by running sleep 30, and while it's running, type another command and press enter. The shell will execute it after sleep is done:
bash-4.3$ sleep 30
echo hello
bash-4.3$ echo hello
hello
bash-4.3$
The editing buffer in the terminal which we see afterwards(the line where we enter text) - where does it come from? Does this buffer exist in the tty device file and is being outputted like an stdout file is printed?
The buffer exists in the kernel and is attached to the TTY file.
If the terminal's echo feature is on, the line discipline will send the input stream not only to the input buffer, but also to the output stream.
If the terminal is in cooked (default) mode, the line discipline will give special treatment characters like backspace (yes, backspace is a character, ASCII 8). In the case of backspace, it will remove the last character from the input buffer, and send to the output stream a control sequence to delete the last character from the screen.
Note that the buffer is managed separately from what you see on the screen.
Which process is controlling this buffer? The tty driver?
The buffer is in the kernel, and not controlled by a process, but rather by the line discipline, which is controlled by the TTY driver.
What happens when we press enter? Does the tty driver 'submit' the line to the stdin part of the tty device?
When you press "enter" a linefeed character (\n) is added to the buffer, and, if any process is waiting on terminal input, the input buffer is copied to the process' buffer and the process becomes unblocked and continues to run.
The more interesting question is what happens when you press something that is NOT "enter". In raw mode, it doesn't matter if it's "enter" or not, because \n doesn't get any special treatment. In cooked mode, however, the input buffer is not copied and the reading process is not notified.
How does the process know that input has been submitted.
The process calls e.g. scanf() which read(STDIN), which will block the process until input is available. When input is available, the TTY driver will unblock the blocked process (i.e. wake it up).
Note that this is not special to TTY files or to STDIN, it applies to all files, that's just how read() works.
Also note that scanf() doesn't know whether or not STDIN is a TTY file.
When the same process outputs something, does it write to the tty device?
when you call something like printf(), it calls write(STDOUT), which calls the TTY driver's implementation of write(), which writes to the TTY's output stream.
Again, note that printf() doesn't know whether or not STDOUT is a TTY file.
But isnt the tty already outputting the current editing buffer line?
In Unix, a file (any file, not just TTY files) can be opened and written to by multiple writers, and no synchronization among them is guaranteed.
As you can see with the echo hello >&10 example above, the process running in the terminal is not the only thing which can write to the TTY's output stream, but even an unrelated process can write to the TTY's output stream.
And when echo is enabled, the line discipline can also write to the TTY's output stream.
All these writes will be interleaved, the driver won't try to synchronize them or make sense of them.
Someone reports that given a stream of strings on the serial port which is pipelined to the OCaml program below, the output of the program is not continuous, but instead it appears in chunks (of a few tens of lines), as if buffered.
What can be the cause of the non-continuous output?
(The output buffer should be flushed after each new line due to the use of '%!'. So this shouldn't be the cause, right?)
let tp = ref 0
let get_next_entry ic =
try
let (ts, pred, v) = Scanf.fscanf ic " #%d %s#(%d)\n" (fun x y z -> (x,y,z)) in
Printf.printf "at timepoint %d (timestamp %d): %s(%d)\n%!" !tp ts pred v;
incr tp;
true
with End_of_file ->
false
let _ =
while get_next_entry stdin do
()
done
The OCaml version used is 4.05.
It is a threefold problem. From the least likely to the most likely.
The glitching output
It is all in the eye of the beholder, as how the program output will look like depends on the environment in which it is run, i.e., on a program that runs your program and renders this on a visual device. In other words, it involves a lot of variables that are beyond the context of this program.
With that said, let me explain what flush means for the printf function. The printf facility relies on buffered channels. And each channel is roughly a pair of a buffer and system-specific file descriptor. When someone (including printf) outputs to a channel, the information first goes into the buffer and remains there until the next portion of information overrides the buffer (i.e., there is no more space in the buffer) or until the flush function is called explicitly. Then the buffer is flushed, which means that the information in the buffer is transferred to the operating system (e.g., using the write system call or library function).
What happens afterward is system dependent. If the file descriptor was associated with a regular file, then you might expect that the information will be passed to it entirely(though the file system has its own hierarchy of caches, so there're caveats also). If the descriptor was associated with a Unix-style shell process through a pipe, then it will go into the pipe's buffer, extracted from it by the shell and printed using a terminal interface, usually fulfilled with some terminal emulator. By default shells are line-buffered, so the line should be printed as a whole unless the user of the shell changes its parameters somehow.
Basically, I hope you get the idea, it is not your program which is actually manipulating with the terminal and lighting up pixels on your monitors. Your program is just outputting data and some other program is receiving this data and drawing it on the screen. And this some other program (a terminal, or terminal emulator, e.g., minicom) is making this output glitchy, not your program. Your program is doing its best to be printed correctly - full line or nothing.
Your program is glitching
And it is. The in_channel is also buffered, so it will accumulate a few bytes before calling sprintf. Therefore, you can just read from the buffered channel and expect a realtime response to it. The most reliable way for you would be to use the Unix module and process the input using your own buffering.
The glitching input
Finally, the input program can also give you the information in chunks. This is especially true for serial interfaces, so make sure that you have correctly set up your terminal interface using the Unix.tcsetattr function. In particular, when your program is blocked on the input, the operating system may decide not to wake it up on each arrived character or line. This behavior is controlled by the terminal interface (see the Canonical and Non-canonical modes. If your input doesn't have newlines, then you shall use the non-canonical mode).
Finally, the device itself could be acting jittering, and if you have an oscilloscope nearby you can observe the signals it is sending. And make sure that you have configured your serial port as prescribed in the user manual of your device.
One possibility is that fscanf is waiting until it sees everything it's looking for.
I'm a beginner in assembly (using nasm). I'm learning assembly through a college course.
I'm trying to understand the behavior of the sys_read linux system call when it's invoked. Specifically, sys_read stops when it reads a new line or line feed. According to what I've been taught, this is true. This online tutorial article also affirms the fact/claim.
When sys_read detects a linefeed, control returns to the program and the users input is located at the memory address you passed in ECX.
I checked the linux programmer's manual for the sys_read call (via "man 2 read"). It does not mention the behavior when it's supposed to, right?
read() attempts to read up to count bytes from file descriptor fd
into the buffer starting at buf.
On files that support seeking, the read operation commences at the
file offset, and the file offset is incremented by the number of bytes
read. If the file offset is at or past the end of file, no bytes are
read, and read() returns zero.
If count is zero, read() may detect the errors described below. In
the absence of any errors, or if read() does not check for errors, a
read() with a count of 0 returns zero and has no other effects.
If count is greater than SSIZE_MAX, the result is unspecified.
So my question really is, why does the behavior happen? Is it a specification in the linux kernel that this should happen or is it a consequence of something else?
It's because you're reading from a POSIX tty in canonical mode (where backspace works before you press return to "submit" the line; that's all handled by the kernel's tty driver). Look up POSIX tty semantics / stty / ioctl. If you ran ./a.out < input.txt, you wouldn't see this behaviour.
Note that read() on a TTY will return without a newline if you hit control-d (the EOF tty control-sequence).
Assuming that read() reads whole lines is ok for a toy program, but don't start assuming that in anything that needs to be robust, even if you've checked that you're reading from a TTY. I forget what happens if the user pastes multiple lines of text into a terminal emulator. Quite probably they all end up in a single read() buffer.
See also my answer on a question about small read()s leaving unread data on the terminal: if you type more characters on one line than the read() buffer size, you'll need at least one more read system call to clear out the input.
As you noted, the read(2) libc function is just a thin wrapper around sys_read. The answer to this question really has nothing to do with assembly language, and is the same for systems programming in C (or any other language).
Further reading:
stty(1) man page: where you can change which control character does what.
The TTY demystified: some history, and some diagrams showing how xterm, the kernel, and the process reading from the tty all interact. And stuff about session management, and signals.
https://en.wikipedia.org/wiki/POSIX_terminal_interface#Canonical_mode_processing and related parts of that article.
This is not an attribute of the read() system call, but rather a property of termios, the terminal driver. In the default configuration, termios buffers incoming characters (i.e. what you type) until you press Enter, after which the entire line is sent to the program reading from the terminal. This is for convenience so you can edit the line before sending it off.
As Peter Cordes already said, this behaviour is not present when reading from other kinds of files (like regular files) and can be turned off by configuring termios.
What the tutorial says is garbage, please disregard it.
I developed my own log processing program. to process logs originated from printk(), I read from kernel ring buffer like this:
#define _PATH_KLOG "/proc/kmsg"
CGR_INT kernelRingBufferFileDescriptor = open(_PATH_KLOG, O_RDONLY|O_NONBLOCK);
CGR_CHAR kernelLogMessage[MAX_KERNEL_RING_BUFFER + 1] = {'\0'};
while (1)
{
...
read(kernelRingBufferFileDescriptor, kernelLogMessage + residueSize, MAX_KERNEL_RING_BUFFER);
...
}
my program is in user space. I remember whenever someone use read() to read data in the ring buffer (like I did above), the part that is read will be cleared from the ring buffer. Is it the case, or is it not?
I am confused about this, since there is always something in the ring buffer, and as a result, my program is very busy processing all these logs. So I am not sure is it because some module is keeping sending logs to me or is it because I read the same logs again and again since logs are not cleared.
TO figure out, I use klogctl() to check the ring buffer:
CGR_CHAR buf[MAX_KERNEL_RING_BUFFER] = {0};
int byteCount = klogctl(4, buf, MAX_KERNEL_RING_BUFFER - 1); /* 4 -- Read and clear all messages remaining in the ring buffer */
printf("%s %d: data read from kernel ring buffer = \"%s\"\n",__FILE__, __LINE__, buf);
and I keep getting data all the time. Since klogctl() with argument 4 read and clear ring buffer, I kind of believing some module DOES sending logs to me all the time.
Can anyone tell me - does read() clear ring buffer?
Become root and run this cat /proc/kmsg >> File1.txt and cat /proc/kmsg >> File2.txt. Compare File1.txt and File2.txt You will immediately know whether the ring buffer is getting cleared on read() cos cat internally invokes read() anyways!
Also read about ring buffers and how they behave in the Kernel Documentation here-
http://www.mjmwired.net/kernel/Documentation/trace/ring-buffer-design.txt
EDIT: I found something interesting in the book Linux Device Drivers by Jonathan Corbet-
The printk function writes messages into a circular buffer that is
__LOG_BUF_LEN bytes long: a value from 4 KB to 1 MB chosen while configuring the kernel. The function then wakes any process that is
waiting for messages, that is, any process that is sleeping in the
syslog system call or that is reading /proc/kmsg. These two interfaces
to the logging engine are almost equivalent, but note that reading
from /proc/kmsg consumes the data from the log buffer, whereas the
syslog system call can optionally return log data while leaving it for
other processes as well. In general, reading the /proc file is easier
and is the default behavior for klogd. The dmesg command can be used
to look at the content of the buffer without flushing it; actually,
the command returns to stdout the whole content of the buffer, whether
or not it has already been read
So in your particular case, if you are using a plain read(), I think the buffer is indeed getting cleared and new data is being constantly written into it and hence you find some data all the time! Kernel experts can correct me here!
From reading the do_syslog function, it seems that messages are cleared when they're read.
By your description, you get the same behavior with klogctl(4), which also clears the buffer, so it makes sense.
So maybe there's indeed someone that keeps writing messages.
You can find which printk it is, by the text, disable it, and see what you get. Or you can add the jiffies value to the message, so you'll know if you keep getting new messages, or are these are the same ones.
I am rather confused with the purpose of these three files. If my understanding is correct, stdin is the file in which a program writes into its requests to run a task in the process, stdout is the file into which the kernel writes its output and the process requesting it accesses the information from, and stderr is the file into which all the exceptions are entered. On opening these files to check whether these actually do occur, I found nothing seem to suggest so!
What I would want to know is what exactly is the purpose of these files, absolutely dumbed down answer with very little tech jargon!
Standard input - this is the file handle that your process reads to get information from you.
Standard output - your process writes conventional output to this file handle.
Standard error - your process writes diagnostic output to this file handle.
That's about as dumbed-down as I can make it :-)
Of course, that's mostly by convention. There's nothing stopping you from writing your diagnostic information to standard output if you wish. You can even close the three file handles totally and open your own files for I/O.
When your process starts, it should already have these handles open and it can just read from and/or write to them.
By default, they're probably connected to your terminal device (e.g., /dev/tty) but shells will allow you to set up connections between these handles and specific files and/or devices (or even pipelines to other processes) before your process starts (some of the manipulations possible are rather clever).
An example being:
my_prog <inputfile 2>errorfile | grep XYZ
which will:
create a process for my_prog.
open inputfile as your standard input (file handle 0).
open errorfile as your standard error (file handle 2).
create another process for grep.
attach the standard output of my_prog to the standard input of grep.
Re your comment:
When I open these files in /dev folder, how come I never get to see the output of a process running?
It's because they're not normal files. While UNIX presents everything as a file in a file system somewhere, that doesn't make it so at the lowest levels. Most files in the /dev hierarchy are either character or block devices, effectively a device driver. They don't have a size but they do have a major and minor device number.
When you open them, you're connected to the device driver rather than a physical file, and the device driver is smart enough to know that separate processes should be handled separately.
The same is true for the Linux /proc filesystem. Those aren't real files, just tightly controlled gateways to kernel information.
It would be more correct to say that stdin, stdout, and stderr are "I/O streams" rather
than files. As you've noticed, these entities do not live in the filesystem. But the
Unix philosophy, as far as I/O is concerned, is "everything is a file". In practice,
that really means that you can use the same library functions and interfaces (printf,
scanf, read, write, select, etc.) without worrying about whether the I/O stream
is connected to a keyboard, a disk file, a socket, a pipe, or some other I/O abstraction.
Most programs need to read input, write output, and log errors, so stdin, stdout,
and stderr are predefined for you, as a programming convenience. This is only
a convention, and is not enforced by the operating system.
As a complement of the answers above, here is a sum up about Redirections:
EDIT: This graphic is not entirely correct.
The first example does not use stdin at all, it's passing "hello" as an argument to the echo command.
The graphic also says 2>&1 has the same effect as &> however
ls Documents ABC > dirlist 2>&1
#does not give the same output as
ls Documents ABC > dirlist &>
This is because &> requires a file to redirect to, and 2>&1 is simply sending stderr into stdout
I'm afraid your understanding is completely backwards. :)
Think of "standard in", "standard out", and "standard error" from the program's perspective, not from the kernel's perspective.
When a program needs to print output, it normally prints to "standard out". A program typically prints output to standard out with printf, which prints ONLY to standard out.
When a program needs to print error information (not necessarily exceptions, those are a programming-language construct, imposed at a much higher level), it normally prints to "standard error". It normally does so with fprintf, which accepts a file stream to use when printing. The file stream could be any file opened for writing: standard out, standard error, or any other file that has been opened with fopen or fdopen.
"standard in" is used when the file needs to read input, using fread or fgets, or getchar.
Any of these files can be easily redirected from the shell, like this:
cat /etc/passwd > /tmp/out # redirect cat's standard out to /tmp/foo
cat /nonexistant 2> /tmp/err # redirect cat's standard error to /tmp/error
cat < /etc/passwd # redirect cat's standard input to /etc/passwd
Or, the whole enchilada:
cat < /etc/passwd > /tmp/out 2> /tmp/err
There are two important caveats: First, "standard in", "standard out", and "standard error" are just a convention. They are a very strong convention, but it's all just an agreement that it is very nice to be able to run programs like this: grep echo /etc/services | awk '{print $2;}' | sort and have the standard outputs of each program hooked into the standard input of the next program in the pipeline.
Second, I've given the standard ISO C functions for working with file streams (FILE * objects) -- at the kernel level, it is all file descriptors (int references to the file table) and much lower-level operations like read and write, which do not do the happy buffering of the ISO C functions. I figured to keep it simple and use the easier functions, but I thought all the same you should know the alternatives. :)
I think people saying stderr should be used only for error messages is misleading.
It should also be used for informative messages that are meant for the user running the command and not for any potential downstream consumers of the data (i.e. if you run a shell pipe chaining several commands you do not want informative messages like "getting item 30 of 42424" to appear on stdout as they will confuse the consumer, but you might still want the user to see them.
See this for historical rationale:
"All programs placed diagnostics on the standard output. This had
always caused trouble when the output was redirected into a file, but
became intolerable when the output was sent to an unsuspecting
process. Nevertheless, unwilling to violate the simplicity of the
standard-input-standard-output model, people tolerated this state of
affairs through v6. Shortly thereafter Dennis Ritchie cut the Gordian
knot by introducing the standard error file. That was not quite enough.
With pipelines diagnostics could come from any of several programs
running simultaneously. Diagnostics needed to identify themselves."
stdin
Reads input through the console (e.g. Keyboard input).
Used in C with scanf
scanf(<formatstring>,<pointer to storage> ...);
stdout
Produces output to the console.
Used in C with printf
printf(<string>, <values to print> ...);
stderr
Produces 'error' output to the console.
Used in C with fprintf
fprintf(stderr, <string>, <values to print> ...);
Redirection
The source for stdin can be redirected. For example, instead of coming from keyboard input, it can come from a file (echo < file.txt ), or another program ( ps | grep <userid>).
The destinations for stdout, stderr can also be redirected. For example stdout can be redirected to a file: ls . > ls-output.txt, in this case the output is written to the file ls-output.txt. Stderr can be redirected with 2>.
Using ps -aux reveals current processes, all of which are listed in /proc/ as /proc/(pid)/, by calling cat /proc/(pid)/fd/0 it prints anything that is found in the standard output of that process I think. So perhaps,
/proc/(pid)/fd/0 - Standard Output File
/proc/(pid)/fd/1 - Standard Input File
/proc/(pid)/fd/2 - Standard Error File
for example
But only worked this well for /bin/bash other processes generally had nothing in 0 but many had errors written in 2
For authoritative information about these files, check out the man pages, run the command on your terminal.
$ man stdout
But for a simple answer, each file is for:
stdout for a stream out
stdin for a stream input
stderr for printing errors or log messages.
Each unix program has each one of those streams.
stderr will not do IO Cache buffering so if our application need to print critical message info (some errors ,exceptions) to console or to file use it where as use stdout to print general log info as it use IO Cache buffering there is a chance that before writing our messages to file application may close ,leaving debugging complex
A file with associated buffering is called a stream and is declared to be a pointer to a defined type FILE. The fopen() function creates certain descriptive data for a stream and returns a pointer to designate the stream in all further transactions. Normally there are three open streams with constant pointers declared in the header and associated with the standard open files.
At program startup three streams are predefined and need not be opened explicitly: standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output). When opened the standard error stream is not fully buffered; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device
https://www.mkssoftware.com/docs/man5/stdio.5.asp
Here is a lengthy article on stdin, stdout and stderr:
What Are stdin, stdout, and stderr on Linux?
To summarize:
Streams Are Handled Like Files
Streams in Linux—like almost everything else—are treated as though
they were files. You can read text from a file, and you can write text
into a file. Both of these actions involve a stream of data. So the
concept of handling a stream of data as a file isn’t that much of a
stretch.
Each file associated with a process is allocated a unique number to
identify it. This is known as the file descriptor. Whenever an action
is required to be performed on a file, the file descriptor is used to
identify the file.
These values are always used for stdin, stdout, and stderr:
0: stdin
1: stdout
2: stderr
Ironically I found this question on stack overflow and the article above because I was searching for information on abnormal / non-standard streams. So my search continues.