Unexpected periodic, non-continuous output for OCaml program - io

Someone reports that given a stream of strings on the serial port which is pipelined to the OCaml program below, the output of the program is not continuous, but instead it appears in chunks (of a few tens of lines), as if buffered.
What can be the cause of the non-continuous output?
(The output buffer should be flushed after each new line due to the use of '%!'. So this shouldn't be the cause, right?)
let tp = ref 0
let get_next_entry ic =
try
let (ts, pred, v) = Scanf.fscanf ic " #%d %s#(%d)\n" (fun x y z -> (x,y,z)) in
Printf.printf "at timepoint %d (timestamp %d): %s(%d)\n%!" !tp ts pred v;
incr tp;
true
with End_of_file ->
false
let _ =
while get_next_entry stdin do
()
done
The OCaml version used is 4.05.

It is a threefold problem. From the least likely to the most likely.
The glitching output
It is all in the eye of the beholder, as how the program output will look like depends on the environment in which it is run, i.e., on a program that runs your program and renders this on a visual device. In other words, it involves a lot of variables that are beyond the context of this program.
With that said, let me explain what flush means for the printf function. The printf facility relies on buffered channels. And each channel is roughly a pair of a buffer and system-specific file descriptor. When someone (including printf) outputs to a channel, the information first goes into the buffer and remains there until the next portion of information overrides the buffer (i.e., there is no more space in the buffer) or until the flush function is called explicitly. Then the buffer is flushed, which means that the information in the buffer is transferred to the operating system (e.g., using the write system call or library function).
What happens afterward is system dependent. If the file descriptor was associated with a regular file, then you might expect that the information will be passed to it entirely(though the file system has its own hierarchy of caches, so there're caveats also). If the descriptor was associated with a Unix-style shell process through a pipe, then it will go into the pipe's buffer, extracted from it by the shell and printed using a terminal interface, usually fulfilled with some terminal emulator. By default shells are line-buffered, so the line should be printed as a whole unless the user of the shell changes its parameters somehow.
Basically, I hope you get the idea, it is not your program which is actually manipulating with the terminal and lighting up pixels on your monitors. Your program is just outputting data and some other program is receiving this data and drawing it on the screen. And this some other program (a terminal, or terminal emulator, e.g., minicom) is making this output glitchy, not your program. Your program is doing its best to be printed correctly - full line or nothing.
Your program is glitching
And it is. The in_channel is also buffered, so it will accumulate a few bytes before calling sprintf. Therefore, you can just read from the buffered channel and expect a realtime response to it. The most reliable way for you would be to use the Unix module and process the input using your own buffering.
The glitching input
Finally, the input program can also give you the information in chunks. This is especially true for serial interfaces, so make sure that you have correctly set up your terminal interface using the Unix.tcsetattr function. In particular, when your program is blocked on the input, the operating system may decide not to wake it up on each arrived character or line. This behavior is controlled by the terminal interface (see the Canonical and Non-canonical modes. If your input doesn't have newlines, then you shall use the non-canonical mode).
Finally, the device itself could be acting jittering, and if you have an oscilloscope nearby you can observe the signals it is sending. And make sure that you have configured your serial port as prescribed in the user manual of your device.

One possibility is that fscanf is waiting until it sees everything it's looking for.

Related

What buffers are between commands in a pipeline?

I thought it was 1 buffer, but now it occurs to me that it might be 2.
I mean in a pipeline:
cmd1 | cmd2
cmd2's output might be e.g. line buffered and there is no pipeline there. This should be the buffer managed by libc FILE * stream functions like fwrite(), or is this buffer also used by write(3)? However, I just remembered that pipe(7) talks about the size of the pipe buffers that's apparently controlled in the kernel.
Is stdin buffered, too? Are there 3 buffers, 1 in kernel space and 2 in user space?
I previously thought that read(2) hung when a pipeline buffer was empty, but when stdin is not a pipe but rather a terminal, there i no pipeline buffer, right? If it doesn't have its own buffer, does it check different buffers depending on whether stdin is a pipe or a terminal or a regular file?
EDIT: Changed "how many" to "what" in the question. I hope that's not too big a change. I'm interested in knowing what the buffers are, not just a number.
Is stdin buffered, too? Are there 3 buffers, 1 in kernel space and 2 in user space?
Yes, in general there could be 3 buffers here: one for cmd1 stdout, one for cmd2 stdin, and one in kernel space.
I previously thought that read(2) hung when a pipeline buffer was empty, but when stdin is not a pipe but rather a terminal, there i no pipeline buffer, right?
The read system call blocks when there is no input, but that has nothing to do with stdio buffering.
The kernel buffer exists regardless of whether the input is from a terminal or not (it would be very inefficient for the kernel to transfer one character at a time).
By default the stdio library will not buffer terminal input, but the application can change that with explicit calls to e.g. setvbuf.
A blog post with more details is here.

Linux tty flip buffer lock when reading part of available data

I have a driver that builds on the new serdev bus in the linux kernel.
In my driver I receive messages from an external device, all messages ends with a null byte (0x00) and the protocol ensures that there are no null bytes in my data (COBS). Now I try to have the TTY layer hand me full messages by scanning for zeros in my input and if there are none I'll just return zero in the callback that is called from the tty layer when bytes are available.
This kind of works. Or rather it works for some messages. After a while though it locks up and the tty layer keeps sending the same size of received bytes indefinitely. My guess is that this happens when one half of the tty flip buffer is full and the rest of my message is in the other half.
I have two questions:
Am I correct in that the tty layer can "hang" until I read out all data in one half of the flip buffer?
If that is so, is there some way to prevent this from happening? I'd rather not implement my own buffering scheme on top of the tty buffer already available.
Thanks
It looks like (drivers/tty/tty_buffer.c and the function flush_to_ldisc) that it is not possible to do what I attempted to do. When the tty buffer is about to flip over the consumer will have to do a read and buffer any half messages.
That is, returning zero and hoping for a larger chunk of data in your callback next time will only work up until the end of the first part of the buffer then the last bit of data must be read.
This is not a problem in userspace because a read call will have an argument that is the most bytes you want but read is free to return fewer bytes than requested.

Why does the sys_read system call end when it detects a new line?

I'm a beginner in assembly (using nasm). I'm learning assembly through a college course.
I'm trying to understand the behavior of the sys_read linux system call when it's invoked. Specifically, sys_read stops when it reads a new line or line feed. According to what I've been taught, this is true. This online tutorial article also affirms the fact/claim.
When sys_read detects a linefeed, control returns to the program and the users input is located at the memory address you passed in ECX.
I checked the linux programmer's manual for the sys_read call (via "man 2 read"). It does not mention the behavior when it's supposed to, right?
read() attempts to read up to count bytes from file descriptor fd
into the buffer starting at buf.
On files that support seeking, the read operation commences at the
file offset, and the file offset is incremented by the number of bytes
read. If the file offset is at or past the end of file, no bytes are
read, and read() returns zero.
If count is zero, read() may detect the errors described below. In
the absence of any errors, or if read() does not check for errors, a
read() with a count of 0 returns zero and has no other effects.
If count is greater than SSIZE_MAX, the result is unspecified.
So my question really is, why does the behavior happen? Is it a specification in the linux kernel that this should happen or is it a consequence of something else?
It's because you're reading from a POSIX tty in canonical mode (where backspace works before you press return to "submit" the line; that's all handled by the kernel's tty driver). Look up POSIX tty semantics / stty / ioctl. If you ran ./a.out < input.txt, you wouldn't see this behaviour.
Note that read() on a TTY will return without a newline if you hit control-d (the EOF tty control-sequence).
Assuming that read() reads whole lines is ok for a toy program, but don't start assuming that in anything that needs to be robust, even if you've checked that you're reading from a TTY. I forget what happens if the user pastes multiple lines of text into a terminal emulator. Quite probably they all end up in a single read() buffer.
See also my answer on a question about small read()s leaving unread data on the terminal: if you type more characters on one line than the read() buffer size, you'll need at least one more read system call to clear out the input.
As you noted, the read(2) libc function is just a thin wrapper around sys_read. The answer to this question really has nothing to do with assembly language, and is the same for systems programming in C (or any other language).
Further reading:
stty(1) man page: where you can change which control character does what.
The TTY demystified: some history, and some diagrams showing how xterm, the kernel, and the process reading from the tty all interact. And stuff about session management, and signals.
https://en.wikipedia.org/wiki/POSIX_terminal_interface#Canonical_mode_processing and related parts of that article.
This is not an attribute of the read() system call, but rather a property of termios, the terminal driver. In the default configuration, termios buffers incoming characters (i.e. what you type) until you press Enter, after which the entire line is sent to the program reading from the terminal. This is for convenience so you can edit the line before sending it off.
As Peter Cordes already said, this behaviour is not present when reading from other kinds of files (like regular files) and can be turned off by configuring termios.
What the tutorial says is garbage, please disregard it.

How to read from STDIN_FILENO even if terminal is closed?

Im trying to code a program in Linux to read every input from keyboard, but using STDIN_FILENO it only reads those entered in the terminal. What I want is during execution it should read keyboard even if the terminal is closed.
STDIN_FILENO is just a helper macro.
From stdin you recieve stream of bytes that are passed to your program, they doesnt neccessary come from terminal - also can from a file, etc. It's not capturing keyboard. The terminal is capturing keyboard and then passes entered data to your program's stdin.
In order to capture keyboard you will need some other method of receiving events. I guess you are running GUI aka X server; Normally applications create windows and receive events related to them. In order to capture all keyboard events, you will have to go more low-level. Take a look at xlib which should be sufficient for you, even though it might not be.

Serial port determinism

This seems like a simple question, but it is difficult to search for. I need to interface with a device over the serial port. In the event my program (or another) does not finish writing a command to the device, how do I ensure the next run of the program can successfully send a command?
Example:
The foo program runs and begins writing "A_VERY_LONG_COMMAND"
The user terminates the program, but the program has only written, "A_VERY"
The user runs the program again, and the command is resent. Except, the device sees "A_VERYA_VERY_LONG_COMMAND," which isn't what we want.
Is there any way to make this more deterministic? Serial port programming feels very out-of-control due to issues like this.
The required method depends on the device.
Serial ports have additional control signal lines as well as the serial data line; perhaps one of them will reset the device's input. I've never done serial port programming but I think ioctl() handles this.
There may be a single byte which will reset, e.g. some control character.
There might be a timing-based signal, e.g. Hayes command set modems use “pause +++ pause”.
It might just reset after not receiving a complete command after a fixed time.
It might be useful to know whether the device was originally intended to support interactive use (serial terminal), control by a program, or both.
I would guess that if you call write("A_VERY_LONG_COMMAND"), and then the user hits Ctrl+C while the bytes are going out on the line, the driver layer should finish sending the full buffer. And if the user interrupts in the middle of the call, the driver layer will probably just ignore the whole thing.
Just in case, when you open a new COM port, it's always wise to clear the port.
Do you have control over the device end? It might make sense to implement a timeout to make the device ignore unfinished or otherwise corrupt packets.
The embedded device should be implemented such that you can either send an abort/clear/break character that will dump the contents of its command buffer and give you a clean slate on your client app startup.
Or else it should provide a software reset character which will reset the command buffer and all state.
Or else it so be designed so that you can send a command termination (perhaps a newline, etc, depending on command protocol) and possibly have an error generated on the parsing of a garbled partial command that was in its buffer, query/clear the error, and then be good to go.
It wouldn't be a bad idea upon connection of your client program to send some health/status/error query repeatedly until you get a sound response, and only then commence sending configuration or operation commands. Unless you can via a query determine that the device was left in a suitable state, you probably want to assume nothing and configure it from scratch, after a configuration reset if available.

Resources