Logging RS232 on Linux without waiting for newline - linux

I was trying to log data from RS232 into a file with cat:
cat /dev/ttyS0 > rs232.log
The result was that I had everything in my file except for the last line.
By printing to stdout, I was able to discover, that cat only writes the output if it gets a newline character ('\n'). I discovered the same with:
dd bs=1 if=/dev/ttyS0 of=rs232.log
After reading How can I print text immediately without waiting for a newline in Perl? I was starting to think, if this could be a buffering problem of either the Linux-Kernel or the coreutils package.
According to TJD's comment, I wrote my own program in C but still had the same problems:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* args[])
{
char buffer;
FILE* serial;
serial = fopen(args[1],"r");
while(1)
{
buffer = fgetc(serial);
printf("%c",buffer);
}
}
As of the results of my own C-Code this seems to be a Linux-Kernel related issue.

You're opening a TTY. When that TTY is in cooked (aka canonical) mode, it performs line processing (e.g. backspace removes the previous character from the buffer). You'll want to put the TTY into raw mode in order to get every single byte when it arrives instead of waiting for the end of line.
From the man page:
Canonical and noncanonical mode
The setting of the ICANON canon flag in c_lflag determines whether
the terminal is operating in canonical mode (ICANON set) or
noncanonical mode (ICANON unset). By default, ICANON set.
In canonical mode:
Input is made available line by line. An input line is available
when one
of the line delimiters is typed (NL, EOL, EOL2; or EOF at the start of
line). Except in the case of EOF, the line delimiter is included in the
buffer returned by read(2).
Line editing is enabled (ERASE, KILL; and if the IEXTEN flag is
set: WERASE,
REPRINT, LNEXT). A read(2) returns at most one line of input; if the
read(2) requested fewer bytes than are available in the current line of
input, then only as many bytes as requested are read, and the remaining
characters will be available for a future read(2).
In noncanonical mode input is available immediately (without the
user having to type a line-delimiter character), and line editing
is disabled.
The simplest thing to do is just to call cfmakeraw.

Does this work?
perl -e 'open(IN, "/dev/ttyS0") || die; while (sysread(IN, $c, 1)) { print "$c" }'
This DOES work:
$ echo -n ccc|perl -e 'while (sysread(STDIN, $c, 1)) { print "$c" } '
ccc$

Related

Understanding file descriptor duplication in bash

I'm having a hard time understanding something about redirections in bash.
I'll start with what I know:
Each process has file descriptors opened which it can write to/read from. These file descriptors may represent files on disk, terminals, devices, etc.
When we start teminal with bash, we have file stdin (0) stdout (1) and stderr (2) opened, pointing to the terminal. Whenever we run a command (a new process), that process inherits the file descriptors of its parent (bash), so by default, it will print stdout and stderr messages to the terminal, and read from terminal also.
When we redirect, for example:
$ ls 1>filelist
We're actually changing file descriptor 1 of the ls process, to point to the filelist file, instead of the terminal. So when ls will write(1, ...) it will go to the file.
So to sum it up, a redirection is basically changing the file to which the file descriptor to which the program writes/reads to/from refers to.
Now, let's say I have the following C program:
#include <stdio.h>
#include <fcntl.h>
int main()
{
int fd = 0;
fd = open("info.log", O_CREAT | O_RDWR);
printf("%d", fd);
write(fd, "INFO::", 6);
return 0;
}
This program opens a file info.log, which is referred to by a file descriptor (usually 3).
Indeed, if I now compile this program and run it:
$ ./app
3
It creates the file info.log which contains the "INFO::" text in it.
But here's what I don't get: according to the logic described above, if I now redirect FD 3 to another file:
$ ./app 3> another_file
The text should be written to this other file, but for some reason, it doesn't.
Can someone explain?
Hint: when you run ./app 3> another_file, it'll print "4" instead of "3".
More detailed explanation: when you run ./app 3> another_file in the shell, a series of things happens:
The shell fork()s a subprocess that'll run ./app. The subprocess is basically a clone of its parent process so, it'll still be running the shell program.
In that subprocess, the shell opens "another_file" on file descriptor #3 for writing.
Then it uses one of the execl() family of calls to execute the ./app binary (with "another_file" still open on FD#3).
The program runs open("info.log", O_CREAT | O_RDWR), which creates "info.log" and opens it on the next available file descriptor. Since FD#3 is already in use, that's FD#4.
The program writes "INFO::" to FD#4, which is "info.log".
Since open() uses a new FD, it's not really affected by any active redirects. And actually, if the program did open something on FD#3, that'd replace the connection to "another_file" with whatever it had opened instead, essentially overriding the redirect.
If the program wanted to use the redirect, it'd have to write to FD#3 without first opening anything on it. This is what's normally done with FD#1 and 2 (standard output and error), and that's why redirecting those works.

sending non-printable characters in expect script

I found that when I use certain bytes as input to a program in an expect script, then there is an automatic conversion to multibytes when a byte is above 0x7f. For example the following line in the script:
spawn ./myprog [exec perl -e { print "\x7f\x80" }]
sends actually three instead of two bytes to myprog: 0x7f 0xc2 0x80
myprog is a simple test program that prints the input it gets:
int main(int argc, char** argv) {
int i;
for (i=0;i<strlen(argv[1]);i++) {
printf("%x\n", (unsigned char)argv[1][i]);
}
I understand that 0x7f is the magic boundary to unicode-related encodings, but how can I just send a byte like 0x80 to my program? In the expect script I already tried conversions like [encoding convertto iso8859-1 [exec perl ...]] described in https://www.tcl.tk/doc/howto/i18n.html, but nothing works.
On the other hand, when I do the identical thing on the command line, e.g.:
./myprog `perl -e 'print "\x7f\x80"'`
I do get only two bytes - as expected (the differing {} compared to the expect script line is tcl's replacement of '').
How can I force the same behavior in an expect script?
After some more experimentation, I found that the only way to do that is to have the argument handover outside the expect logic, e.g.:
set input [binary format H* 7f80]
exec echo "$input" > input.dat
spawn sh -c "./myprog `cat input.dat`"
Note that using ${...} instead of backticks does not seem to work easily due to the special meaning of $ to expect.
Of course, having spawned a shell instead of the process directly is not the same thing but that does not matter for most of my use cases.

Bash: Can't append a string to an existing string - it appears to overwrite the start of that string instead [duplicate]

This question already has answers here:
Are shell scripts sensitive to encoding and line endings?
(14 answers)
Closed 8 months ago.
This is a script that helps us in creating a dhcpd.conf file.
sample inputs (ex. mac-tab-;-tab-IP)
DC:D3:21:75:61:90 ; 10.25.131.17
;
expected outputs
Host 27-48 { hardware ethernet DC:D3:21:75:61:90 ; fixed-address 10.25.131.17 ; }
#host 27-48 { hardware ethernet ; fixed-address ; }
Currently the line outputted is this:
Host 27-48 { hardware ethernet 00:16:6B:C8:3D:C9 ; fixed-address 10.25.129.185
Specific line in code I'm stuck on
outputLine="Host $((names[i]))-$((startingNumber+counter)) { hardware ethernet $first ; fixed-address $second"
If I try adding ; }
outputLine="Host $((names[i]))-$((startingNumber+counter)) { hardware ethernet $first ; fixed-address $second ; }"
I get this:
; } 27-48 { hardware ethernet 00:16:6B:C8:3D:C9 ; fixed-address 10.25.129.185
The issue is, whenever I append " ; }" to the end of the above line, it overwrites the beginning of the line. I've tried a few tricks to work around it, such as writing the above line to a string, and then trying to append to the string, but the same issue occurs. I had an idea to export the entire contents to a file, and re-reload the file into an array just so I can append, but it seems a little overkill.
for ((j=1; j<=${sizes[i]}; j++ )); do
#split line, read split as two entries for an arrIN
IN=(${line[counter+1]})
arrIN=(${IN//;/ })
first="${arrIN[0]}"
second=${arrIN[1]}
if [ ${lineSize[counter+1]} -gt 5 ]
then
#sed 's/$/ ; }/' $outputLine > newoutputLine
outputLine="Host $((names[i]))-$((startingNumber+counter)) { hardware ethernet $first ; fixed-address $second"
echo $outputLine
else
echo "#host $((names[i])) $((startingNumber+counter)) { hardware ethernet ; fixed-address ; }"
fi
counter=$((counter+1))
done
As ruakh explains in a comment on the question, the problem was a CR (0xD, \r) character at the end of the value of variable $second, which can be removed with the following parameter expansion: second="${second//$'\r'/}".
The rest of this answer explains the symptom and provides background information.
The issue is, whenever I append " ; }" to the end of the above line, it overwrites the beginning of the line.
"overwriting the beginning of a line" almost always points to an embedded CR (0xD, \r) character, which, when a string is printed to the terminal, gives the appearance of overwriting the start of the line:
$ printf 'abc\rd' # `printf '\r'` produces a CR
dbc # It LOOKS LIKE 'd' replaced 'a', but that's an artifact of printing to the terminal.
It is only because the terminal interprets the CR (\r) as "place the cursor at the start of the line" that d - the remaining string after \r - appears to "overwrite" the start of the already-printed part of the string.
You can visualize embedded CRs (and other control characters) with cat -et, which represents them in caret notation as ^M:
$ printf 'abc\rd' | cat -et
abc^Md # ^M represents a CR (\r)
As you can see, the d didn't actually overwrite a, the start of the string.
CR (\r) instances are rarely used in the world of Unix text processing. If they do appear as part of text data, it is usually from Windows-originated sources that use CRLF (\r\n) line endings rather than the Unix LF-only (\n) line endings.
Often, the simplest approach is to simply remove the CR (\r) instances before using the data with Unix utilities.
There are many, many existing answers that cover this territory, often recommending use of third-party utility dos2unix, easily installable via the package managers of many platforms (e.g., sudo apt install dos2unix on Ubunutu, or brew install dos2unix on macOS, via Homebrew).
Alternatively,
for text already stored in a variable, use ruakh's approach based on parameter expansion; e.g., second="${second//$'\r'/}"
for files, solutions using standard utilities can act as a makeshift dos2unix.

Writing Block buffered data to a File without fflush(stdout)

From what I understood about buffers: a buffer is a temporarily stored data.
For example: let's assume that you wanted to implement an algorithm for determining whether something is speech or just noise. How would you do this using a constant stream flow of sound data? It would be very difficult. Therefore, by storing this into an array you can perform analysis on this data.
This array of data is called a buffer.
Now, I have a Linux command where the output is continuous:
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/SUF/ {print $3,$4,$5,$6,$10,$11,substr($2,1,2),".",substr($2,3,2),".",substr($2,5,2)}' < /dev/ttyUSB0
If I were to write the output of this command to a file, I won't be able to write it, because the output is probably block buffered and only an empty text file will be generated when I terminate the output of the above command (CTRL+C).
Here is what i mean by Block Buffered.
The three types of buffering available are unbuffered, block
buffered, and line buffered. When an output stream is unbuffered,
information appears on the destination file or terminal as soon as
written; when it is block buffered many characters are saved up and
written as a block; when it is line buffered characters are saved
up until a newline is output or input is read from any stream
attached to a terminal device (typically stdin). The function
fflush(3) may be used to force the block out early. (See
fclose(3).) Normally all files are block buffered. When the first
I/O operation occurs on a file, malloc(3) is called, and a buffer
is obtained. If a stream refers to a terminal (as stdout normally
does) it is line buffered. The standard error stream stderr is
always unbuffered by default.
Now, executing this command,
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/SUF/ {print $3,$4,$5,$6,$10,$11,substr($2,1,2),".",substr($2,3,2),".",substr($2,5,2)}' < /dev/ttyUSB0 > outputfile.txt
An empty file will be generated because the buffer block might have not been completed when I terminated the process, and since i don't know the block buffer size, there is no way to wait for the block is complete.
In order to write the output of this command to a file I have to use fflush() inside awk, which would successfully write the output into the text file, which I have already done successfully.
Here it goes:
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/GGA/ {print "Latitude:",$3,$4,"Longitude:",$5,$6,"Altitude:",$10,$11,"Time:",substr($2+50000,1,2),".",substr($2,3,2),".",substr($2,5,2); fflush(stdout) }' < /dev/ttyUSB0 | head -n 2 > GPS_data.txt
But my question is:
Is there any way to declare the buffer block size so that I would know when the buffer block in generated, so eliminating the need of using fflush()?
OR
Is there anyway to change buffer type from Block buffered to unbuffered or line buffered ?
You can use stdbuf to run a command with a modified buffer size.
For example, stdbuf -o 100 awk ... will run awk with a 100 byte standard output buffer.

Linux terminal input: reading user input from terminal truncating lines at 4095 character limit

In a bash script, I try to read lines from standard input, using built-in read command after setting IFS=$'\n'. The lines are truncated at 4095 character limit if I paste input to the read. This limitation seems to come from reading from terminal, because this worked perfectly fine:
fill=
for i in $(seq 1 94); do fill="${fill}x"; done
for i in $(seq 1 100); do printf "%04d00$fill" $i; done | (read line; echo $line)
I experience the same behavior with Python script (did not accept longer than 4095 input from terminal, but accepted from pipe):
#!/usr/bin/python
from sys import stdin
line = stdin.readline()
print('%s' % line)
Even C program works the same, using read(2):
#include <stdio.h>
#include <unistd.h>
int main(void)
{
char buf[32768];
int sz = read(0, buf, sizeof(buf) - 1);
buf[sz] = '\0';
printf("READ LINE: [%s]\n", buf);
return 0;
}
In all cases, I cannot enter longer than about 4095 characters. The input prompt stops accepting characters.
Question-1: Is there a way to interactively read from terminal longer than 4095 characters in Linux systems (at least Ubuntu 10.04 and 13.04)?
Question-2: Where does this limitation come from?
Systems affected: I noticed this limitation in Ubuntu 10.04/x86 and 13.04/x86, but Cygwin (recent version at least) does not truncate yet at over 10000 characters (did not test further since I need to get this script working in Ubuntu). Terminals used: Virtual Console and KDE konsole (Ubuntu 13.04) and gnome-terminal (Ubuntu 10.04).
Please refer to termios(3) manual page, under section "Canonical and noncanonical mode".
Typically, the terminal (standard input) is in canonical mode; in this mode the kernel will buffer the input line before returning the input to the application. The hard-coded limit for Linux (N_TTY_BUF_SIZE defined in ${linux_source_path}/include/linux/tty.h) is set to 4096 allowing input of 4095 characters not counting the ending new line. You can also have a look at file ${linux_source_path}/drivers/tty/n_tty.c, function n_tty_receive_buf_common() and the comment above that.
In noncanonical mode there will by default be no buffering by kernel and the read(2) system call returns immediately once a single character of input is returned (key is pressed). You can manipulate the terminal settings to read a specified amount of characters or set a time-out for non-canonical mode, but then too the hard-coded limit is 4095 per the termios(3) manual page (and the comment above the above mentioned n_tty_receive_buf_common()).
Bash read builtin command still works in non-canonical mode as can be demonstrated by the following:
IFS=$'\n' # Allow spaces and other white spaces.
stty -icanon # Disable canonical mode.
read line # Now we can read without inhibitions set by terminal.
stty icanon # Re-enable canonical mode (assuming it was enabled to begin with).
After this modification of adding stty -icanon you can paste longer than 4096 character string and read it successfully using bash built-in read command (I successfully tried longer than 10000 characters).
If you put this in a file, i.e. make it a script, you can use strace to see the system calls called, and you will see read(2) called multiple times, each time returning a single character when you type input to it.
I do not have a workaround for you, but I can answer question 2.
In linux PIPE_BUF is set to 4096 (in limits.h) If you do a write of more than 4096 to a pipe it will be truncated.
From /usr/include/linux/limits.h:
#ifndef _LINUX_LIMITS_H
#define _LINUX_LIMITS_H
#define NR_OPEN 1024
#define NGROUPS_MAX 65536 /* supplemental group IDs are available */
#define ARG_MAX 131072 /* # bytes of args + environ for exec() */
#define LINK_MAX 127 /* # links a file may have */
#define MAX_CANON 255 /* size of the canonical input queue */
#define MAX_INPUT 255 /* size of the type-ahead buffer */
#define NAME_MAX 255 /* # chars in a file name */
#define PATH_MAX 4096 /* # chars in a path name including nul */
#define PIPE_BUF 4096 /* # bytes in atomic write to a pipe */
#define XATTR_NAME_MAX 255 /* # chars in an extended attribute name */
#define XATTR_SIZE_MAX 65536 /* size of an extended attribute value (64k) */
#define XATTR_LIST_MAX 65536 /* size of extended attribute namelist (64k) */
#define RTSIG_MAX 32
#endif
The problem is definitely not the read() ; as it can read upto any valid integer value. The problem comes from the heap memory or the pipe size.. as they are the only possible limiting factors to the size..

Resources