How can I select() (ie, simultaneously read from) standard input *and* a file in bash? - linux

I have a program that accepts input on one FIFO and emits output to another FIFO. I want to write a small script to control this program. The script needs to listen both to standard input (so I can input commands to adjust things in real time) and the program's output FIFO (so it can respond to events happening there as well).
Essentially my control program needs to select between standard input and a file (my FIFO).
I like learning how to figure out how to develop simple and elegant bash-based solutions to complex problems, and after a little headscratching I remembered that that tail -f will happily select on multiple files and tell you when one of them changes in real time, so I initially tried
tail -f <(od -An -vtd1 -w1) <(cat fifo)
to read both standard input (I'd previously run stty icanon min 1; this od invocation shows each stdin character on a separate line alongside its ASCII code, and is great for escape sequence parsing) and my FIFO. This failed epically (as does cat <(cat)): od gets run here as a backgrounded task, so it doesn't get access to the controlling TTY, and fails with a cryptic "I/O error" that was explained incredibly well here.
So now I'm a bit stumped. I realize that I can use any scripting language like Perl/Python/Ruby/Tcl to solve this; my compsci/engineering question is whether/how I might be able to solve this using (Linux) shell scripting.

Related

Linux - read or collect file content faster (e.g. cpu temp every sec.)

I'm working on a system on which ubuntu is running. I'm reading basic data like CPU frequency and temperature out of the thermal zones provided in /sys/class/thermal.
Unfortunately, I've got around 100 thermal_zones from which I need to read the data. I do it with:
for SENSOR_NODE in /sys/class/thermal/thermal_zone*; do printf "%s: %s\n" $(cat ${SENSOR_NODE}/type) $(cat ${SENSOR_NODE}/temp); done
To collect all data takes ~2.5-3 sec. which is way to long.
Since I want to collect the data every second my question is, if there is a way to "read" or "collect" the data faster?
Thank you in advance
There's only so much you can do while writing your code in shell, but let's start with the basics.
Command substitutions, $(...), are expensive: They require creating a FIFO, fork()ing a new subprocess, connecting the FIFO to that subprocess's stdout, reading from the FIFO and waiting for the commands running in that subshell to exit.
External commands, like cat, are expensive: They require linking and loading a separate executable; and when you run them without exec (in which case they inherit and consume the shell's process ID), they also require a new process to be fork()ed off.
All POSIX-compliant shells give you a read command:
for sensor_node in /sys/class/thermal/thermal_zone*; do
read -r sensor_type <"$sensor_node/type" || continue
read -r sensor_temp <"$sensor_node/temp" || continue
printf '%s: %s\n' "$sensor_type" "$sensor_temp"
done
...which lets you avoid the command substitution overhead and the overhead of cat. However, read reads content only one byte at a time; so while you're not paying that overhead, it's still relatively slow.
If you switch from /bin/sh to bash, you get a faster alternative:
for sensor_node in /sys/class/thermal/thermal_zone*; do
printf '%s: %s\n' "$(<"$sensor_node/type)" "$(<"$sensor_node/temp")"
done
...as $(<file) doesn't need to do the one-byte-at-a-time reads that read does. That's only faster for being bash, though; it doesn't mean it's actually fast. There's a reason modern production monitoring systems are typically written in Go or with a JavaScript runtime like Node.

Program based on pipe in Linux

Write a program for this------>>>>>>
One program will open a pipe, write a number to pipe.
Other program will open the same pipe, will read the number and print them.
Close both the pipes.
how can i write a program based on this any one knows it then please help me...!!!!
I think what you're looking for is:
echo <number you want to use> (or output from program) | <program you want to pipe into>
For instance:
echo 5 | more
which will simply display:
5
The "|" is your pipe; it redirects output from the left to the right connecting their standard streams, which usually does not include stderr.
Hope that helps.
A pipe is probably the simplest IPC solution under Linux; so talking about a pipe i like talking about a specific interprocess communication solution.
The IPC lives in the kernel space and it's managed by the kernel itself, works in 1 direction and only between a caller and a callee, it's unidirectional.
for more you should just read a good article about the pipes and the IPC under Linux, you will found a gozillion of articles on the internet, for a short example you can go here.

Time taken by `less` command to show output

I have a script that produces a lot of output. The script pauses for a few seconds at point T.
Now I am using the less command to analyze the output of the script.
So I execute ./script | less. I leave it running for sufficient time so that the script would have finished executing.
Now I go through the output of the less command by pressing Pg Down key. Surprisingly while scrolling at the point T of the output I notice the pause of few seconds again.
The script does not expect any input and would have definitely completed by the time I start analyzing the output of less.
Can someone explain how the pause of few seconds is noticable in the output of less when the script would have finished executing?
Your script is communicating with less via a pipe. Pipe is an in-memory stream of bytes that connects two endpoints: your script and the less program, the former writing output to it, the latter reading from it.
As pipes are in-memory, it would be not pleasant if they grew arbitrarily large. So, by default, there's a limit of data that can be inside the pipe (written, but not yet read) at any given moment. By default it's 64k on Linux. If the pipe is full, and your script tries to write to it, the write blocks. So your script isn't actually working, it stopped at some point when doing a write() call.
How to overcome this? Adjusting defaults is a bad option; what is used instead is allocating a buffer in the reader, so that it reads into the buffer, freeing the pipe and thus letting the writing program work, but shows to you (or handles) only a part of the output. less has such a buffer, and, by default, expands it automatically, However, it doesn't fill it in the background, it only fills it as you read the input.
So what would solve your problem is reading the file until the end (like you would normally press G), and then going back to the beginning (like you would normally press g). The thing is that you may specify these commands via command line like this:
./script | less +Gg
You should note, however, that you will have to wait until the whole script's output loads into memory, so you won't be able to view it at once. less is insufficiently sophisticated for that. But if that's what you really need (browsing the beginning of the output while the ./script is still computing its end), you might want to use a temporary file:
./script >x & less x ; rm x
The pipe is full at the OS level, so script blocks until less consumes some of it.
Flow control. Your script is effectively being paused while less is paging.
If you want to make sure that your command completes before you use less interactively, invoke less as less +G and it will read to the end of the input, you can then return to the start by typing 1G into less.
For some background information there's also a nice article by Alexander Sandler called "How less processes its input"!
http://www.alexonlinux.com/how-less-processes-its-input
Can I externally enforce line buffering on the script?
Is there an off the shelf pseudo tty utility I could use?
You may try to use the script command to turn on line-buffering output mode.
script -q /dev/null ./script | less # FreeBSD, Mac OS X
script -c "./script" /dev/null | less # Linux
For more alternatives in this respect please see: Turn off buffering in pipe.

Linux - Program Design for Debug - Print STDOUT streams from several programs

Let's say I have 10 programs (in terminals) working in tandem: {p1,p2,p3,...,p10}.
It's hard to keep track of all STDOUT debug statements in their respective terminal. I plan to create a GUI to keep track of each STDOUT such that, if I do:
-- Click on p1 would "tail" program 1's output.
-- Click on p3 would "tail" program 4's output.
It's a decent approach but there may be better ideas out there? It's just overwhelming to have 10 terminals; I'd rather have 1 super terminal that keeps track of this.
And unfortunately, linux "screen" is not an option. RESTRICTIONS: I only have the ability to either: redirect STDOUT to a file. (or read directly from STDOUT).
If you are looking for a creative alternative, I would suggest that you look at sockets.
If each program writes to the socket (rather than STDOUT), then your master terminal can act as a server and organize the output.
Now from what you described, it seems as though you are relatively constrained to STDOUT, however it could be possible to do something like this:
# (use netcat (or nc on some systems) to write to a socket on the provided port)
./prog1 | netcat localhost 12312
I'm not sure if this fits in the requirements of what you are doing (and it might be more effort than it is worth!), but it could provide a very stable solution.
EDIT: As was pointed out in the comments, netcat does exactly what you would need to make this work.

Confused about stdin, stdout and stderr?

I am rather confused with the purpose of these three files. If my understanding is correct, stdin is the file in which a program writes into its requests to run a task in the process, stdout is the file into which the kernel writes its output and the process requesting it accesses the information from, and stderr is the file into which all the exceptions are entered. On opening these files to check whether these actually do occur, I found nothing seem to suggest so!
What I would want to know is what exactly is the purpose of these files, absolutely dumbed down answer with very little tech jargon!
Standard input - this is the file handle that your process reads to get information from you.
Standard output - your process writes conventional output to this file handle.
Standard error - your process writes diagnostic output to this file handle.
That's about as dumbed-down as I can make it :-)
Of course, that's mostly by convention. There's nothing stopping you from writing your diagnostic information to standard output if you wish. You can even close the three file handles totally and open your own files for I/O.
When your process starts, it should already have these handles open and it can just read from and/or write to them.
By default, they're probably connected to your terminal device (e.g., /dev/tty) but shells will allow you to set up connections between these handles and specific files and/or devices (or even pipelines to other processes) before your process starts (some of the manipulations possible are rather clever).
An example being:
my_prog <inputfile 2>errorfile | grep XYZ
which will:
create a process for my_prog.
open inputfile as your standard input (file handle 0).
open errorfile as your standard error (file handle 2).
create another process for grep.
attach the standard output of my_prog to the standard input of grep.
Re your comment:
When I open these files in /dev folder, how come I never get to see the output of a process running?
It's because they're not normal files. While UNIX presents everything as a file in a file system somewhere, that doesn't make it so at the lowest levels. Most files in the /dev hierarchy are either character or block devices, effectively a device driver. They don't have a size but they do have a major and minor device number.
When you open them, you're connected to the device driver rather than a physical file, and the device driver is smart enough to know that separate processes should be handled separately.
The same is true for the Linux /proc filesystem. Those aren't real files, just tightly controlled gateways to kernel information.
It would be more correct to say that stdin, stdout, and stderr are "I/O streams" rather
than files. As you've noticed, these entities do not live in the filesystem. But the
Unix philosophy, as far as I/O is concerned, is "everything is a file". In practice,
that really means that you can use the same library functions and interfaces (printf,
scanf, read, write, select, etc.) without worrying about whether the I/O stream
is connected to a keyboard, a disk file, a socket, a pipe, or some other I/O abstraction.
Most programs need to read input, write output, and log errors, so stdin, stdout,
and stderr are predefined for you, as a programming convenience. This is only
a convention, and is not enforced by the operating system.
As a complement of the answers above, here is a sum up about Redirections:
EDIT: This graphic is not entirely correct.
The first example does not use stdin at all, it's passing "hello" as an argument to the echo command.
The graphic also says 2>&1 has the same effect as &> however
ls Documents ABC > dirlist 2>&1
#does not give the same output as
ls Documents ABC > dirlist &>
This is because &> requires a file to redirect to, and 2>&1 is simply sending stderr into stdout
I'm afraid your understanding is completely backwards. :)
Think of "standard in", "standard out", and "standard error" from the program's perspective, not from the kernel's perspective.
When a program needs to print output, it normally prints to "standard out". A program typically prints output to standard out with printf, which prints ONLY to standard out.
When a program needs to print error information (not necessarily exceptions, those are a programming-language construct, imposed at a much higher level), it normally prints to "standard error". It normally does so with fprintf, which accepts a file stream to use when printing. The file stream could be any file opened for writing: standard out, standard error, or any other file that has been opened with fopen or fdopen.
"standard in" is used when the file needs to read input, using fread or fgets, or getchar.
Any of these files can be easily redirected from the shell, like this:
cat /etc/passwd > /tmp/out # redirect cat's standard out to /tmp/foo
cat /nonexistant 2> /tmp/err # redirect cat's standard error to /tmp/error
cat < /etc/passwd # redirect cat's standard input to /etc/passwd
Or, the whole enchilada:
cat < /etc/passwd > /tmp/out 2> /tmp/err
There are two important caveats: First, "standard in", "standard out", and "standard error" are just a convention. They are a very strong convention, but it's all just an agreement that it is very nice to be able to run programs like this: grep echo /etc/services | awk '{print $2;}' | sort and have the standard outputs of each program hooked into the standard input of the next program in the pipeline.
Second, I've given the standard ISO C functions for working with file streams (FILE * objects) -- at the kernel level, it is all file descriptors (int references to the file table) and much lower-level operations like read and write, which do not do the happy buffering of the ISO C functions. I figured to keep it simple and use the easier functions, but I thought all the same you should know the alternatives. :)
I think people saying stderr should be used only for error messages is misleading.
It should also be used for informative messages that are meant for the user running the command and not for any potential downstream consumers of the data (i.e. if you run a shell pipe chaining several commands you do not want informative messages like "getting item 30 of 42424" to appear on stdout as they will confuse the consumer, but you might still want the user to see them.
See this for historical rationale:
"All programs placed diagnostics on the standard output. This had
always caused trouble when the output was redirected into a file, but
became intolerable when the output was sent to an unsuspecting
process. Nevertheless, unwilling to violate the simplicity of the
standard-input-standard-output model, people tolerated this state of
affairs through v6. Shortly thereafter Dennis Ritchie cut the Gordian
knot by introducing the standard error file. That was not quite enough.
With pipelines diagnostics could come from any of several programs
running simultaneously. Diagnostics needed to identify themselves."
stdin
Reads input through the console (e.g. Keyboard input).
Used in C with scanf
scanf(<formatstring>,<pointer to storage> ...);
stdout
Produces output to the console.
Used in C with printf
printf(<string>, <values to print> ...);
stderr
Produces 'error' output to the console.
Used in C with fprintf
fprintf(stderr, <string>, <values to print> ...);
Redirection
The source for stdin can be redirected. For example, instead of coming from keyboard input, it can come from a file (echo < file.txt ), or another program ( ps | grep <userid>).
The destinations for stdout, stderr can also be redirected. For example stdout can be redirected to a file: ls . > ls-output.txt, in this case the output is written to the file ls-output.txt. Stderr can be redirected with 2>.
Using ps -aux reveals current processes, all of which are listed in /proc/ as /proc/(pid)/, by calling cat /proc/(pid)/fd/0 it prints anything that is found in the standard output of that process I think. So perhaps,
/proc/(pid)/fd/0 - Standard Output File
/proc/(pid)/fd/1 - Standard Input File
/proc/(pid)/fd/2 - Standard Error File
for example
But only worked this well for /bin/bash other processes generally had nothing in 0 but many had errors written in 2
For authoritative information about these files, check out the man pages, run the command on your terminal.
$ man stdout
But for a simple answer, each file is for:
stdout for a stream out
stdin for a stream input
stderr for printing errors or log messages.
Each unix program has each one of those streams.
stderr will not do IO Cache buffering so if our application need to print critical message info (some errors ,exceptions) to console or to file use it where as use stdout to print general log info as it use IO Cache buffering there is a chance that before writing our messages to file application may close ,leaving debugging complex
A file with associated buffering is called a stream and is declared to be a pointer to a defined type FILE. The fopen() function creates certain descriptive data for a stream and returns a pointer to designate the stream in all further transactions. Normally there are three open streams with constant pointers declared in the header and associated with the standard open files.
At program startup three streams are predefined and need not be opened explicitly: standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output). When opened the standard error stream is not fully buffered; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device
https://www.mkssoftware.com/docs/man5/stdio.5.asp
Here is a lengthy article on stdin, stdout and stderr:
What Are stdin, stdout, and stderr on Linux?
To summarize:
Streams Are Handled Like Files
Streams in Linux—like almost everything else—are treated as though
they were files. You can read text from a file, and you can write text
into a file. Both of these actions involve a stream of data. So the
concept of handling a stream of data as a file isn’t that much of a
stretch.
Each file associated with a process is allocated a unique number to
identify it. This is known as the file descriptor. Whenever an action
is required to be performed on a file, the file descriptor is used to
identify the file.
These values are always used for stdin, stdout, and stderr:
0: stdin
1: stdout
2: stderr
Ironically I found this question on stack overflow and the article above because I was searching for information on abnormal / non-standard streams. So my search continues.

Resources