creating a fake file that shows the output of a program - linux

I am trying to "stich" two programs together. The first program, which I can change as I want, generates an output with some data. The second program cannot be changed, and expects to read the data that is generated by the first program.
This second program exects a file, I cannot use a pipe. I don't want to regenerate the file every x seconds.
Is there a way on linux to create a "fake" file that fetches the first program output every time it's opened for reading? This would be transparent to the second program. Is it doable with fuse?

If you're using bash, you can use process substitution:
program2 <(program1)
If you're not using a shell with process substitution, you can use a named pipe.
mkfifo /tmp/pipe
program1 > /tmp/pipe &
program2 /tmp/pipe
Many programs that require a filename argument for their input also allow that filename to be -, which they interpret to mean standard input. This allows you to pipe to them:
program1 | program2 -

Related

How can I select() (ie, simultaneously read from) standard input *and* a file in bash?

I have a program that accepts input on one FIFO and emits output to another FIFO. I want to write a small script to control this program. The script needs to listen both to standard input (so I can input commands to adjust things in real time) and the program's output FIFO (so it can respond to events happening there as well).
Essentially my control program needs to select between standard input and a file (my FIFO).
I like learning how to figure out how to develop simple and elegant bash-based solutions to complex problems, and after a little headscratching I remembered that that tail -f will happily select on multiple files and tell you when one of them changes in real time, so I initially tried
tail -f <(od -An -vtd1 -w1) <(cat fifo)
to read both standard input (I'd previously run stty icanon min 1; this od invocation shows each stdin character on a separate line alongside its ASCII code, and is great for escape sequence parsing) and my FIFO. This failed epically (as does cat <(cat)): od gets run here as a backgrounded task, so it doesn't get access to the controlling TTY, and fails with a cryptic "I/O error" that was explained incredibly well here.
So now I'm a bit stumped. I realize that I can use any scripting language like Perl/Python/Ruby/Tcl to solve this; my compsci/engineering question is whether/how I might be able to solve this using (Linux) shell scripting.

Streaming split

I am trying to split the output of a program into smaller files. This is a long-running program that prints its output to stderr and I'd like to capture the logs in a series of smaller files rather than in one gigantic file. So what I have is:
program 2>&1 | split -l100 &
... but to my dismay I found that the split tool doesn't actually write any files out to disk until the input buffer ends. What I want is a streaming tool that automatically copies its input to the output files in a streaming manner without waiting until the source stream ends, which is unnecessary in my case. I've also tried the -u option of the split tool but it doesn't seem to work unless you choose the -n option but that option doesn't really apply in my case because the number of generated files could be arbitrarily high. Is there a Unix tool that might let me do this?
Barmar's suggestion to add a call to fflush() after every iteration in the awk script worked for me. This was preferable to me to calling close() on each file when it's done since that would only flush when each file is full, while I wanted a line-buffered behavior. I also had to configure the output pipe to be line-buffered, so the command in the end looks like this:
stdbuf -oL -eL command 2>&1 | awk -v number=1 '++i>1000 {++number; i=0} {print > "file" number; fflush("file" number)}'

Is there some way in Linux to create a "virtual file" that is the concatenation of two files?

I have two data sets that I want to run a program on. I want to compare the results to running the analysis on each set individually, and on the combined data. Since they're large files, I'd prefer not to create a whole new file that's the two data sets concatenated, doubling the disk space required. Is there some way in Linux I can create something like a symbolic link, but that points to two files, so that when it's read it will read two other files in sequence, as if they're concatenated into one file? I'm guessing probably not, but I thought I'd check.
Can your program read from standard input?
cat file-1 file-2 | yourprogram
If your program can only read from a file that is named on the command line, then this might work:
yourprogram <(cat file-1 file-2)
I think you need to be running the /bin/bash shell for the second example to work. The shell replaces <(foobar) with the name of a named pipe that your program can open and read like a file. The shell runs the foobar command in another process, and sends its output in to the other end of the pipe.

How to make gnu-parallel split multiple input files

I have a script which takes three arguments and is run like this:
myscript.sh input1.fa input2.fa out.txt
The script reads one line each from input1.fa and input2.fa, does some comparison, and writes the result to out.txt. The two inputs are required to have the same number of lines, and out.txt will also have the same number of lines after the script finishes.
Is it possible to parallelize this using GNU parallel?
I do not care that the output has a different order from the inputs, but I do need to compare the ith line of input1.fa with the ith line of input2.fa. Also, it is acceptable if I get multiple output files (like output{#}) instead of one -- I'll just cat them together.
I found this topic, but the answer wasn't quite what I wanted.
I know I can split the two input files and process them in parallel in pairs using xargs, but would like to do this in one line if possible...
If you can change myscript.sh, so it reads from a pipe and writes to a pipe you can do:
paste input1.fa input2.fa | parallel --pipe myscript.sh > out.txt
So in myscript you will need to read from STDIN and split on TAB to get the input from input1 and input2.

Time taken by `less` command to show output

I have a script that produces a lot of output. The script pauses for a few seconds at point T.
Now I am using the less command to analyze the output of the script.
So I execute ./script | less. I leave it running for sufficient time so that the script would have finished executing.
Now I go through the output of the less command by pressing Pg Down key. Surprisingly while scrolling at the point T of the output I notice the pause of few seconds again.
The script does not expect any input and would have definitely completed by the time I start analyzing the output of less.
Can someone explain how the pause of few seconds is noticable in the output of less when the script would have finished executing?
Your script is communicating with less via a pipe. Pipe is an in-memory stream of bytes that connects two endpoints: your script and the less program, the former writing output to it, the latter reading from it.
As pipes are in-memory, it would be not pleasant if they grew arbitrarily large. So, by default, there's a limit of data that can be inside the pipe (written, but not yet read) at any given moment. By default it's 64k on Linux. If the pipe is full, and your script tries to write to it, the write blocks. So your script isn't actually working, it stopped at some point when doing a write() call.
How to overcome this? Adjusting defaults is a bad option; what is used instead is allocating a buffer in the reader, so that it reads into the buffer, freeing the pipe and thus letting the writing program work, but shows to you (or handles) only a part of the output. less has such a buffer, and, by default, expands it automatically, However, it doesn't fill it in the background, it only fills it as you read the input.
So what would solve your problem is reading the file until the end (like you would normally press G), and then going back to the beginning (like you would normally press g). The thing is that you may specify these commands via command line like this:
./script | less +Gg
You should note, however, that you will have to wait until the whole script's output loads into memory, so you won't be able to view it at once. less is insufficiently sophisticated for that. But if that's what you really need (browsing the beginning of the output while the ./script is still computing its end), you might want to use a temporary file:
./script >x & less x ; rm x
The pipe is full at the OS level, so script blocks until less consumes some of it.
Flow control. Your script is effectively being paused while less is paging.
If you want to make sure that your command completes before you use less interactively, invoke less as less +G and it will read to the end of the input, you can then return to the start by typing 1G into less.
For some background information there's also a nice article by Alexander Sandler called "How less processes its input"!
http://www.alexonlinux.com/how-less-processes-its-input
Can I externally enforce line buffering on the script?
Is there an off the shelf pseudo tty utility I could use?
You may try to use the script command to turn on line-buffering output mode.
script -q /dev/null ./script | less # FreeBSD, Mac OS X
script -c "./script" /dev/null | less # Linux
For more alternatives in this respect please see: Turn off buffering in pipe.

Resources