how to avoid deadlock with parallel named pipes? - linux

I am working on a flow-based programming system called net2sh. It is currently based on shell tools connected by named pipes. Several processes work together to get the job done, communicating over named pipes, not unlike a production line in a factory.
In general it is working delightfully well, however there is one major problem with it. In the case where processes are communicating over two or more named pipes, the "sending" process and "receiving" process must open the pipes in the same order. This is because when a process opens a named pipe, it blocks until the other end has also been opened.
I want a way to avoid this, without spawning extra "helper" processes for each pipe, without having to hack existing components, and without having to mess with the program networks to avoid this problem.
Ideally I am looking for some "non-blocking fifo" option, where "open" on a fifo always succeeds immediately but subsequent operations may block if the pipe buffer is full (or empty, for read)... I'd even consider using a kernel patch to that effect. According to fifo(7) O_NONBLOCK does do something different when opening fifos, not what I want exactly, and in order to use that I would have to rewrite every existing shell tool such as cat.
Here is a minimal example which deadlocks:
mkfifo a b
(> a; > b; ) &
(< b; < a; ) &
wait
If you can help me to solve this sensibly I will be very grateful!

There is a good description of using O_NONBLOCK with named pipes here: How do I perform a non-blocking fopen on a named pipe (mkfifo)?
It sounds like you want it to work in your entire environment without changing any C code. Therefore, one approach would be to set LD_PRELOAD to some shared library which contains a wrapper for open(2) which adds O_NONBLOCK to the flags whenever pathname refers to a named pipe.
A concise example of using LD_PRELOAD to override a library function is here: https://www.technovelty.org/c/using-ld_preload-to-override-a-function.html
Whether this actually works in practice without breaking anything else, you'll have to find out for yourself (please let us know!).

Related

std::process, with stdin and stdout from buffers

I have a command cmd and three Vec<u8>: buff1, buff2, and buff3.
I want to execute cmd, using buff1 as stdin, and capturing stdout into buff2 and stderr into buff3.
And I'd like to do all this without explicitly writing any temporary files.
std::process seems to allow all of those things, just not all at the same time.
If I use Command::new(cmd).output() it will return the buffers for stdout and stderr, but there's no way to give it stdin.
If I use Command::new(cmd).stdin(Stdio::piped()).spawn()
then I can child.stdin.as_mut().unwrap().write_all(buff1)
but I can't capture stdout and stderr.
As far as I can tell, there's no way to call Command::new(cmd).stdout(XXX) to explicitly tell it to capture stdout in a buffer, the way it does by default with .output().
It seems like something like this should be possible:
Command::new(cmd)
.stdin(buff1)
.stdout(buff2)
.stderr(buff3)
.output()
Since Rust makes a Vec<u8> look like a File, but Vec doesn't implement Into<Stdio>
Am I missing something? Is there a way to do this, or do I need to read and write with actual files?
If you're ok with using an external library, the subprocess crate supports this use case:
let (buff2, buff3) = subprocess::Exec::cmd(cmd)
.stdin(buff1)
.communicate()?
.read()?;
Doing this with std::process::Command is trickier than it seems because the OS doesn't make it easy to connect a region of memory to a subprocess's stdin. It's easy to connect a file or anything file-like, but to feed a chunk of memory to a subprocess, you basically have to write() in a loop. While Vec<u8> does implement std::io::Read, you can't use it to construct an actual File (or anything else that contains a file descriptor/handle).
Feeding data into a subprocess while at the same time reading its output is sometimes referred to as communicating in reference to the Python method introduced in 2004 with the then-new subprocess module of Python 2.4. You can implement it yourself using std::process, but you need to be careful to avoid deadlock in case the command generates output while you are trying to feed it input. (E.g. a naive loop that feeds a chunk of data to the subprocess and then reads its stdout and stderr will be prone to such deadlocks.) The documentation describes a possible approach to implement it safely using just the standard library.
If you want to read and write with buffers, you need to use the piped forms. The reason is that, at least on Unix, input and output to a process are done through file descriptors. Since a buffer cannot intrinsically be turned into a file descriptor, it's required to use a pipe and both read and write incrementally. The fact that Rust provides an abstraction for buffers doesn't allow you to avoid the fact that the operating system doesn't, and Rust doesn't abstract this for you.
However, since you'll be using pipes for both reading and writing, you'll need to use something like select so you don't deadlock. Otherwise, you could end up trying to write when your subprocess was not accepting new input because it needed data to be read from its standard output. Using select or poll (or similar) permits you to determine when each of those file descriptors are ready to be read or written to. In Rust, these functions are in the libc crate; I don't believe that Rust provides them natively. Windows will have some similar functionality, but I have no clue what it is.
It should be noted that unless you are certain that the subprocess's output can fit into memory, it may be better to process it in a more incremental way. Since you're going to be using select, that shouldn't be too difficult.

Get triggered by GPIO state change in bash

I have a GPIO pin, that value of which is represented in the sysfs node /sys/class/gpio/gpioXXXX/value) and I want to detect a change to the value of this GPIO pin. According to the sysfs documentation you should use poll(2) or select(2) for this.
However, both poll and message only seems to be available as a system calls and not from bash. Is there some way to use to get triggered by a state change of the GPIO pin functionality from a bash script?
My intention is to not have (semi-)busy waiting or userland polling. I would also like to simply do this from bash without having to dip into another language. I don't plan to stick with bash throughout the project, but I do want to use it for this very first version. Writing a simple C program to be called from bash for just this is a possibility, but before doing that, I would like to know if I'm not missing something.
Yes, you'll need a C or Python helper -- and you might think about abandoning bash for this project entirely.
See this gist for an implementation of such a helper (named "wfi", "watch-for-interrupt", modified from a Raspberry Pi StackExchange question's answer.
That said:
If you want to (semi-)efficiently have a shell script monitor for a GPIO signal change, you'll want to have a C helper that uses poll() and writes to stdout whenever a noteworthy change occurs. Given that, you can then write a shell loop akin to the following:
while IFS= read -r event; do
echo "Processing $event"
done < <(wfi /sys/class/gpio/gpioXXXX/value)
Using process substitution in this way ensures that the startup cost for your monitor-gpio-signal helper is paid only once. Note some caveats:
Particularly if anything inside the body of your loop calls an external command (rather than relying on shell builtins alone), this is still going to be much slower than using a program written in C, Go or even an otherwise-relatively-slow language such as Python.
If the shell script isn't ready to receive a write, that write may block for an indefinite amount of time. A tool such as pv may be useful to add a buffer to your pipeline:
done < <(wfi "name" | pv -q -B 1M)
...for instance, will establish a 1MB buffer.

Can I override a system function before calling fork?

I'd like to be able to intercept filenames with a certain prefix from any of my child processes that I launch. This would be names like "pipe://pipe_name". I think wrapping the open() system call would be a good way to do this for my application, but I'd like to do it without having to compile a separate shared library and hooking it with the LD_PRELOAD trick (or using FUSE and having to have a mounted directory)
I'll be forking the processes myself, is there a way to redirect open() to my own function before forking and have it persist in the child after an exec()?
Edit: The thought behind this is that I want to implement multi-reader pipes by having an intermediate process tee() the data from one pipe into all the others. I'd like this to be transparent to my child processes, so that they can take a filename and open() it, and, if it's a pipe, I'll return the file descriptor for it, while if it's a normal file, I'll just pass that to the regular open() function. Any alternative way to do this that makes it transparent to the child processes would interesting to hear. I'd like to not have to compile a separate library that has to be pre-linked though.
I believe the answer here is no, it's not possible. To my knowledge, there's only three ways to achieve this:
LD_PRELOAD trick, compile a .so that's pre-loaded to override the system call
Implement a FUSE filesystem and pass a path into it to the client program, intercept calls.
Use PTRACE to intercept system calls, and fix them up as needed.
2 and 3 will be very slow as they'll have to intercept every I/O call and swap back to user space to handle it. Probably 500% slower than normal. 1 requires building and maintaining an external shared library to link in.
For my application, where I'm just wanting to be able to pass in a path to a pipe, I realized I can open both ends in my process (via pipe()), then go grab the path to the read end in /proc//fd and pass that path to the client program, which gives me everything I need I think.

How to create a virtual command-backed file in Linux?

What is the most straightforward way to create a "virtual" file in Linux, that would allow the read operation on it, always returning the output of some particular command (run everytime the file is being read from)? So, every read operation would cause an execution of a command, catching its output and passing it as a "content" of the file.
There is no way to create such so called "virtual file". On the other hand, you would be
able to achieve this behaviour by implementing simple synthetic filesystem in userspace via FUSE. Moreover you don't have to use c, there
are bindings even for scripting languages such as python.
Edit: And chances are that something like this already exists: see for example scriptfs.
This is a great answer I copied below.
Basically, named pipes let you do this in scripting, and Fuse let's you do it easily in Python.
You may be looking for a named pipe.
mkfifo f
{
echo 'V cebqhpr bhgchg.'
sleep 2
echo 'Urer vf zber bhgchg.'
} >f
rot13 < f
Writing to the pipe doesn't start the listening program. If you want to process input in a loop, you need to keep a listening program running.
while true; do rot13 <f >decoded-output-$(date +%s.%N); done
Note that all data written to the pipe is merged, even if there are multiple processes writing. If multiple processes are reading, only one gets the data. So a pipe may not be suitable for concurrent situations.
A named socket can handle concurrent connections, but this is beyond the capabilities for basic shell scripts.
At the most complex end of the scale are custom filesystems, which lets you design and mount a filesystem where each open, write, etc., triggers a function in a program. The minimum investment is tens of lines of nontrivial coding, for example in Python. If you only want to execute commands when reading files, you can use scriptfs or fuseflt.
No one mentioned this but if you can choose the path to the file you can use the standard input /dev/stdin.
Everytime the cat program runs, it ends up reading the output of the program writing to the pipe which is simply echo my input here:
for i in 1 2 3; do
echo my input | cat /dev/stdin
done
outputs:
my input
my input
my input
I'm afraid this is not easily possible. When a process reads from a file, it uses system calls like open, fstat, read. You would need to intercept these calls and output something different from what they would return. This would require writing some sort of kernel module, and even then it may turn out to be impossible.
However, if you simply need to trigger something whenever a certain file is accessed, you could play with inotifywait:
#!/bin/bash
while inotifywait -qq -e access /path/to/file; do
echo "$(date +%s)" >> /tmp/access.txt
done
Run this as a background process, and you will get an entry in /tmp/access.txt each time your file is being read.

Designing a perl script with multithreading and data sharing between threads

I'm writing a perl script to run some kind of a pipeline. I start by reading a JSON file with a bunch of parameters in it. I then do some work - mainly building some data structures needed later and calling external programs that generate some output files I keep references to.
I usually use a subroutine for each of these steps. Each such subroutine will usually write some data to a unique place that no other subroutine writes to (i.e. a specific key in a hash) and reads data that other subroutines may have generated.
These steps can take a good couple of minutes if done sequentially, but most of them can be run in parallel with some simple logic of dependencies that I know how to handle (using threads and a queue). So I wonder how I should implement this to allow sharing data between the threads. What would you suggest the framework to be? Perhaps use an object (of which I will have only one instance) and keep all the shared data in $self? Perhaps
a simple script (no objects) with some "global" shared variables? ...
I would obviously prefer a simple, neat solution.
Read threads::shared. By default, as perhaps you know, perl variables are not shared. But you place the shared attribute on them, and they are.
my %repository: shared;
Then if you want to synchronize access to them, the easiest way is to
{ lock( %repository );
$repository{JSON_dump} = $json_dump;
}
# %respository will be unlocked at the end of scope.
However you could use Thread::Queue, which are supposed to be muss-free, and do this as well:
$repo_queue->enqueue( JSON_dump => $json_dump );
Then your consumer thread could just:
my ( $key, $value ) = $repo_queue->dequeue( 2 );
$repository{ $key } = $value;
You can certainly do that in Perl, I suggest you look at perldoc threads and perldoc threads::shared, as these manual pages best describe the methods and pitfalls encountered when using threads in Perl.
What I would really suggest you use, provided you can, is instead a queue management system such as Gearman, which has various interfaces to it including a Perl module. This allows you to create as many "workers" as you want (the subs actually doing the work) and create one simple "client" which would schedule the appropriate tasks and then collate the results, without needing to use tricks as using hashref keys specific to the task or things like that.
This approach would also scale better, and you'd be able to have clients and workers (even managers) on different machines, should you choose so.
Other queue systems, such as TheSchwartz, would not be indicated as they lack the feedback/result that Gearman provides. To all effects, using Gearman this way is pretty much as the threaded system you described, just without the hassles and headaches that any system based on threads may eventually suffer from: having to lock variables, using semaphores, joining threads.

Resources