how to use a persistent named pipe under linux? - linux

Use named pipe some times very convenient, such as mkfifo file.fifo.
but the file.fifo is not persistent, if the computer restarted or the writer process crashed, I can get nothing from the pipe. so, are there any methods to let the piped data store in disk rather than memory?
thanks.

The simplest solution is to use plain files to store the data. For example, and use a pipe (or similar) to notify that there are new data, for example. You must take care of interprocess locking, of course.
Or you can use "message queues" (see mqueue.h). They are persistent in case of process crash, but not if the system is rebooted.
Or you can use a third-party library that implements "persistent message queues", such as MQTT or RabbitMQ.

Related

std::process, with stdin and stdout from buffers

I have a command cmd and three Vec<u8>: buff1, buff2, and buff3.
I want to execute cmd, using buff1 as stdin, and capturing stdout into buff2 and stderr into buff3.
And I'd like to do all this without explicitly writing any temporary files.
std::process seems to allow all of those things, just not all at the same time.
If I use Command::new(cmd).output() it will return the buffers for stdout and stderr, but there's no way to give it stdin.
If I use Command::new(cmd).stdin(Stdio::piped()).spawn()
then I can child.stdin.as_mut().unwrap().write_all(buff1)
but I can't capture stdout and stderr.
As far as I can tell, there's no way to call Command::new(cmd).stdout(XXX) to explicitly tell it to capture stdout in a buffer, the way it does by default with .output().
It seems like something like this should be possible:
Command::new(cmd)
.stdin(buff1)
.stdout(buff2)
.stderr(buff3)
.output()
Since Rust makes a Vec<u8> look like a File, but Vec doesn't implement Into<Stdio>
Am I missing something? Is there a way to do this, or do I need to read and write with actual files?
If you're ok with using an external library, the subprocess crate supports this use case:
let (buff2, buff3) = subprocess::Exec::cmd(cmd)
.stdin(buff1)
.communicate()?
.read()?;
Doing this with std::process::Command is trickier than it seems because the OS doesn't make it easy to connect a region of memory to a subprocess's stdin. It's easy to connect a file or anything file-like, but to feed a chunk of memory to a subprocess, you basically have to write() in a loop. While Vec<u8> does implement std::io::Read, you can't use it to construct an actual File (or anything else that contains a file descriptor/handle).
Feeding data into a subprocess while at the same time reading its output is sometimes referred to as communicating in reference to the Python method introduced in 2004 with the then-new subprocess module of Python 2.4. You can implement it yourself using std::process, but you need to be careful to avoid deadlock in case the command generates output while you are trying to feed it input. (E.g. a naive loop that feeds a chunk of data to the subprocess and then reads its stdout and stderr will be prone to such deadlocks.) The documentation describes a possible approach to implement it safely using just the standard library.
If you want to read and write with buffers, you need to use the piped forms. The reason is that, at least on Unix, input and output to a process are done through file descriptors. Since a buffer cannot intrinsically be turned into a file descriptor, it's required to use a pipe and both read and write incrementally. The fact that Rust provides an abstraction for buffers doesn't allow you to avoid the fact that the operating system doesn't, and Rust doesn't abstract this for you.
However, since you'll be using pipes for both reading and writing, you'll need to use something like select so you don't deadlock. Otherwise, you could end up trying to write when your subprocess was not accepting new input because it needed data to be read from its standard output. Using select or poll (or similar) permits you to determine when each of those file descriptors are ready to be read or written to. In Rust, these functions are in the libc crate; I don't believe that Rust provides them natively. Windows will have some similar functionality, but I have no clue what it is.
It should be noted that unless you are certain that the subprocess's output can fit into memory, it may be better to process it in a more incremental way. Since you're going to be using select, that shouldn't be too difficult.

fs.createWriteStream over several processes

How can I implement a system where multiple Node.js processes write to the same file with fs.createWriteStream, such that they don't overwrite data? It looks like the default setup for fs.createWriteStream is that the file is cleared out when that method is called. My goal is to clear out the file once, and then have all other subsequent writers only append data.
Should I use fs.createWriteStream and then fs.appendFile? Or is there a way to open up a stream for each process, not just for the first process to open the file?
Should I use fs.createWriteStream and then fs.appendFile?
you can use either.
with fs.createWriteStream you have to change the flag like this:
fs.createWriteStream('your_file',{
flags: 'a+', // default is 'w' (just 'a' might be enough here, i'm not sure)
})
this should create the file if it doesn't exist or open it with write access if it exists and set the pointer to end. (append mode)
How to use fs.appendFile should be clear and it does pretty much the same.
Now the problem with multiple processes accessing the same file. Obviously only one process can open the same file with write access at the same time.
Therefore you need to wait for the file to be released if another process has the write access. You will probably need a library for that.
this one for example: https://www.npmjs.com/package/lockup
or this one: https://github.com/Perennials/mutex-node
you can also find alot more here: https://www.npmjs.com/browse/keyword/lock
or here: https://www.npmjs.com/browse/keyword/mutex
I have not tried any of those libraries but the one I posted and several others on the list should do exactly what you need.
Writing on a single file from multiple processes, ensuring data integrity, it is a fairly complex operation that you can orchestrate using File locking.
However, you have two simpler approaches:
Writing on a temporary file for each process, and then concatenate
the files at the end of the operations.
Transmitting what you need to write to a dedicated, single process and delegate the writing execution to it. Keep in mind that sending messages among processes can be expensive.

Transparent fork-server on linux

Suppose I have application A that takes some time to load (opens a couple of libraries). A processes stdin into some stdout.
I want to serve A on a network over a socket (instead of stdin and stdout).
The simplest way of doing that efficiently that I can think of is by hacking at the code and adding a forking server loop, replacing stdin and stdout with socket input and output.
The performance improvement compared to having an independent server application that spawns A (fork+exec) on each connection comes at a cost however. The latter is much easier to write and I don't need to have access to the source code of A or know the language it's written in.
I want my cake and eat it too. Is there a mechanism that would extract that forking loop?
What I want is something like fast_spawnp("A", "/tmp/A.pid", stdin_fd, stdout_fd, stderr_fd) (start process A unless it's already running, clone A from outside and make sure the standard streams of the child point to the argument-supplied file descriptors).

Communication between processes

What's the simplest way to send a simple string to a process (I know the pid of the process and I don't need to do any check of any sort) in python 3-4 ?
Should I just go for a communication via socket ?
If you want only two processes communicate, you can use a pipe (it can be named using mkfifo or "anonymous" using pipe & dup syscalls).
If you want one server and some clients, then you can use sockets. TCP socket is usable over network, but on unix/linux exists also so-called "unix socket", that look alike named pipe.
Next way to communicate between applicaitons are realtime signals and/or shared memory.
More information about unix socket: http://beej.us/guide/bgipc/output/html/multipage/unixsock.html
About pipes, google will tell you.
But, to correctly answer your question (which do not have only one "right answer"), I think that simplest is using named pipe, because it is used as writing to file on disk.

One variable shared across all forked instances?

I have a Perl script that forks itself repeatedly. I wish to gather statistics about each forked instance: whether it passed or failed and how many instances there were in total. For this task, is there a way to create a variable that is shared across all instances?
My perl version is v5.8.8.
You should use IPC in some shape or form, most typically a shared memory segment with a semaphore guarding access to it. Alternatively, you could use some kind of hybrid memory/disk database where access API would handle concurrent access for you but this might be an overkill. Finally, you could use a file with record locking.
IPC::Shareable does what you literally ask for. Each process will have to take care to lock and unlock a shared hash (for example), but the data will appear to be shared across processes.
However, ordinary UNIX facilities provide easier ways (IMHO) of collecting worker status and count. Have every process write ($| = 1) "ok\n" or "not ok\n" when it END{}s, for example, and make sure that they are writing to a FIFO as comparatively short writes will not be interleaved. Then capture that output (e.g., ./my-script.pl | tee /tmp/my.log) and you're done. Another approach would have them record their status in simple files — open(my $status, '>', "./status.$$") — in a directory specially prepared for this.

Resources