Pass stdin to Command without interfering with user stdin - rust

I've been having trouble sort of succinctly describing what I'm tying to accomplish, but what I'm attempting to do is write a program that will execute other programs/scripts as plugins, and during that execution, facilitate communication between my program and the other program/script for various requests (like asking for a userlist, for example). However, if these other programs/scripts need to communicate with the end user, I'd also like to not get in the way of that. I explored some other means of doing this (like dbus, an http api, a different file handle, etc), and all those options are either too heavy a solution for what I'm trying to do, or I didn't have much success in implementing (as is the case of the last idea), and just using normal STDIN/STDOUT seems like the cleanest path forward.
Here's what I have at the moment:
let mut child = Command::new("test.pl")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.expect("Failed to spawn child process");
let mut stdin = child.stdin.as_mut().unwrap();
writeln!(&mut stdin, "Hey, testing!").unwrap();
let stdout = child.stdout.as_mut().unwrap();
let stdout_reader = BufReader::new(stdout);
let stdout_lines = stdout_reader.lines();
for line in stdout_lines {
match line.unwrap().as_str() {
"member_list" => {
println!("Here's a member list");
let mlist = cluster::members_list()?;
for (member, port) in mlist.iter() {
writeln!(&mut stdin, "{}:{}", member, port);
}
}
e => {
println!("{}", e);
}
};
}
child.wait().unwrap();
This works well with communication between my program and the other, passing STDOUT to the user if it doesn't match any of the keywords expected for communication. However, I can't think of a way of being able to pass information to STDIN of the other program without getting in the way of the other program requesting input from the user (such as if the program needs to ask the user, not my program, for input). I tried a silly usage of drop(stdin); and redeclaring stdin later on, which of course didn't work due to scope issues.
I'm really trying to avoid acting as an intermediary for STDIN since that seems like it would be terribly messy. I would appreciate any insight into how I might be able to accomplish this. I'm also open to other ideas on facilitating this communication other than through STDIN/STDOUT.
Thanks all!

You could use an anonymous pipes. In Linux, at a high level, that would look like this:
Your process creates the pipe using the pipe system call, which returns a pair of FDs, one for the reading end of the pipe and the other for the writing end
Your process then forks. In the child subprocess, you then call exec to start the process you want to run, passing it one of the FDs as a command line argument
The new program takes the FD and uses it to communicate with the parent process.
Both the parent and child process should take care to close the FDs representing the half of the connection they are not using.
You can do this in rust - the needed system calls are exposed in the libc crate, or by the more rust-friendly nix crate.
I also stumbled across the interprocess crate which provides an abstraction over these low level calls.
These newly created FDs are completely independant of the stdin / stdout / stderr of the process.
If bidirectional communication is required you could use two such pipes (one for each direction) - however in that case it is probably easier to use an anonymous socket, which you create using the socketpair system call. This works in a similar fashion to the pipe, returning a pair of FDs - except that in this case the FDs are bi-directional. Again, libc and nix expose this system call.

Related

Spawn child process, then later send argument to that process

I want to spawn a child process, and then at a later time send an argument to it that will then execute. How can I do that? (NodeJS, on Mac)
For example, I have a command to execute a script file:
osascript script-test.scpt
This command works in the terminal, and it also works using exec, like so:
const { exec } = require('child_process')
var script = 'osascript script-test.scpt'
exec(script)
But how do I get it to work in an already running child process?
I've tried the following, but nothing happens (no errors, and no activity):
const { spawn } = require('child_process')
var process = spawn('osascript')
...
[at some later point (after spawned process has been created)]
process.stdin.write('script-test.scpt')
In all current operating systems, a process is spawned with a given set of arguments (also called argv, argument values) and preserves this set until execution ends. This means that you cannot change arguments on the fly.
For a program to support multiple job submissions after spawning, it needs to implement this explicitly using some form of communication - this is known as IPC, or Inter-Process Communication. A program that supports IPC will usually allow another program to control its behavior to some extent - for example, submit jobs for processing and report back on their completion.
Popular methods of implementing IPC include:
Network communication
Local calls via a "message bus" such as D-Bus
Pipes (direct communication over stdin/stdout)
Inspect the documentation for program that you're trying to call and find out if it supports any form of control, out of the ones listed above. If yes, you may be able to integrate (in a program-specific way) with it. If not, then you will need to spawn a new instance every time you need to process a new job.

Why Command's stdin write_all never terminate?

I have the following code. It runs python3 solution.py whose stdin is supplied from variable input
let mut cmd = Command::new("python3");
let mut child = cmd
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.arg("solution.py")
.spawn()
.expect("Failed to execute solution");
println!("Writing to stdin");
child
.stdin
.as_mut()
.unwrap()
.write_all(input.as_bytes())
.unwrap();
//never printed
println!("Finish writing to stdin");
child.stdin.as_mut().unwrap().flush().unwrap()
It always runs ok (and terminates fast) when the input variable is small. But when it is ~3MB (large String), it never terminates. I guess there is deadlock somewhere since the CPU usage is small.
Could you suggest how to make it run with large input? And why does it seem to suffer from deadlock?
Update
Simplified version of solution.py where this problem still happens (it just prints whitespace strings):
t = int(input())
for tt in range(t):
input()
res = [' '] * 1000
result = ''.join(res)
print("Case #{}: {}".format(tt+1, result))
Interestingly, if I change line 4 to the below, the program terminates
res = [' '] * 100
It is probably due to large output size?
Your Rust program is writing to a pipe connected to the stdin of the child process. The child process is writing to its stdout, which is again a pipe. The operating system is buffering some amount data for both pipes, but when the buffer is full it will wait for the process on the reading end to consume the data before accepting any further writes.
The write_all() call is constantly writing data into the stdin pipe, which is read by the child process. The child process is writing data to its stdout pipe, but nobody is consuming that data. once the stdout pipe buffer fills up, the child process blocks trying to write further data, so it stops reading data from stdin. That means the stdin pipe buffer will also fill up, at which point the parent process also blocks while trying to write further data.
The easiest way to resolve this is moving writing to the stdin pipe to a thread, and adding code in the main thread to read from stdout. This way, your Rust program reads and writes data in parallel, and no deadlock will occur. The documentation of std::process has an example demonstrating this approach.
The accepted answer explains the underlying issue and gives the directions how to solve it using the standard library.
In addition to that, you can use the subprocess crate whose communicate() method (modeled after Python's Popen.communicate()) is designed to prevent deadlocks in situations like yours:
let (out, _) = subprocess::Exec::cmd("python3")
.arg("solution.py")
.stdin("some data")
.communicate()?
.read_string()?;
// out is a String
Disclaimer: I'm the author of subprocess.

How can I launch a daemon in a websocket handler with actix-web?

Given a basic setup of a WebSocket server with Actix, how can I launch a daemon inside my message handler?
I've extended the example starter code linked above to call daemon(false, true) using the fork crate.
use actix::{Actor, StreamHandler};
use actix_web::{web, App, Error, HttpRequest, HttpResponse, HttpServer};
use actix_web_actors::ws;
use fork::{daemon, Fork};
/// Define HTTP actor
struct MyWs;
impl Actor for MyWs {
type Context = ws::WebsocketContext<Self>;
}
/// Handler for ws::Message message
impl StreamHandler<Result<ws::Message, ws::ProtocolError>> for MyWs {
fn handle(
&mut self,
msg: Result<ws::Message, ws::ProtocolError>,
ctx: &mut Self::Context,
) {
match msg {
Ok(ws::Message::Ping(msg)) => ctx.pong(&msg),
Ok(ws::Message::Text(text)) => {
println!("text message received");
if let Ok(Fork::Child) = daemon(false, true) {
println!("from daemon: this print but then the websocket crashes!");
};
ctx.text(text)
},
Ok(ws::Message::Binary(bin)) => ctx.binary(bin),
_ => (),
}
}
}
async fn index(req: HttpRequest, stream: web::Payload) -> Result<HttpResponse, Error> {
let resp = ws::start(MyWs {}, &req, stream);
println!("{:?}", resp);
resp
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
HttpServer::new(|| App::new().route("/ws/", web::get().to(index)))
.bind("127.0.0.1:8080")?
.run()
.await
}
The above code starts the server but when I send it a message, I receive a Panic in Arbiter thread.
text message received
from daemon: this print but then the websocket crashes!
thread 'actix-rt:worker:0' panicked at 'failed to park', /Users/xxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.25/src/runtime/basic_scheduler.rs:158:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Panic in Arbiter thread.
The issue with your application is that the actix-web runtime (i.e. Tokio) is multi-threaded. This is a problem because the fork() call (used internaly by daemon()) only replicates the thread that called fork().
Even if your parent process has N threads, your child process will have only 1. If your parent process has any mutexes locked by those threads, their state will be replicated in the child process, but as those threads do not exist there, they will remain locked for forever.
If you have an Rc/Arc it will never de-allocate its memory, because it will never be dropped, thus its internal count will never reach zero. The same applies for any pointers and shared state.
Or said more simply - your forked child will end up in undefined state.
This is best explained in Calling fork() in a Multithreaded Environment:
The fork( ) system call creates an exact duplicate of the address
space from which it is called, resulting in two address spaces
executing the same code. Problems can occur if the forking address
space has multiple threads executing at the time of the fork( ). When
multithreading is a result of library invocation, threads are not
necessarily aware of each other's presence, purpose, actions, and so
on. Suppose that one of the other threads (any thread other than the
one doing the fork( )) has the job of deducting money from your
checking account. Clearly, you do not want this to happen twice as a
result of some other thread's decision to call fork( ).
Because of these types of problems, which in general are problems of
threads modifying persistent state, POSIX defined the behavior of
fork( ) in the presence of threads to propagate only the forking
thread. This solves the problem of improper changes being made to
persistent state. However, it causes other problems, as discussed in
the next paragraph.
In the POSIX model, only the forking thread is propagated. All the
other threads are eliminated without any form of notice; no cancels
are sent and no handlers are run. However, all the other portions of
the address space are cloned, including all the mutex state. If the
other thread has a mutex locked, the mutex will be locked in the child
process, but the lock owner will not exist to unlock it. Therefore,
the resource protected by the lock will be permanently unavailable.
Here you can find a more reputable source with more details
To answer your other question:
"how can I launch a daemon inside my message handler?"
I assume you want to implement the classical unix "fork() on accept()" model.
In that case you are out of luck, because servers such as actix-web, and async/await
in general are not designed with that in mind. Even if you have a
single-threaded async/await server, then:
When a child is forked it inherits all file descriptors from the parent. So it's
common after a fork, the child to close its listening socket in order to avoid a
resource leak - but there is no way to do that on any of the async/await based servers,
not because it's impossible to do, but because it's not implemented.
And even more important reason to do that is to prevent the child process
from accepting new connections - because even if you run a single threaded
server, it's still capable of processing many tasks concurrently - i.e.
when your handler calls .await on something, the acceptor would be free to
accept a new connection (by stealing it from the socket's queue) and start processing it.
Your parent server may have already spawned a lot of tasks and those would be
replicated in each forked child, thus executing the very same thing multiple times,
independently in each process
And well... there is no way to prevent any of that on any of the async/await
based servers I'm familiar with. You would need a custom server that:
Checks in its acceptor task if it's a child and if it detects that it's the child
it should close the listening socket and drop the acceptor.
It should not execute any other task that was forked from the parent,
but there is no way to achieve that.
In other words - async/await and "fork() on accept()" are two different and
incompatible models for processing tasks concurrently.
A possible solution would be to have a non-async acceptor daemon that only
accepts connections and forks itself. Then spawns a web-server in the child
then feeding it the accepted socket. But although possible, none of the servers
currently have support for that.
As described in the other answer, the async runtime you're relying on may completely break if you touch it in the child process. Touching anything can completely break assumptions the actix or tokio devs made. Wacky stuff will happen if you so much as return from the function.
See this response by one of the key authors of tokio to someone doing something similar (calling fork() in the context of a threadpool with hyper):
Threads + fork is bad news... you can fork if you immediately exec and do not allocate memory or perform any other operation that may have been corrupted by the fork.
Going back to your question:
The objective is for my websocket to respond to messages and be able to launch isolated long-running processes that launch successfully and do not exit when the websocket exits.
I don't think you want to manually fork() at all. Utility functions provided by actix/tokio should integrate well with their runtimes. You may:
Run blocking or CPU-heavy code in a dedicated thread with actix_web::block
Spawn a future with actix::AsyncContext::spawn. You would ideally want to use e.g. tokio::process::Command rather than the std version to avoid blocking in an async context.
If all you're doing in the child process is running Command::new() and later Command::spawn(), I'm pretty sure you can just call it directly. There's no need to fork; it does that internally.

Linux: is there a way to use named fifos on the writer side in non-blocking mode?

I've found many questions and answers about pipes on Linux, but almost all discuss the reader side.
For a process that shall be ready to deliver data to a named pipe as soon as the data is available and a reading process is connected, is there a way to, in a non-blocking fashion:
wait (poll(2)) for reader to open the pipe,
wait in a loop (again poll(2)) for signal that writing to the pipe will not block, and
when such signal is received, check how many bytes may be written to the pipe without blocking
I understand how to do (2.), but I wasn't able to find consistent answers for (1.) and (3.).
EDIT: I was looking for (something like) FIONWRITE for pipes, but Linux does not have FIONWRITE (for pipes) (?)
EDIT2: The intended main loop for the writer (kind of pseudo code, target language is C/C++):
forever
poll(can_read_command, can_write_to_the_fifo)
if (can_read_command) {
read and parse command
update internal status
continue
}
if (can_write_to_the_fifo) {
length = min(data_available, space_for_nonblocking_write)
write(output_fifo, buffer, length)
update internal status
continue
}

How to handle long running external function calls such as blocking I/O in Rust?

Editor's note: This question is from a version of Rust prior to 1.0 and uses terms and functions that do not exist in Rust 1.0 code. The concepts expressed are still relevant.
I need to read data provided by an external process via a POSIX file descriptor in my Rust program. The file descriptor connection is kept up a very long time (hours) and the other side passes data to me from time to time. I need to read and process the data stream continuously.
To do so, I wrote a loop that calls libc::read() (readv actually) to read the data and processes it when received. Since this would block the whole scheduler, I'm spawning a task on a new scheduler (task::spawn_sched(SingleThreaded)). This works fine as long as it runs, but I can't find a way to cleanly shut down the loop.
Since the loop is blocking most of the time, I can't use a port/channel to notify the loop to exit.
I tried to kill the loop task by taking it down using a failing linked task (spawn the loop task supervised, spawn a linked task within it and wait for a signal on a port to happen before fail!()ing and taking down the loop task with it). It works well in tests, but the libc::read() isn't interrupted (the task doesn't fail before read finishes and it hits task::yield() at some time.
I learned a lot looking at libcore sources, but I can't seem to find a proper solution.
Is there a way to kill a (child) task in Rust even if it's doing some long external function call like a blocking read?
Is there a way to do non-blocking reads on a POSIX file descriptor so that Rust keeps control over the task?
How can I react to signals, e.g. SIGTERMif the user terminates my program? There doesn't seem to be something like sigaction() in Rust yet.
According to mozila, killing a task is no more possible, for now, let alone blocking read.
It will be possible to do so after mozilla/rust/pull/11410, see also my other issue report for rust-zmq erickt/rust-zmq/issues/24 which also depends on this. (sorry about the links)
Maybe the signal listener will work for you.
Is there a way to kill a (child) task in Rust even if it's doing some long external function call like a blocking read?
No.
See also:
How does Rust handle killing threads?
How to terminate or suspend a Rust thread from another thread?
What is the standard way to get a Rust thread out of blocking operations?
Is there a way to do non-blocking reads [...] so that Rust keeps control over the task?
Yes.
See also:
How can I read non-blocking from stdin?
How do I read the output of a child process without blocking in Rust?
How can I force a thread that is blocked reading from a file to resume in Rust?
Force non blocking read with TcpStream
on a POSIX file descriptor
Yes.
See also:
How can I read from a specific raw file descriptor in Rust?
How do I write to a specific raw file descriptor from Rust?
How to get tokio-io's async_read for a File handle
How to asynchronously read a file?
How can I react to signals
Decide your desired platform support, then pick an appropriate crate.
See also:
How to catch signals in Rust
Is there a way to listen to signals on Windows
How to handle SIGSEGV signal in userspace using Rust?
Putting it all together
use future::Either;
use signal_hook::iterator::Signals;
use std::os::unix::io::FromRawFd;
use tokio::{fs::File, io, prelude::*};
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
fn main() -> Result<()> {
let signals = Signals::new(&[signal_hook::SIGUSR1])?;
let signals = signals.into_async()?;
let input = unsafe { std::fs::File::from_raw_fd(5) };
let input = File::from_std(input);
let lines = io::lines(std::io::BufReader::new(input));
let signals = signals.map(Either::A);
let lines = lines.map(Either::B);
let combined = signals.select(lines);
tokio::run({
combined
.map_err(|e| panic!("Early error: {}", e))
.for_each(|v| match v {
Either::A(signal) => {
println!("Got signal: {:?}", signal);
Err(())
}
Either::B(data) => {
println!("Got data: {:?}", data);
Ok(())
}
})
});
Ok(())
}
Cargo.toml
[package]
name = "future_example"
version = "0.1.0"
authors = ["An Devloper <an.devloper#example.com>"]
edition = "2018"
[dependencies]
tokio = "0.1.22"
signal-hook = { version = "0.1.9", features = ["tokio-support"] }
shim.sh
#!/bin/bash
set -eu
exec 5< /tmp/testpipe
exec ./target/debug/future_example
Execution
cargo build
mkfifo /tmp/testpipe
./shim.sh
Another terminal
printf 'hello\nthere\nworld' > /tmp/testpipe
kill -s usr1 $PID_OF_THE_PROCESS

Resources