Why Command's stdin write_all never terminate?

Why Command's stdin write_all never terminate? - rust

I have the following code. It runs python3 solution.py whose stdin is supplied from variable input
let mut cmd = Command::new("python3");
let mut child = cmd
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.arg("solution.py")
.spawn()
.expect("Failed to execute solution");
println!("Writing to stdin");
child
.stdin
.as_mut()
.unwrap()
.write_all(input.as_bytes())
.unwrap();
//never printed
println!("Finish writing to stdin");
child.stdin.as_mut().unwrap().flush().unwrap()
It always runs ok (and terminates fast) when the input variable is small. But when it is ~3MB (large String), it never terminates. I guess there is deadlock somewhere since the CPU usage is small.
Could you suggest how to make it run with large input? And why does it seem to suffer from deadlock?
Update
Simplified version of solution.py where this problem still happens (it just prints whitespace strings):
t = int(input())
for tt in range(t):
input()
res = [' '] * 1000
result = ''.join(res)
print("Case #{}: {}".format(tt+1, result))
Interestingly, if I change line 4 to the below, the program terminates
res = [' '] * 100
It is probably due to large output size?

Your Rust program is writing to a pipe connected to the stdin of the child process. The child process is writing to its stdout, which is again a pipe. The operating system is buffering some amount data for both pipes, but when the buffer is full it will wait for the process on the reading end to consume the data before accepting any further writes.
The write_all() call is constantly writing data into the stdin pipe, which is read by the child process. The child process is writing data to its stdout pipe, but nobody is consuming that data. once the stdout pipe buffer fills up, the child process blocks trying to write further data, so it stops reading data from stdin. That means the stdin pipe buffer will also fill up, at which point the parent process also blocks while trying to write further data.
The easiest way to resolve this is moving writing to the stdin pipe to a thread, and adding code in the main thread to read from stdout. This way, your Rust program reads and writes data in parallel, and no deadlock will occur. The documentation of std::process has an example demonstrating this approach.

The accepted answer explains the underlying issue and gives the directions how to solve it using the standard library.
In addition to that, you can use the subprocess crate whose communicate() method (modeled after Python's Popen.communicate()) is designed to prevent deadlocks in situations like yours:
let (out, _) = subprocess::Exec::cmd("python3")
.arg("solution.py")
.stdin("some data")
.communicate()?
.read_string()?;
// out is a String
Disclaimer: I'm the author of subprocess.

Related

Pass stdin to Command without interfering with user stdin

I've been having trouble sort of succinctly describing what I'm tying to accomplish, but what I'm attempting to do is write a program that will execute other programs/scripts as plugins, and during that execution, facilitate communication between my program and the other program/script for various requests (like asking for a userlist, for example). However, if these other programs/scripts need to communicate with the end user, I'd also like to not get in the way of that. I explored some other means of doing this (like dbus, an http api, a different file handle, etc), and all those options are either too heavy a solution for what I'm trying to do, or I didn't have much success in implementing (as is the case of the last idea), and just using normal STDIN/STDOUT seems like the cleanest path forward.
Here's what I have at the moment:
let mut child = Command::new("test.pl")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.expect("Failed to spawn child process");
let mut stdin = child.stdin.as_mut().unwrap();
writeln!(&mut stdin, "Hey, testing!").unwrap();
let stdout = child.stdout.as_mut().unwrap();
let stdout_reader = BufReader::new(stdout);
let stdout_lines = stdout_reader.lines();
for line in stdout_lines {
match line.unwrap().as_str() {
"member_list" => {
println!("Here's a member list");
let mlist = cluster::members_list()?;
for (member, port) in mlist.iter() {
writeln!(&mut stdin, "{}:{}", member, port);
}
}
e => {
println!("{}", e);
}
};
}
child.wait().unwrap();
This works well with communication between my program and the other, passing STDOUT to the user if it doesn't match any of the keywords expected for communication. However, I can't think of a way of being able to pass information to STDIN of the other program without getting in the way of the other program requesting input from the user (such as if the program needs to ask the user, not my program, for input). I tried a silly usage of drop(stdin); and redeclaring stdin later on, which of course didn't work due to scope issues.
I'm really trying to avoid acting as an intermediary for STDIN since that seems like it would be terribly messy. I would appreciate any insight into how I might be able to accomplish this. I'm also open to other ideas on facilitating this communication other than through STDIN/STDOUT.
Thanks all!

You could use an anonymous pipes. In Linux, at a high level, that would look like this:
Your process creates the pipe using the pipe system call, which returns a pair of FDs, one for the reading end of the pipe and the other for the writing end
Your process then forks. In the child subprocess, you then call exec to start the process you want to run, passing it one of the FDs as a command line argument
The new program takes the FD and uses it to communicate with the parent process.
Both the parent and child process should take care to close the FDs representing the half of the connection they are not using.
You can do this in rust - the needed system calls are exposed in the libc crate, or by the more rust-friendly nix crate.
I also stumbled across the interprocess crate which provides an abstraction over these low level calls.
These newly created FDs are completely independant of the stdin / stdout / stderr of the process.
If bidirectional communication is required you could use two such pipes (one for each direction) - however in that case it is probably easier to use an anonymous socket, which you create using the socketpair system call. This works in a similar fashion to the pipe, returning a pair of FDs - except that in this case the FDs are bi-directional. Again, libc and nix expose this system call.

How to read binary data over a pipe from another process in python?

I launch another process using the following command:
p = subprocess.Popen(binFilePath, shell=False, stdin=PIPE, stdout=PIPE, stderr=PIPE, universal_newlines=False)
According to documentation p.stdout should now have a binary data stream as universal_newlines is set to False.
If the other program now sends binary data how can I read it? A call to the following command does not return though there is a limited amount of data waiting to be read:
returnedData = p.stdout.read()
I want exactly the amount of data waiting in the pipe (if there is data available and otherwise block until data is available). So how do I do that?

It's not simple as python is not designed for this kind of stuff as other programming languages would be.
First: You have to switch the pipe to non-blocking. Otherwise a call to read() will block in all cases. This is done with this code:
fd = p.stdout.fileno()
fl = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
Note: Be aware the p.stdout is not your output stream to the other process but the process' output stream, which is your input stream.
Now as we have a non-blocking stream we can proceed.
Second: Wait until data is available. This can be done with select():
streams = [ p.stdout ]
temp0 = []
readable, writable, exceptional = select.select(streams, temp0, temp0, 5)
if len(readable) == 0:
raise Exception("Timeout of 5 seconds reached!")
As far as I know exceptional will never receive any data as we deal with pipes here.
Third: Now let's read the data:
temp = bytearray(4096)
numberOfBytesReceived = p.stdout.readinto(temp)
if numberOfBytesReceived <= 0:
raise Exception("No data received!")
Additional information:
Of course you have no idea how much data the sender actually sent. You have to repeatedly read data and check if you have all output of the sending process. This you can either be sure about if the process closes the stream - but that would render this question completely obsolete as then there would be no need for this kind of I/O implementation at all - or until some specific all-data-sent-mark has been sent.
Additional note:
If you are required to perform multiple reads on the pipe in order to fully read a meaningful chunk of data sent by another process you will end up doing this in a loop and append data to a buffer. This requires copying data from the temporary buffer your p.stout.readinto(temp) has been writing to to your real buffer where you want to keep the data. As far as I know there is no more efficient way in python as readinto() always (!) writes to the beginning of a pre-allocated buffer. It is not possible to write data at a specific offset as it is well known within other programming languages. If there really is no other way as it seems to me here, this must be considered to be a design flaw in the python API.

Reading stdout of child process unbuffered

I'm trying to read the output of a Python script launched by Node.js as it arrives. However, I only get access to the data once the process has finished.
var proc, args;
args = [
'./bin/build_map.py',
'--min_lon',
opts.sw.lng,
'--max_lon',
opts.ne.lng,
'--min_lat',
opts.sw.lat,
'--max_lat',
opts.ne.lat,
'--city',
opts.city
];
proc = spawn('python', args);
proc.stdout.on('data', function (buf) {
console.log(buf.toString());
socket.emit('map-creation-response', buf.toString());
});
If I launch the process with { stdio : 'inherit' } I can see the output as it happens directly in the console. But doing something like process.stdout.on('data', ...) will not work.
How do I make sure I can read the output from the child process as it arrives and direct it somewhere else?

The process doing the buffering, because it knows the terminal was redirected and not really going to the terminal, is python. You can easily tell Python not to do this buffering: Just run "python -u" instead of "python". Should be easy as that.

When a process is spawned by child_process.spawn(), the streams connected to the child process's standard output and standard error are actually unbuffered on the Nodejs side. To illustrate this, consider the following program:
const spawn = require('child_process').spawn;
var proc = spawn('bash', [
'-c',
'for i in $(seq 1 80); do echo -n .; sleep 1; done'
]);
proc.stdout
.on('data', function (b) {
process.stdout.write(b);
})
.on('close', function () {
process.stdout.write("\n");
});
This program runs bash and has it emit . characters every second for 80 seconds, while consuming this child process's standard output via data events. You should notice that the dots are emitted by the Node program every second, helping to confirm that buffering does not occur on the Nodejs side.
Also, as explained in the Nodejs documentation on child_process:
By default, pipes for stdin, stdout and stderr are established between
the parent Node.js process and the spawned child. It is possible to
stream data through these pipes in a non-blocking way. Note, however,
that some programs use line-buffered I/O internally. While that does
not affect Node.js, it can mean that data sent to the child process
may not be immediately consumed.
You may want to confirm that your Python program does not buffer its output. If you feel you're emitting data from your Python program as separate distinct writes to standard output, consider running sys.stdout.flush() following each write to suggest that Python should actually write data instead of trying to buffer it.
Update: In this commit that passage from the Nodejs documentation was removed for the following reason:
doc: remove confusing note about child process stdio
It’s not obvious what the paragraph is supposed to say. In particular,
whether and what kind of buffering mechanism a process uses for its
stdio streams does not affect that, in general, no guarantees can be
made about when it consumes data that was sent to it.
This suggests that there could be buffering at play before the Nodejs process receives data. In spite of this, care should be taken to ensure that processes within your control upstream of Nodejs are not buffering their output.

Linux: is there a way to use named fifos on the writer side in non-blocking mode?

I've found many questions and answers about pipes on Linux, but almost all discuss the reader side.
For a process that shall be ready to deliver data to a named pipe as soon as the data is available and a reading process is connected, is there a way to, in a non-blocking fashion:
wait (poll(2)) for reader to open the pipe,
wait in a loop (again poll(2)) for signal that writing to the pipe will not block, and
when such signal is received, check how many bytes may be written to the pipe without blocking
I understand how to do (2.), but I wasn't able to find consistent answers for (1.) and (3.).
EDIT: I was looking for (something like) FIONWRITE for pipes, but Linux does not have FIONWRITE (for pipes) (?)
EDIT2: The intended main loop for the writer (kind of pseudo code, target language is C/C++):
forever
poll(can_read_command, can_write_to_the_fifo)
if (can_read_command) {
read and parse command
update internal status
continue
}
if (can_write_to_the_fifo) {
length = min(data_available, space_for_nonblocking_write)
write(output_fifo, buffer, length)
update internal status
continue
}

How to handle long running external function calls such as blocking I/O in Rust?

Editor's note: This question is from a version of Rust prior to 1.0 and uses terms and functions that do not exist in Rust 1.0 code. The concepts expressed are still relevant.
I need to read data provided by an external process via a POSIX file descriptor in my Rust program. The file descriptor connection is kept up a very long time (hours) and the other side passes data to me from time to time. I need to read and process the data stream continuously.
To do so, I wrote a loop that calls libc::read() (readv actually) to read the data and processes it when received. Since this would block the whole scheduler, I'm spawning a task on a new scheduler (task::spawn_sched(SingleThreaded)). This works fine as long as it runs, but I can't find a way to cleanly shut down the loop.
Since the loop is blocking most of the time, I can't use a port/channel to notify the loop to exit.
I tried to kill the loop task by taking it down using a failing linked task (spawn the loop task supervised, spawn a linked task within it and wait for a signal on a port to happen before fail!()ing and taking down the loop task with it). It works well in tests, but the libc::read() isn't interrupted (the task doesn't fail before read finishes and it hits task::yield() at some time.
I learned a lot looking at libcore sources, but I can't seem to find a proper solution.
Is there a way to kill a (child) task in Rust even if it's doing some long external function call like a blocking read?
Is there a way to do non-blocking reads on a POSIX file descriptor so that Rust keeps control over the task?
How can I react to signals, e.g. SIGTERMif the user terminates my program? There doesn't seem to be something like sigaction() in Rust yet.

According to mozila, killing a task is no more possible, for now, let alone blocking read.
It will be possible to do so after mozilla/rust/pull/11410, see also my other issue report for rust-zmq erickt/rust-zmq/issues/24 which also depends on this. (sorry about the links)
Maybe the signal listener will work for you.

Is there a way to kill a (child) task in Rust even if it's doing some long external function call like a blocking read?
No.
See also:
How does Rust handle killing threads?
How to terminate or suspend a Rust thread from another thread?
What is the standard way to get a Rust thread out of blocking operations?
Is there a way to do non-blocking reads [...] so that Rust keeps control over the task?
Yes.
See also:
How can I read non-blocking from stdin?
How do I read the output of a child process without blocking in Rust?
How can I force a thread that is blocked reading from a file to resume in Rust?
Force non blocking read with TcpStream
on a POSIX file descriptor
Yes.
See also:
How can I read from a specific raw file descriptor in Rust?
How do I write to a specific raw file descriptor from Rust?
How to get tokio-io's async_read for a File handle
How to asynchronously read a file?
How can I react to signals
Decide your desired platform support, then pick an appropriate crate.
See also:
How to catch signals in Rust
Is there a way to listen to signals on Windows
How to handle SIGSEGV signal in userspace using Rust?
Putting it all together
use future::Either;
use signal_hook::iterator::Signals;
use std::os::unix::io::FromRawFd;
use tokio::{fs::File, io, prelude::*};
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
fn main() -> Result<()> {
let signals = Signals::new(&[signal_hook::SIGUSR1])?;
let signals = signals.into_async()?;
let input = unsafe { std::fs::File::from_raw_fd(5) };
let input = File::from_std(input);
let lines = io::lines(std::io::BufReader::new(input));
let signals = signals.map(Either::A);
let lines = lines.map(Either::B);
let combined = signals.select(lines);
tokio::run({
combined
.map_err(|e| panic!("Early error: {}", e))
.for_each(|v| match v {
Either::A(signal) => {
println!("Got signal: {:?}", signal);
Err(())
}
Either::B(data) => {
println!("Got data: {:?}", data);
Ok(())
}
})
});
Ok(())
}
Cargo.toml
[package]
name = "future_example"
version = "0.1.0"
authors = ["An Devloper <an.devloper#example.com>"]
edition = "2018"
[dependencies]
tokio = "0.1.22"
signal-hook = { version = "0.1.9", features = ["tokio-support"] }
shim.sh
#!/bin/bash
set -eu
exec 5< /tmp/testpipe
exec ./target/debug/future_example
Execution
cargo build
mkfifo /tmp/testpipe
./shim.sh
Another terminal
printf 'hello\nthere\nworld' > /tmp/testpipe
kill -s usr1 $PID_OF_THE_PROCESS

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string