Repeatedly read subprocess output with tokio-process - rust

For a GUI tool I'm writing in Rust I'd like to kick off long-lived subprocesses and then repeatedly poll them for output. I don't want to block indefinitely waiting for output from any given one, so I'm using tokio and tokio-process to run the process and timeout the output reading. I need to store the subprocess in a struct as well.
I'm running into problems because it seems I need to consume the subprocess's output stream in order to read from it, so after a single read I'm unable to access it again.
I've included a simplified reproduction of my problem below. It prints actual output for the first invocation of print_output, but then prints No process_output_fut for the second invocation since I had to take the output future out of the Plugin struct and consume it in order to read from it.
Any suggestions for how to refactor this code to avoid consuming the process output future with each attempt to fetch the output?
extern crate tokio;
use std::process::{Command, Stdio};
use std::time::Duration;
use tokio::prelude::*;
use tokio::codec::{FramedRead, LinesCodec};
use tokio::prelude::stream::StreamFuture;
use tokio::timer::Timeout;
use tokio_process::{Child, ChildStdout, CommandExt};
pub struct Plugin {
process: Option<Child>,
process_output_fut: Option<Timeout<StreamFuture<FramedRead<ChildStdout, LinesCodec>>>>,
tokio_runtime: tokio::runtime::Runtime,
}
impl Plugin {
pub fn new() -> Self {
return Plugin {
process: None,
process_output_fut: None,
tokio_runtime: tokio::runtime::Runtime::new().expect(
"Could not create tokio runtime"
),
}
}
pub fn spawn(&mut self, arg: String) {
let mut process = Command::new("/bin/ls")
.arg(arg)
.stdout(Stdio::piped())
.spawn_async()
.expect("Failed to spawn command");
if let Some(plugin_output) = process.stdout().take() {
let lines_codec = LinesCodec::new();
self.process_output_fut = Some(FramedRead::new(plugin_output, lines_codec)
.into_future()
.timeout(Duration::from_millis(3000)));
}
self.process = Some(process);
}
pub fn print_output(&mut self) {
if let Some(process_output_fut) = self.process_output_fut.take() {
let result = self.tokio_runtime.block_on(process_output_fut).unwrap();
if let Some(output) = result.0 {
println!("{:?}", output);
};
} else {
println!("No process_output_fut");
}
}
}
fn main() {
let mut p = Plugin::new();
p.spawn("/");
p.print_output();
p.print_output();
}

Related

Calling an async function synchronously with tokio [duplicate]

I am trying to use hyper to grab the content of an HTML page and would like to synchronously return the output of a future. I realized I could have picked a better example since synchronous HTTP requests already exist, but I am more interested in understanding whether we could return a value from an async calculation.
extern crate futures;
extern crate hyper;
extern crate hyper_tls;
extern crate tokio;
use futures::{future, Future, Stream};
use hyper::Client;
use hyper::Uri;
use hyper_tls::HttpsConnector;
use std::str;
fn scrap() -> Result<String, String> {
let scraped_content = future::lazy(|| {
let https = HttpsConnector::new(4).unwrap();
let client = Client::builder().build::<_, hyper::Body>(https);
client
.get("https://hyper.rs".parse::<Uri>().unwrap())
.and_then(|res| {
res.into_body().concat2().and_then(|body| {
let s_body: String = str::from_utf8(&body).unwrap().to_string();
futures::future::ok(s_body)
})
}).map_err(|err| format!("Error scraping web page: {:?}", &err))
});
scraped_content.wait()
}
fn read() {
let scraped_content = future::lazy(|| {
let https = HttpsConnector::new(4).unwrap();
let client = Client::builder().build::<_, hyper::Body>(https);
client
.get("https://hyper.rs".parse::<Uri>().unwrap())
.and_then(|res| {
res.into_body().concat2().and_then(|body| {
let s_body: String = str::from_utf8(&body).unwrap().to_string();
println!("Reading body: {}", s_body);
Ok(())
})
}).map_err(|err| {
println!("Error reading webpage: {:?}", &err);
})
});
tokio::run(scraped_content);
}
fn main() {
read();
let content = scrap();
println!("Content = {:?}", &content);
}
The example compiles and the call to read() succeeds, but the call to scrap() panics with the following error message:
Content = Err("Error scraping web page: Error { kind: Execute, cause: None }")
I understand that I failed to launch the task properly before calling .wait() on the future but I couldn't find how to properly do it, assuming it's even possible.
Standard library futures
Let's use this as our minimal, reproducible example:
async fn example() -> i32 {
42
}
Call executor::block_on:
use futures::executor; // 0.3.1
fn main() {
let v = executor::block_on(example());
println!("{}", v);
}
Tokio
Use the tokio::main attribute on any function (not just main!) to convert it from an asynchronous function to a synchronous one:
use tokio; // 0.3.5
#[tokio::main]
async fn main() {
let v = example().await;
println!("{}", v);
}
tokio::main is a macro that transforms this
#[tokio::main]
async fn main() {}
Into this:
fn main() {
tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap()
.block_on(async { {} })
}
This uses Runtime::block_on under the hood, so you can also write this as:
use tokio::runtime::Runtime; // 0.3.5
fn main() {
let v = Runtime::new().unwrap().block_on(example());
println!("{}", v);
}
For tests, you can use tokio::test.
async-std
Use the async_std::main attribute on the main function to convert it from an asynchronous function to a synchronous one:
use async_std; // 1.6.5, features = ["attributes"]
#[async_std::main]
async fn main() {
let v = example().await;
println!("{}", v);
}
For tests, you can use async_std::test.
Futures 0.1
Let's use this as our minimal, reproducible example:
use futures::{future, Future}; // 0.1.27
fn example() -> impl Future<Item = i32, Error = ()> {
future::ok(42)
}
For simple cases, you only need to call wait:
fn main() {
let s = example().wait();
println!("{:?}", s);
}
However, this comes with a pretty severe warning:
This method is not appropriate to call on event loops or similar I/O situations because it will prevent the event loop from making progress (this blocks the thread). This method should only be called when it's guaranteed that the blocking work associated with this future will be completed by another thread.
Tokio
If you are using Tokio 0.1, you should use Tokio's Runtime::block_on:
use tokio; // 0.1.21
fn main() {
let mut runtime = tokio::runtime::Runtime::new().expect("Unable to create a runtime");
let s = runtime.block_on(example());
println!("{:?}", s);
}
If you peek in the implementation of block_on, it actually sends the future's result down a channel and then calls wait on that channel! This is fine because Tokio guarantees to run the future to completion.
See also:
How can I efficiently extract the first element of a futures::Stream in a blocking manner?
As this is the top result that come up in search engines by the query "How to call async from sync in Rust", I decided to share my solution here. I think it might be useful.
As #Shepmaster mentioned, back in version 0.1 futures crate had beautiful method .wait() that could be used to call an async function from a sync one. This must-have method, however, was removed from later versions of the crate.
Luckily, it's not that hard to re-implement it:
trait Block {
fn wait(self) -> <Self as futures::Future>::Output
where Self: Sized, Self: futures::Future
{
futures::executor::block_on(self)
}
}
impl<F,T> Block for F
where F: futures::Future<Output = T>
{}
After that, you can just do following:
async fn example() -> i32 {
42
}
fn main() {
let s = example().wait();
println!("{:?}", s);
}
Beware that this comes with all the caveats of original .wait() explained in the #Shepmaster's answer.
This works for me using tokio:
tokio::runtime::Runtime::new()?.block_on(fooAsyncFunction())?;

Why does multithreaded execution not work when the subroutine is expensive? [duplicate]

This question already has an answer here:
Why are my Rust threads not running in parallel?
(1 answer)
Closed 3 years ago.
This program randomly prints the index number such as 1, 4, 2, 3, 100 ....
use std::thread;
fn main() {
for x in 0..100 {
print!("{}: {:?} ", x, child.join());
}
}
However, once I add the ping() function, which does something other than console output, it no longer executes concurrently, instead just iterating the ping() function.
extern crate serde;
extern crate serde_derive;
extern crate reqwest;
use reqwest::Client;
use std::thread;
use std::process::Command;
fn main() {
for x in 0..100 {
let child = thread::spawn(move || {
ping(x);
});
print!("{}: {:?} ", x, child.join());
}
}
fn ping(x: i32) {
let output = if cfg!(target_os = "windows") {
Command::new("cmd")
.args(&["/C", "echo hello"])
.output()
.expect("failed to execute process")
} else {
Command::new("sh")
.arg("-c")
.arg("https://keisugano.blogspot.com/")
.output()
.expect("failed to execute process")
};
println!("{:?}", output);
}
What is the root cause of this and how do I fix it?
The first example is incomplete without actually spawning threads, so not sure what happened there.
The key point here is that join is blocking, meaning it won't return until the underlying thread is completed. From the documentation:
pub fn join(self) -> Result<T>
Waits for the associated thread to finish.
In terms of atomic memory orderings, the completion of the associated thread synchronizes with this function returning. In other words, all operations performed by that thread are ordered before all operations that happen after join returns.
With this new knowledge, it is clear that what your code actually did was: create a new thread, wait for it to its completion, and then create the next one. So it is still sequntial and clearly not what you intended.
The solution is straightforward: create all the threads first, then wait them to finish as showed below:
use std::thread;
fn main() {
let handles = (0..100)
.into_iter()
.map(|x| {
thread::spawn(move || {
ping(x);
})
})
.collect::<Vec<_>>();
for thread in handles {
thread.join().unwrap();
}
}
fn ping(x: i32) {
// Do things.
}

How to add special NotReady logic to tokio-io?

I'm trying to make a Stream that would wait until a specific character is in buffer. I know there's read_until() on BufRead but I actually need a custom solution, as this is a stepping stone to implement waiting until a specific string in in buffer (or, for example, a regexp match happens).
In my project where I first encountered the problem, problem was that future processing just hanged when I get a Ready(_) from inner future and return NotReady from my function. I discovered I shouldn't do that per docs (last paragraph). However, what I didn't get, is what's the actual alternative that is promised in that paragraph. I read all the published documentation on the Tokio site and it doesn't make sense for me at the moment.
So following is my current code. Unfortunately I couldn't make it simpler and smaller as it's already broken. Current result is this:
Err(Custom { kind: Other, error: Error(Shutdown) })
Err(Custom { kind: Other, error: Error(Shutdown) })
Err(Custom { kind: Other, error: Error(Shutdown) })
<ad infinum>
Expected result is getting some Ok(Ready(_)) out of it, while printing W and W', and waiting for specific character in buffer.
extern crate futures;
extern crate tokio_core;
extern crate tokio_io;
extern crate tokio_io_timeout;
extern crate tokio_process;
use futures::stream::poll_fn;
use futures::{Async, Poll, Stream};
use tokio_core::reactor::Core;
use tokio_io::AsyncRead;
use tokio_io_timeout::TimeoutReader;
use tokio_process::CommandExt;
use std::process::{Command, Stdio};
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
struct Process {
child: tokio_process::Child,
stdout: Arc<Mutex<tokio_io_timeout::TimeoutReader<tokio_process::ChildStdout>>>,
}
impl Process {
fn new(
command: &str,
reader_timeout: Option<Duration>,
core: &tokio_core::reactor::Core,
) -> Self {
let mut cmd = Command::new(command);
let cat = cmd.stdout(Stdio::piped());
let mut child = cat.spawn_async(&core.handle()).unwrap();
let stdout = child.stdout().take().unwrap();
let mut timeout_reader = TimeoutReader::new(stdout);
timeout_reader.set_timeout(reader_timeout);
let timeout_reader = Arc::new(Mutex::new(timeout_reader));
Self {
child,
stdout: timeout_reader,
}
}
}
fn work() -> Result<(), ()> {
let window = Arc::new(Mutex::new(Vec::new()));
let mut core = Core::new().unwrap();
let process = Process::new("cat", Some(Duration::from_secs(20)), &core);
let mark = Arc::new(Mutex::new(b'c'));
let read_until_stream = poll_fn({
let window = window.clone();
let timeout_reader = process.stdout.clone();
move || -> Poll<Option<u8>, std::io::Error> {
let mut buf = [0; 8];
let poll;
{
let mut timeout_reader = timeout_reader.lock().unwrap();
poll = timeout_reader.poll_read(&mut buf);
}
match poll {
Ok(Async::Ready(0)) => Ok(Async::Ready(None)),
Ok(Async::Ready(x)) => {
{
let mut window = window.lock().unwrap();
println!("W: {:?}", *window);
println!("buf: {:?}", &buf[0..x]);
window.extend(buf[0..x].into_iter().map(|x| *x));
println!("W': {:?}", *window);
if let Some(_) = window.iter().find(|c| **c == *mark.lock().unwrap()) {
Ok(Async::Ready(Some(1)))
} else {
Ok(Async::NotReady)
}
}
}
Ok(Async::NotReady) => Ok(Async::NotReady),
Err(e) => Err(e),
}
}
});
let _stream_thread = thread::spawn(move || {
for o in read_until_stream.wait() {
println!("{:?}", o);
}
});
match core.run(process.child) {
Ok(_) => {}
Err(e) => {
println!("Child error: {:?}", e);
}
}
Ok(())
}
fn main() {
work().unwrap();
}
This is complete example project.
If you need more data you need to call poll_read again until you either find what you were looking for or poll_read returns NotReady.
You might want to avoid looping in one task for too long, so you can build yourself a yield_task function to call instead if poll_read didn't return NotReady; it makes sure your task gets called again ASAP after other pending tasks were run.
To use it just run return yield_task();.
fn yield_inner() {
use futures::task;
task::current().notify();
}
#[inline(always)]
pub fn yield_task<T, E>() -> Poll<T, E> {
yield_inner();
Ok(Async::NotReady)
}
Also see futures-rs#354: Handle long-running, always-ready futures fairly #354.
With the new async/await API futures::task::current is gone; instead you'll need a std::task::Context reference, which is provided as parameter to the new std::future::Future::poll trait method.
If you're already manually implementing the std::future::Future trait you can simply insert:
context.waker().wake_by_ref();
return std::task::Poll::Pending;
Or build yourself a Future-implementing type that yields exactly once:
pub struct Yield {
ready: bool,
}
impl core::future::Future for Yield {
type Output = ();
fn poll(self: core::pin::Pin<&mut Self>, cx: &mut core::task::Context<'_>) -> core::task::Poll<Self::Output> {
let this = self.get_mut();
if this.ready {
core::task::Poll::Ready(())
} else {
cx.waker().wake_by_ref();
this.ready = true; // ready next round
core::task::Poll::Pending
}
}
}
pub fn yield_task() -> Yield {
Yield { ready: false }
}
And then use it in async code like this:
yield_task().await;

Split a futures connection into a sink and a stream and use them in two different tasks

I'm experimenting with the futures API using the websocket library. I have this code:
use futures::future::Future;
use futures::future;
use futures::sink::Sink;
use futures::stream::Stream;
use futures::sync::mpsc::channel;
use futures::sync::mpsc::{Sender, Receiver};
use tokio_core::reactor::Core;
use websocket::{ClientBuilder, OwnedMessage};
pub fn main() {
let mut core = Core::new().unwrap();
let handle = core.handle();
let handle_clone = handle.clone();
let (send, recv): (Sender<String>, Receiver<String>) = channel(100);
let f = ClientBuilder::new("wss://...")
.unwrap()
.async_connect(None, &handle_clone)
.map_err(|e| println!("error: {:?}", e))
.map(|(duplex, _)| duplex.split())
.and_then(move |(sink, stream)| {
// this task consumes the channel, writes messages to the websocket
handle_clone.spawn(future::loop_fn(recv, |recv: Receiver<String>| {
sink.send(OwnedMessage::Close(None))
.and_then(|_| future::ok(future::Loop::Break(())))
.map_err(|_| ())
}));
// the main tasks listens the socket
future::loop_fn(stream, |stream| {
stream
.into_future()
.and_then(|_| future::ok(future::Loop::Break(())))
.map_err(|_| ())
})
});
loop {
core.turn(None)
}
}
After connecting to the server, I want to run "listener" and "sender" tasks without one blocking the other one. The problem is I can't use sink in the new task, it fails with:
error[E0507]: cannot move out of captured outer variable in an `FnMut` closure
--> src/slack_conn.rs:29:17
|
25 | .and_then(move |(sink, stream)| {
| ---- captured outer variable
...
29 | sink.send(OwnedMessage::Close(None))
| ^^^^ cannot move out of captured outer variable in an `FnMut` closure
I could directly use duplex to send and receive, but that leads to worse errors.
Any ideas on how to make this work? Indeed, I'd be happy with any futures code that allows me to non-blockingly connect to a server and spawn two async tasks:
one that reads from the connection and takes some action (prints to screen etc.)
one that reads from a mpsc channel and writes to the connection
It's fine if I have to write it in a different style.
SplitSink implements Sink which defines send to take ownership:
fn send(self, item: Self::SinkItem) -> Send<Self>
where
Self: Sized,
On the other hand, loop_fn requires that the closure be able to be called multiple times. These two things are fundamentally incompatible — how can you call something multiple times which requires consuming a value?
Here's a completely untested piece of code that compiles — I don't have rogue WebSocket servers lying about.
#[macro_use]
extern crate quick_error;
extern crate futures;
extern crate tokio_core;
extern crate websocket;
use futures::{Future, Stream, Sink};
use futures::sync::mpsc::channel;
use tokio_core::reactor::Core;
use websocket::ClientBuilder;
pub fn main() {
let mut core = Core::new().unwrap();
let handle = core.handle();
let (send, recv) = channel(100);
let f = ClientBuilder::new("wss://...")
.unwrap()
.async_connect(None, &handle)
.from_err::<Error>()
.map(|(duplex, _)| duplex.split())
.and_then(|(sink, stream)| {
let reader = stream
.for_each(|i| {
println!("Read a {:?}", i);
Ok(())
})
.from_err();
let writer = sink
.sink_from_err()
.send_all(recv.map_err(Error::Receiver))
.map(|_| ());
reader.join(writer)
});
drop(send); // Close the sending channel manually
core.run(f).expect("Unable to run");
}
quick_error! {
#[derive(Debug)]
pub enum Error {
WebSocket(err: websocket::WebSocketError) {
from()
description("websocket error")
display("WebSocket error: {}", err)
cause(err)
}
Receiver(err: ()) {
description("receiver error")
display("Receiver error")
}
}
}
The points that stuck out during implementation were:
everything has to become a Future eventually
it's way easier to define an error type and convert to it
Knowing if the Item and Error associated types were "right" was tricky. I ended up doing a lot of "type assertions" ({ let x: &Future<Item = (), Error = ()> = &reader; }).

How to implement a long running process with progress in Rust, available via a Rest api?

I am a beginner in Rust.
I have a long running IO-bound process that I want to spawn and monitor via a REST API. I chose Iron for that, following this tutorial . Monitoring means getting its progress and its final result.
When I spawn it, I give it an id and map that id to a resource that I can GET to get the progress. I don't have to be exact with the progress; I can report the progress from 5 seconds ago.
My first attempt was to have a channel via which I send request for progress and receive the status. I got stuck where to store the receiver, as in my understanding it belongs to one thread only. I wanted to put it in the context of the request, but that won't work as there are different threads handling subsequent requests.
What would be the idiomatic way to do this in Rust?
I have a sample project.
Later edit:
Here is a self contained example which follows the sample principle as the answer, namely a map where each thread updates its progress:
extern crate iron;
extern crate router;
extern crate rustc_serialize;
use iron::prelude::*;
use iron::status;
use router::Router;
use rustc_serialize::json;
use std::io::Read;
use std::sync::{Mutex, Arc};
use std::thread;
use std::time::Duration;
use std::collections::HashMap;
#[derive(Debug, Clone, RustcEncodable, RustcDecodable)]
pub struct Status {
pub progress: u64,
pub context: String
}
#[derive(RustcEncodable, RustcDecodable)]
struct StartTask {
id: u64
}
fn start_process(status: Arc<Mutex<HashMap<u64, Status>>>, task_id: u64) {
let c = status.clone();
thread::spawn(move || {
for i in 1..100 {
{
let m = &mut c.lock().unwrap();
m.insert(task_id, Status{ progress: i, context: "in progress".to_string()});
}
thread::sleep(Duration::from_secs(1));
}
let m = &mut c.lock().unwrap();
m.insert(task_id, Status{ progress: 100, context: "done".to_string()});
});
}
fn main() {
let status: Arc<Mutex<HashMap<u64, Status>>> = Arc::new(Mutex::new(HashMap::new()));
let status_clone: Arc<Mutex<HashMap<u64, Status>>> = status.clone();
let mut router = Router::new();
router.get("/:taskId", move |r: &mut Request| task_status(r, &status.lock().unwrap()));
router.post("/start", move |r: &mut Request|
start_task(r, status_clone.clone()));
fn task_status(req: &mut Request, statuses: & HashMap<u64,Status>) -> IronResult<Response> {
let ref task_id = req.extensions.get::<Router>().unwrap().find("taskId").unwrap_or("/").parse::<u64>().unwrap();
let payload = json::encode(&statuses.get(&task_id)).unwrap();
Ok(Response::with((status::Ok, payload)))
}
// Receive a message by POST and play it back.
fn start_task(request: &mut Request, statuses: Arc<Mutex<HashMap<u64, Status>>>) -> IronResult<Response> {
let mut payload = String::new();
request.body.read_to_string(&mut payload).unwrap();
let task_start_request: StartTask = json::decode(&payload).unwrap();
start_process(statuses, task_start_request.id);
Ok(Response::with((status::Ok, json::encode(&task_start_request).unwrap())))
}
Iron::new(router).http("localhost:3000").unwrap();
}
One possibility is to use a global HashMap that associate each worker id with the progress (and result). Here is simple example (without the rest stuff)
#[macro_use]
extern crate lazy_static;
use std::sync::Mutex;
use std::collections::HashMap;
use std::thread;
use std::time::Duration;
lazy_static! {
static ref PROGRESS: Mutex<HashMap<usize, usize>> = Mutex::new(HashMap::new());
}
fn set_progress(id: usize, progress: usize) {
// insert replaces the old value if there was one.
PROGRESS.lock().unwrap().insert(id, progress);
}
fn get_progress(id: usize) -> Option<usize> {
PROGRESS.lock().unwrap().get(&id).cloned()
}
fn work(id: usize) {
println!("Creating {}", id);
set_progress(id, 0);
for i in 0..100 {
set_progress(id, i + 1);
// simulates work
thread::sleep(Duration::new(0, 50_000_000));
}
}
fn monitor(id: usize) {
loop {
if let Some(p) = get_progress(id) {
if p == 100 {
println!("Done {}", id);
// to avoid leaks, remove id from PROGRESS.
// maybe save that the task ends in a data base.
return
} else {
println!("Progress {}: {}", id, p);
}
}
thread::sleep(Duration::new(1, 0));
}
}
fn main() {
let w = thread::spawn(|| work(1));
let m = thread::spawn(|| monitor(1));
w.join().unwrap();
m.join().unwrap();
}
You need to register one channel per request thread, because if cloning Receivers were possible the responses might/will end up with the wrong thread if two request are running at the same time.
Instead of having your thread create a channel for answering requests, use a future. A future allows you to have a handle to an object, where the object doesn't exist yet. You can change the input channel to receive a Promise, which you then fulfill, no output channel necessary.

Resources