tokio with multiqueue sometimes hangs, sometimes works - rust

I'm trying to benchmark the crate multiqueue with tokio to implement something along the lines of publisher/subscriber by making Streams that can be iterated. I'm not convinced on the efficiency (I may need dozens or hundreds of listeners which filter on the items and the single publisher will be publishing somewhere around 10 messages per millisecond), so I'd like to benchmark the approach before I commit to it. However, right now, I'm encountering a strange bug where sometimes the tokio::timer::Interval just doesn't seem to fire at all.
The full code is below:
#![feature(test)]
extern crate futures;
extern crate multiqueue;
extern crate test;
extern crate tokio;
#[cfg(test)]
mod tests {
use super::*;
use futures::future::lazy;
use futures::sync::mpsc::{channel, Receiver, Sender};
use futures::{Async, Poll, Stream};
use futures::{Future, Sink};
use test::Bencher;
use tokio::timer::Interval;
#[bench]
fn bench_many(b: &mut Bencher) {
tokio::run(lazy(|| {
let (tx, rx) = multiqueue::mpmc_fut_queue(1000);
tokio::spawn(
Interval::new_interval(std::time::Duration::from_micros(100))
.take(100)
.map(|_| 100)
.map_err(|e| {
eprintln!("Got interval error = {:?}", e);
})
.fold(tx, |tx, num| {
println!("Sending {}", num);
tx.send(num).map_err(|e| println!("send err = {:?}", e))
})
.map(|_| ()),
);
for i in 0..3 {
println!("Starting");
let rx = rx.clone();
tokio::spawn(rx.for_each(move |num| {
println!("{} Got a num! {}", i, num);
Ok(())
}));
}
Ok(())
}));
}
}
and I'm running it with cargo bench. futures is on version "0.1", tokio is on version "0.1", and multiqueue is on version "0.3".
Sometimes, the whole test completes with many messages of "[0-2] Got a num! 100" and "Sending 100", but sometimes it hangs either in the middle (after several "Sending" and "Got a" messages) or hangs just with 3 "Starting" messages.
I suspect this may be an issue with the number of tasks I can run at the same time with tokio, but I don't really understand why this would be a limitation I'd be running into as both types of tasks I'm spawning yield time to the executor frequently.
How can I make this more reliable?

Related

Can a Tokio task terminate the whole runtime gracefully?

I start up a Tokio runtime with code like this:
tokio::run(my_future);
My future goes on to start a bunch of tasks in response to various conditions.
One of those tasks is responsible for determining when the program should shut down. However, I don't know how to have that task gracefully terminate the program. Ideally, I'd like to find a way for this task to cause the run function call to terminate.
Below is an example of the kind of program I would like to write:
extern crate tokio;
use tokio::prelude::*;
use std::time::Duration;
use std::time::Instant;
use tokio::timer::{Delay, Interval};
fn main() {
let kill_future = Delay::new(Instant::now() + Duration::from_secs(3));
let time_print_future = Interval::new_interval(Duration::from_secs(1));
let mut runtime = tokio::runtime::Runtime::new().expect("failed to start new Runtime");
runtime.spawn(time_print_future.for_each(|t| Ok(println!("{:?}", t))).map_err(|_| ()));
runtime.spawn(
kill_future
.map_err(|_| {
eprintln!("Timer error");
})
.map(move |()| {
// TODO
unimplemented!("Shutdown the runtime!");
}),
);
// TODO
unimplemented!("Block until the runtime is shutdown");
println!("Done");
}
shutdown_now seems promising, but upon further investigation, it's probably not going to work. In particular, it takes ownership of the runtime, and Tokio is probably not going to allow both the main thread (where the runtime was created) and some random task to own the runtime.
You can use a oneshot channel to communicate from inside the runtime to outside. When the delay expires, we send a single message through the channel.
Outside of the runtime, once we receive that message we initiate a shutdown of the runtime and wait for it to finish.
use std::time::{Duration, Instant};
use tokio::{
prelude::*,
runtime::Runtime,
sync::oneshot,
timer::{Delay, Interval},
}; // 0.1.15
fn main() {
let mut runtime = Runtime::new().expect("failed to start new Runtime");
let (tx, rx) = oneshot::channel();
runtime.spawn({
let every_second = Interval::new_interval(Duration::from_secs(1));
every_second
.for_each(|t| Ok(println!("{:?}", t)))
.map_err(drop)
});
runtime.spawn({
let in_three_seconds = Delay::new(Instant::now() + Duration::from_secs(3));
in_three_seconds
.map_err(|_| eprintln!("Timer error"))
.and_then(move |_| tx.send(()))
});
rx.wait().expect("unable to wait for receiver");
runtime
.shutdown_now()
.wait()
.expect("unable to wait for shutdown");
println!("Done");
}
See also:
How do I gracefully shutdown the Tokio runtime in response to a SIGTERM?
Is there any way to shutdown `tokio::runtime::current_thread::Runtime`?
How can I stop the hyper HTTP web server and return an error?

How can I test a future that is bound to a tokio TcpStream?

I have a future which wraps a TCP stream in a Framed using the LinesCodec.
When I try to wrap this in a test, I get the future blocking around 20% of the time, but because I have nothing listening on the socket I'm trying to connect to, I expect to always get the error:
thread 'tokio-runtime-worker-0' panicked at 'error: Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }', src/lib.rs:35:24 note: Run with 'RUST_BACKTRACE=1' for a backtrace.
This is the test code I have used:
#[macro_use(try_ready)]
extern crate futures; // 0.1.24
extern crate tokio; // 0.1.8
use std::io;
use std::net::SocketAddr;
use tokio::codec::{Framed, LinesCodec};
use tokio::net::TcpStream;
use tokio::prelude::*;
struct MyFuture {
addr: SocketAddr,
}
impl Future for MyFuture {
type Item = Framed<TcpStream, LinesCodec>;
type Error = io::Error;
fn poll(&mut self) -> Result<Async<Framed<TcpStream, LinesCodec>>, io::Error> {
let strm = try_ready!(TcpStream::connect(&self.addr).poll());
Ok(Async::Ready(Framed::new(strm, LinesCodec::new())))
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::net::Shutdown;
#[test]
fn connect() {
let addr: SocketAddr = "127.0.0.1:4222".parse().unwrap();
let fut = MyFuture { addr: addr }
.and_then(|f| {
println!("connected");
let cn = f.get_ref();
cn.shutdown(Shutdown::Both)
}).map_err(|e| panic!("error: {:?}", e));
tokio::run(fut)
}
}
playground
I have seen patterns in other languages where the test binary itself offers a mechanism to return results asynchronously, but haven't found a good way of using a similar mechanism in Rust.
A simple way to test async code may be to use a dedicated runtime for each test: start it, wait for future completion and shutdown the runtime at the end of the test.
#[test]
fn my_case() {
// setup future f
// ...
tokio::run(f);
}
I don't know if there are consolidated patterns already in the Rust ecosystem; see this discussion about the evolution of testing support for future based code.
Why your code does not work as expected
When you invoke poll(), the future is queried to check if a value is available.
If a value is not available, an interest is registered so that poll() will be invoked again when something happens that can resolve the future.
When your MyFuture::poll() is invoked:
TcpStream::connect creates a new future TcpStreamNew
TcpStreamNew::poll is invoked immediately only once on the future's creation at step 1.
The future goes out of scope, so the next time you invoke MyFuture::poll you never resolve the previously created futures.
You have registered an interest for a future that, if not resolved the first time you poll it, you never ask back again (poll) for a resolved value or for an error.
The reason of the "nondeterministic" behavior is because the first poll sometimes resolve immediately with a ConnectionRefused error and sometimes it waits forever for a future connection event or a failure that it is never retrieved.
Look at mio::sys::unix::tcp::TcpStream used by Tokio:
impl TcpStream {
pub fn connect(stream: net::TcpStream, addr: &SocketAddr) -> io::Result<TcpStream> {
set_nonblock(stream.as_raw_fd())?;
match stream.connect(addr) {
Ok(..) => {}
Err(ref e) if e.raw_os_error() == Some(libc::EINPROGRESS) => {}
Err(e) => return Err(e),
}
Ok(TcpStream {
inner: stream,
})
}
When you connect on a non-blocking socket, the system call may connect/fail immediately or return EINPROGRESS, in this last case a poll must be triggered for retrieving the value of the error.
The issue is not with the test but with the implementation.
This working test case based on yours has no custom future implementation and only calls TcpStream::connect(). It works as you expect it to.
extern crate futures;
extern crate tokio;
#[cfg(test)]
mod tests {
use super::*;
use std::net::Shutdown;
use std::net::SocketAddr;
use tokio::net::TcpStream;
use tokio::prelude::*;
#[test]
fn connect() {
let addr: SocketAddr = "127.0.0.1:4222".parse().unwrap();
let fut = TcpStream::connect(&addr)
.and_then(|f| {
println!("connected");
f.shutdown(Shutdown::Both)
}).map_err(|e| panic!("error: {:?}", e));
tokio::run(fut)
}
}
playground
You are connecting to the same endpoint over and over again in your poll() method. That's not how a future works. The poll() method will be called repeatedly, with the expectation that at some point it will return either Ok(Async::Ready(..)) or Err(..).
If you initiate a new TCP connection every time poll() is called, it will be unlikely to complete in time.
Here is a modified example that does what you expect:
#[macro_use(try_ready)]
extern crate futures;
extern crate tokio;
use std::io;
use std::net::SocketAddr;
use tokio::codec::{Framed, LinesCodec};
use tokio::net::{ConnectFuture, TcpStream};
use tokio::prelude::*;
struct MyFuture {
tcp: ConnectFuture,
}
impl MyFuture {
fn new(addr: SocketAddr) -> MyFuture {
MyFuture {
tcp: TcpStream::connect(&addr),
}
}
}
impl Future for MyFuture {
type Item = Framed<TcpStream, LinesCodec>;
type Error = io::Error;
fn poll(&mut self) -> Result<Async<Framed<TcpStream, LinesCodec>>, io::Error> {
let strm = try_ready!(self.tcp.poll());
Ok(Async::Ready(Framed::new(strm, LinesCodec::new())))
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::net::Shutdown;
#[test]
fn connect() {
let addr: SocketAddr = "127.0.0.1:4222".parse().unwrap();
let fut = MyFuture::new(addr)
.and_then(|f| {
println!("connected");
let cn = f.get_ref();
cn.shutdown(Shutdown::Both)
}).map_err(|e| panic!("error: {:?}", e));
tokio::run(fut)
}
}
I'm not certain what you intend your future to do, though; I can't comment if it's the right approach.
to some degree, you can drop in tokio's test library to make this easier; it supports async/await in unit tests.
#[tokio::test]
async fn my_future_test() {
let addr: SocketAddr = "127.0.0.1:4222".parse().unwrap();
match MyFuture { addr }.poll().await {
Ok(f) => assert!("something good")
Err(e) => assert!("something bad")
}
}
https://docs.rs/tokio/0.3.3/tokio/attr.test.html

Why doesn't dropping this SpawnHandle cancel its future?

Here is an example program:
extern crate futures;
extern crate tokio_core;
use futures::{Async, Future, Stream};
use tokio_core::reactor::Core;
use tokio_core::net::TcpListener;
fn main() {
let mut core = Core::new().unwrap();
futures::sync::oneshot::spawn(
TcpListener::bind(&"127.0.0.1:5000".parse().unwrap(), &core.handle())
.unwrap()
.incoming()
.for_each(|_| {
println!("connection received");
Ok(())
}),
&core,
);
let ft = futures::future::poll_fn::<(), (), _>(|| {
std::thread::sleep_ms(50);
Ok(Async::NotReady)
});
core.run(ft);
}
As you can see, I call oneshot::spawn and then immediately drop its return value, which should theoretically cancel the future contained inside. However, when I run this program and then make a connection to 127.0.0.1:5000, it still prints "connection received." Why does it do this? I expected it to not print anything and drop the TcpListener, unbinding from the port.
This is a (now fixed) bug in the futures crate; version 0.1.18 should include the fix.
It used inverted values for keep_running: bool in SpawnHandle/Executor.

Why do my Futures not max out the CPU?

I am creating a few hundred requests to download the same file (this is a toy example). When I run the equivalent logic with Go, I get 200% CPU usage and return in ~5 seconds w/ 800 reqs. In Rust with only 100 reqs, it takes nearly 5 seconds and spawns 16 OS threads with 37% CPU utilization.
Why is there such a difference?
From what I understand, if I have a CpuPool managing Futures across N cores, this is functionally what the Go runtime/goroutine combo is doing, just via fibers instead of futures.
From the perf data, it seems like I am only using 1 core despite the ThreadPoolExecutor.
extern crate curl;
extern crate fibers;
extern crate futures;
extern crate futures_cpupool;
use std::io::{Write, BufWriter};
use curl::easy::Easy;
use futures::future::*;
use std::fs::File;
use futures_cpupool::CpuPool;
fn make_file(x: i32, data: &mut Vec<u8>) {
let f = File::create(format!("./data/{}.txt", x)).expect("Unable to open file");
let mut writer = BufWriter::new(&f);
writer.write_all(data.as_mut_slice()).unwrap();
}
fn collect_request(x: i32, url: &str) -> Result<i32, ()> {
let mut data = Vec::new();
let mut easy = Easy::new();
easy.url(url).unwrap();
{
let mut transfer = easy.transfer();
transfer
.write_function(|d| {
data.extend_from_slice(d);
Ok(d.len())
})
.unwrap();
transfer.perform().unwrap();
}
make_file(x, &mut data);
Ok(x)
}
fn main() {
let url = "https://en.wikipedia.org/wiki/Immanuel_Kant";
let pool = CpuPool::new(16);
let output_futures: Vec<_> = (0..100)
.into_iter()
.map(|ind| {
pool.spawn_fn(move || {
let output = collect_request(ind, url);
output
})
})
.collect();
// println!("{:?}", output_futures.Item());
for i in output_futures {
i.wait().unwrap();
}
}
My equivalent Go code
From what I understand, if I have a CpuPool managing Futures across N cores, this is functionally what the Go runtime/goroutine combo is doing, just via fibers instead of futures.
This is not correct. The documentation for CpuPool states, emphasis mine:
A thread pool intended to run CPU intensive work.
Downloading a file is not CPU-bound, it's IO-bound. All you have done is spin up many threads then told each thread to block while waiting for IO to complete.
Instead, use tokio-curl, which adapts the curl library to the Future abstraction. You can then remove the threadpool completely. This should drastically improve your throughput.

Running interruptible Rust program that spawns threads

I am trying to write a program that spawns a bunch of threads and then joins the threads at the end. I want it to be interruptible, because my plan is to make this a constantly running program in a UNIX service.
The idea is that worker_pool will contain all the threads that have been spawned, so terminate can be called at any time to collect them.
I can't seem to find a way to utilize the chan_select crate to do this, because this requires I spawn a thread first to spawn my child threads, and once I do this I can no longer use the worker_pool variable when joining the threads on interrupt, because it had to be moved out for the main loop. If you comment out the line in the interrupt that terminates the workers, it compiles.
I'm a little frustrated, because this would be really easy to do in C. I could set up a static pointer, but when I try and do that in Rust I get an error because I am using a vector for my threads, and I can't initialize to an empty vector in a static. I know it is safe to join the workers in the interrupt code, because execution stops here waiting for the signal.
Perhaps there is a better way to do the signal handling, or maybe I'm missing something that I can do.
The error and code follow:
MacBook8088:video_ingest pjohnson$ cargo run
Compiling video_ingest v0.1.0 (file:///Users/pjohnson/projects/video_ingest)
error[E0382]: use of moved value: `worker_pool`
--> src/main.rs:30:13
|
24 | thread::spawn(move || run(sdone, &mut worker_pool));
| ------- value moved (into closure) here
...
30 | worker_pool.terminate();
| ^^^^^^^^^^^ value used here after move
<chan macros>:42:47: 43:23 note: in this expansion of chan_select! (defined in <chan macros>)
src/main.rs:27:5: 35:6 note: in this expansion of chan_select! (defined in <chan macros>)
|
= note: move occurs because `worker_pool` has type `video_ingest::WorkerPool`, which does not implement the `Copy` trait
main.rs
#[macro_use]
extern crate chan;
extern crate chan_signal;
extern crate video_ingest;
use chan_signal::Signal;
use video_ingest::WorkerPool;
use std::thread;
use std::ptr;
///
/// Starts processing
///
fn main() {
let mut worker_pool = WorkerPool { join_handles: vec![] };
// Signal gets a value when the OS sent a INT or TERM signal.
let signal = chan_signal::notify(&[Signal::INT, Signal::TERM]);
// When our work is complete, send a sentinel value on `sdone`.
let (sdone, rdone) = chan::sync(0);
// Run work.
thread::spawn(move || run(sdone, &mut worker_pool));
// Wait for a signal or for work to be done.
chan_select! {
signal.recv() -> signal => {
println!("received signal: {:?}", signal);
worker_pool.terminate(); // <-- Comment out to compile
},
rdone.recv() => {
println!("Program completed normally.");
}
}
}
fn run(sdone: chan::Sender<()>, worker_pool: &mut WorkerPool) {
loop {
worker_pool.ingest();
worker_pool.terminate();
}
}
lib.rs
extern crate libc;
use std::thread;
use std::thread::JoinHandle;
use std::os::unix::thread::JoinHandleExt;
use libc::pthread_join;
use libc::c_void;
use std::ptr;
use std::time::Duration;
pub struct WorkerPool {
pub join_handles: Vec<JoinHandle<()>>
}
impl WorkerPool {
///
/// Does the actual ingestion
///
pub fn ingest(&mut self) {
// Use 9 threads for an example.
for i in 0..10 {
self.join_handles.push(
thread::spawn(move || {
// Get the videos
println!("Getting videos for thread {}", i);
thread::sleep(Duration::new(5, 0));
})
);
}
}
///
/// Joins all threads
///
pub fn terminate(&mut self) {
println!("Total handles: {}", self.join_handles.len());
for handle in &self.join_handles {
println!("Joining thread...");
unsafe {
let mut state_ptr: *mut *mut c_void = 0 as *mut *mut c_void;
pthread_join(handle.as_pthread_t(), state_ptr);
}
}
self.join_handles = vec![];
}
}
terminate can be called at any time to collect them.
I don't want to stop the threads; I want to collect them with join. I agree stopping them would not be a good idea.
These two statements don't make sense to me. You can only join a thread when it's complete. The word "interruptible" and "at any time" would mean that you could attempt to stop a thread while it is still doing some processing. Which behavior do you want?
If you want to be able to stop a thread that has partially completed, you have to enhance your code to check if it should exit early. This is usually complicated by the fact that you are doing some big computation that you don't have control over. Ideally, you break that up into chunks and check your exit flag frequently. For example, with video work, you could check every frame. Then the response delay is roughly the time to process a frame.
this would be really easy to do in C.
This would be really easy to do incorrectly. For example, the code currently presented attempts to perform mutation to the pool from two different threads without any kind of synchronization. That's a sure-fire recipe to make broken, hard-to-debug code.
// Use 9 threads for an example.
0..10 creates 10 threads.
Anyway, it seems like the missing piece of knowledge is Arc and Mutex. Arc allows sharing ownership of a single item between threads, and Mutex allows for run-time mutable borrowing between threads.
#[macro_use]
extern crate chan;
extern crate chan_signal;
use chan_signal::Signal;
use std::thread::{self, JoinHandle};
use std::sync::{Arc, Mutex};
fn main() {
let worker_pool = Arc::new(Mutex::new(WorkerPool::new()));
let signal = chan_signal::notify(&[Signal::INT, Signal::TERM]);
let (work_done_tx, work_done_rx) = chan::sync(0);
let worker_pool_clone = worker_pool.clone();
thread::spawn(move || run(work_done_tx, worker_pool_clone));
// Wait for a signal or for work to be done.
chan_select! {
signal.recv() -> signal => {
println!("received signal: {:?}", signal);
let mut pool = worker_pool.lock().expect("Unable to lock the pool");
pool.terminate();
},
work_done_rx.recv() => {
println!("Program completed normally.");
}
}
}
fn run(_work_done_tx: chan::Sender<()>, worker_pool: Arc<Mutex<WorkerPool>>) {
loop {
let mut worker_pool = worker_pool.lock().expect("Unable to lock the pool");
worker_pool.ingest();
worker_pool.terminate();
}
}
pub struct WorkerPool {
join_handles: Vec<JoinHandle<()>>,
}
impl WorkerPool {
pub fn new() -> Self {
WorkerPool {
join_handles: vec![],
}
}
pub fn ingest(&mut self) {
self.join_handles.extend(
(0..10).map(|i| {
thread::spawn(move || {
println!("Getting videos for thread {}", i);
})
})
)
}
pub fn terminate(&mut self) {
for handle in self.join_handles.drain(..) {
handle.join().expect("Unable to join thread")
}
}
}
Beware that the program logic itself is still poor; even though an interrupt is sent, the loop in run continues to execute. The main thread will lock the mutex, join all the current threads1, unlock the mutex and exit the program. However, the loop can lock the mutex before the main thread has exited and start processing some new data! And then the program exits right in the middle of processing. It's almost the same as if you didn't handle the interrupt at all.
1: Haha, tricked you! There are no running threads at that point. Since the mutex is locked for the entire loop, the only time another lock can be made is when the loop is resetting. However, since the last instruction in the loop is to join all the threads, there won't be anymore running.
I don't want to let the program terminate before all threads have completed.
Perhaps it's an artifact of the reduced problem, but I don't see how the infinite loop can ever exit, so the "I'm done" channel seems superfluous.
I'd probably just add a flag that says "please stop" when an interrupt is received. Then I'd check that instead of the infinite loop and wait for the running thread to finish before exiting the program.
use std::sync::atomic::{AtomicBool, Ordering};
fn main() {
let worker_pool = WorkerPool::new();
let signal = chan_signal::notify(&[Signal::INT, Signal::TERM]);
let please_stop = Arc::new(AtomicBool::new(false));
let threads_please_stop = please_stop.clone();
let runner = thread::spawn(|| run(threads_please_stop, worker_pool));
// Wait for a signal
chan_select! {
signal.recv() -> signal => {
println!("received signal: {:?}", signal);
please_stop.store(true, Ordering::SeqCst);
},
}
runner.join().expect("Unable to join runner thread");
}
fn run(please_stop: Arc<AtomicBool>, mut worker_pool: WorkerPool) {
while !please_stop.load(Ordering::SeqCst) {
worker_pool.ingest();
worker_pool.terminate();
}
}

Resources