Rust Concurrent Execution with Futures and Tokio

Rust Concurrent Execution with Futures and Tokio - rust

I've got some Rust code that currently looks like this
fn read_stdin(mut tx: mpsc::Sender<String>) {
loop {
// read from stdin and send value over tx.
}
}
fn sleep_for(n: u64) -> impl Future<Item = (), Error = ()> {
thread::sleep(time::Duration::from_millis(n));
println!("[{}] slept for {} ms", Local::now().format("%T%.3f"), n);
future::ok(())
}
fn main() {
let (stdin_tx, stdin_rx) = mpsc::channel(0);
thread::spawn(move || read_stdin(stdin_tx));
let server = stdin_rx
.map(|data| data.trim().parse::<u64>().unwrap_or(0))
.for_each(|n| tokio::spawn(sleep_for(n * 100)));
tokio::run(server);
}
It uses tokio and futures, with the aim of running some "cpu heavy" work (emulated by the sleep_for function) and then outputting some stuff to stdout.
When I run it, things seems to work fine and I get this output
2
[00:00:00.800] slept for 200 ms
10
1
[00:00:01.800] slept for 1000 ms
[00:00:01.900] slept for 100 ms
The first output with the value 2 is exactly as expected, and I see the timestamp printed after 200ms. But for the next inputs, it becomes clear that the sleep_for function is being executed sequentially, and not concurrently.
The output that I want to see is
2
[00:00:00.800] slept for 200 ms
10
1
[00:00:00.900] slept for 100 ms
[00:00:01.900] slept for 1000 ms
It seems that to get the output I'm looking for I want to execute sleep_for(10) and sleep_for(1) concurrently. How would I go about doing this in Rust with futures and tokio?
(Note: the actual values of the timestamps aren't important I'm using them more to show the ordering of execution within the program)

Found a solution with the use of the futures-timer crate.
use chrono::Local;
use futures::{future, sync::mpsc, Future, Sink, Stream};
use futures_timer::Delay;
use std::{io::stdin, thread, time::Duration};
fn read_stdin(mut tx: mpsc::Sender<String>) {
let stdin = stdin();
loop {
let mut buf = String::new();
stdin.read_line(&mut buf).unwrap();
tx = tx.send(buf).wait().unwrap()
}
}
fn main() {
let (stdin_tx, stdin_rx) = mpsc::channel(0);
thread::spawn(move || read_stdin(stdin_tx));
let server = stdin_rx
.map(|data| data.trim().parse::<u64>().unwrap_or(0) * 100)
.for_each(|delay| {
println!("[{}] {} ms -> start", Local::now().format("%T%.3f"), delay);
tokio::spawn({
Delay::new(Duration::from_millis(delay))
.and_then(move |_| {
println!("[{}] {} ms -> done", Local::now().format("%T%.3f"), delay);
future::ok(())
})
.map_err(|e| panic!(e))
})
});
tokio::run(server);
}
The issue is that the rather letting the future to become parked and then notifying the current task, the code presented in the question was just sleeping the thread and so no progress could be made.
Update: Now I've just come across tokio-timer which seems like the standard way of doing this.

Related

Rust threadpool with init code in each thread?

Following code is working, it can be tested in Playground
use std::{thread, time::Duration};
use rand::Rng;
fn main() {
let mut hiv = Vec::new();
let (sender, receiver) = crossbeam_channel::unbounded();
// make workers
for t in 0..5 {
println!("Make worker {}", t);
let receiver = receiver.clone(); // clone for this thread
let handler = thread::spawn(move || {
let mut rng = rand::thread_rng(); // each thread have one
loop {
let r = receiver.recv();
match r {
Ok(x) => {
let s = rng.gen_range(100..1000);
thread::sleep(Duration::from_millis(s));
println!("w={} r={} working={}", t, x, s);
},
_ => { println!("No more work for {} --- {:?}.", t, r); break},
}
}
});
hiv.push(handler);
}
// Generate jobs
for x in 0..10 {
sender.send(x).expect("all threads hung up :(");
}
drop(sender);
// wait for jobs to finish.
println!("Wait for all threads to finish.\n");
for h in hiv {
h.join().unwrap();
}
println!("join() done. Work Finish.");
}
My question is following :
Can I remove boilerplate code by using threadpool, rayon or some other Rust crate ?
I know that I could do my own implementation, but would like to know is there some crate with same functionality ?
From my research threadpool/rayon are useful when you "send" code and it is executed, but I have not found way to make N threads that will have some code/logic that they need to remember ?
Basic idea is in let mut rng = rand::thread_rng(); this is instance that each thread need to have on it own.
Also is there are some other problems with code, please point it out.

Yes, you can use Rayon to eliminate a lot of that code and make the remaining code much more readable, as illustrated in this gist:
https://gist.github.com/BillBarnhill/db07af903cb3c3edb6e715d9cedae028
The worker pool model is not great in Rust, due to the ownership rules. As a result parallel iterators are often a better choice.
I forgot to address your main concern, per thread context, originally. You can see how to store per thread context using a ThreadLocal! in this answer:
https://stackoverflow.com/a/42656422/204343
I will try to come back and edit the code to reflect ThreadLocal! use as soon as I have more time.
The gist requires nightly because of thread_id_value, but that is all but stable and can be removed if needed.
The real catch is that the gist has timing, and compares main_new with main_original, with surprising results. Perhaps not so surprising, Rayon has good debug support.
On Debug build the timing output is:
main_new duration: 1.525667954s
main_original duration: 1.031234059s
You can see main_new takes almost 50% longer to run.
On release however main_new is a little faster:
main_new duration: 1.584190936s
main_original duration: 1.5851124s
A slimmed version of the gist is below, with only the new code.
#![feature(thread_id_value)]
use std::{thread, time::Duration, time::Instant};
use rand::Rng;
#[allow(unused_imports)]
use rayon::prelude::*;
fn do_work(x : u32) -> String {
let mut rng = rand::thread_rng(); // each thread have one
let s = rng.gen_range(100..1000);
let thread_id = thread::current().id();
let t = thread_id.as_u64();
thread::sleep(Duration::from_millis(s));
format!("w={} r={} working={}", t, x, s)
}
fn process_work_product(output : String) {
println!("{}", output);
}
fn main() {
// bit hacky, but lets set number of threads to 5
rayon::ThreadPoolBuilder::new()
.num_threads(4)
.build_global()
.unwrap();
let x = 0..10;
x.into_par_iter()
.map(do_work)
.for_each(process_work_product);
}

If "futures do nothing unless awaited", why does `tokio::spawn` work anyway?

I have read here that futures in Rust do nothing unless they are awaited. However, I tried a more complex example and it is a little unclear why I get a message printed by the 2nd print in this example because task::spawn gives me a JoinHanlde on which I do not do any .await.
Meanwhile, I tried the same example, but with an await above the 2nd print, and now I get printed only the message in the 1st print.
If I wait for all the futures at the end, I get printed both messages, which I understood. My question is why the behaviour in the previous 2 cases.
use futures::stream::{FuturesUnordered, StreamExt};
use futures::TryStreamExt;
use rand::prelude::*;
use std::collections::VecDeque;
use std::sync::Arc;
use tokio::sync::Semaphore;
use tokio::task::JoinHandle;
use tokio::{task, time};
fn candidates() -> Vec<i32> {
Vec::from([2, 2])
}
async fn produce_event(nanos: u64) -> i32 {
println!("waiting {}", nanos);
time::sleep(time::Duration::from_nanos(nanos)).await;
1
}
async fn f(seconds: i64, semaphore: &Arc<Semaphore>) {
let mut futures = vec![];
for (i, j) in (0..1).enumerate() {
for (i, event) in candidates().into_iter().enumerate() {
let permit = Arc::clone(semaphore).acquire_owned().await;
let secs = 500;
futures.push(task::spawn(async move {
let _permit = permit;
produce_event(500); // 2nd example has an .await here
println!("Event produced at {}", seconds);
}));
}
}
}
#[tokio::main()]
async fn main() {
let semaphore = Arc::new(Semaphore::new(45000));
for _ in 0..1 {
let mut futures: FuturesUnordered<_> = (0..2).map(|moment| f(moment, &semaphore)).collect();
while let Some(item) = futures.next().await {
let () = item;
}
}
}

However, I tried a more complex example and it is a little unclear why I get a message printed by the 2nd print in this example because task::spawn gives me a JoinHanlde on which I do not do any .await.
You're spawning tasks. A task is a separate thread of execution which can execute concurrently to the current task, and can be scheduled in parallel.
All the JoinHandle does there is wait for that task to end, it doesn't control the task running.
Meanwhile, I tried the same example, but with an await above the 2nd print, and now I get printed only the message in the 1st print.
You spawn a bunch of tasks and make them sleep. Since you don't wait for them to terminate (don't join them) nor is there any sort of sleep in their parent task, once all the tasks have been spawned the loops terminate, you reach the end of the main function and the program terminates.
At this point all the tasks are still sleeping.

How do I run an asynchronous task periodically and also sometimes on demand?

I have a task (downloading something from the Web) that runs regularly with pauses 10 min between runs.
If my program notices that the data is outdated, then it should run the download task immediately unless it is already running. If the download task happened out-of-time, the next task should be after 10 min since the out-of-time task so all future tasks and pauses are shifted later in time.
How do I do this with Tokio?
I made a library to run a sequence of tasks, but trying to use it for my problem failed.
mod tasks_with_regular_pauses;
use std::future::Future;
use std::pin::Pin;
use std::sync::Arc;
use tokio::spawn;
use tokio::sync::mpsc::{channel, Receiver, Sender};
use tokio::sync::Mutex;
use tokio::task::JoinHandle;
use tokio_interruptible_future::{
interruptible, interruptible_sendable, interruptible_straight, InterruptError,
};
pub type TaskItem = Pin<Box<dyn Future<Output = ()> + Send>>;
/// Execute futures from a stream of futures in order in a Tokio task. Not tested code.
pub struct TaskQueue {
tx: Sender<TaskItem>,
rx: Arc<Mutex<Receiver<TaskItem>>>,
}
impl TaskQueue {
pub fn new() -> Self {
let (tx, rx) = channel(1);
Self {
tx,
rx: Arc::new(Mutex::new(rx)),
}
}
async fn _task(this: Arc<Mutex<Self>>) {
// let mut rx = ReceiverStream::new(rx);
loop {
let this2 = this.clone();
let fut = {
// block to shorten locks lifetime
let obj = this2.lock().await;
let rx = obj.rx.clone();
let mut rx = rx.lock().await;
rx.recv().await
};
if let Some(fut) = fut {
fut.await;
} else {
break;
}
}
}
pub fn spawn(
this: Arc<Mutex<Self>>,
notify_interrupt: async_channel::Receiver<()>,
) -> JoinHandle<Result<(), InterruptError>> {
spawn(interruptible_straight(notify_interrupt, async move {
Self::_task(this).await;
Ok(())
}))
}
pub async fn push_task(&self, fut: TaskItem) {
let _ = self.tx.send(fut).await;
}
}

I'd recommend using select! instead of interruptible futures to detect one of 3 conditions in your loop:
download task is finished
the data is outdated signal
data expired timeout signal
"The data is outdated" signal can be conveyed using a dedicated channel.
select! allows waiting for futures (like downloading and timeouts), and reading from channels at the same time. See the tutorial for examples of that.
Solution sketch:
loop {
// it is time to download
let download_future = ...; // make your URL request
let download_result = download_future.await;
// if the outdated signal is generated while download
// was in progress, ignore the signal by draining the receiver
while outdated_data_signal_receiver.try_recv().is_ok() {}
// send results upstream for processing
download_results_sender.send(download_result);
// wait to re-download
select! {
// after a 10 min pause
_ = sleep(Duration::from_minutes(10)) => break,
// or by an external signal
_ = outdated_data_signal_receiver.recv() => break,
}
}
This logic can be simplified further by the timeout primitive:
loop {
// it is time to download
let download_future = ...; // make your URL request
let download_result = download_future.await;
// if the outdated signal is generated while download
// was in progress, ignore the signal by draining the receiver
while outdated_data_signal_receiver.try_recv().is_ok() {}
// send results upstream for processing
download_results_sender.send(download_result);
// re-download by a signal, or timeout (whichever comes first)
_ = timeout(Duration::from_minutes(10), outdated_data_signal_receiver.recv()).await;
}

How to let struct hold a thread and destroy thread as soon as it go out of scope

struct ThreadHolder{
state: ???
thread: ???
}
impl ThreadHolder {
fn launch(&mut self) {
self.thread = ???
// in thread change self.state
}
}
#[test]
fn test() {
let mut th = ThreadHolder{...};
th.launch();
// thread will be destroy as soon as th go out of scope
}
I think there is something to deal with lifetime, but I don't know how to write it.

What you want is so simple that you don't even need it to be mutable in any way, and then it becomes trivial to share it across threads, unless you want to reset it. You said you need to leave a thread, for one reason or another, therefore I'll assume that you don't care about this.
You instead can poll it every tick (most games run in ticks so I don't think there will be any issue implementing that).
I will provide example that uses sleep, so it's not most accurate thing, it is painfully obvious on the last subsecond duration, but I am not trying to do your work for you anyway, there's enough resources on internet that can help you deal with it.
Here it goes:
use std::{
sync::Arc,
thread::{self, Result},
time::{Duration, Instant},
};
struct Timer {
end: Instant,
}
impl Timer {
fn new(duration: Duration) -> Self {
// this code is valid for now, but might break in the future
// future so distant, that you really don't need to care unless
// you let your players draw for eternity
let end = Instant::now().checked_add(duration).unwrap();
Timer { end }
}
fn left(&self) -> Duration {
self.end.saturating_duration_since(Instant::now())
}
// more usable than above with fractional value being accounted for
fn secs_left(&self) -> u64 {
let span = self.left();
span.as_secs() + if span.subsec_millis() > 0 { 1 } else { 0 }
}
}
fn main() -> Result<()> {
let timer = Timer::new(Duration::from_secs(10));
let timer_main = Arc::new(timer);
let timer = timer_main.clone();
let t = thread::spawn(move || loop {
let seconds_left = timer.secs_left();
println!("[Worker] Seconds left: {}", seconds_left);
if seconds_left == 0 {
break;
}
thread::sleep(Duration::from_secs(1));
});
loop {
let seconds_left = timer_main.secs_left();
println!("[Main] Seconds left: {}", seconds_left);
if seconds_left == 5 {
println!("[Main] 5 seconds left, waiting for worker thread to finish work.");
break;
}
thread::sleep(Duration::from_secs(1));
}
t.join()?;
println!("[Main] worker thread finished work, shutting down!");
Ok(())
}
By the way, this kind of implementation wouldn't be any different in any other language, so please don't blame Rust for it. It's not the easiest language, but it provides more than enough tools to build anything you want from scratch as long as you put effort into it.
Goodluck :)

I think I got it work
use std::sync::{Arc, Mutex};
use std::thread::{sleep, spawn, JoinHandle};
use std::time::Duration;
struct Timer {
pub(crate) time: Arc<Mutex<u32>>,
jh_ticker: Option<JoinHandle<()>>,
}
impl Timer {
fn new<T>(i: T, duration: Duration) -> Self
where
T: Iterator<Item = u32> + Send + 'static,
{
let time = Arc::new(Mutex::new(0));
let arc_time = time.clone();
let jh_ticker = Some(spawn(move || {
for item in i {
let mut mg = arc_time.lock().unwrap();
*mg = item;
drop(mg); // needed, otherwise this thread will always hold lock
sleep(duration);
}
}));
Timer { time, jh_ticker }
}
}
impl Drop for Timer {
fn drop(&mut self) {
self.jh_ticker.take().unwrap().join();
}
}
#[test]
fn test_timer() {
let t = Timer::new(0..=10, Duration::from_secs(1));
let a = t.time.clone();
for _ in 0..100 {
let b = *a.lock().unwrap();
println!("{}", b);
sleep(Duration::from_millis(100));
}
}

How can I run a set of functions concurrently on a recurring interval without running the same function at the same time using Tokio?

My goal is to run N functions concurrently but don't want to spawn more until all of them have finished. This is what I have so far:
extern crate tokio;
extern crate futures;
use futures::future::lazy;
use std::{thread, time};
use tokio::prelude::*;
use tokio::timer::Interval;
fn main() {
let task = Interval::new(time::Instant::now(), time::Duration::new(1, 0))
.for_each(|interval| {
println!("Interval: {:?}", interval);
for i in 0..5 {
tokio::spawn(lazy(move || {
println!("Hello from task {}", i);
// mock delay (something blocking)
// thread::sleep(time::Duration::from_secs(3));
Command::new("sleep").arg("3").output().expect("failed to execute process");
Ok(())
}));
}
Ok(())
})
.map_err(|e| panic!("interval errored; err={:?}", e));
tokio::run(task);
}
Every second I spawning 5 functions, but I now would like to wait until all of the functions have finished before spawning more.
From my understanding (I am probably getting the idea wrong), I am returning a Future within another future
task (Interval ----------------------+ (outer future)
for i in 0..5 { |
tokio::spawn( ----+ |
// my function | (inner) |
Ok(()) | |
) ----+ |
} |
Ok(()) --------------------------+
I am stuck trying to wait for the inner future to finish.

You can achieve this by joining your worker futures such that they all run in parallel, but must all finish together. You can then join that with a delay of 1 second for the same rationale. Wrap that into a loop to run it forever (or 5 iterations, for the demo).
Tokio 1.3
use futures::{future, future::BoxFuture, stream, FutureExt, StreamExt}; // 0.3.13
use std::time::{Duration, Instant};
use tokio::time; // 1.3.0
#[tokio::main]
async fn main() {
let now = Instant::now();
let forever = stream::unfold((), |()| async {
eprintln!("Loop starting at {:?}", Instant::now());
// Resolves when all pages are done
let batch_of_pages = future::join_all(all_pages());
// Resolves when both all pages and a delay of 1 second is done
future::join(batch_of_pages, time::sleep(Duration::from_secs(1))).await;
Some(((), ()))
});
forever.take(5).for_each(|_| async {}).await;
eprintln!("Took {:?}", now.elapsed());
}
fn all_pages() -> Vec<BoxFuture<'static, ()>> {
vec![page("a", 100).boxed(), page("b", 200).boxed()]
}
async fn page(name: &'static str, time_ms: u64) {
eprintln!("page {} starting", name);
time::sleep(Duration::from_millis(time_ms)).await;
eprintln!("page {} done", name);
}
Loop starting at Instant { t: 1022680437923626 }
page a starting
page b starting
page a done
page b done
Loop starting at Instant { t: 1022681444390534 }
page a starting
page b starting
page a done
page b done
Loop starting at Instant { t: 1022682453240399 }
page a starting
page b starting
page a done
page b done
Loop starting at Instant { t: 1022683469924126 }
page a starting
page b starting
page a done
page b done
Loop starting at Instant { t: 1022684493522592 }
page a starting
page b starting
page a done
page b done
Took 5.057315596s
Tokio 0.1
use futures::future::{self, Loop}; // 0.1.26
use std::time::{Duration, Instant};
use tokio::{prelude::*, timer::Delay}; // 0.1.18
fn main() {
let repeat_count = Some(5);
let forever = future::loop_fn(repeat_count, |repeat_count| {
eprintln!("Loop starting at {:?}", Instant::now());
// Resolves when all pages are done
let batch_of_pages = future::join_all(all_pages());
// Resolves when both all pages and a delay of 1 second is done
let wait = Future::join(batch_of_pages, ez_delay_ms(1000));
// Run all this again
wait.map(move |_| {
if let Some(0) = repeat_count {
Loop::Break(())
} else {
Loop::Continue(repeat_count.map(|c| c - 1))
}
})
});
tokio::run(forever.map_err(drop));
}
fn all_pages() -> Vec<Box<dyn Future<Item = (), Error = ()> + Send + 'static>> {
vec![Box::new(page("a", 100)), Box::new(page("b", 200))]
}
fn page(name: &'static str, time_ms: u64) -> impl Future<Item = (), Error = ()> + Send + 'static {
future::ok(())
.inspect(move |_| eprintln!("page {} starting", name))
.and_then(move |_| ez_delay_ms(time_ms))
.inspect(move |_| eprintln!("page {} done", name))
}
fn ez_delay_ms(ms: u64) -> impl Future<Item = (), Error = ()> + Send + 'static {
Delay::new(Instant::now() + Duration::from_millis(ms)).map_err(drop)
}
Loop starting at Instant { tv_sec: 4031391, tv_nsec: 806352322 }
page a starting
page b starting
page a done
page b done
Loop starting at Instant { tv_sec: 4031392, tv_nsec: 807792559 }
page a starting
page b starting
page a done
page b done
Loop starting at Instant { tv_sec: 4031393, tv_nsec: 809117958 }
page a starting
page b starting
page a done
page b done
Loop starting at Instant { tv_sec: 4031394, tv_nsec: 813142458 }
page a starting
page b starting
page a done
page b done
Loop starting at Instant { tv_sec: 4031395, tv_nsec: 814407116 }
page a starting
page b starting
page a done
page b done
Loop starting at Instant { tv_sec: 4031396, tv_nsec: 815342642 }
page a starting
page b starting
page a done
page b done
See also:
Why does Future::select choose the future with a longer sleep period first?
What is the best approach to encapsulate blocking I/O in future-rs?
How do I read the output of a child process without blocking in Rust?

From my understanding (I am probably getting the idea wrong), I am
returning a Future within another future
You are not wrong, but in the code that you provided the only returned future is Ok(()) which implements IntoFuture. tokio::spawn just spawns the new task into the DefaultExecutor of Tokio.
If I understand from your question, you want to spawn the next batch when the previous one is done, but if the previous is done before 1 second you want to finish that 1 second before spawning the next batch.
Implementing your own future and handling the poll by yourself would be a better solution but this can be done roughly:
By using join_all to collect batch tasks. This is a new future which waits for the collected futures to complete.
For the 1 second wait you can use atomic state. If it is locked for the tick, it waits until the state released.
Here is the code (Playground):
extern crate futures;
extern crate tokio;
use futures::future::lazy;
use std::time::{self, Duration, Instant};
use tokio::prelude::*;
use tokio::timer::{Delay, Interval};
use futures::future::join_all;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
fn main() {
let locker = Arc::new(AtomicBool::new(false));
let task = Interval::new(time::Instant::now(), time::Duration::new(1, 0))
.map_err(|e| panic!("interval errored; err={:?}", e))
.for_each(move |interval| {
let is_locked = locker.load(Ordering::SeqCst);
println!("Interval: {:?} --- {:?}", interval, is_locked);
if !is_locked {
locker.store(true, Ordering::SeqCst);
println!("locked");
let futures: Vec<_> = (0..5)
.map(|i| {
lazy(move || {
println!("Running Task-{}", i);
// mock delay
Delay::new(Instant::now() + Duration::from_millis(100 - i))
.then(|_| Ok(()))
})
.and_then(move |_| {
println!("Task-{} is done", i);
Ok(())
})
})
.collect();
let unlocker = locker.clone();
tokio::spawn(join_all(futures).and_then(move |_| {
unlocker.store(false, Ordering::SeqCst);
println!("unlocked");
Ok(())
}));
}
Ok(())
});
tokio::run(task.then(|_| Ok(())));
}
Output :
Interval: Instant { tv_sec: 4036783, tv_nsec: 211837425 } --- false
locked
Running Task-0
Running Task-1
Running Task-2
Running Task-3
Running Task-4
Task-4 is done
Task-3 is done
Task-2 is done
Task-1 is done
Task-0 is done
unlocked
Interval: Instant { tv_sec: 4036784, tv_nsec: 211837425 } --- false
locked
Running Task-0
Running Task-1
Running Task-2
Running Task-3
Running Task-4
Task-3 is done
Task-4 is done
Task-0 is done
Task-1 is done
Task-2 is done
unlocked
Warning ! : Please check Shepmaster's comment
Even for demonstration, you should not use thread:sleep in futures.
There are better alternatives

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Rust Concurrent Execution with Futures and Tokio - rust

Related

Rust threadpool with init code in each thread?

If "futures do nothing unless awaited", why does `tokio::spawn` work anyway?

How do I run an asynchronous task periodically and also sometimes on demand?

How to let struct hold a thread and destroy thread as soon as it go out of scope

How can I run a set of functions concurrently on a recurring interval without running the same function at the same time using Tokio?

Categories

Resources