I am trying to understand how polling works in a Async Rust Future. Using this following code, I tried to run two futures Fut0 and Fut1, such that they interleave as following Fut0 -> Fut1 -> Fut0 -> Fut0.
extern crate futures; // 0.3.1
use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll, Waker};
use std::cell::RefCell;
use std::rc::Rc;
use std::collections::HashMap;
use futures::executor::block_on;
use futures::future::join_all;
#[derive(Default, Debug)]
struct Fut {
id: usize,
step: usize,
wakers: Rc<RefCell<HashMap<usize, Waker>>>,
}
impl Future for Fut {
type Output = ();
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
self.step += 1;
println!("Fut{} at step {}", self.id, self.step);
{
let mut wakers = self.wakers.borrow_mut();
wakers.insert(self.id, cx.waker().clone());
}
{
let next_id = (self.id + self.step) % 2;
let wakers = self.wakers.borrow();
if let Some(w) = wakers.get(&next_id) {
println!("Waking up Fut{} from Fut{}", next_id, self.id);
w.wake_by_ref();
}
}
if self.step > 1 {
Poll::Ready(())
} else {
Poll::Pending
}
}
}
macro_rules! create_fut {
($i:ident, $e:expr, $w:expr) => (
let $i = Fut {
id: $e,
step: 0,
wakers: $w.clone(),
};
)
}
fn main() {
let wakers = Rc::new(RefCell::new(HashMap::new()));
create_fut!(fut0, 0, wakers);
create_fut!(fut1, 1, wakers);
block_on(join_all(vec![fut0, fut1]));
}
But they are always being polled in round robin fashion i.e. Fut0 -> Fut1 -> Fut0 -> Fut1 -> ....
Fut0 at step 1
Fut1 at step 1
Waking up Fut0 from Fut1
Fut0 at step 2
Waking up Fut0 from Fut0
Fut1 at step 2
Waking up Fut1 from Fut1
It seems, all of their Contexts are same, hence the Wakers for the each Futures are same too. So waking one of them wakes the other. Is it possible to have different Context(or Waker) for each future?
The method futures::future::join_all returns a future that polls the given futures in sequence, instead of in parallel. The way you should look at it, is that futures are nested and the executor will only have a reference to the top-most future that is scheduled (in this case the future returned by futures::future::join_all).
This means that when the join_all future is polled, it passes the context to the nested future its currently executing. Thereafter the join_all future will pass it to the next nested future and so on. Effectively using the same context for all nested futures. This can be verified by viewing the source code of the JoinAll future in the futures crate.
The block_on executor can only execute a single future at the time. Executors such as tokio that use thread pools can actually execute futures in parallel, and thus will use different contexts for different scheduled futures (But still the same one for JoinAll futures for the reasons described above).
Related
I have a collection of Futures, and I would like to execute all of them and get the first one that resolves successfully and abort the others still processing.
But I want to take care of the scenario where the first future that resolves actually returns an invalid value, hence leading to a situation where a retry is needed.
I found the select! macro from tokio, but it does not supporting racing a collection of futures. With select! one needs to explicitly list the futures that would be raced...making it not usable for my usecase. Also i do not see it supporting any retry mechanism.
So how do I race collection of futures in Rust and with retry?
If your futures return Result and you need to retry on Err, and by "retry" you don't mean to retry the failed future but to try others, you can use futures' select_ok():
async fn first_successful<T, E: fmt::Debug, Fut>(futures: Vec<Fut>) -> T
where
E: fmt::Debug,
Fut: Future<Output = Result<T, E>> + Unpin,
{
let (output, _remaining_futures) = futures::future::select_ok(futures)
.await
.expect("all futures failed");
output
}
If not, and you need more control, you can use the powerful FuturesUnordered. For example, for trying others with a custom predicate:
use futures::stream::StreamExt;
async fn first_successful<Fut: Future + Unpin>(
futures: Vec<Fut>,
mut failed: impl FnMut(&Fut::Output) -> bool,
) -> Fut::Output {
let mut futures = futures::stream::FuturesUnordered::from_iter(futures);
while let Some(v) = futures.next().await {
if !failed(&v) {
return v;
}
}
panic!("all futures failed");
}
In Rust, I would like to do multiple tasks in parallel and when each task finishes, I would like to do another task handled by the main process.
I know that tasks will finish at different timings, and I don't want to wait for all the tasks to do the next task.
I've tried doing multiple threads handled by the main process but I have to wait for all the threads to finish before doing another action or maybe I did not understand.
for handle in handles {
handle.join().unwrap();
}
How can I manage to do a task handled by the main process after each end of threads without blocking the whole main thread?
Here is a diagram to explain what I want to do :
If i'm not clear or if you have a better idea to handle my problem, don't mind to tell me!
Here's an example how to implement this using FuturesUnordered and Tokio:
use futures::{stream::FuturesUnordered, StreamExt};
use tokio::time::sleep;
use std::{time::Duration, future::ready};
#[tokio::main]
async fn main() {
let tasks = FuturesUnordered::new();
tasks.push(some_task(1000));
tasks.push(some_task(2000));
tasks.push(some_task(500));
tasks.push(some_task(1500));
tasks.for_each(|result| {
println!("Task finished after {} ms.", result);
ready(())
}).await;
}
async fn some_task(delay_ms: u64) -> u64 {
sleep(Duration::from_millis(delay_ms)).await;
delay_ms
}
If you run this code, you can see that the closure passed to for_each() is executed immediately whenever a task finishes, even though they don't finish in the order they were created.
Note that Tokio takes care of scheduling the tasks to different threads for you. By default, there will be one thread per CPU core.
To compile this, you need to add this to your Cargo.toml file:
[dependencies]
futures = "0.3"
tokio = { version = "1", features = ["full"] }
If you want to add some proper error propagation, the code becomes only slightly more complex – most of the added code is for the custom error type:
use futures::{stream::FuturesUnordered, TryStreamExt};
use tokio::time::sleep;
use std::{time::Duration, future::ready};
#[tokio::main]
async fn main() -> Result<(), MyError> {
let tasks = FuturesUnordered::new();
tasks.push(some_task(1000));
tasks.push(some_task(2000));
tasks.push(some_task(500));
tasks.push(some_task(1500));
tasks.try_for_each(|result| {
println!("Task finished after {} ms.", result);
ready(Ok(()))
}).await
}
async fn some_task(delay_ms: u64) -> Result<u64, MyError> {
sleep(Duration::from_millis(delay_ms)).await;
Ok(delay_ms)
}
#[derive(Debug)]
struct MyError {}
impl std::fmt::Display for MyError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "MyError occurred")
}
}
impl std::error::Error for MyError {}
when running code like this:
use futures::executor;
...
pub fn store_temporary_password(email: &str, password: &str) -> Result<(), Box<dyn Error>> {
let client = DynamoDbClient::new(Region::ApSoutheast2);
...
let future = client.put_item(input);
executor::block_on(future)?; <- crashes here
Ok(())
}
I get the error:
thread '<unnamed>' panicked at 'there is no reactor running, must be called from the context of a Tokio 1.x runtime
My main has the tokio annotation as it should:
#[tokio::main]
async fn main() {
...
My cargo.toml looks like:
[dependencies]
...
futures = { version="0", features=["executor"] }
tokio = "1"
My cargo.lock shows that i only have 1 version of both futures and tokio ("1.2.0" and "0.3.12" respectively).
This exhausts the explanations I found elsewhere for this problem. Any ideas? Thanks.
You have to enter the tokio runtime context before calling block_on:
let handle = tokio::runtime::Handle::current();
handle.enter();
executor::block_on(future)?;
Note that your code is violating the rule that async functions should never spend a long time without reaching a .await. Ideally, store_temporary_password should be marked as async to avoid blocking the current thread:
pub async fn store_temporary_password(email: &str, password: &str) -> Result<(), Box<dyn Error>> {
...
let future = client.put_item(input);
future.await?;
Ok(())
}
If that is not an option, you should wrap any calls to store_temporary_password in tokio::spawn_blocking to run the blocking operation on a separate threadpool.
I'm trying to learn async programming, but this very basic example doesn't work:
use std::future::Future;
fn main() {
let t = async {
println!("Hello, world!");
};
t.poll();
}
Everything I've read from the specs says this should work, but cargo complains that method "poll" can't be found in "impl std::future::Future". What am I doing wrong?
poll has this signature:
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
There are two problems with calling this in the way you do:
poll is not implement on a future Fut, but on Pin<&mut Fut>, so you need to get a pinned reference first. pin_mut! is often useful, and if the future implements Unpin, you can use Pin::new as well.
The bigger problem however is that poll takes a &mut Context<'_> argument. The context is created by the asynchronous runtime and passed in to the poll function of the outermost future. This means that you can't just poll a future like that, you need to be in an asynchronous runtime to do it.
Instead, you can use a crate like tokio or async-std to run a future in a synchronous context:
// tokio
use tokio::runtime::Runtime;
let runtime = Runtime::new().unwrap();
let result = runtime.block_on(async {
// ...
});
// async-std
let result = async_std::task::block_on(async {
// ...
})
Or even better, you can use #[tokio::main] or #[async_std::main] to convert your main function into an asynchronous function:
// tokio
#[tokio::main]
async fn main() {
// ...
}
// async-std
#[async_std::main]
async fn main() {
// ...
}
I am creating a few hundred requests to download the same file (this is a toy example). When I run the equivalent logic with Go, I get 200% CPU usage and return in ~5 seconds w/ 800 reqs. In Rust with only 100 reqs, it takes nearly 5 seconds and spawns 16 OS threads with 37% CPU utilization.
Why is there such a difference?
From what I understand, if I have a CpuPool managing Futures across N cores, this is functionally what the Go runtime/goroutine combo is doing, just via fibers instead of futures.
From the perf data, it seems like I am only using 1 core despite the ThreadPoolExecutor.
extern crate curl;
extern crate fibers;
extern crate futures;
extern crate futures_cpupool;
use std::io::{Write, BufWriter};
use curl::easy::Easy;
use futures::future::*;
use std::fs::File;
use futures_cpupool::CpuPool;
fn make_file(x: i32, data: &mut Vec<u8>) {
let f = File::create(format!("./data/{}.txt", x)).expect("Unable to open file");
let mut writer = BufWriter::new(&f);
writer.write_all(data.as_mut_slice()).unwrap();
}
fn collect_request(x: i32, url: &str) -> Result<i32, ()> {
let mut data = Vec::new();
let mut easy = Easy::new();
easy.url(url).unwrap();
{
let mut transfer = easy.transfer();
transfer
.write_function(|d| {
data.extend_from_slice(d);
Ok(d.len())
})
.unwrap();
transfer.perform().unwrap();
}
make_file(x, &mut data);
Ok(x)
}
fn main() {
let url = "https://en.wikipedia.org/wiki/Immanuel_Kant";
let pool = CpuPool::new(16);
let output_futures: Vec<_> = (0..100)
.into_iter()
.map(|ind| {
pool.spawn_fn(move || {
let output = collect_request(ind, url);
output
})
})
.collect();
// println!("{:?}", output_futures.Item());
for i in output_futures {
i.wait().unwrap();
}
}
My equivalent Go code
From what I understand, if I have a CpuPool managing Futures across N cores, this is functionally what the Go runtime/goroutine combo is doing, just via fibers instead of futures.
This is not correct. The documentation for CpuPool states, emphasis mine:
A thread pool intended to run CPU intensive work.
Downloading a file is not CPU-bound, it's IO-bound. All you have done is spin up many threads then told each thread to block while waiting for IO to complete.
Instead, use tokio-curl, which adapts the curl library to the Future abstraction. You can then remove the threadpool completely. This should drastically improve your throughput.