How to race collection of futures in Rust and with retry

How to race collection of futures in Rust and with retry - rust

I have a collection of Futures, and I would like to execute all of them and get the first one that resolves successfully and abort the others still processing.
But I want to take care of the scenario where the first future that resolves actually returns an invalid value, hence leading to a situation where a retry is needed.
I found the select! macro from tokio, but it does not supporting racing a collection of futures. With select! one needs to explicitly list the futures that would be raced...making it not usable for my usecase. Also i do not see it supporting any retry mechanism.
So how do I race collection of futures in Rust and with retry?

If your futures return Result and you need to retry on Err, and by "retry" you don't mean to retry the failed future but to try others, you can use futures' select_ok():
async fn first_successful<T, E: fmt::Debug, Fut>(futures: Vec<Fut>) -> T
where
E: fmt::Debug,
Fut: Future<Output = Result<T, E>> + Unpin,
{
let (output, _remaining_futures) = futures::future::select_ok(futures)
.await
.expect("all futures failed");
output
}
If not, and you need more control, you can use the powerful FuturesUnordered. For example, for trying others with a custom predicate:
use futures::stream::StreamExt;
async fn first_successful<Fut: Future + Unpin>(
futures: Vec<Fut>,
mut failed: impl FnMut(&Fut::Output) -> bool,
) -> Fut::Output {
let mut futures = futures::stream::FuturesUnordered::from_iter(futures);
while let Some(v) = futures.next().await {
if !failed(&v) {
return v;
}
}
panic!("all futures failed");
}

Related

Call a random function with variable arguments dynamically

I have a list of functions with variable arguments, and I want to randomly pick one of them, in runtime, and call it, on a loop. I'm looking to enhance the performance of my solution.
I have a function that calculates the arguments based on some randomness, and then (should) return a function pointer, which I could then call.
pub async fn choose_random_endpoint(
&self,
rng: ThreadRng,
endpoint_type: EndpointType,
) -> impl Future<Output = Result<std::string::String, MyError>> {
match endpoint_type {
EndpointType::Type1 => {
let endpoint_arguments = self.choose_endpoint_arguments(rng);
let endpoint = endpoint1(&self.arg1, &self.arg2, &endpoint_arguments.arg3);
endpoint
}
EndpointType::Type2 => {
let endpoint_arguments = self.choose_endpoint_arguments(rng);
let endpoint = endpoint2(
&self.arg1,
&self.arg2,
&endpoint_arguments.arg3,
rng.clone(),
);
endpoint
}
EndpointType::Type3 => {
let endpoint_arguments = self.choose_endpoint_arguments(rng);
let endpoint = endpoint3(
&self.arg1,
&self.arg2,
&endpoint_arguments.arg3,
rng.clone(),
);
endpoint
}
}
}
The error I obtain is
expected opaque type `impl Future<Output = Result<std::string::String, MyError>>` (opaque type at <src/calls/type1.rs:14:6>)
found opaque type `impl Future<Output = Result<std::string::String, MyError>>` (opaque type at <src/type2.rs:19:6>)
. The compiler advises me to await the endpoints, and this solves the issue, but is there a performance overhead to this?
Outer function:
Aassume there is a loop calling this function:
pub async fn make_call(arg1: &str, arg2: &str) -> Result<String> {
let mut rng = rand::thread_rng();
let random_endpoint_type = choose_random_endpoint_type(&mut rng);
let random_endpoint = choose_random_endpoint(&rng, random_endpoint_type);
// call the endpoint
Ok(response)
}
Now, I want to call make_call every X seconds, but I don't want my main thread to block during the endpoint calls, as those are expensive. I suppose the right way to approach this is spawning a new thread per X seconds of interval, that call make_call?
Also, performance-wise: having so many clones on the rng seems quite expensive. Is there a more performant way to do this?

The error you get is sort of unrelated to async. It's the same one you get when you try to return two different iterators from a function. Your function as written doesn't even need to be async. I'm going to remove async from it when it's not needed, but if you need async (like for implementing an async-trait) then you can add it back and it'll probably work the same.
I've reduced your code into a simpler example that has the same issue (playground):
async fn a() -> &'static str {
"a"
}
async fn b() -> &'static str {
"b"
}
fn a_or_b() -> impl Future<Output = &'static str> {
if rand::random() {
a()
} else {
b()
}
}
What you're trying to write
When you want to return a trait, but the specific type that implements that trait isn't known at compile time, you can return a trait object. Futures need to be Unpin to be awaited, so this uses a pinned box (playground).
fn a_or_b() -> Pin<Box<dyn Future<Output = &'static str>>> {
if rand::random() {
Box::pin(a())
} else {
Box::pin(b())
}
}
You may need the type to be something like Pin<Box<dyn Future<Output = &'static str> + Send + Sync + 'static>> depending on the context.
What you should write
I think the only reason you'd do the above is if you want to generate the future with some kind of async rng, then do something else, and then run the generated future after that. Otherwise there's no need to have nested futures; just await the inner futures when you call them (playground).
async fn a_or_b() -> &'static str {
if rand::random() {
a().await
} else {
b().await
}
}
This is conceptually equivalent to the Pin<Box> method, just without having to allocate a Box. Instead, you have an opaque type that implements Future itself.
Blocking
The blocking behavior of these is only slightly different. Pin<Box> will block on non-async things when you call it, while the async one will block on non-async things where you await it. This is probably mostly the random generation.
The blocking behavior of the endpoint is the same and depends on what happens inside there. It'll block or not block wherever you await either way.
If you want to have multiple make_call calls happening at the same time, you'll need to do that outside the function anyway. Using the tokio runtime, it would look something like this:
use tokio::task;
use futures::future::join_all;
let tasks: Vec<_> = (0..100).map(|_| task::spawn(make_call())).collect();
let results = join_all(tasks).await;
This also lets you do other stuff while the futures are running, in between collect(); and let results.
If something inside your function blocks, you'd want to spawn it with task::spawn_blocking (and then await that handle) so that the await call in make_call doesn't get blocked.
RNG
If your runtime is multithreaded, the ThreadRng will be an issue. You could create a type that implements Rng + Send with from_entropy, and pass that into your functions. Or you can call thread_rng or even just rand::random where you need it. This makes a new rng per thread, but will reuse them on later calls since it's a thread-local static. On the other hand, if you don't need as much randomness, you can go with a Rng + Send type from the beginning.
If your runtime isn't multithreaded, you should be able to pass &mut ThreadRng all the way through, assuming the borrow checker is smart enough. You won't be able to pass it into an async function and then spawn it, though, so you'd have to create a new one inside that function.

How can I execute an action after each end of thread?

In Rust, I would like to do multiple tasks in parallel and when each task finishes, I would like to do another task handled by the main process.
I know that tasks will finish at different timings, and I don't want to wait for all the tasks to do the next task.
I've tried doing multiple threads handled by the main process but I have to wait for all the threads to finish before doing another action or maybe I did not understand.
for handle in handles {
handle.join().unwrap();
}
How can I manage to do a task handled by the main process after each end of threads without blocking the whole main thread?
Here is a diagram to explain what I want to do :
If i'm not clear or if you have a better idea to handle my problem, don't mind to tell me!

Here's an example how to implement this using FuturesUnordered and Tokio:
use futures::{stream::FuturesUnordered, StreamExt};
use tokio::time::sleep;
use std::{time::Duration, future::ready};
#[tokio::main]
async fn main() {
let tasks = FuturesUnordered::new();
tasks.push(some_task(1000));
tasks.push(some_task(2000));
tasks.push(some_task(500));
tasks.push(some_task(1500));
tasks.for_each(|result| {
println!("Task finished after {} ms.", result);
ready(())
}).await;
}
async fn some_task(delay_ms: u64) -> u64 {
sleep(Duration::from_millis(delay_ms)).await;
delay_ms
}
If you run this code, you can see that the closure passed to for_each() is executed immediately whenever a task finishes, even though they don't finish in the order they were created.
Note that Tokio takes care of scheduling the tasks to different threads for you. By default, there will be one thread per CPU core.
To compile this, you need to add this to your Cargo.toml file:
[dependencies]
futures = "0.3"
tokio = { version = "1", features = ["full"] }
If you want to add some proper error propagation, the code becomes only slightly more complex – most of the added code is for the custom error type:
use futures::{stream::FuturesUnordered, TryStreamExt};
use tokio::time::sleep;
use std::{time::Duration, future::ready};
#[tokio::main]
async fn main() -> Result<(), MyError> {
let tasks = FuturesUnordered::new();
tasks.push(some_task(1000));
tasks.push(some_task(2000));
tasks.push(some_task(500));
tasks.push(some_task(1500));
tasks.try_for_each(|result| {
println!("Task finished after {} ms.", result);
ready(Ok(()))
}).await
}
async fn some_task(delay_ms: u64) -> Result<u64, MyError> {
sleep(Duration::from_millis(delay_ms)).await;
Ok(delay_ms)
}
#[derive(Debug)]
struct MyError {}
impl std::fmt::Display for MyError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "MyError occurred")
}
}
impl std::error::Error for MyError {}

Sleep in Future::poll

I am trying to create a future polling for inputs from the crossterm crate, which does not provide an asynchronous API, as far as I know.
At first I tried to do something like the following :
use crossterm::event::poll as crossterm_poll;
use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll};
use std::time::Duration;
use tokio::time::{sleep, timeout};
struct Polled {}
impl Polled {
pub fn new() -> Polled {
Polled {}
}
}
impl Future for Polled {
type Output = bool;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
// If there are events pending, it returns "Ok(true)", else it returns instantly
let poll_status = crossterm_poll(Duration::from_secs(0));
if poll_status.is_ok() && poll_status.unwrap() {
return Poll::Ready(true);
}
Poll::Pending
}
}
pub async fn poll(d: Duration) -> Result<bool, ()> {
let polled = Polled::new();
match timeout(d, polled).await {
Ok(b) => Ok(b),
Err(_) => Err(()),
}
}
It technically works but obivously the program started using 100% CPU all the time since the executor always try to poll the future in case there's something new. Thus I wanted to add some asynchronous equivalent to sleep, that would delay the next time the executor tries to poll the Future, so I tried adding the following (right before returning Poll::Pending), which obviously did not work since sleep_future::poll() just returns Pending :
let mut sleep_future = sleep(Duration::from_millis(50));
tokio::pin!(sleep_future);
sleep_future.poll(cx);
cx.waker().wake_by_ref();
The fact that poll is not async forbids the use of async functions, and I'm starting to wonder if what I want to do is actually feasible, or if I'm not diving in my first problem the wrong way.
Is finding a way to do some async sleep the good way to go ?
If not, what is it ? Am I missing something in the asynchronous paradigm ?
Or is it just sometimes impossible to wrap some synchronous logic into a Future if the crate does not give you the necessary tools to do so ?
Thanks in advance anyway !
EDIT : I found a way to do what I want using an async block :
pub async fn poll(d: Duration) -> Result<bool, ()> {
let mdr = async {
loop {
let a = crossterm_poll(Duration::from_secs(0));
if a.is_ok() && a.unwrap() {
break;
}
sleep(Duration::from_millis(50)).await;
}
true
};
match timeout(d, mdr).await {
Ok(b) => Ok(b),
Err(_) => Err(()),
}
}
Is it the idiomatic way to do so ? Or did I miss something more elegant ?

Yes, using an async block is a good way to compose futures, like your custom poller and tokio's sleep.
However, if you did want to write your own Future which also invokes tokio's sleep, here's what you would need to do differently:
Don't call wake_by_ref() immediately — the sleep future will take care of that when its time comes, and that's how you avoid spinning (using 100% CPU).
You must construct the sleep() future once when you intend to sleep (not every time you're polled), then store it in your future (this will require pin-projection) and poll the same future again the next time you're polled. That's how you ensure you wait the intended amount of time and not shorter.
Async blocks are usually a much easier way to get the same result.

Spawn non-static future with Tokio

I have an async method that should execute some futures in parallel, and only return after all futures finished. However, it is passed some data by reference that does not live as long as 'static (it will be dropped at some point in the main method). Conceptually, it's similar to this (Playground):
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in array {
let task = spawn(do_sth(i));
tasks.push(task);
}
for task in tasks {
task.await;
}
}
#[tokio::main]
async fn main() {
parallel_stuff(&[3, 1, 4, 2]);
}
Now, tokio wants futures that are passed to spawn to be valid for the 'static lifetime, because I could drop the handle without the future stopping. That means that my example above produces this error message:
error[E0759]: `array` has an anonymous lifetime `'_` but it needs to satisfy a `'static` lifetime requirement
--> src/main.rs:12:25
|
12 | async fn parallel_stuff(array: &[u64]) {
| ^^^^^ ------ this data with an anonymous lifetime `'_`...
| |
| ...is captured here...
...
15 | let task = spawn(do_sth(i));
| ----- ...and is required to live as long as `'static` here
So my question is: How do I spawn futures that are only valid for the current context that I can then wait until all of them completed?

It is not possible to spawn a non-'static future from async Rust. This is because any async function might be cancelled at any time, so there is no way to guarantee that the caller really outlives the spawned tasks.
It is true that there are various crates that allow scoped spawns of async tasks, but these crates cannot be used from async code. What they do allow is to spawn scoped async tasks from non-async code. This doesn't violate the problem above, because the non-async code that spawned them cannot be cancelled at any time, as it is not async.
Generally there are two approaches to this:
Spawn a 'static task by using Arc rather than ordinary references.
Use the concurrency primitives from the futures crate instead of spawning.
Generally to spawn a static task and use Arc, you must have ownership of the values in question. This means that since your function took the argument by reference, you cannot use this technique without cloning the data.
async fn do_sth(with: Arc<[u64]>, idx: usize) {
delay_for(Duration::new(with[idx], 0)).await;
println!("{}", with[idx]);
}
async fn parallel_stuff(array: &[u64]) {
// Make a clone of the data so we can shared it across tasks.
let shared: Arc<[u64]> = Arc::from(array);
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in 0..array.len() {
// Cloning an Arc does not clone the data.
let shared_clone = shared.clone();
let task = spawn(do_sth(shared_clone, i));
tasks.push(task);
}
for task in tasks {
task.await;
}
}
Note that if you have a mutable reference to the data, and the data is Sized, i.e. not a slice, it is possible to temporarily take ownership of it.
async fn do_sth(with: Arc<Vec<u64>>, idx: usize) {
delay_for(Duration::new(with[idx], 0)).await;
println!("{}", with[idx]);
}
async fn parallel_stuff(array: &mut Vec<u64>) {
// Swap the array with an empty one to temporarily take ownership.
let vec = std::mem::take(array);
let shared = Arc::new(vec);
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in 0..array.len() {
// Cloning an Arc does not clone the data.
let shared_clone = shared.clone();
let task = spawn(do_sth(shared_clone, i));
tasks.push(task);
}
for task in tasks {
task.await;
}
// Put back the vector where we took it from.
// This works because there is only one Arc left.
*array = Arc::try_unwrap(shared).unwrap();
}
Another option is to use the concurrency primitives from the futures crate. These have the advantage of working with non-'static data, but the disadvantage that the tasks will not be able to run on multiple threads at the same time.
For many workflows this is perfectly fine, as async code should spend most of its time waiting for IO anyway.
One approach is to use FuturesUnordered. This is a special collection that can store many different futures, and it has a next function that runs all of them concurrently, and returns once the first of them finished. (The next function is only available when StreamExt is imported)
You can use it like this:
use futures::stream::{FuturesUnordered, StreamExt};
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
let mut tasks = FuturesUnordered::new();
for i in array {
let task = do_sth(i);
tasks.push(task);
}
// This loop runs everything concurrently, and waits until they have
// all finished.
while let Some(()) = tasks.next().await { }
}
Note: The FuturesUnordered must be defined after the shared value. Otherwise you will get a borrow error that is caused by them being dropped in the wrong order.
Another approach is to use a Stream. With streams, you can use buffer_unordered. This is a utility that uses FuturesUnordered internally.
use futures::stream::StreamExt;
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
// Create a stream going through the array.
futures::stream::iter(array)
// For each item in the stream, create a future.
.map(|i| do_sth(i))
// Run at most 10 of the futures concurrently.
.buffer_unordered(10)
// Since Streams are lazy, we must use for_each or collect to run them.
// Here we use for_each and do nothing with the return value from do_sth.
.for_each(|()| async {})
.await;
}
Note that in both cases, importing StreamExt is important as it provides various methods that are not available on streams without importing the extension trait.

In case of code that uses threads for parallelism, it is possible to avoid copying by extending a lifetime with transmute. An example:
fn main() {
let now = std::time::Instant::now();
let string = format!("{now:?}");
println!(
"{now:?} has length {}",
parallel_len(&[&string, &string]) / 2
);
}
fn parallel_len(input: &[&str]) -> usize {
// SAFETY: this variable needs to be static, because it is passed into a thread,
// but the thread does not live longer than this function, because we wait for
// it to finish by calling `join` on it.
let input: &[&'static str] = unsafe { std::mem::transmute(input) };
let mut threads = vec![];
for txt in input {
threads.push(std::thread::spawn(|| txt.len()));
}
threads.into_iter().map(|t| t.join().unwrap()).sum()
}
It seems reasonable that this should also work for asynchronous code, but I do not know enough about that to say for sure.

How do you write test assertions inside of tokio::run futures?

How do you test your futures which are meant to be run in the Tokio runtime?
fn fut_dns() -> impl Future<Item = (), Error = ()> {
let f = dns::lookup("www.google.de", "127.0.0.1:53");
f.then(|result| match result {
Ok(smtptls) => {
println!("{:?}", smtptls);
assert_eq!(smtptls.version, "TLSRPTv1");
assert!(smtptls.rua.len() > 0);
assert_eq!(smtptls.rua[0], "mailto://...");
ok(())
}
Err(e) => {
println!("error: {:?}", e);
err(())
}
})
}
#[test]
fn smtp_log_test() {
tokio::run(fut_dns());
assert!(true);
}
The future runs and the thread of the future panics on an assert. You can read the panic in the console, but the test doesn't recognize the threads of tokio::run.
The How can I test a future that is bound to a tokio TcpStream? doesn't answer this, because it simply says: A simple way to test async code may be to use a dedicated runtime for each test
I do this!
My question is related to how the test can detect if the future works. The future needs a started runtime environment.
The test is successful although the future asserts or calls err().
So what can I do?

Do not write your assertions inside the future.
As described in How can I test a future that is bound to a tokio TcpStream?, create a Runtime to execute your future. As described in How do I synchronously return a value calculated in an asynchronous Future in stable Rust?, compute your value and then exit the async world:
fn run_one<F>(f: F) -> Result<F::Item, F::Error>
where
F: IntoFuture,
F::Future: Send + 'static,
F::Item: Send + 'static,
F::Error: Send + 'static,
{
let mut runtime = tokio::runtime::Runtime::new().expect("Unable to create a runtime");
runtime.block_on(f.into_future())
}
#[test]
fn smtp_log_test() {
let smtptls = run_one(dns::lookup("www.google.de", "127.0.0.1:53")).unwrap();
assert_eq!(smtptls.version, "TLSRPTv1");
assert!(smtptls.rua.len() > 0);
assert_eq!(smtptls.rua[0], "mailto://...");
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to race collection of futures in Rust and with retry - rust

Related

Call a random function with variable arguments dynamically

How can I execute an action after each end of thread?

Sleep in Future::poll

Spawn non-static future with Tokio

How do you write test assertions inside of tokio::run futures?

Categories

Resources