Rust: combine select! with FuturesUnordered.buffered(N)

Rust: combine select! with FuturesUnordered.buffered(N) - rust

I am using FuturesUnordered to enqueue async workloads onto a multi-threaded tokio runner. These futures return various different kinds of results. To differentiate them, I map each future's result to a custom Event type.
enum Event {
ResultTypeA {...},
ResultTypeB {...},
ResultTypeC {...},
ResultTypeD {...}
}
let pending_futures: FuturesUnordered<Pin<Box<dyn Future<Output = Event> + Send>>> = FuturesUnordered::default()
loop {
tokio::select! {
Some(future) = workload_receiver.recv() => {
pending_futures.push(future.boxed());
},
Some(event) = pending_futures.next() => process_event(event),
else => break,
}
}
Above code works well, however, I would like to limit the number of pending_futures to process in parallel. This is where buffered_unordered comes in. My naive approach was:
loop {
tokio::select! {
Some(future) = workload_receiver.recv() => {
pending_futures.push(future.boxed());
},
Some(event) = pending_futures.buffered(10).next() => process_event(event),
else => break,
}
}
This throws the following compilation error:
--> src/main.rs
|
257 | Some(event) = pending_futures.buffered(10).next() => process_event(event),
| ^^^^^^^^^^^^ `Event` is not a future
|
= help: the trait `futures::Future` is not implemented for `Event`
= note: Event must be a future or must implement `IntoFuture` to be awaited
note: required by a bound in `buffered`
--> futures-util-0.3.24/src/stream/stream/mod.rs:1359:21
|
1359 | Self::Item: Future,
| ^^^^^^ required by this bound in `buffered`
How can I limit FuturesUnordered to only ever process N futures of its underlying queue at the same time but still allow dynamically enqueuing new futures?

You do not want to use FuturesUnordered if you want to limit the concurrency, it will run all of the contained tasks, all the time. Using .buffered() won't help with that since the Stream that it implements is for the results of the tasks, after they've completed.
If your workload_receiver is a tokio::sync::mpsc::Receiver, then you're in luck! You can convert it directly into a Stream via ReceiverStream from the tokio-stream crate (wrappers for other things exist as well). This will work perfectly with .buffered() or .buffered_unordered() since the items you appear to be receiving are Futures.

Related

Waiting on multiple futures borrowing mutable self

Each of the following methods need (&mut self) to operate. The following code gives the error.
cannot borrow *self as mutable more than once at a time
How can I achieve this correctly?
loop {
let future1 = self.handle_new_connections(sender_to_connector.clone());
let future2 = self.handle_incoming_message(&mut receiver_from_peers);
let future3 = self.handle_outgoing_message();
tokio::pin!(future1, future2, future3);
tokio::select! {
_=future1=>{},
_=future2=>{},
_=future3=>{}
}
}

You are not allowed to have multiple mutable references to an object and there's a good reason for that.
Imagine you pass an object mutably to 2 different functions and they edited the object out of sync since you don't have any mechanism for that in place. then you'd end up with something called a race condition.
To prevent this bug rust allows only one mutable reference to an object at a time but you can have multiple immutable references and often you see people use internal mutability patterns.
In your case, you want data not to be able to be modified by 2 different threads at the same time so you'd wrap it in a Lock or RwLock then since you want multiple threads to be able to own this value you'd wrap that in an Arc.
here you can read about interior mutability in more detail.
Alternatively, while declaring the type of your function you could add proper lifetimes to indicate the resulting Future will be waited on in the same context by giving it a lifetime since your code waits for the future before the next iteration that would do the trick as well.

I encountered the same problem when dealing with async code. Here is what I figured out:
Let's say you have an Engine, that contains both incoming and outgoing:
struct Engine {
log: Arc<Mutex<Vec<String>>>,
outgoing: UnboundedSender<String>,
incoming: UnboundedReceiver<String>,
}
Our goal is to create two functions process_incoming and process_logic and then poll them simultaneously without messing up with the borrow checker in Rust.
What is important here is that:
You cannot pass &mut self to these async functions simultaneously.
Either incoming or outgoing will be only held by one function at most.
The data access by both process_incoming and process_logic need to be wrapped by a lock.
Any trying to lock Engine directly will lead to a deadlock at runtime.
So that leaves us giving up using the method in favor of the associated function:
impl Engine {
// ...
async fn process_logic(outgoing: &mut UnboundedSender<String>, log: Arc<Mutex<Vec<String>>>) {
loop {
Delay::new(Duration::from_millis(1000)).await.unwrap();
let msg: String = "ping".into();
println!("outgoing: {}", msg);
log.lock().push(msg.clone());
outgoing.send(msg).await.unwrap();
}
}
async fn process_incoming(
incoming: &mut UnboundedReceiver<String>,
log: Arc<Mutex<Vec<String>>>,
) {
while let Some(msg) = incoming.next().await {
println!("incoming: {}", msg);
log.lock().push(msg);
}
}
}
And we can then write main as:
fn main() {
futures::executor::block_on(async {
let mut engine = Engine::new();
let a = Engine::process_incoming(&mut engine.incoming, engine.log.clone()).fuse();
let b = Engine::process_logic(&mut engine.outgoing, engine.log).fuse();
futures::pin_mut!(a, b);
select! {
_ = a => {},
_ = b => {},
}
});
}
I put the whole example here.
It's a workable solution, only be aware that you should add futures and futures-timer in your dependencies.

Rust "future cannot be sent between threads safely"

I'm invoking an async implemented method:
let mut safebrowsing: MutexGuard<Safebrowsing> = self.safebrowsing.lock().unwrap();
safebrowsing.is_safe(input: &message.content).await;
The is_safe-Method:
pub async fn is_safe(&mut self, input: &str) {
let links = self.finder.links(input);
for link in links {
match reqwest::get("url").await {
Ok(response) => {
println!(
"{}",
response.text().await.expect("response has no content")
);
}
Err(_) => {
println!("Unable to get safebrowsing-response")
}
}
}
}
But unfortunately by invoking the is_safe-Method asynchronously, the compiler tells me that threads cannot be sent safely. The error is about:
future cannot be sent between threads safely
within `impl std::future::Future`, the trait `std::marker::Send` is not implemented for `std::sync::MutexGuard<'_, Safebrowsing>`
required for the cast to the object type `dyn std::future::Future<Output = ()> + std::marker::Send`
handler.rs(31, 9): future is not `Send` as this value is used across an await
^-- safebrowsing.is_safe(input: &message.content).await;
---
future cannot be sent between threads safely
the trait `std::marker::Send` is not implemented for `(dyn for<'r> Fn(&'r [u8]) -> Option<usize> + 'static)`
required for the cast to the object type `dyn std::future::Future<Output = ()> + std::marker::Send`
safebrowsing.rs(22, 19): future is not `Send` as this value is used across an await
^-- match reqwest::get("url").await
I already tried to implement the Send-Trait to my Safebrowsing-Struct, but that does also not work.
Is there something I need to do to get it working? Because I have no clue why that is appearing

The key of this error is that MutexGuard<T> is not Send. This means that you are trying to do an await while the mutex is locked, and that is usually a bad idea, if you think about it: await may wait, in principle, indefinitely, but by waiting so long with the mutex held, any other thread that tries to lock the mutex will block, also indefinitely (unless you set a timeout, of course).
So, as a rule of thumb, you should never sleep with a mutex locked. For example your code could be rewritten as (totally untested):
pub async fn is_safe(this: &Mutex<Safebrowsing>, input: &str) {
//lock, find, unlock
let links = this.lock().unwrap().finder.links(input);
//now we can await safely
for link in links {
match reqwest::get("url").await {
Ok(response) => {
println!(
"{}",
response.text().await.expect("response has no content")
);
}
Err(_) => {
println!("Unable to get safebrowsing-response")
}
}
}
}
If you need to lock the Mutex later in the function, beware of the races! It may have been modified by other thread, maybe that input is no longer a thing.

Use Mutex implementation from the async runtime you're using.
Before 😭
Using mutex from standard library:
use std::sync::Mutex; // stdlib
let m = Mutex::new(...);
let v = m.lock().unwrap();
After 😁
Using mutex from tokio:
use tokio::sync::Mutex; // tokio async runtime
let m = Mutex::new(...); // the same!
let v = m.lock().await;
But why?
Roughly speaking, the native mutex forces the lock to be kept in the same thread,
but async runtime does not understand it.
If your lock does not cross with an async, then you can use mutex
from stdlib (it can be faster).
See the discussion from tokio documentation.

Spawn non-static future with Tokio

I have an async method that should execute some futures in parallel, and only return after all futures finished. However, it is passed some data by reference that does not live as long as 'static (it will be dropped at some point in the main method). Conceptually, it's similar to this (Playground):
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in array {
let task = spawn(do_sth(i));
tasks.push(task);
}
for task in tasks {
task.await;
}
}
#[tokio::main]
async fn main() {
parallel_stuff(&[3, 1, 4, 2]);
}
Now, tokio wants futures that are passed to spawn to be valid for the 'static lifetime, because I could drop the handle without the future stopping. That means that my example above produces this error message:
error[E0759]: `array` has an anonymous lifetime `'_` but it needs to satisfy a `'static` lifetime requirement
--> src/main.rs:12:25
|
12 | async fn parallel_stuff(array: &[u64]) {
| ^^^^^ ------ this data with an anonymous lifetime `'_`...
| |
| ...is captured here...
...
15 | let task = spawn(do_sth(i));
| ----- ...and is required to live as long as `'static` here
So my question is: How do I spawn futures that are only valid for the current context that I can then wait until all of them completed?

It is not possible to spawn a non-'static future from async Rust. This is because any async function might be cancelled at any time, so there is no way to guarantee that the caller really outlives the spawned tasks.
It is true that there are various crates that allow scoped spawns of async tasks, but these crates cannot be used from async code. What they do allow is to spawn scoped async tasks from non-async code. This doesn't violate the problem above, because the non-async code that spawned them cannot be cancelled at any time, as it is not async.
Generally there are two approaches to this:
Spawn a 'static task by using Arc rather than ordinary references.
Use the concurrency primitives from the futures crate instead of spawning.
Generally to spawn a static task and use Arc, you must have ownership of the values in question. This means that since your function took the argument by reference, you cannot use this technique without cloning the data.
async fn do_sth(with: Arc<[u64]>, idx: usize) {
delay_for(Duration::new(with[idx], 0)).await;
println!("{}", with[idx]);
}
async fn parallel_stuff(array: &[u64]) {
// Make a clone of the data so we can shared it across tasks.
let shared: Arc<[u64]> = Arc::from(array);
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in 0..array.len() {
// Cloning an Arc does not clone the data.
let shared_clone = shared.clone();
let task = spawn(do_sth(shared_clone, i));
tasks.push(task);
}
for task in tasks {
task.await;
}
}
Note that if you have a mutable reference to the data, and the data is Sized, i.e. not a slice, it is possible to temporarily take ownership of it.
async fn do_sth(with: Arc<Vec<u64>>, idx: usize) {
delay_for(Duration::new(with[idx], 0)).await;
println!("{}", with[idx]);
}
async fn parallel_stuff(array: &mut Vec<u64>) {
// Swap the array with an empty one to temporarily take ownership.
let vec = std::mem::take(array);
let shared = Arc::new(vec);
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in 0..array.len() {
// Cloning an Arc does not clone the data.
let shared_clone = shared.clone();
let task = spawn(do_sth(shared_clone, i));
tasks.push(task);
}
for task in tasks {
task.await;
}
// Put back the vector where we took it from.
// This works because there is only one Arc left.
*array = Arc::try_unwrap(shared).unwrap();
}
Another option is to use the concurrency primitives from the futures crate. These have the advantage of working with non-'static data, but the disadvantage that the tasks will not be able to run on multiple threads at the same time.
For many workflows this is perfectly fine, as async code should spend most of its time waiting for IO anyway.
One approach is to use FuturesUnordered. This is a special collection that can store many different futures, and it has a next function that runs all of them concurrently, and returns once the first of them finished. (The next function is only available when StreamExt is imported)
You can use it like this:
use futures::stream::{FuturesUnordered, StreamExt};
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
let mut tasks = FuturesUnordered::new();
for i in array {
let task = do_sth(i);
tasks.push(task);
}
// This loop runs everything concurrently, and waits until they have
// all finished.
while let Some(()) = tasks.next().await { }
}
Note: The FuturesUnordered must be defined after the shared value. Otherwise you will get a borrow error that is caused by them being dropped in the wrong order.
Another approach is to use a Stream. With streams, you can use buffer_unordered. This is a utility that uses FuturesUnordered internally.
use futures::stream::StreamExt;
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
// Create a stream going through the array.
futures::stream::iter(array)
// For each item in the stream, create a future.
.map(|i| do_sth(i))
// Run at most 10 of the futures concurrently.
.buffer_unordered(10)
// Since Streams are lazy, we must use for_each or collect to run them.
// Here we use for_each and do nothing with the return value from do_sth.
.for_each(|()| async {})
.await;
}
Note that in both cases, importing StreamExt is important as it provides various methods that are not available on streams without importing the extension trait.

In case of code that uses threads for parallelism, it is possible to avoid copying by extending a lifetime with transmute. An example:
fn main() {
let now = std::time::Instant::now();
let string = format!("{now:?}");
println!(
"{now:?} has length {}",
parallel_len(&[&string, &string]) / 2
);
}
fn parallel_len(input: &[&str]) -> usize {
// SAFETY: this variable needs to be static, because it is passed into a thread,
// but the thread does not live longer than this function, because we wait for
// it to finish by calling `join` on it.
let input: &[&'static str] = unsafe { std::mem::transmute(input) };
let mut threads = vec![];
for txt in input {
threads.push(std::thread::spawn(|| txt.len()));
}
threads.into_iter().map(|t| t.join().unwrap()).sum()
}
It seems reasonable that this should also work for asynchronous code, but I do not know enough about that to say for sure.

Await for future again after tokio::time::timeout

Background:
I have a process using tokio::process to spawn child processes with handles in the tokio runtime.
It is also responsible for freeing the resources after killing a child and, according to the documentation (std::process::Child, tokio::process::Child), this requires the parent to wait() (or await in tokio) for the process.
Not all process behave the same to a SIGINT or a SIGTERM, so I wanted to give the child some time to die, before I send a SIGKILL.
Desired solution:
pub async fn kill(self) {
// Close input
std::mem::drop(self.stdin);
// Send gracefull signal
let pid = nix::unistd::Pid::from_raw(self.process.id() as nix::libc::pid_t);
nix::sys::signal::kill(pid, nix::sys::signal::SIGINT);
// Give the process time to die gracefully
if let Err(_) = tokio::time::timeout(std::time::Duration::from_secs(2), self.process).await
{
// Kill forcefully
nix::sys::signal::kill(pid, nix::sys::signal::SIGKILL);
self.process.await;
}
}
However this error is given:
error[E0382]: use of moved value: `self.process`
--> src/bin/multi/process.rs:46:13
|
42 | if let Err(_) = tokio::time::timeout(std::time::Duration::from_secs(2), self.process).await
| ------------ value moved here
...
46 | self.process.await;
| ^^^^^^^^^^^^ value used here after move
|
= note: move occurs because `self.process` has type `tokio::process::Child`, which does not implement the `Copy` trait
And if I obey and remove the self.process.await, I see the child process still taking resources in ps.
Question:
How can I await for an amount of time and perform actions and await again if the amount of time expired?
Note:
I solved my immediate problem by setting a tokio timer that always sends the SIGKILL after two seconds, and having a single self.process.await at the bottom. But this solution is not desirable since another process may spawn in the same PID while the timer is running.
Edit:
Adding a minimal, reproducible example (playground)
async fn delay() {
for _ in 0..6 {
tokio::time::delay_for(std::time::Duration::from_millis(500)).await;
println!("Ping!");
}
}
async fn runner() {
let delayer = delay();
if let Err(_) = tokio::time::timeout(std::time::Duration::from_secs(2), delayer).await {
println!("Taking more than two seconds");
delayer.await;
}
}

You needed to pass a mutable reference. However, you first need to pin the future in order for its mutable reference to implement Future. pin_mut re-exported from the futures crate is a good helper around this:
use futures::pin_mut;
async fn delay() {
for _ in 0..6 {
tokio::time::delay_for(std::time::Duration::from_millis(500)).await;
println!("Ping!");
}
}
async fn runner() {
let delayer = delay();
pin_mut!(delayer);
if let Err(_) = tokio::time::timeout(std::time::Duration::from_secs(2), &mut delayer).await {
println!("Taking more than two seconds");
delayer.await;
}
}

How do I solve "cannot return value referencing local data" when using threads and async/await?

I am learning Rust especially multithreading and async requests in parallel.
I read the documentation and still I do not understand where I made a mistake. I assume I know where, but do not see how to resolve it.
main.rs
use std::thread;
struct Request {
url: String,
}
impl Request {
fn new(name: &str) -> Request {
Request {
url: name.to_string(),
}
}
async fn call(&self, x: &str) -> Result<(), Box<dyn std::error::Error>> {
let resp = reqwest::get(x).await;
Ok(())
}
}
#[tokio::main]
async fn main() {
let requests = vec![
Request::new("https://www.google.com/"),
Request::new("https://www.google.com/"),
];
let handles: Vec<_> = requests
.into_iter()
.map(|request| {
thread::spawn(move || async {
request.call(&request.url).await;
})
})
.collect();
for y in handles {
println!("{:?}", y);
}
}
error[E0515]: cannot return value referencing local data `request`
--> src/main.rs:29:35
|
29 | thread::spawn(move || async {
| ___________________________________^
30 | | request.call(&request.url).await;
| | ------- `request` is borrowed here
31 | | })
| |_____________^ returns a value referencing data owned by the current function
Cargo.toml
[dependencies]
reqwest = "0.10.4"
tokio = { version = "0.2", features = ["full"] }

Like closures, async blocks capture their variables as weakly as possible. In order of preference:
immutable reference
mutable reference
by value
This is determined by how the variable is used in the closure / async block. In your example, request is only used by reference, so it is only captured by reference:
async {
request.call(&request.url).await;
}
However, you need to transfer ownership of the variable to the async block so that the variable is still alive when the future is eventually executed. Like closures, this is done via the move keyword:
thread::spawn(move || async move {
request.call(&request.url).await;
})
See also:
What is the difference between `|_| async move {}` and `async move |_| {}`
Is there a way to have a Rust closure that moves only some variables into it?
It is very unlikely that you want to mix threads and async at this point in your understanding. One is inherently blocking and the other expects code to not block. You should follow the example outlined in How can I perform parallel asynchronous HTTP GET requests with reqwest? instead.
See also:
What is the best approach to encapsulate blocking I/O in future-rs?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Rust: combine select! with FuturesUnordered.buffered(N) - rust

Related

Waiting on multiple futures borrowing mutable self

Rust "future cannot be sent between threads safely"

Spawn non-static future with Tokio

Await for future again after tokio::time::timeout

How do I solve "cannot return value referencing local data" when using threads and async/await?

Categories

Resources