Waiting on multiple futures borrowing mutable self

Waiting on multiple futures borrowing mutable self - rust

Each of the following methods need (&mut self) to operate. The following code gives the error.
cannot borrow *self as mutable more than once at a time
How can I achieve this correctly?
loop {
let future1 = self.handle_new_connections(sender_to_connector.clone());
let future2 = self.handle_incoming_message(&mut receiver_from_peers);
let future3 = self.handle_outgoing_message();
tokio::pin!(future1, future2, future3);
tokio::select! {
_=future1=>{},
_=future2=>{},
_=future3=>{}
}
}

You are not allowed to have multiple mutable references to an object and there's a good reason for that.
Imagine you pass an object mutably to 2 different functions and they edited the object out of sync since you don't have any mechanism for that in place. then you'd end up with something called a race condition.
To prevent this bug rust allows only one mutable reference to an object at a time but you can have multiple immutable references and often you see people use internal mutability patterns.
In your case, you want data not to be able to be modified by 2 different threads at the same time so you'd wrap it in a Lock or RwLock then since you want multiple threads to be able to own this value you'd wrap that in an Arc.
here you can read about interior mutability in more detail.
Alternatively, while declaring the type of your function you could add proper lifetimes to indicate the resulting Future will be waited on in the same context by giving it a lifetime since your code waits for the future before the next iteration that would do the trick as well.

I encountered the same problem when dealing with async code. Here is what I figured out:
Let's say you have an Engine, that contains both incoming and outgoing:
struct Engine {
log: Arc<Mutex<Vec<String>>>,
outgoing: UnboundedSender<String>,
incoming: UnboundedReceiver<String>,
}
Our goal is to create two functions process_incoming and process_logic and then poll them simultaneously without messing up with the borrow checker in Rust.
What is important here is that:
You cannot pass &mut self to these async functions simultaneously.
Either incoming or outgoing will be only held by one function at most.
The data access by both process_incoming and process_logic need to be wrapped by a lock.
Any trying to lock Engine directly will lead to a deadlock at runtime.
So that leaves us giving up using the method in favor of the associated function:
impl Engine {
// ...
async fn process_logic(outgoing: &mut UnboundedSender<String>, log: Arc<Mutex<Vec<String>>>) {
loop {
Delay::new(Duration::from_millis(1000)).await.unwrap();
let msg: String = "ping".into();
println!("outgoing: {}", msg);
log.lock().push(msg.clone());
outgoing.send(msg).await.unwrap();
}
}
async fn process_incoming(
incoming: &mut UnboundedReceiver<String>,
log: Arc<Mutex<Vec<String>>>,
) {
while let Some(msg) = incoming.next().await {
println!("incoming: {}", msg);
log.lock().push(msg);
}
}
}
And we can then write main as:
fn main() {
futures::executor::block_on(async {
let mut engine = Engine::new();
let a = Engine::process_incoming(&mut engine.incoming, engine.log.clone()).fuse();
let b = Engine::process_logic(&mut engine.outgoing, engine.log).fuse();
futures::pin_mut!(a, b);
select! {
_ = a => {},
_ = b => {},
}
});
}
I put the whole example here.
It's a workable solution, only be aware that you should add futures and futures-timer in your dependencies.

Related

Rust - How to pass function parameters to closure

I'm trying to write a function that takes two parameters. The function starts two threads and uses one of the parameters inside one of the thread closures. This doesn't work because of the error "Borrowed data escapes outside of closure". Here's the code.
pub fn measure_stats(testdatapath: &PathBuf, filenameprefix: &String) {
let (tx, rx) = mpsc::channel();
let filename = format!("test.txt")
let measure_thread = thread::spawn(move || {
let stats = sar();
fs::write(filename, stats).expect("failed to write output to file");
// Send a signal that we're done.
let _ = tx.send(());
});
thread::spawn(move || {
let mut n = 0;
loop {
// Break if the measure thread is done.
match rx.try_recv() {
Ok(_) | Err(TryRecvError::Disconnected) => break,
Err(TryRecvError::Empty) => {}
}
let filename = format!("{:04}.img", n);
let filepath = Path::new(testdatapath).join(&filename);
random_file_write(&filepath).unwrap();
random_file_read(&filepath).unwrap();
fs::remove_file(&filepath).expect("failed to remove file");
n += 1;
}
});
measure_thread.join().expect("joining measure thread panicked");
}
The problem is that testdatapath escapes the function body. I think this is a problem because the lifetime of testdatapath is only guaranteed until the end of the closure, but it needs to be the lifetime of the entire program. But it's a little confusing to me.
I've tried cloning the variable, but that didn't help. I'm not sure how I'm supposed to do this. How do I use a function parameter inside the closure or accomplish the same goal some other more canonical way?

If it's okay for the function not to return until both threads complete, then use std::thread::scope() to create scoped threads instead of std::thread::spawn(). Scoped threads allow borrowing data whereas regular spawning cannot, but require the threads to all terminate before the scope ends and the function that created them returns.
If this has to be a “background” task, then you need to make sure that all the data used by each thread is owned, i.e. not a reference. In this case, that means you should change the parameters to be owned:
pub fn measure_stats(testdatapath: PathBuf, filenameprefix: String) {
Then, those values will be moved into the receiving thread, without any lifetime constraints.

You're trying to make testdata live longer than the function, since this is a value you're borrowing and since you can't guarantee that the original PathBuff will outlive closure running in the new thread the compiler is warning you that you're assuming that this would be the case, but not taking any precautions to do so.
The 3 simpler choices:
Move the PathBuff to the function instead of borrowing it (remove the &).
Use an Arc
clone it and move the clone into the thread.

Is there a way of spawning new threads with copies of existing data that have a non-static lifetime?

I have a problem that is similar to what is discussed in Is there a succinct way to spawn new threads with copies of existing data?. Unlike the linked question, I am trying to move an object with an associated lifetime into the new thread.
Intuitively, what I am trying to do is to copy everything that is necessary to continue the computation to the new thread and exit the old one. However, when trying to move cloned data (with a lifetime) to the new thread, I get the following error:
error[E0759]: data has lifetime 'a but it needs to satisfy a 'static lifetime requirement
I created a reproducible example based on the referenced question here. This is just to exemplify the problem. Here, the lifetimes could be removed easily but in my actual use-case the data I want to move to the thread is much more complex.
Is there an easy way of making this work with Rust?

A qualified answer to the question in the title is "yes", but we can't do it by copying non-static references. The reasons for this seeming limitation are sound. The way we can get the required data/objects into the thread closures is by passing ownership of them (or copies of them, or other concrete objects that represent them) to the closures.
It may not be immediately clear on how to do this with a complex library like pyo3 since much of the API returns reference types to objects rather than concrete objects that can be passed as-is to other threads, but the library does provide ways to pass Python data/objects to other threads, which I'll cover in the second example below.
The start() function will need to put a 'static bound on the closure type associated with its data parameter because within its body, start() is passing these closures on to other threads. The compiler is working to guarantee that the closures aren't holding on to references to anything that may evaporate if a thread runs longer than its parent, which is why it gripes without the 'static guarantee.
fn start<'a>(data : Vec<Arc<dyn Fn() -> f64 + Send + Sync + 'static>>,
more_data : String)
{
for _ in 1..=4 {
let cloned_data = data.clone();
let cloned_more_data = more_data.clone();
thread::spawn(move || foo(cloned_data, cloned_more_data));
}
}
A 'static bound is different than a 'static lifetime applied to a reference (data: 'static vs. &'static data). In the case of a bound, it only means the type it's applied to doesn't contain any non-static references (if it even holds any references at all). It's pretty common to see this bound applied to method parameters in threaded code.
As this applies specifically to the pyo3 problem space, we can avoid forming closures that contain non-static references by converting any such references to owned objects, then when the callback, running in another thread, needs to do something with them, it can acquire the GIL and cast them back to Python object references.
More about this in the code comments below. I took a simple example from the pyo3 GitHub README and combined it with the code provided in the playground example.
Something to watch out for when applying this pattern is deadlock. The threads will need to acquire the GIL in order to use the Python objects they have access to. In the example, once the parent thread is done spawning new threads, it releases the GIL when it goes out of scope. The parent then waits for the child threads to complete by joining their handles.
use std::thread;
use std::thread::JoinHandle;
use std::sync::Arc;
use pyo3::prelude::*;
use pyo3::types::IntoPyDict;
use pyo3::types::PyDict;
type MyClosure<'a> = dyn Fn() -> f64 + Send + Sync + 'a;
fn main() -> Result<(), ()>
{
match Python::with_gil(|py| main_(py)
.map_err(|e| e.print_and_set_sys_last_vars(py)))
{
Ok(handles) => {
for handle in handles {
handle.join().unwrap();
}},
Err(e) => { println!("{:?}", e); },
}
Ok(())
}
fn main_(py: Python) -> PyResult<Vec<JoinHandle<()>>>
{
let sys = py.import("sys")?;
let version = sys.get("version")?.extract::<String>()?;
let locals = [("os", py.import("os")?)].into_py_dict(py);
let code = "os.getenv('USER') or os.getenv('USERNAME') or 'Unknown'";
let user = py.eval(code, None, Some(&locals))?.extract::<String>()?;
println!("Hello {}, I'm Python {}", user, version);
// The thread will do something with the `locals` dictionary. In order to
// pass this reference object to the thread, first convert it to a
// non-reference object.
// Convert `locals` to `PyObject`.
let locals_obj = locals.to_object(py);
// Now we can move `locals_obj` into the thread without concern.
let closure: Arc<MyClosure<'_>> = Arc::new(move || {
// We can print out the PyObject which reveals it to be a tuple
// containing a pointer value.
println!("{:?}", locals_obj);
// If we want to do anything with the `locals` object, we can cast it
// back to a `PyDict` reference. We'll need to acquire the GIL first.
Python::with_gil(|py| {
// We have the GIL, cast the dict back to a PyDict reference.
let py_dict = locals_obj.cast_as::<PyDict>(py).unwrap();
// Printing it out reveals it to be a dictionary with the key `os`.
println!("{:?}", py_dict);
});
1.
});
let data = vec![closure];
let more = "Important data.".to_string();
let handles = start(data, more);
Ok(handles)
}
fn start<'a>(data : Vec<Arc<MyClosure<'static>>>,
more : String
) -> Vec<JoinHandle<()>>
{
let mut handles = vec![];
for _ in 1..=4 {
let cloned_data = data.clone();
let cloned_more = more.clone();
let h = thread::spawn(move || foo(cloned_data, cloned_more));
handles.push(h);
}
handles
}
fn foo<'a>(data : Vec<Arc<MyClosure<'a>>>,
more : String)
{
for closure in data {
closure();
}
}
Output:
Hello todd, I'm Python 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0]
Py(0x7f3329ccdd40)
Py(0x7f3329ccdd40)
Py(0x7f3329ccdd40)
{'os': <module 'os' from '/usr/lib/python3.8/os.py'>}
{'os': <module 'os' from '/usr/lib/python3.8/os.py'>}
{'os': <module 'os' from '/usr/lib/python3.8/os.py'>}
Py(0x7f3329ccdd40)
{'os': <module 'os' from '/usr/lib/python3.8/os.py'>}
Something to consider: you may be able to minimize, or eliminate, the need to pass Python objects to the threads by extracting all the information needed from them into Rust objects and passing those to threads instead.

How can I move the data between threads safely?

I'm currently trying to call a function to which I pass multiple file names and expect the function to read the files and generate the appropriate structs and return them in a Vec<Audit>. I've been able to accomplish it reading the files one by one but I want to achieve it using threads.
This is the function:
fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut audits = Arc::new(Mutex::new(vec![]));
let mut handlers = vec![];
for file in files {
let audits = Arc::clone(&audits);
handlers.push(thread::spawn(move || {
let mut audits = audits.lock().unwrap();
audits.push(audit_from_xml_file(file.clone()));
audits
}));
}
for handle in handlers {
let _ = handle.join();
}
audits
.lock()
.unwrap()
.into_iter()
.fold(vec![], |mut result, audit| {
result.push(audit);
result
})
}
But it won't compile due to the following error:
error[E0277]: `MutexGuard<'_, Vec<Audit>>` cannot be sent between threads safely
--> src/main.rs:82:23
|
82 | handlers.push(thread::spawn(move || {
| ^^^^^^^^^^^^^ `MutexGuard<'_, Vec<Audit>>` cannot be sent between threads safely
|
::: /home/enthys/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:618:8
I have tried wrapping the generated Audit structs in Some(Audit) to avoid the MutexGuard but then I stumble with Poisonned Thread issues.

The cause of the error is that after after pushing the new Audit into the (locked) audits vec you then try to return the vec's MutexGuard.
In Rust, a thread's function can actually return values, the point of doing that is to send the value back to whoever is join-ing the thread. This means the value is going to move between threads, so the value needs to be movable betweem threads (aka Send), which mutex guards have no reason to be[0].
The easy solution is to just... not do that. Just delete the last line of the spawn function. Though it's not like the code works after that as you still have borrowing issue related to the thing at the end.
An alternative is to lean into the feature (especially if Audit objects are not too big): drop the audits vec entirely and instead have each thread return its audit, then collect from the handlers when you join them:
pub fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut handlers = vec![];
for file in files {
handlers.push(thread::spawn(move || {
audit_from_xml_file(file)
}));
}
handlers.into_iter()
.map(|handler| handler.join().unwrap())
.collect()
}
Though at that point you might as well just let Rayon handle it:
use rayon::prelude::*;
pub fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
files.into_par_iter().map(audit_from_xml_file).collect()
}
That also avoids crashing the program or bringing the machine to its knees if you happen to have millions of files.
[0] and all the reasons not to be, locking on one thread and unlocking on an other is not necessarily supported e.g. ReleaseMutex
The ReleaseMutex function fails if the calling thread does not own the mutex object.
(NB: in the windows lingo, "owning" a mutex means having acquired it via WaitForSingleObject, which translates to lock in posix lingo)
and can be plain UB e.g. pthread_mutex_unlock
If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, undefined behavior results.

Your problem is that you are passing your Vec<Audit> (or more precisely the MutexGuard<Vec<Audit>>), to the threads and back again, without really needing it.
And you don't need Mutex or Arc for this simpler task:
fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut handlers = vec![];
for file in files {
handlers.push(thread::spawn(move || {
audit_from_xml_file(file)
}));
}
handlers
.into_iter()
.flat_map(|x| x.join())
.collect()
}

Spawn non-static future with Tokio

I have an async method that should execute some futures in parallel, and only return after all futures finished. However, it is passed some data by reference that does not live as long as 'static (it will be dropped at some point in the main method). Conceptually, it's similar to this (Playground):
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in array {
let task = spawn(do_sth(i));
tasks.push(task);
}
for task in tasks {
task.await;
}
}
#[tokio::main]
async fn main() {
parallel_stuff(&[3, 1, 4, 2]);
}
Now, tokio wants futures that are passed to spawn to be valid for the 'static lifetime, because I could drop the handle without the future stopping. That means that my example above produces this error message:
error[E0759]: `array` has an anonymous lifetime `'_` but it needs to satisfy a `'static` lifetime requirement
--> src/main.rs:12:25
|
12 | async fn parallel_stuff(array: &[u64]) {
| ^^^^^ ------ this data with an anonymous lifetime `'_`...
| |
| ...is captured here...
...
15 | let task = spawn(do_sth(i));
| ----- ...and is required to live as long as `'static` here
So my question is: How do I spawn futures that are only valid for the current context that I can then wait until all of them completed?

It is not possible to spawn a non-'static future from async Rust. This is because any async function might be cancelled at any time, so there is no way to guarantee that the caller really outlives the spawned tasks.
It is true that there are various crates that allow scoped spawns of async tasks, but these crates cannot be used from async code. What they do allow is to spawn scoped async tasks from non-async code. This doesn't violate the problem above, because the non-async code that spawned them cannot be cancelled at any time, as it is not async.
Generally there are two approaches to this:
Spawn a 'static task by using Arc rather than ordinary references.
Use the concurrency primitives from the futures crate instead of spawning.
Generally to spawn a static task and use Arc, you must have ownership of the values in question. This means that since your function took the argument by reference, you cannot use this technique without cloning the data.
async fn do_sth(with: Arc<[u64]>, idx: usize) {
delay_for(Duration::new(with[idx], 0)).await;
println!("{}", with[idx]);
}
async fn parallel_stuff(array: &[u64]) {
// Make a clone of the data so we can shared it across tasks.
let shared: Arc<[u64]> = Arc::from(array);
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in 0..array.len() {
// Cloning an Arc does not clone the data.
let shared_clone = shared.clone();
let task = spawn(do_sth(shared_clone, i));
tasks.push(task);
}
for task in tasks {
task.await;
}
}
Note that if you have a mutable reference to the data, and the data is Sized, i.e. not a slice, it is possible to temporarily take ownership of it.
async fn do_sth(with: Arc<Vec<u64>>, idx: usize) {
delay_for(Duration::new(with[idx], 0)).await;
println!("{}", with[idx]);
}
async fn parallel_stuff(array: &mut Vec<u64>) {
// Swap the array with an empty one to temporarily take ownership.
let vec = std::mem::take(array);
let shared = Arc::new(vec);
let mut tasks: Vec<JoinHandle<()>> = Vec::new();
for i in 0..array.len() {
// Cloning an Arc does not clone the data.
let shared_clone = shared.clone();
let task = spawn(do_sth(shared_clone, i));
tasks.push(task);
}
for task in tasks {
task.await;
}
// Put back the vector where we took it from.
// This works because there is only one Arc left.
*array = Arc::try_unwrap(shared).unwrap();
}
Another option is to use the concurrency primitives from the futures crate. These have the advantage of working with non-'static data, but the disadvantage that the tasks will not be able to run on multiple threads at the same time.
For many workflows this is perfectly fine, as async code should spend most of its time waiting for IO anyway.
One approach is to use FuturesUnordered. This is a special collection that can store many different futures, and it has a next function that runs all of them concurrently, and returns once the first of them finished. (The next function is only available when StreamExt is imported)
You can use it like this:
use futures::stream::{FuturesUnordered, StreamExt};
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
let mut tasks = FuturesUnordered::new();
for i in array {
let task = do_sth(i);
tasks.push(task);
}
// This loop runs everything concurrently, and waits until they have
// all finished.
while let Some(()) = tasks.next().await { }
}
Note: The FuturesUnordered must be defined after the shared value. Otherwise you will get a borrow error that is caused by them being dropped in the wrong order.
Another approach is to use a Stream. With streams, you can use buffer_unordered. This is a utility that uses FuturesUnordered internally.
use futures::stream::StreamExt;
async fn do_sth(with: &u64) {
delay_for(Duration::new(*with, 0)).await;
println!("{}", with);
}
async fn parallel_stuff(array: &[u64]) {
// Create a stream going through the array.
futures::stream::iter(array)
// For each item in the stream, create a future.
.map(|i| do_sth(i))
// Run at most 10 of the futures concurrently.
.buffer_unordered(10)
// Since Streams are lazy, we must use for_each or collect to run them.
// Here we use for_each and do nothing with the return value from do_sth.
.for_each(|()| async {})
.await;
}
Note that in both cases, importing StreamExt is important as it provides various methods that are not available on streams without importing the extension trait.

In case of code that uses threads for parallelism, it is possible to avoid copying by extending a lifetime with transmute. An example:
fn main() {
let now = std::time::Instant::now();
let string = format!("{now:?}");
println!(
"{now:?} has length {}",
parallel_len(&[&string, &string]) / 2
);
}
fn parallel_len(input: &[&str]) -> usize {
// SAFETY: this variable needs to be static, because it is passed into a thread,
// but the thread does not live longer than this function, because we wait for
// it to finish by calling `join` on it.
let input: &[&'static str] = unsafe { std::mem::transmute(input) };
let mut threads = vec![];
for txt in input {
threads.push(std::thread::spawn(|| txt.len()));
}
threads.into_iter().map(|t| t.join().unwrap()).sum()
}
It seems reasonable that this should also work for asynchronous code, but I do not know enough about that to say for sure.

One mutable borrow and multiple immutable borrows

I'm trying to write a program that spawns a background thread that continuously inserts data into some collection. At the same time, I want to keep getting input from stdin and check if that input is in the collection the thread is operating on.
Here is a boiled down example:
use std::collections::HashSet;
use std::thread;
fn main() {
let mut set: HashSet<String> = HashSet::new();
thread::spawn(move || {
loop {
set.insert("foo".to_string());
}
});
loop {
let input: String = get_input_from_stdin();
if set.contains(&input) {
// Do something...
}
}
}
fn get_input_from_stdin() -> String {
String::new()
}
However this doesn't work because of ownership stuff.
I'm still new to Rust but this seems like something that should be possible. I just can't find the right combination of Arcs, Rcs, Mutexes, etc. to wrap my data in.

First of all, please read Need holistic explanation about Rust's cell and reference counted types.
There are two problems to solve here:
Sharing ownership between threads,
Mutable aliasing.
To share ownership, the simplest solution is Arc. It requires its argument to be Sync (accessible safely from multiple threads) which can be achieved for any Send type by wrapping it inside a Mutex or RwLock.
To safely get aliasing in the presence of mutability, both Mutex and RwLock will work. If you had multiple readers, RwLock might have an extra performance edge. Since you have a single reader there's no point: let's use the simple Mutex.
And therefore, your type is: Arc<Mutex<HashSet<String>>>.
The next trick is passing the value to the closure to run in another thread. The value is moved, and therefore you need to first make a clone of the Arc and then pass the clone, otherwise you've moved your original and cannot access it any longer.
Finally, accessing the data requires going through the borrows and locks...
use std::sync::{Arc, Mutex};
fn main() {
let set = Arc::new(Mutex::new(HashSet::new()));
let clone = set.clone();
thread::spawn(move || {
loop {
clone.lock().unwrap().insert("foo".to_string());
}
});
loop {
let input: String = get_input_from_stdin();
if set.lock().unwrap().contains(&input) {
// Do something...
}
}
}
The call to unwrap is there because Mutex::lock returns a Result; it may be impossible to lock the Mutex if it is poisoned, which means a panic occurred while it was locked and therefore its content is possibly garbage.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Waiting on multiple futures borrowing mutable self - rust

Related

Rust - How to pass function parameters to closure

Is there a way of spawning new threads with copies of existing data that have a non-static lifetime?

How can I move the data between threads safely?

Spawn non-static future with Tokio

One mutable borrow and multiple immutable borrows

Categories

Resources