Rust - How to pass function parameters to closure - multithreading

I'm trying to write a function that takes two parameters. The function starts two threads and uses one of the parameters inside one of the thread closures. This doesn't work because of the error "Borrowed data escapes outside of closure". Here's the code.
pub fn measure_stats(testdatapath: &PathBuf, filenameprefix: &String) {
let (tx, rx) = mpsc::channel();
let filename = format!("test.txt")
let measure_thread = thread::spawn(move || {
let stats = sar();
fs::write(filename, stats).expect("failed to write output to file");
// Send a signal that we're done.
let _ = tx.send(());
});
thread::spawn(move || {
let mut n = 0;
loop {
// Break if the measure thread is done.
match rx.try_recv() {
Ok(_) | Err(TryRecvError::Disconnected) => break,
Err(TryRecvError::Empty) => {}
}
let filename = format!("{:04}.img", n);
let filepath = Path::new(testdatapath).join(&filename);
random_file_write(&filepath).unwrap();
random_file_read(&filepath).unwrap();
fs::remove_file(&filepath).expect("failed to remove file");
n += 1;
}
});
measure_thread.join().expect("joining measure thread panicked");
}
The problem is that testdatapath escapes the function body. I think this is a problem because the lifetime of testdatapath is only guaranteed until the end of the closure, but it needs to be the lifetime of the entire program. But it's a little confusing to me.
I've tried cloning the variable, but that didn't help. I'm not sure how I'm supposed to do this. How do I use a function parameter inside the closure or accomplish the same goal some other more canonical way?

If it's okay for the function not to return until both threads complete, then use std::thread::scope() to create scoped threads instead of std::thread::spawn(). Scoped threads allow borrowing data whereas regular spawning cannot, but require the threads to all terminate before the scope ends and the function that created them returns.
If this has to be a “background” task, then you need to make sure that all the data used by each thread is owned, i.e. not a reference. In this case, that means you should change the parameters to be owned:
pub fn measure_stats(testdatapath: PathBuf, filenameprefix: String) {
Then, those values will be moved into the receiving thread, without any lifetime constraints.

You're trying to make testdata live longer than the function, since this is a value you're borrowing and since you can't guarantee that the original PathBuff will outlive closure running in the new thread the compiler is warning you that you're assuming that this would be the case, but not taking any precautions to do so.
The 3 simpler choices:
Move the PathBuff to the function instead of borrowing it (remove the &).
Use an Arc
clone it and move the clone into the thread.

Related

Waiting on multiple futures borrowing mutable self

Each of the following methods need (&mut self) to operate. The following code gives the error.
cannot borrow *self as mutable more than once at a time
How can I achieve this correctly?
loop {
let future1 = self.handle_new_connections(sender_to_connector.clone());
let future2 = self.handle_incoming_message(&mut receiver_from_peers);
let future3 = self.handle_outgoing_message();
tokio::pin!(future1, future2, future3);
tokio::select! {
_=future1=>{},
_=future2=>{},
_=future3=>{}
}
}
You are not allowed to have multiple mutable references to an object and there's a good reason for that.
Imagine you pass an object mutably to 2 different functions and they edited the object out of sync since you don't have any mechanism for that in place. then you'd end up with something called a race condition.
To prevent this bug rust allows only one mutable reference to an object at a time but you can have multiple immutable references and often you see people use internal mutability patterns.
In your case, you want data not to be able to be modified by 2 different threads at the same time so you'd wrap it in a Lock or RwLock then since you want multiple threads to be able to own this value you'd wrap that in an Arc.
here you can read about interior mutability in more detail.
Alternatively, while declaring the type of your function you could add proper lifetimes to indicate the resulting Future will be waited on in the same context by giving it a lifetime since your code waits for the future before the next iteration that would do the trick as well.
I encountered the same problem when dealing with async code. Here is what I figured out:
Let's say you have an Engine, that contains both incoming and outgoing:
struct Engine {
log: Arc<Mutex<Vec<String>>>,
outgoing: UnboundedSender<String>,
incoming: UnboundedReceiver<String>,
}
Our goal is to create two functions process_incoming and process_logic and then poll them simultaneously without messing up with the borrow checker in Rust.
What is important here is that:
You cannot pass &mut self to these async functions simultaneously.
Either incoming or outgoing will be only held by one function at most.
The data access by both process_incoming and process_logic need to be wrapped by a lock.
Any trying to lock Engine directly will lead to a deadlock at runtime.
So that leaves us giving up using the method in favor of the associated function:
impl Engine {
// ...
async fn process_logic(outgoing: &mut UnboundedSender<String>, log: Arc<Mutex<Vec<String>>>) {
loop {
Delay::new(Duration::from_millis(1000)).await.unwrap();
let msg: String = "ping".into();
println!("outgoing: {}", msg);
log.lock().push(msg.clone());
outgoing.send(msg).await.unwrap();
}
}
async fn process_incoming(
incoming: &mut UnboundedReceiver<String>,
log: Arc<Mutex<Vec<String>>>,
) {
while let Some(msg) = incoming.next().await {
println!("incoming: {}", msg);
log.lock().push(msg);
}
}
}
And we can then write main as:
fn main() {
futures::executor::block_on(async {
let mut engine = Engine::new();
let a = Engine::process_incoming(&mut engine.incoming, engine.log.clone()).fuse();
let b = Engine::process_logic(&mut engine.outgoing, engine.log).fuse();
futures::pin_mut!(a, b);
select! {
_ = a => {},
_ = b => {},
}
});
}
I put the whole example here.
It's a workable solution, only be aware that you should add futures and futures-timer in your dependencies.

Safely move or dereference Receiver in a Fn?

I'm working on an app that optionally uses a GUI to display video data that's roughly structured like this:
fn main() {
let (window_tx, window_rx) =
MainContext::channel::<MyStruct>(PRIORITY_DEFAULT);
let some_thread = thread::spawn(move || -> () {
// send data to window_tx
});
let application =
gtk::Application::new(Some("com.my.app"), Default::default());
application.connect_activate(move |app: &gtk::Application| {
build_ui(app, window_rx);
});
application.run();
some_thread.join().unwrap();
}
fn build_ui(application: &gtk::Application, window_rx: Receiver<MyStruct>) {
window_rx.attach( ... );
}
The gtk rust library requires a Fn callback passed to application.connect_activate on startup, so I can't use a FnOnce or FnMut closure to move the glib::Receiver in the callback. The compiler throws this error:
error[E0507]: cannot move out of `window_rx`, a captured variable in an `Fn` closure
I've tried to avoid the move by wrapping window_rx in a Rc, ie:
let r = Rc::new(RefCell::new(window_rx));
application.connect_activate(move |app: &gtk::Application| {
build_ui(app, Rc::clone(&r));
});
But upon dereferencing the Rc in my build_ui function, I get this error:
error[E0507]: cannot move out of an `Rc`
The fallback I've used thus far is to just move the channel creation and thread creation into my build_ui function, but because the GUI is not required, I was hoping to avoid using GTK and the callback entirely if GUI is not used. Is there some way I can either safely move window_rx within a closure or otherwise dereference it in the callback without causing an error?
When you need to move a value out from code that, by the type system but not in practice, could be called more than once, the simple tool to reach for is Option. Wrapping the value in an Option allows it to be swapped with an Option::None.
When you need something to be mutable even though you're inside a Fn, you need interior mutability; in this case, Cell will do. Here's a complete compilable program that approximates your situation:
use std::cell::Cell;
// Placeholders to let it compile
use std::sync::mpsc;
fn wants_fn_callback<F>(_f: F) where F: Fn() + 'static {}
struct MyStruct;
fn main() {
let (_, window_rx) = mpsc::channel::<MyStruct>();
let window_rx: Cell<Option<mpsc::Receiver<MyStruct>>> = Cell::new(Some(window_rx));
wants_fn_callback(move || {
let _: mpsc::Receiver<MyStruct> = window_rx.take().expect("oops, called twice");
});
}
Cell::take() removes the Option<Receiver> from the Cell, leaving None in its place. The expect then removes the Option wrapper (and handles the possibility of the function being called twice by panicking in that case).
Applied to your original problem, this would be:
let window_rx: Option<Receiver<MyStruct>> = Cell::new(Some(window_rx));
application.connect_activate(move |app: &gtk::Application| {
build_ui(app, window_rx.take().expect("oops, called twice"));
});
However, be careful: if the library requires a Fn closure, there might be some condition under which the function could be called more than once, in which case you should be prepared to do something appropriate in that circumstance. If there isn't such a condition, then the library's API should be improved to take a FnOnce instead.

Is there a way of spawning new threads with copies of existing data that have a non-static lifetime?

I have a problem that is similar to what is discussed in Is there a succinct way to spawn new threads with copies of existing data?. Unlike the linked question, I am trying to move an object with an associated lifetime into the new thread.
Intuitively, what I am trying to do is to copy everything that is necessary to continue the computation to the new thread and exit the old one. However, when trying to move cloned data (with a lifetime) to the new thread, I get the following error:
error[E0759]: data has lifetime 'a but it needs to satisfy a 'static lifetime requirement
I created a reproducible example based on the referenced question here. This is just to exemplify the problem. Here, the lifetimes could be removed easily but in my actual use-case the data I want to move to the thread is much more complex.
Is there an easy way of making this work with Rust?
A qualified answer to the question in the title is "yes", but we can't do it by copying non-static references. The reasons for this seeming limitation are sound. The way we can get the required data/objects into the thread closures is by passing ownership of them (or copies of them, or other concrete objects that represent them) to the closures.
It may not be immediately clear on how to do this with a complex library like pyo3 since much of the API returns reference types to objects rather than concrete objects that can be passed as-is to other threads, but the library does provide ways to pass Python data/objects to other threads, which I'll cover in the second example below.
The start() function will need to put a 'static bound on the closure type associated with its data parameter because within its body, start() is passing these closures on to other threads. The compiler is working to guarantee that the closures aren't holding on to references to anything that may evaporate if a thread runs longer than its parent, which is why it gripes without the 'static guarantee.
fn start<'a>(data : Vec<Arc<dyn Fn() -> f64 + Send + Sync + 'static>>,
more_data : String)
{
for _ in 1..=4 {
let cloned_data = data.clone();
let cloned_more_data = more_data.clone();
thread::spawn(move || foo(cloned_data, cloned_more_data));
}
}
A 'static bound is different than a 'static lifetime applied to a reference (data: 'static vs. &'static data). In the case of a bound, it only means the type it's applied to doesn't contain any non-static references (if it even holds any references at all). It's pretty common to see this bound applied to method parameters in threaded code.
As this applies specifically to the pyo3 problem space, we can avoid forming closures that contain non-static references by converting any such references to owned objects, then when the callback, running in another thread, needs to do something with them, it can acquire the GIL and cast them back to Python object references.
More about this in the code comments below. I took a simple example from the pyo3 GitHub README and combined it with the code provided in the playground example.
Something to watch out for when applying this pattern is deadlock. The threads will need to acquire the GIL in order to use the Python objects they have access to. In the example, once the parent thread is done spawning new threads, it releases the GIL when it goes out of scope. The parent then waits for the child threads to complete by joining their handles.
use std::thread;
use std::thread::JoinHandle;
use std::sync::Arc;
use pyo3::prelude::*;
use pyo3::types::IntoPyDict;
use pyo3::types::PyDict;
type MyClosure<'a> = dyn Fn() -> f64 + Send + Sync + 'a;
fn main() -> Result<(), ()>
{
match Python::with_gil(|py| main_(py)
.map_err(|e| e.print_and_set_sys_last_vars(py)))
{
Ok(handles) => {
for handle in handles {
handle.join().unwrap();
}},
Err(e) => { println!("{:?}", e); },
}
Ok(())
}
fn main_(py: Python) -> PyResult<Vec<JoinHandle<()>>>
{
let sys = py.import("sys")?;
let version = sys.get("version")?.extract::<String>()?;
let locals = [("os", py.import("os")?)].into_py_dict(py);
let code = "os.getenv('USER') or os.getenv('USERNAME') or 'Unknown'";
let user = py.eval(code, None, Some(&locals))?.extract::<String>()?;
println!("Hello {}, I'm Python {}", user, version);
// The thread will do something with the `locals` dictionary. In order to
// pass this reference object to the thread, first convert it to a
// non-reference object.
// Convert `locals` to `PyObject`.
let locals_obj = locals.to_object(py);
// Now we can move `locals_obj` into the thread without concern.
let closure: Arc<MyClosure<'_>> = Arc::new(move || {
// We can print out the PyObject which reveals it to be a tuple
// containing a pointer value.
println!("{:?}", locals_obj);
// If we want to do anything with the `locals` object, we can cast it
// back to a `PyDict` reference. We'll need to acquire the GIL first.
Python::with_gil(|py| {
// We have the GIL, cast the dict back to a PyDict reference.
let py_dict = locals_obj.cast_as::<PyDict>(py).unwrap();
// Printing it out reveals it to be a dictionary with the key `os`.
println!("{:?}", py_dict);
});
1.
});
let data = vec![closure];
let more = "Important data.".to_string();
let handles = start(data, more);
Ok(handles)
}
fn start<'a>(data : Vec<Arc<MyClosure<'static>>>,
more : String
) -> Vec<JoinHandle<()>>
{
let mut handles = vec![];
for _ in 1..=4 {
let cloned_data = data.clone();
let cloned_more = more.clone();
let h = thread::spawn(move || foo(cloned_data, cloned_more));
handles.push(h);
}
handles
}
fn foo<'a>(data : Vec<Arc<MyClosure<'a>>>,
more : String)
{
for closure in data {
closure();
}
}
Output:
Hello todd, I'm Python 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0]
Py(0x7f3329ccdd40)
Py(0x7f3329ccdd40)
Py(0x7f3329ccdd40)
{'os': <module 'os' from '/usr/lib/python3.8/os.py'>}
{'os': <module 'os' from '/usr/lib/python3.8/os.py'>}
{'os': <module 'os' from '/usr/lib/python3.8/os.py'>}
Py(0x7f3329ccdd40)
{'os': <module 'os' from '/usr/lib/python3.8/os.py'>}
Something to consider: you may be able to minimize, or eliminate, the need to pass Python objects to the threads by extracting all the information needed from them into Rust objects and passing those to threads instead.

One mutable borrow and multiple immutable borrows

I'm trying to write a program that spawns a background thread that continuously inserts data into some collection. At the same time, I want to keep getting input from stdin and check if that input is in the collection the thread is operating on.
Here is a boiled down example:
use std::collections::HashSet;
use std::thread;
fn main() {
let mut set: HashSet<String> = HashSet::new();
thread::spawn(move || {
loop {
set.insert("foo".to_string());
}
});
loop {
let input: String = get_input_from_stdin();
if set.contains(&input) {
// Do something...
}
}
}
fn get_input_from_stdin() -> String {
String::new()
}
However this doesn't work because of ownership stuff.
I'm still new to Rust but this seems like something that should be possible. I just can't find the right combination of Arcs, Rcs, Mutexes, etc. to wrap my data in.
First of all, please read Need holistic explanation about Rust's cell and reference counted types.
There are two problems to solve here:
Sharing ownership between threads,
Mutable aliasing.
To share ownership, the simplest solution is Arc. It requires its argument to be Sync (accessible safely from multiple threads) which can be achieved for any Send type by wrapping it inside a Mutex or RwLock.
To safely get aliasing in the presence of mutability, both Mutex and RwLock will work. If you had multiple readers, RwLock might have an extra performance edge. Since you have a single reader there's no point: let's use the simple Mutex.
And therefore, your type is: Arc<Mutex<HashSet<String>>>.
The next trick is passing the value to the closure to run in another thread. The value is moved, and therefore you need to first make a clone of the Arc and then pass the clone, otherwise you've moved your original and cannot access it any longer.
Finally, accessing the data requires going through the borrows and locks...
use std::sync::{Arc, Mutex};
fn main() {
let set = Arc::new(Mutex::new(HashSet::new()));
let clone = set.clone();
thread::spawn(move || {
loop {
clone.lock().unwrap().insert("foo".to_string());
}
});
loop {
let input: String = get_input_from_stdin();
if set.lock().unwrap().contains(&input) {
// Do something...
}
}
}
The call to unwrap is there because Mutex::lock returns a Result; it may be impossible to lock the Mutex if it is poisoned, which means a panic occurred while it was locked and therefore its content is possibly garbage.

Cannot move data out of a Mutex

Consider the following code example, I have a vector of JoinHandlers in which I need it iterate over to join back to the main thread, however, upon doing so I am getting the error error: cannot move out of borrowed content.
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
//do some work
}
threads.lock().unwrap().push((handle));
}
for t in threads.lock().unwrap().iter() {
t.join();
}
Unfortunately, you can't do this directly. When Mutex consumes the data structure you fed to it, you can't get it back by value again. You can only get &mut reference to it, which won't allow moving out of it. So even into_iter() won't work - it needs self argument which it can't get from MutexGuard.
There is a workaround, however. You can use Arc<Mutex<Option<Vec<_>>>> instead of Arc<Mutex<Vec<_>>> and then just take() the value out of the mutex:
for t in threads.lock().unwrap().take().unwrap().into_iter() {
}
Then into_iter() will work just fine as the value is moved into the calling thread.
Of course, you will need to construct the vector and push to it appropriately:
let threads = Arc::new(Mutex::new(Some(Vec::new())));
...
threads.lock().unwrap().as_mut().unwrap().push(handle);
However, the best way is to just drop the Arc<Mutex<..>> layer altogether (of course, if this value is not used from other threads).
As referenced in How to take ownership of T from Arc<Mutex<T>>? this is now possible to do without any trickery in Rust using Arc::try_unwrap and Mutex.into_inner()
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
println!("{}", _x);
});
threads.lock().unwrap().push(handle);
}
let threads_unwrapped: Vec<JoinHandle<_>> = Arc::try_unwrap(threads).unwrap().into_inner().unwrap();
for t in threads_unwrapped.into_iter() {
t.join().unwrap();
}
Play around with it in this playground to verify.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9d5635e7f778bc744d1fb855b92db178
while the drain is a good solution, you can also do the following thing
// with a copy
let built_words: Arc<Mutex<Vec<String>>> = Arc::new(Mutex::new(vec![]));
let result: Vec<String> = built_words.lock().unwrap().clone();
// using drain
let mut locked_result = built_words.lock().unwrap();
let mut result: Vec<String> = vec![];
result.extend(locked_result.drain(..));
I would prefer to clone the data to get the original value. Not sure if it has any performance overhead.

Resources