Lifetime of variables passed to a new thread - rust

I have trouble compiling this program:
use std::env;
use std::sync::mpsc;
use std::thread;
use std::time::Duration;
fn main() {
let args: Vec<_> = env::args().skip(1).collect();
let (tx, rx) = mpsc::channel();
for arg in &args {
let t = tx.clone();
thread::spawn(move || {
thread::sleep(Duration::from_millis(50));
let _new_arg = arg.to_string() + "foo";
t.send(arg);
});
}
for _ in &args {
println!("{}", rx.recv().unwrap());
}
}
I read all arguments from the command line and emulate doing some work on each argument in the thread. Then I print out the results of this work, which I do using a channel.
error[E0597]: `args` does not live long enough
--> src/main.rs:11:17
|
11 | for arg in &args {
| ^^^^ does not live long enough
...
24 | }
| - borrowed value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
If I understood well.. the lifetime of args must be static (i.e. the entire time of program execution), while it only lives within the scope of main function (?). I don't understand the reason behind this, and how I could fix it.

The problem lies in spawning a background thread. When you call thread::spawn you effectively have to pass ownership of any resource used in it to the thread, as it might run indefinitely, which means that its lifetime must be 'static.
There are two options to resolve that: the simplest one would be to pass ownership. Your code here
let new_arg = arg.to_string() + "foo";
t.send(arg);
looks like you actually wanted to send new_arg, in which case you could just create the owned result of arg.to_string() before spawning the thread, thus eliminating the need to pass the reference arg.
Another slightly more involved idea, that might be useful at some point though, are scoped threads as implemented in crossbeam for example. These are bound to an explicit scope, where you spawn them and are joined together at the end. This looks somewhat like this:
crossbeam::scope(|scope| {
scope.spawn(|| {
println!("Hello from a scoped thread!");
});
});
Have a look at the docs for further details.

Related

Rust - How to pass function parameters to closure

I'm trying to write a function that takes two parameters. The function starts two threads and uses one of the parameters inside one of the thread closures. This doesn't work because of the error "Borrowed data escapes outside of closure". Here's the code.
pub fn measure_stats(testdatapath: &PathBuf, filenameprefix: &String) {
let (tx, rx) = mpsc::channel();
let filename = format!("test.txt")
let measure_thread = thread::spawn(move || {
let stats = sar();
fs::write(filename, stats).expect("failed to write output to file");
// Send a signal that we're done.
let _ = tx.send(());
});
thread::spawn(move || {
let mut n = 0;
loop {
// Break if the measure thread is done.
match rx.try_recv() {
Ok(_) | Err(TryRecvError::Disconnected) => break,
Err(TryRecvError::Empty) => {}
}
let filename = format!("{:04}.img", n);
let filepath = Path::new(testdatapath).join(&filename);
random_file_write(&filepath).unwrap();
random_file_read(&filepath).unwrap();
fs::remove_file(&filepath).expect("failed to remove file");
n += 1;
}
});
measure_thread.join().expect("joining measure thread panicked");
}
The problem is that testdatapath escapes the function body. I think this is a problem because the lifetime of testdatapath is only guaranteed until the end of the closure, but it needs to be the lifetime of the entire program. But it's a little confusing to me.
I've tried cloning the variable, but that didn't help. I'm not sure how I'm supposed to do this. How do I use a function parameter inside the closure or accomplish the same goal some other more canonical way?
If it's okay for the function not to return until both threads complete, then use std::thread::scope() to create scoped threads instead of std::thread::spawn(). Scoped threads allow borrowing data whereas regular spawning cannot, but require the threads to all terminate before the scope ends and the function that created them returns.
If this has to be a “background” task, then you need to make sure that all the data used by each thread is owned, i.e. not a reference. In this case, that means you should change the parameters to be owned:
pub fn measure_stats(testdatapath: PathBuf, filenameprefix: String) {
Then, those values will be moved into the receiving thread, without any lifetime constraints.
You're trying to make testdata live longer than the function, since this is a value you're borrowing and since you can't guarantee that the original PathBuff will outlive closure running in the new thread the compiler is warning you that you're assuming that this would be the case, but not taking any precautions to do so.
The 3 simpler choices:
Move the PathBuff to the function instead of borrowing it (remove the &).
Use an Arc
clone it and move the clone into the thread.

Lifetime struggles with "borrowed value does not live long enough" for lazy_static value

Rust newbie here that has been struggling for a full day on how to get the compiler to recognize that the lifetime of a lazy_static struct instance is 'static. A minimal example of what I am trying to do is the following:
use redis::{Client, Connection, PubSub};
use std::sync::Mutex;
#[macro_use]
extern crate lazy_static;
lazy_static! {
static ref REDIS_CLIENT: Mutex<Client> =
Mutex::new(Client::open("redis://127.0.0.1/").unwrap());
static ref RECEIVER_CONNECTIONS: Mutex<Vec<Connection>> = Mutex::new(vec![]);
static ref RECEIVERS: Mutex<Vec<PubSub<'static>>> = Mutex::new(vec![]);
}
pub fn create_receiver() -> u64 {
let client_instance = match REDIS_CLIENT.lock() {
Ok(i) => i,
Err(_) => return 0,
};
let connection: Connection = match client_instance.get_connection() {
Ok(conn) => conn,
Err(_) => return 0,
};
let mut receiver_connections_instance = match RECEIVER_CONNECTIONS.lock() {
Ok(i) => i,
Err(_) => return 0,
};
let receiver_connection_index = receiver_connections_instance.len();
receiver_connections_instance.push(connection);
let receiver_connection = &mut receiver_connections_instance[receiver_connection_index];
let receiver = receiver_connection.as_pubsub();
let mut receivers_instance = match RECEIVERS.lock() {
Ok(i) => i,
Err(_) => return 0,
};
receivers_instance.push(receiver);
let receiver_handle = receivers_instance.len();
receiver_handle.try_into().unwrap()
}
But I am getting the following error:
error[E0597]: `receiver_connections_instance` does not live long enough
--> src/lib.rs:33:36
|
33 | let receiver_connection = &mut receiver_connections_instance[receiver_connection_index];
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ borrowed value does not live long enough
34 | let receiver = receiver_connection.as_pubsub();
| ------------------------------- argument requires that `receiver_connections_instance` is borrowed for `'static`
...
45 | }
| - `receiver_connections_instance` dropped here while still borrowed
I don't understand this because RECEIVER_CONNECTIONS is a lazy_static variable and I don't think my code uses the receiver_connections_instance past the end of the function.
Many thanks and infinite karma to whoever can help me understand what I'm doing wrong here. :)
The problem is that your Connection isn't 'static at the time you invoke as_pubsub(), you access it through a mutex guard with a limited lifetime. As soon as you drop the guard, the connection is no longer exclusively yours, and neither is the PubSub - which is why PubSub<'static> is not allowed. The redis Rust API doesn't seem to allow exactly what you're after (at least without unsafe), because Connection::as_pubsub() requires &mut self, prohibiting you from invoking as_pubsub() directly on a globally stored Connection.
But since your connections are global and never removed anyway, you could simply not store the connection, but "leak" it instead and only store the PubSub. Here leak is meant in a technical sense of creating a value that is allocated and then never dropped, much like a global variable, not to an uncontrolled memory leak that would indicate a bug. Leaking the connection gives you &'static mut Connection which you can use to create a PubSub<'static>, which you can store in a global variable. For example, this compiles:
lazy_static! {
static ref REDIS_CLIENT: Client = Client::open("redis://127.0.0.1/").unwrap();
static ref RECEIVERS: Mutex<Vec<PubSub<'static>>> = Default::default();
}
pub fn create_receiver() -> RedisResult<usize> {
let connection = REDIS_CLIENT.get_connection()?;
let connection = Box::leak(Box::new(connection)); // make it immortal
let mut receivers = RECEIVERS.lock().unwrap();
receivers.push(connection.as_pubsub());
Ok(receivers.len() - 1)
}
Several tangential notes:
redis Client doesn't need to be wrapped in Mutex because get_connection() takes &self.
you don't need to pattern-match every mutex lock - locking can fail only if a thread that held the lock panicked. In that case you most likely want to just propagate the panic, so an unwrap() is appropriate.
using 0 as a special value is not idiomatic Rust, you can use Option<u64> or Result<u64> to signal that a value could not be returned. That allows the function to use the ? operator.
The code above has these improvements applied, resulting in a significantly reduced line count.
The TLDR is that the relevant reference is not 'static because it's tied to the lifetime of the mutex guard. I'll explain the issue by walking through the relevant parts of the code.
You start by locking the RECEIVER_CONNECTIONS mutex, storing the guard in receiver_connections_instance:
let mut receiver_connections_instance = match RECEIVER_CONNECTIONS.lock() {
Ok(i) => i,
Err(_) => return 0,
};
Then you get a mutable reference to data inside the guard, and store it in receiver_connection:
let receiver_connection = &mut receiver_connections_instance[receiver_connection_index];
You then call the as_pubsub() method on receiver_connection and store the result in receiver:
let receiver = receiver_connection.as_pubsub();
The signature of that as_pubsub() method is the following:
fn as_pubsub(&mut self) -> PubSub<'_>
which if we un-elide the lifetimes can be written as
fn as_pubsub<'a>(&'a mut self) -> PubSub<'a>
We can see from the lifetimes that the return type PubSub captures the input lifetime. (This is because PubSub stores the mutable reference inside itself). So all of this means the lifetime of receiver is bound to the lifetime of the mutex guard. The code that follows then tries to store receiver in the static RECEIVERS variable, but that cannot work because receiver cannot outlive the mutex guard receiver_connections_instance, which is dropped at the end of the function.

Return a reference to a T inside a lazy static RwLock<Option<T>>?

I have a lazy static struct that I want to be able to set to some random value in the beginning of the execution of the program, and then get later. This little silly snippet can be used as an example:
use lazy_static::lazy_static;
use std::sync::RwLock;
struct Answer(i8);
lazy_static! {
static ref ANSWER: RwLock<Option<Answer>> = RwLock::new(None);
}
fn answer_question() {
*ANSWER.write().unwrap() = Some(Answer(42));
}
fn what_is_the_answer() -> &'static Answer {
ANSWER
.read()
.unwrap()
.as_ref()
.unwrap()
}
This code fails to compile:
error[E0515]: cannot return value referencing temporary value
--> src/lib.rs:15:5
|
15 | ANSWER
| _____^
| |_____|
| ||
16 | || .read()
17 | || .unwrap()
| ||_________________- temporary value created here
18 | | .as_ref()
19 | | .unwrap()
| |__________________^ returns a value referencing data owned by the current function
I know you can not return a reference to a temporary value. But I want to return a reference to ANSWER which is static - the very opposite of temporary! I guess it is the RwLockReadGuard that the first call to unwrap returns that is the problem?
I can get the code to compile by changing the return type:
fn what_is_the_answer() -> RwLockReadGuard<'static, Option<Answer>> {
ANSWER
.read()
.unwrap()
}
But now the calling code becomes very unergonomic - I have to do two extra calls to get to the actual value:
what_is_the_answer().as_ref().unwrap()
Can I somehow return a reference to the static ANSWER from this function? Can I get it to return a RwLockReadGuard<&Answer> maybe by mapping somehow?
once_cell is designed for this: use .set(...).unwrap() in answer_question and .get().unwrap() in what_is_the_answer.
As far as I understand your intention, the value of Answer can't be computed while it is being initialized in the lazy_static but depends on parameters known only when answer_question is called. The following may not be the most elegant solution, yet it allows for having a &'static-reference to a value that depends on parameters only known at runtime.
The basic approach is to use two lazy_static-values, one of which serves as a "proxy" to do the necessary synchronization, the other being the value itself. This avoids having to access multiple layers of locks and unwrapping of Option-values whenever you access ANSWER.
The ANSWER-value is initialized by waiting on a CondVar, which will signal when the value has been computed. The value is then placed in the lazy_static and from then on unmovable. Hence &'static is possible (see get_the_answer()). I have chosen String as the example-type. Notice that accessing ANSWER without calling generate_the_answer() will cause the initialization to wait forever, deadlocking the program.
use std::{sync, thread};
lazy_static::lazy_static! {
// A proxy to synchronize when the value is generated
static ref ANSWER_PROXY: (sync::Mutex<Option<String>>, sync::Condvar) = {
(sync::Mutex::new(None), sync::Condvar::new())
};
// The actual value, which is initialized from the proxy and stays in place
// forever, hence allowing &'static access
static ref ANSWER: String = {
let (lock, cvar) = &*ANSWER_PROXY;
let mut answer = lock.lock().unwrap();
loop {
// As long as the proxy is None, the answer has not been generated
match answer.take() {
None => answer = cvar.wait(answer).unwrap(),
Some(answer) => return answer,
}
}
};
}
// Generate the answer and place it in the proxy. The `param` is just here
// to demonstrate we can move owned values into the proxy
fn generate_the_answer(param: String) {
// We don't need a thread here, yet we can
thread::spawn(move || {
println!("Generating the answer...");
let mut s = String::from("Hello, ");
s.push_str(&param);
thread::sleep(std::time::Duration::from_secs(1));
let (lock, cvar) = &*ANSWER_PROXY;
*lock.lock().unwrap() = Some(s);
cvar.notify_one();
println!("Answer generated.");
});
}
// Nothing to see here, except that we have a &'static reference to the answer
fn get_the_answer() -> &'static str {
println!("Asking for the answer...");
&ANSWER
}
fn main() {
println!("Hello, world!");
// Accessing `ANSWER` without generating it will deadlock!
//get_the_answer();
generate_the_answer(String::from("John!"));
println!("The answer is \"{}\"", get_the_answer());
// The second time a value is generated, noone is listening.
// This is the flipside of `ANSWER` being a &'static
generate_the_answer(String::from("Peter!"));
println!("The answer is still \"{}\"", get_the_answer());
}

How can multiple threads share an iterator?

I've been working on a function that will copy a bunch of files from a source to a destination using Rust and threads. I'm getting some trouble making the threads share the iterator. I am not still used to the borrowing system:
extern crate libc;
extern crate num_cpus;
use libc::{c_char, size_t};
use std::thread;
use std::fs::copy;
fn python_str_array_2_str_vec<T, U, V>(_: T, _: U) -> V {
unimplemented!()
}
#[no_mangle]
pub extern "C" fn copyFiles(
sources: *const *const c_char,
destinies: *const *const c_char,
array_len: size_t,
) {
let src: Vec<&str> = python_str_array_2_str_vec(sources, array_len);
let dst: Vec<&str> = python_str_array_2_str_vec(destinies, array_len);
let mut iter = src.iter().zip(dst);
let num_threads = num_cpus::get();
let threads = (0..num_threads).map(|_| {
thread::spawn(|| while let Some((s, d)) = iter.next() {
copy(s, d);
})
});
for t in threads {
t.join();
}
}
fn main() {}
I'm getting this compilation error that I have not been able to solve:
error[E0597]: `src` does not live long enough
--> src/main.rs:20:20
|
20 | let mut iter = src.iter().zip(dst);
| ^^^ does not live long enough
...
30 | }
| - borrowed value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
error[E0373]: closure may outlive the current function, but it borrows `**iter`, which is owned by the current function
--> src/main.rs:23:23
|
23 | thread::spawn(|| while let Some((s, d)) = iter.next() {
| ^^ ---- `**iter` is borrowed here
| |
| may outlive borrowed value `**iter`
|
help: to force the closure to take ownership of `**iter` (and any other referenced variables), use the `move` keyword, as shown:
| thread::spawn(move || while let Some((s, d)) = iter.next() {
I've seen the following questions already:
Value does not live long enough when using multiple threads
I'm not using chunks, I would like to try to share an iterator through the threads although creating chunks to pass them to the threads will be the classic solution.
Unable to send a &str between threads because it does not live long enough
I've seen some of the answers to use channels to communicate with the threads, but I'm not quite sure about using them. There should be an easier way of sharing just one object through threads.
Why doesn't a local variable live long enough for thread::scoped
This got my attention, scoped is supposed to fix my error, but since it is in the unstable channel I would like to see if there is another way of doing it just using spawn.
Can someone explain how should I fix the lifetimes so the iterator can be accessed from the threads?
Here's a minimal, reproducible example of your problem:
use std::thread;
fn main() {
let src = vec!["one"];
let dst = vec!["two"];
let mut iter = src.iter().zip(dst);
thread::spawn(|| {
while let Some((s, d)) = iter.next() {
println!("{} -> {}", s, d);
}
});
}
There are multiple related problems:
The iterator lives on the stack and the thread's closure takes a reference to it.
The closure takes a mutable reference to the iterator.
The iterator itself has a reference to a Vec that lives on the stack.
The Vec itself has references to string slices that likely live on the stack but are not guaranteed to live longer than the thread either way.
Said another way, the Rust compiler has stopped you from executing four separate pieces of memory unsafety.
A main thing to recognize is that any thread you spawn might outlive the place where you spawned it. Even if you call join right away, the compiler cannot statically verify that will happen, so it has to take the conservative path. This is the point of scoped threads — they guarantee the thread exits before the stack frame they were started in.
Additionally, you are attempting to use a mutable reference in multiple concurrent threads. There's zero guarantee that the iterator (or any of the iterators it was built on) can be safely called in parallel. It's entirely possible that two threads call next at exactly the same time. The two pieces of code run in parallel and write to the same memory address. One thread writes half of the data and the other thread writes the other half, and now your program crashes at some arbitrary point in the future.
Using a tool like crossbeam, your code would look something like:
use crossbeam; // 0.7.3
fn main() {
let src = vec!["one"];
let dst = vec!["two"];
let mut iter = src.iter().zip(dst);
while let Some((s, d)) = iter.next() {
crossbeam::scope(|scope| {
scope.spawn(|_| {
println!("{} -> {}", s, d);
});
})
.unwrap();
}
}
As mentioned, this will only spawn one thread at a time, waiting for it to finish. An alternative to get more parallelism (the usual point of this exercise) is to interchange the calls to next and spawn. This requires transferring ownership of s and d to the thread via the move keyword:
use crossbeam; // 0.7.3
fn main() {
let src = vec!["one", "alpha"];
let dst = vec!["two", "beta"];
let mut iter = src.iter().zip(dst);
crossbeam::scope(|scope| {
while let Some((s, d)) = iter.next() {
scope.spawn(move |_| {
println!("{} -> {}", s, d);
});
}
})
.unwrap();
}
If you add a sleep call inside the spawn, you can see the threads run in parallel.
I'd have written it using a for loop, however:
let iter = src.iter().zip(dst);
crossbeam::scope(|scope| {
for (s, d) in iter {
scope.spawn(move |_| {
println!("{} -> {}", s, d);
});
}
}).unwrap();
In the end, the iterator is exercised on the current thread, and each value returned from the iterator is then handed off to a new thread. The new threads are guaranteed to exit before the captured references.
You may be interested in Rayon, a crate that allows easy parallelization of certain types of iterators.
See also:
How can I pass a reference to a stack variable to a thread?
Lifetime troubles sharing references between threads
How do I use static lifetimes with threads?
Thread references require static lifetime?
Lifetime woes when using threads
Cannot call a function in a spawned thread because it "does not fulfill the required lifetime"

How to avoid mutex borrowing problems when using it's guard

I want my method of struct to perform in a synchronized way. I wanted to do this by using Mutex (Playground):
use std::sync::Mutex;
use std::collections::BTreeMap;
pub struct A {
map: BTreeMap<String, String>,
mutex: Mutex<()>,
}
impl A {
pub fn new() -> A {
A {
map: BTreeMap::new(),
mutex: Mutex::new(()),
}
}
}
impl A {
fn synchronized_call(&mut self) {
let mutex_guard_res = self.mutex.try_lock();
if mutex_guard_res.is_err() {
return
}
let mut _mutex_guard = mutex_guard_res.unwrap(); // safe because of check above
let mut lambda = |text: String| {
let _ = self.map.insert("hello".to_owned(),
"d".to_owned());
};
lambda("dd".to_owned());
}
}
Error message:
error[E0500]: closure requires unique access to `self` but `self.mutex` is already borrowed
--> <anon>:23:26
|
18 | let mutex_guard_res = self.mutex.try_lock();
| ---------- borrow occurs here
...
23 | let mut lambda = |text: String| {
| ^^^^^^^^^^^^^^ closure construction occurs here
24 | if let Some(m) = self.map.get(&text) {
| ---- borrow occurs due to use of `self` in closure
...
31 | }
| - borrow ends here
As I understand when we borrow anything from the struct we are unable to use other struct's fields till our borrow is finished. But how can I do method synchronization then?
The closure needs a mutable reference to the self.map in order to insert something into it. But closure capturing works with whole bindings only. This means, that if you say self.map, the closure attempts to capture self, not self.map. And self can't be mutably borrowed/captured, because parts of self are already immutably borrowed.
We can solve this closure-capturing problem by introducing a new binding for the map alone such that the closure is able to capture it (Playground):
let mm = &mut self.map;
let mut lambda = |text: String| {
let _ = mm.insert("hello".to_owned(), text);
};
lambda("dd".to_owned());
However, there is something you overlooked: since synchronized_call() accepts &mut self, you don't need the mutex! Why? Mutable references are also called exclusive references, because the compiler can assure at compile time that there is only one such mutable reference at any given time.
Therefore you statically know, that there is at most one instance of synchronized_call() running on one specific object at any given time, if the function is not recursive (calls itself).
If you have mutable access to a mutex, you know that the mutex is unlocked. See the Mutex::get_mut() method for more explanation. Isn't that amazing?
Rust mutexes do not work the way you are trying to use them. In Rust, a mutex protects specific data relying on the borrow-checking mechanism used elsewhere in the language. As a consequence, declaring a field Mutex<()> doesn't make sense, because it is protecting read-write access to the () unit object that has no values to mutate.
As Lukas explained, your call_synchronized as declared doesn't need to do synchronization because its signature already requests an exclusive (mutable) reference to self, which prevents it from being invoked from multiple threads on the same object. In other words, you need to change the signature of call_synchronized because the current one does not match the functionality it is intended to provide.
call_synchronized needs to accept a shared reference to self, which will signal to Rust that it can be called from multiple threads in the first place. Inside call_synchronized a call to Mutex::lock will simultaneously lock the mutex and provide a mutable reference to the underlying data, carefully scoped so that the lock is held for the duration of the reference:
use std::sync::Mutex;
use std::collections::BTreeMap;
pub struct A {
synced_map: Mutex<BTreeMap<String, String>>,
}
impl A {
pub fn new() -> A {
A {
synced_map: Mutex::new(BTreeMap::new()),
}
}
}
impl A {
fn synchronized_call(&self) {
let mut map = self.synced_map.lock().unwrap();
// omitting the lambda for brevity, but it would also work
// (as long as it refers to map rather than self.map)
map.insert("hello".to_owned(), "d".to_owned());
}
}

Resources