Understanding &* to access a Rust Arc - rust

Reading about Condvar (condition variable for Rust) at https://doc.rust-lang.org/beta/std/sync/struct.Condvar.html I stumbled upon:
use std::sync::{Arc, Mutex, Condvar};
use std::thread;
let pair = Arc::new((Mutex::new(false), Condvar::new()));
let pair2 = pair.clone();
// Inside of our lock, spawn a new thread, and then wait for it to start.
thread::spawn(move|| {
let (lock, cvar) = &*pair2;
let mut started = lock.lock().unwrap();
*started = true;
// We notify the condvar that the value has changed.
cvar.notify_one();
});
// Wait for the thread to start up.
let (lock, cvar) = &*pair;
let mut started = lock.lock().unwrap();
while !*started {
started = cvar.wait(started).unwrap();
}
What is the &*pair2 thing? I think it has to do with being able to retrieve the pair from inside the Arc, but shouldn't it be better to have a simple method that retrives the internal object of the Arc as a reference?
Can somebody explain to me exactly what &* does?

The * operator turns the Arc<T> into T. The & operator borrows that T into &T.
So when we put them together, &*pair borrows the Arc<T> into &T.
Another way of writing that code would be:
let (lock, cvar) = pair2.deref();
Indeed, the original &*pair2 actually means &*pair2.deref() – the * forces the compiler to insert a .deref() call, and it's that method which performs the actual conversion.

Related

Multithreaded list iteration while using Mutex to prevent dealing with the same type at the same time

I am writing an application that needs to run on many threads at the same time. It will process a long list of items where one property of each item is a user_id. I am trying to make sure that items belonging to the same user_id are never processed at the same time. This means that the closure running the sub threads needs to wait until no other thread is processing data for the same user.
I do not understand how to solve this. My simplified, current example, looks like this:
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use threadpool::ThreadPool;
fn main() {
let pool = ThreadPool::new(num_cpus::get());
let mut locks: HashMap<String, Mutex<bool>> = HashMap::new();
let queue = Arc::new(vec![
"1".to_string(),
"1".to_string(),
"2".to_string(),
"1".to_string(),
"3".to_string(),
]);
let count = queue.len();
for i in 0..count {
let user_id = queue[i].clone();
// Problem: cannot borrow `locks` as mutable more than once at a time
// mutable borrow starts here in previous iteration of loop
let lock = locks.entry(user_id).or_insert(Mutex::new(true));
pool.execute(move || {
// Wait until the user_id becomes free.
lock.lock().unwrap();
// Do stuff with user_id, but never process
// the same user_id more than once at the same time.
println!("{:?}", user_id);
});
}
pool.join();
}
I am trying to keep a list of Mutex which I then use to wait for the user_id to become free, but the borrow checker does not allow this. The queue items and the item process code is much more complex in the actual application I am working on.
I am not allowed to change the order of the items in the queue (but some variations will be allowed because of waiting for the lock).
How to solve this scenario?
First of all, HashMap::entry() consumes the key, so since you want to use it in the closure as well, you'll need to clone it, i.e. .entry(user_id.clone()).
Since you need to share the Mutex<bool> between the main thread and worker threads, then you need to likewise wrap that in an Arc. You can also use Entry::or_insert_with(), so you avoid needlessly creating a new Mutex unless needed.
let mut locks: HashMap<String, Arc<Mutex<bool>>> = HashMap::new();
// ...
let lock = locks
.entry(user_id.clone())
.or_insert_with(|| Arc::new(Mutex::new(true)))
.clone();
Lastly, you must store the guard returned by lock(), otherwise it is immediately released.
let _guard = lock.lock().unwrap();
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use threadpool::ThreadPool;
fn main() {
let pool = ThreadPool::new(num_cpus::get());
let mut locks: HashMap<String, Arc<Mutex<bool>>> = HashMap::new();
let queue = Arc::new(vec![
"1".to_string(),
"1".to_string(),
"2".to_string(),
"1".to_string(),
"3".to_string(),
]);
let count = queue.len();
for i in 0..count {
let user_id = queue[i].clone();
let lock = locks
.entry(user_id.clone())
.or_insert_with(|| Arc::new(Mutex::new(true)))
.clone();
pool.execute(move || {
// Wait until the user_id becomes free.
let _guard = lock.lock().unwrap();
// Do stuff with user_id, but never process
// the same user_id more than once at the same time.
println!("{:?}", user_id);
});
}
pool.join();
}

Confusing automatic dereferencing of Arc

This is an example taken from the Mutex documentation:
use std::sync::{Arc, Mutex};
use std::sync::mpsc::channel;
use std::thread;
const N: usize = 10;
fn main() {
let data = Arc::new(Mutex::new(0));
let (tx,rx) = channel();
for _ in 0..N{
let (data, tx) = (data.clone(), tx.clone());
thread::spawn(move || {
// snippet
});
}
rx.recv().unwrap();
}
My question is where the snippet comment is. It is given as
let mut data = data.lock().unwrap();
*data += 1;
if *data == N {
tx.send(()).unwrap();
}
The type of data is Arc<Mutex<usize>>, so when calling data.lock(), I assumed that the Arc is being automatically dereferenced and an usize is assigned to data. Why do we need a *in front of data again to dereference it?
The following code which first dereferences the Arc and then proceeds with just an usize also works in place of the snippet.
let mut data = *data.lock().unwrap();
data += 1;
if data == N {
tx.send(()).unwrap();
}
Follow the docs. Starting with Arc<T>:
Does Arc::lock exist? No. Check Deref.
Deref::Target is T. Check Mutex<T>.
Does Mutex::lock exist? Yes. It returns LockResult<MutexGuard<T>>.
Where does unwrap come from? LockResult<T> is a synonym for Result<T, PoisonError<T>>. So it's Result::unwrap, which results in a MutexGuard<T>.
Therefore, data is of type MutexGuard<usize>.
So this is wrong:
so when calling data.lock(), I assumed that the Arc is being automatically dereferenced and an usize is assigned to data.
Thus the question is not why you can't assign directly, but how you're able to assign an usize value at all. Again, follow the docs:
data is a MutexGuard<usize>, so check MutexGuard<T>.
*data is a pointer dereference in a context that requires mutation. Look for an implementation of DerefMut.
It says that for MutexGuard<T>, it implements DerefMut::deref_mut(&mut self) -> &mut T.
Thus, the result of *data is &mut usize.
Then we have your modified example. At this point, it should be clear that this is not at all doing the same thing: it's mutating a local variable that happens to contain the same value as the mutex. But because it's a local variable, changing it has absolutely no bearing on the contents of the mutex.
Thus, the short version is: the result of locking a mutex is a "smart pointer" wrapping the actual value, not the value itself. Thus you have to dereference it to access the value.

Cannot move data out of a Mutex

Consider the following code example, I have a vector of JoinHandlers in which I need it iterate over to join back to the main thread, however, upon doing so I am getting the error error: cannot move out of borrowed content.
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
//do some work
}
threads.lock().unwrap().push((handle));
}
for t in threads.lock().unwrap().iter() {
t.join();
}
Unfortunately, you can't do this directly. When Mutex consumes the data structure you fed to it, you can't get it back by value again. You can only get &mut reference to it, which won't allow moving out of it. So even into_iter() won't work - it needs self argument which it can't get from MutexGuard.
There is a workaround, however. You can use Arc<Mutex<Option<Vec<_>>>> instead of Arc<Mutex<Vec<_>>> and then just take() the value out of the mutex:
for t in threads.lock().unwrap().take().unwrap().into_iter() {
}
Then into_iter() will work just fine as the value is moved into the calling thread.
Of course, you will need to construct the vector and push to it appropriately:
let threads = Arc::new(Mutex::new(Some(Vec::new())));
...
threads.lock().unwrap().as_mut().unwrap().push(handle);
However, the best way is to just drop the Arc<Mutex<..>> layer altogether (of course, if this value is not used from other threads).
As referenced in How to take ownership of T from Arc<Mutex<T>>? this is now possible to do without any trickery in Rust using Arc::try_unwrap and Mutex.into_inner()
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
println!("{}", _x);
});
threads.lock().unwrap().push(handle);
}
let threads_unwrapped: Vec<JoinHandle<_>> = Arc::try_unwrap(threads).unwrap().into_inner().unwrap();
for t in threads_unwrapped.into_iter() {
t.join().unwrap();
}
Play around with it in this playground to verify.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9d5635e7f778bc744d1fb855b92db178
while the drain is a good solution, you can also do the following thing
// with a copy
let built_words: Arc<Mutex<Vec<String>>> = Arc::new(Mutex::new(vec![]));
let result: Vec<String> = built_words.lock().unwrap().clone();
// using drain
let mut locked_result = built_words.lock().unwrap();
let mut result: Vec<String> = vec![];
result.extend(locked_result.drain(..));
I would prefer to clone the data to get the original value. Not sure if it has any performance overhead.

Code not running in parallel when using thread::scoped

Can someone please explain why the code below does not run in parallel? I guess I don't understand how thread::scoped works..
use std::thread;
use std::sync::{Arc, Mutex};
use std::time::Duration;
use std::old_io::timer;
fn main() {
let buf = Arc::new(Mutex::new(Vec::<String>::new()));
let res = test(buf);
println!("{:?}", *res.lock().unwrap());
}
fn test(buf: Arc<Mutex<Vec<String>>>) -> Arc<Mutex<Vec<String>>> {
let guards: Vec<_> = (0..3).map( |i| {
let mtx = buf.clone();
thread::scoped(|| {
println!("Thread: {}", i);
let mut res = mtx.lock().unwrap();
timer::sleep(Duration::seconds(5));
res.push(format!("thread {}", i));
});
}).collect();
buf
}
The code is based on the examples here where it's stated:
The scoped function takes one argument, a closure, indicated by the double bars ||. This closure is executed in a new thread created by scoped. The method is called scoped because it returns a 'join guard', which will automatically join the child thread when it goes out of scope. Because we collect these guards into a Vec, and that vector goes out of scope at the end of our program, our program will wait for every thread to finish before finishing.
Thanks
This is a tricky case. The problem is the humble semicolon. Look at this minimized code:
thread::scoped(|| {});
That semicolon means that the result of the collect isn't a vector of JoinGuards — it's a Vec<()>! Each JoinGuard is dropped immediately, forcing the thread to finish before the next iteration starts.
When you fix this issue, you'll hit the next problem, which is that i and mtx don't live long enough. You'll need to move them into the closure:
thread::scoped(move || {})

What do I use to share an object with many threads and one writer in Rust?

What is the right approach to share a common object between many threads when the object may sometimes be written to by one owner?
I tried to create one Configuration trait object with several methods to get and set config keys. I'd like to pass this to other threads where configuration items may be read. Bonus points would be if it can be written and read by everyone.
I found a Reddit thread which talks about Rc and RefCell; would that be the right way? I think these would not allow me to borrow the object immutably multiple times and still mutate it.
Rust has a built-in concurrency primitive exactly for this task called RwLock. Together with Arc, it can be used to implement what you want:
use std::sync::{Arc, RwLock};
use std::sync::mpsc;
use std::thread;
const N: usize = 12;
let shared_data = Arc::new(RwLock::new(Vec::new()));
let (finished_tx, finished_rx) = mpsc::channel();
for i in 0..N {
let shared_data = shared_data.clone();
let finished_tx = finished_tx.clone();
if i % 4 == 0 {
thread::spawn(move || {
let mut guard = shared_data.write().expect("Unable to lock");
guard.push(i);
finished_tx.send(()).expect("Unable to send");
});
} else {
thread::spawn(move || {
let guard = shared_data.read().expect("Unable to lock");
println!("From {}: {:?}", i, *guard);
finished_tx.send(()).expect("Unable to send");
});
}
}
// wait until everything's done
for _ in 0..N {
let _ = finished_rx.recv();
}
println!("Done");
This example is very silly but it demonstrates what RwLock is and how to use it.
Also note that Rc and RefCell/Cell are not appropriate in a multithreaded environment because they are not synchronized properly. Rust won't even allow you to use them at all with thread::spawn(). To share data between threads you must use an Arc, and to share mutable data you must additionally use one of the synchronization primitives like RWLock or Mutex.

Resources