Get HashMap from thread - multithreading

I am trying to get a value from a thread, in this case a HashMap. I reduced the code to the following (I originally tried to share a HashMap containig a Vec):
use std::thread;
use std::sync::mpsc;
use std::sync::Mutex;
use std::sync::Arc;
use std::collections::HashMap;
fn main() {
let(tx, rx) = mpsc::channel();
let n_handle= thread::spawn( || {
tx.send(worker());
});
print!("{:?}", rx.recv().unwrap().into_inner().unwrap());
}
fn worker() -> Arc<Mutex<HashMap<String, i32>>>{
let result: HashMap<String, i32> = HashMap::new();
// some computation
Arc::from(Mutex::from(result))
}
Still Rust says that:
std::sync::mpsc::Sender<std::sync::Arc<std::sync::Mutex<std::collections::HashMap<std::string::String, i32>>>> cannot be shared between threads safely
I read some confusing stuff about putting everything into Arc<Mutex<..>> which I also tried with the value:
let result: HashMap<String, Arc<Mutex<i32>>> = HashMap::new();
Can anyone point me to a document that explains the usage of the mpsc::channel with values such as HashMaps? I understand why it is not working, as the trait Sync is not implemented for the HashMap, which is required to share the stuff. Still I have no idea how to get it to work.

You can pass the values between threads with using mpsc channel.
Until you tag your thread::spawn with the move keyword like following:
thread::spawn(move || {});
Since you did not tag it with move keyword then it is not moving the outer variables into the thread scope but only sharing their references. Thus you need to implement Sync trait that every outer variable you use.
mpsc::Sender does not implement Sync that is why you get the error cannot be shared between threads.
The solution for your case would be ideal to move the sender to inside of the thread scope with move like following:
use std::collections::HashMap;
use std::sync::mpsc;
use std::sync::Arc;
use std::sync::Mutex;
use std::thread;
fn main() {
let (tx, rx) = mpsc::channel();
thread::spawn(move || {
let _ = tx.send(worker());
});
let arc = rx.recv().unwrap();
let hashmap_guard = arc.lock().unwrap();
print!(
"HashMap that retrieved from thread : {:?}",
hashmap_guard.get("Hello").unwrap()
);
}
fn worker() -> Arc<Mutex<HashMap<String, i32>>> {
let mut result: HashMap<String, i32> = HashMap::new();
result.insert("Hello".to_string(), 2);
// some computation
Arc::new(Mutex::new(result))
}
Playground
For further info: I'd recommend reading The Rust Programming Language, specifically the chapter on concurrency. In it, you are introduced to Arc: especially if you want to share your data in between threads.

Related

Testing a thread Worker with an anonymous function

I am adding tests to the 'hello' web server from the rust book.
My issue/error is around how to test whether a Worker has processed a Job.
My idea is to pass an anonymous function which updates a bool from false to true.
I think ownership is an issue here. I tried wrapping f in a Box, thinking it would prevent passing bool as a value as opposed to a reference. Using Box I struggled to mutate the value of state_updated when it was wrapped in this way.
I also tried writing a basic struct to wrap and update the bool. I have since reverted back to a mut bool.
First question: What changes do I need to make to get the test to pass?
Second question: Is there a better way for me to test this?
Below is a minimal version which reproduces my issue.
The full code is available at the bottom of this page in the rust book.
My current test creates a Worker, sends a Job to the worker, and asserts on an expected change
that could only have occurred if the Worker has processed the Job.
I intend to iterate on this test to add proper thread cleanup in the future.
use std::sync::mpsc;
use std::sync::Arc;
use std::sync::Mutex;
use hello_server_help::Worker;
use std::thread;
use std::time::Duration;
#[test]
fn test_worker_processes_job() {
let (sender, r) = mpsc::channel();
let receiver = Arc::new(Mutex::new(r));
let _ = Worker::new(0, receiver);
let mut state_updated = false;
let f = move || state_updated = true;
sender.send(Box::new(f)).unwrap();
thread::sleep(Duration::from_secs(1)); // primitive wait, for now
assert_eq!(state_updated, true);
}
It's my understanding that f is taking ownership of state_updated. In the assert line, however,
at the end, there is no error along the lines of "referenced after move".
Running the tests gives me the output:
running 1 test
test test_worker_processes_job ... FAILED
failures:
---- test_worker_processes_job stdout ----
thread 'test_worker_processes_job' panicked at 'assertion failed: `(left == right)`
left: `false`,
right: `true`', tests/worker_tests.rs:19:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
The MRE implementation:
use std::sync::mpsc;
use std::sync::Arc;
use std::sync::Mutex;
use std::thread;
pub type Job = Box<dyn FnOnce() + Send + 'static>;
pub struct Worker {
id: usize,
handle: Option<thread::JoinHandle<()>>,
}
impl Worker {
pub fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || loop {
let job = receiver
.lock()
.expect("Error obtaining lock.")
.recv()
.unwrap();
job();
});
Worker {
id,
handle: Some(thread),
}
}
}
state_updated is a boolean so it implements Copy, which is why you can move it into your closure and keep using it afterwards, and why you can't see the changes: the one that is modified by the closure is the copy and not the original.
If you want to update a boolean in the thread and have it visible in the caller, you will need to make sure that you send a reference and you will need to have some synchronization mechanism. Two solutions:
Use an Arc<Mutex<bool>>:
use std::sync::Arc;
use std::sync::Mutex;
let state_updated = Arc::new (Mutex::new (false));
let state_ref = state_updated.clone()
let f = move || *state_ref.lock().unwrap() = true;
…
assert_eq!(*state_updated.lock().unwrap(), true);
Or use an AtomicBool:
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering;
let state_updated = AtomicBool::new (false);
let state_ref = &state_updated;
let f = move || state_ref.store (true, Ordering::Release);
…
assert_eq!(state_updated.load (Ordering::Acquire), true);
The compiler will complain that "state_ref does not live long enough", but you can get around that by using a scoped thread (or from rayon or crossbeam), or with a bit of unsafe: let state_ref: &'static AtomicBool = unsafe { transmute (&state_updated) }; (just make sure you join the child thread before state_updated goes out of scope).
It might however be better to use a channel for the return value:
use use std::sync::mpsc;
let (rsend, rrecv) = mpsc::channel();
let f = move || rsend.send(());
…
assert_eq!(rrecv.recv_timeout (Duration::from_secs (1)), Ok(()));
that way you only wait until the result is available (the duration is just a timeout if the thread takes too long to compute the result).

Rust chunks method with owned values?

I'm trying to perform a parallel operation on several chunks of strings at a time, and I'm finding having an issue with the borrow checker:
(for context, identifiers is a Vec<String> from a CSV file, client is reqwest and target is an Arc<String> that is write once read many)
use futures::{stream, StreamExt};
use std::sync::Arc;
async fn nop(
person_ids: &[String],
target: &str,
url: &str,
) -> String {
let noop = format!("{} {}", target, url);
let noop2 = person_ids.iter().for_each(|f| {f.as_str();});
"Some text".into()
}
#[tokio::main]
async fn main() {
let target = Arc::new(String::from("sometext"));
let url = "http://example.com";
let identifiers = vec!["foo".into(), "bar".into(), "baz".into(), "qux".into(), "quux".into(), "quuz".into(), "corge".into(), "grault".into(), "garply".into(), "waldo".into(), "fred".into(), "plugh".into(), "xyzzy".into()];
let id_sets: Vec<&[String]> = identifiers.chunks(2).collect();
let responses = stream::iter(id_sets)
.map(|person_ids| {
let target = target.clone();
tokio::spawn( async move {
let resptext = nop(person_ids, target.as_str(), url).await;
})
})
.buffer_unordered(2);
responses
.for_each(|b| async { })
.await;
}
Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e41c635e99e422fec8fc8a581c28c35e
Given chunks yields a Vec<&[String]>, the compiler complains that identifiers doesn't live long enough because it potentially goes out of scope while the slices are being referenced. Realistically this won't happen because there's an await. Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
There was a similarly asked question that used into_owned() as a solution, but when I try that, rustc complains about the slice size not being known at compile time in the request_user function.
EDIT: Some other questions as well:
Is there a more direct way of using target in each thread without needing Arc? From the moment it is created, it never needs to be modified, just read from. If not, is there a way of pulling it out of the Arc that doesn't require the .as_str() method?
How do you handle multiple error types within the tokio::spawn() block? In the real world use, I'm going to receive quick_xml::Error and reqwest::Error within it. It works fine without tokio spawn for concurrency.
Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
You can chunk a Vec<T> into a Vec<Vec<T>> without cloning by using the itertools crate:
use itertools::Itertools;
fn main() {
let items = vec![
String::from("foo"),
String::from("bar"),
String::from("baz"),
];
let chunked_items: Vec<Vec<String>> = items
.into_iter()
.chunks(2)
.into_iter()
.map(|chunk| chunk.collect())
.collect();
for chunk in chunked_items {
println!("{:?}", chunk);
}
}
["foo", "bar"]
["baz"]
This is based on the answers here.
Your issue here is that the identifiers are a Vector of references to a slice. They will not necessarily be around once you've left the scope of your function (which is what async move inside there will do).
Your solution to the immediate problem is to convert the Vec<&[String]> to a Vec<Vec<String>> type.
A way of accomplishing that would be:
let id_sets: Vec<Vec<String>> = identifiers
.chunks(2)
.map(|x: &[String]| x.to_vec())
.collect();

How to use std::slice::Chunks on Arc<Mutex<Vec<u8>>> properly between threads in Rust?

i'd like to gain understanding why following seems to not work properly in Rust.
I'd like to chunk a vector and give every thread a chunk to work on it. I tried it with a Arc and a Mutex combination to have mutual access to my vec.
This was my first (obvious) attempt:
Declare the vec, chunk it, send chunk into each thread. In my understanding it should work because the Chunk methods guarantees non overlapping chunks.
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new(vec![0;20]));
let chunk_size = 5;
let mut threads = vec![];
let chunks: Vec<&mut [u8]> = data.lock().unwrap().chunks_mut(chunk_size).collect();
for chunk in chunks.into_iter(){
threads.push(thread::spawn(move || {
inside_thread(chunk)
}));
}
}
fn inside_thread(chunk: &mut [u8]) {
// now do something with chunk
}
The error says data does not live enough. Silly me, with the chunking i created pointers to the array but passed no Arc reference into the thread.
So i changed a few lines, but it would make no sense because I'd have an unused reference in my thread!?
for i in 0..data.lock().unwrap().len() / 5 {
let ref_to_data = data.clone();
threads.push(thread::spawn(move || {
inside_thread(chunk, ref_to_data)
}));
}
The error still says that data does not live enough.
The next attempt didn't work either. I thought ok i could work around it and chunk it inside the thread and get my chunk by an index. But if it worked the code wouldn't be very rust idiomatic :/
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new(vec![0;20]));
let chunk_size = 5;
let mut threads = vec![];
for i in 0..data.lock().unwrap().len() / chunk_size {
let ref_to_data = data.clone();
threads.push(thread::spawn(move || {
inside_thread(ref_to_data, i, chunk_size)
}));
}
}
fn inside_thread(data: Arc<Mutex<Vec<u8>>>, index: usize, chunk_size: usize) {
let chunk: &mut [u8] = data.lock().unwrap().chunks_mut(chunk_size).collect()[index];
// now do something with chunk
}
The error says:
--> src/main.rs:18:72
|
18 | let chunk: &mut [u8] = data.lock().unwrap().chunks_mut(chunk_size).collect()[index];
| ^^^^^^^
| |
| cannot infer type for type parameter `B` declared on the method `collect`
| help: consider specifying the type argument in the method call: `collect::<B>`
|
= note: type must be known at this point
And when i try to infer it it doesn't work either. So for now i tried much and nothing worked and im out of ideas.
Ok what i always could do is to "do the chunking on my own". Just mutating the vector inside the threads works fine. But this is no idiomatic nice way to do it (in my opinion).
Question: Is there a possible way to solve this problem rust idiomatic?
Hint: I know i could work with an scoped-threadpool or something like this but i want to gain this knowledge for my thesis.
Much thanks for spending your time on this!

How do I share a mutable object between threads using Arc?

I'm trying to share a mutable object between threads in Rust using Arc, but I get this error:
error[E0596]: cannot borrow data in a `&` reference as mutable
--> src/main.rs:11:13
|
11 | shared_stats_clone.add_stats();
| ^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
This is the sample code:
use std::{sync::Arc, thread};
fn main() {
let total_stats = Stats::new();
let shared_stats = Arc::new(total_stats);
let threads = 5;
for _ in 0..threads {
let mut shared_stats_clone = shared_stats.clone();
thread::spawn(move || {
shared_stats_clone.add_stats();
});
}
}
struct Stats {
hello: u32,
}
impl Stats {
pub fn new() -> Stats {
Stats { hello: 0 }
}
pub fn add_stats(&mut self) {
self.hello += 1;
}
}
What can I do?
Arc's documentation says:
Shared references in Rust disallow mutation by default, and Arc is no exception: you cannot generally obtain a mutable reference to something inside an Arc. If you need to mutate through an Arc, use Mutex, RwLock, or one of the Atomic types.
You will likely want a Mutex combined with an Arc:
use std::{
sync::{Arc, Mutex},
thread,
};
struct Stats;
impl Stats {
fn add_stats(&mut self, _other: &Stats) {}
}
fn main() {
let shared_stats = Arc::new(Mutex::new(Stats));
let threads = 5;
for _ in 0..threads {
let my_stats = shared_stats.clone();
thread::spawn(move || {
let mut shared = my_stats.lock().unwrap();
shared.add_stats(&Stats);
});
// Note: Immediately joining, no multithreading happening!
// THIS WAS A LIE, see below
}
}
This is largely cribbed from the Mutex documentation.
How can I use shared_stats after the for? (I'm talking about the Stats object). It seems that the shared_stats cannot be easily converted to Stats.
As of Rust 1.15, it's possible to get the value back. See my additional answer for another solution as well.
[A comment in the example] says that there is no multithreading. Why?
Because I got confused! :-)
In the example code, the result of thread::spawn (a JoinHandle) is immediately dropped because it's not stored anywhere. When the handle is dropped, the thread is detached and may or may not ever finish. I was confusing it with JoinGuard, a old, removed API that joined when it is dropped. Sorry for the confusion!
For a bit of editorial, I suggest avoiding mutability completely:
use std::{ops::Add, thread};
#[derive(Debug)]
struct Stats(u64);
// Implement addition on our type
impl Add for Stats {
type Output = Stats;
fn add(self, other: Stats) -> Stats {
Stats(self.0 + other.0)
}
}
fn main() {
let threads = 5;
// Start threads to do computation
let threads: Vec<_> = (0..threads).map(|_| thread::spawn(|| Stats(4))).collect();
// Join all the threads, fail if any of them failed
let result: Result<Vec<_>, _> = threads.into_iter().map(|t| t.join()).collect();
let result = result.unwrap();
// Add up all the results
let sum = result.into_iter().fold(Stats(0), |i, sum| sum + i);
println!("{:?}", sum);
}
Here, we keep a reference to the JoinHandle and then wait for all the threads to finish. We then collect the results and add them all up. This is the common map-reduce pattern. Note that no thread needs any mutability, it all happens in the master thread.

What do I use to share an object with many threads and one writer in Rust?

What is the right approach to share a common object between many threads when the object may sometimes be written to by one owner?
I tried to create one Configuration trait object with several methods to get and set config keys. I'd like to pass this to other threads where configuration items may be read. Bonus points would be if it can be written and read by everyone.
I found a Reddit thread which talks about Rc and RefCell; would that be the right way? I think these would not allow me to borrow the object immutably multiple times and still mutate it.
Rust has a built-in concurrency primitive exactly for this task called RwLock. Together with Arc, it can be used to implement what you want:
use std::sync::{Arc, RwLock};
use std::sync::mpsc;
use std::thread;
const N: usize = 12;
let shared_data = Arc::new(RwLock::new(Vec::new()));
let (finished_tx, finished_rx) = mpsc::channel();
for i in 0..N {
let shared_data = shared_data.clone();
let finished_tx = finished_tx.clone();
if i % 4 == 0 {
thread::spawn(move || {
let mut guard = shared_data.write().expect("Unable to lock");
guard.push(i);
finished_tx.send(()).expect("Unable to send");
});
} else {
thread::spawn(move || {
let guard = shared_data.read().expect("Unable to lock");
println!("From {}: {:?}", i, *guard);
finished_tx.send(()).expect("Unable to send");
});
}
}
// wait until everything's done
for _ in 0..N {
let _ = finished_rx.recv();
}
println!("Done");
This example is very silly but it demonstrates what RwLock is and how to use it.
Also note that Rc and RefCell/Cell are not appropriate in a multithreaded environment because they are not synchronized properly. Rust won't even allow you to use them at all with thread::spawn(). To share data between threads you must use an Arc, and to share mutable data you must additionally use one of the synchronization primitives like RWLock or Mutex.

Resources