Multithreaded hashmap insertion using tokio and Mutex - multithreading

[PLAYGROUND]
I need to execute parallel calls(in this example 2) once and insert result values into the same mutable HashMap defined earlier, then only after all are completed (running once) the program progresses further and extracts the HashMap from Mutex<>.
let mut REZN:Mutex<HashMap<u8, (u128, u128)>> = Mutex::new(HashMap::new());
let b=vec![0, 1, 2, (...), 4999, 5000];
let payload0 = &b[0..2500];
let payload1 = &b[2500..5000];
tokio::spawn(async move{
let result_ = //make calls
for (i,j) in izip!(payload0.iter(), result_.iter()){
REZN.lock().unwrap().insert(*i, (j[0], j[1]));
};
});
tokio::spawn(async move{
let result_ = //make calls
for (i,j) in izip!(payload1.iter(), result_.iter()){
REZN.lock().unwrap().insert(*i, (j[0], j[1]));
};
});
I'm just starting with multithreading in Rust. Both the hashmap and the object used to make calls are moved into the spawned thread. I read that cloning should be done and I tried it, but the compiler says:
&mut REZN.lock().unwrap().clone().insert(*i, (j[0], j[1]));
| |---- use occurs due to use in generator
what does that mean? what's a generator in that context?
and
value moved here, in previous iteration of loop errors are abundant.
I don't want it to do more than 1 iteration. How can I put a stop once each is done its job inserting into the HashMap?
Later, I'm trying to escape the lock/extract the Hashmap from inside of Mutex<>:
let mut REZN:HashMap<u8, (u128, u128)> = *REZN.lock().unwrap();
| ^^^^^^^^^^^^^^^^^^^^^
| |
| move occurs because value has type `HashMap<u8, (u128, u128)>`, which does not implement the `Copy` trait
| help: consider borrowing here: `&*REZN.lock().unwrap()`
But if I borrow here errors appear elsewhere. Could this work though if there was no conflict? I read that Mutex is removed automatically when threads are done working on it, but I don't know how that happens exactly on a lower level (if you can reccomend resources I'll be glad to read up on that).
I tried clone() both in the threads and the later attempt of extracting the HashMap, and they fail unfortunately. Am I doing it wrong?
Finally, how can I await until both are completed to proceed further in my program?

what does that mean? what's a generator in that context?
An async block compiles to a generator.
I tried clone() both in the threads and the later attempt of extracting the HashMap, and they fail unfortunately. Am I doing it wrong?
Yes. If you clone inside the thread/tasks, then first the map is moved into the routine then it's cloned when used. That's not helpful, because once the map has been moved it can't be used from the caller anymore.
A common solution to that is the "capture clause pattern", where you use an outer block which can then do the setup for a closure or inner block:
tokio::spawn({
let REZN = REZN.clone();
async move{
let result_ = [[6, 406], [7,407]];//make calls
for (i,j) in izip!(payload0.iter(), result_.iter()){
REZN.lock().unwrap().insert(*i, (j[0], j[1]));
};
});
This way only the cloned map will be moved into the closure.
However this is not very useful, or efficient, or convenient: by cloning the map, each tasks gets its own map (a copy of the original), and you're left with just the unmodified original. This means there's nothing to extract, because in practice it's as if nothing had happened. This also makes the mutex redundant: since each tasks has its own (copy of the) map, there's no need for synchronisation because there's no sharing.
The solution is to use shared ownership primitives, namely Arc:
let REZN: Arc<Mutex<HashMap<u8, (u128, u128)>>> = Arc::new(Mutex::new(HashMap::new()));
this way you can share the map between all coroutines, and the mutex will synchronise access: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=33ce606b1ab7c2dfc7f4897de69855ef
Alternatively, Rust threads and tasks can return values, so each task could create a map internally, return it after it's done, and the parent can get those maps and merge them:
let task1 = tokio::spawn(async move {
let mut map = Map::new();
let result_ = [[6, 406], [7, 407]]; //make calls
for (i, j) in izip!(payload0.iter(), result_.iter()) {
map.insert(*i, (j[0], j[1]));
}
map
});
let task2 = tokio::spawn(async move {
let mut map = Map::new();
let result_ = [[6, 106], [7, 907]]; //make calls
for (i, j) in izip!(payload1.iter(), result_.iter()) {
map.insert(*i, (j[0], j[1]));
}
map
});
match tokio::join![task1, task2] {
(Ok(mut m1), Ok(m2)) => {
m1.extend(m2.into_iter());
eprintln!("{:?}", m1);
}
e => eprintln!("Error {:?}", e),
}
This has a higher number of allocations, but there is no synchronisation necessary between the workers.

A mutex will give you safe multithread access, but you also need to share the ownership of the mutex itself between those threads.
If you used scoped threads, you could just use a &Mutex<HashMap<...>>, but if you want or need to use normal tokio spawned tasks, you cannot pass a reference because normal tokio tasks require the callback to be 'static, and a reference to a local variable will not comply. In this case the idiomatic solution is to use an Arc<Mutex<HashMap<...>>>.
let REZN:Mutex<HashMap<u8, (u128, u128)>> = Mutex::new(HashMap::new());
let REZN = Arc::new(REZN);
And then pass a clone of the Arc to the spawned tasks. There are several ways to write that but my favourite currently is this:
let task1 = {
let REZN = Arc::clone(&REZN);
tokio::spawn(async move{
//...
})
};
A little known fact about Arc is that you can extract the inner value using Arc::try_unwrap(), but that will only work if your Arc is the only one pointing to this value. In your case you can ensure that by waiting for (joining) the spawned tasks.
task1.await.unwrap();
task2.await.unwrap();
And then you can unwrap the Arc and the Mutex with this nice looking line:
let REZN = Arc::try_unwrap(REZN).unwrap().into_inner().unwrap();
These four unwraps are for the following:
Arc::try_unwrap(REZN) gets to the inner value of the Arc.
But only if this is the only clone of the Arc so we have a Result that we have to unwrap().
We get a Mutex that we unwrap using into_inner(). Note that we do not lock the mutex to extract the inner value: since into_inner() requires the mutex by value we are sure that it is not borrowed anywhere and we are sure we have exclusive access.
But this can fail too if the mutex is poisoned, so another unwrap() to get the real value. This is not needed if you use tokio::Mutex instead, because they don't have poisoning.
You can see the whole thing in this playground.

Related

Why did moving variables out of HashMap's insert method scope solve a deadlock?

OB_HM is Arc<Mutex<HashMap<...>>>
This caused the deadlock at very first iteration:
let mut OB_HM__:HashMap<u8, (f64, f64, f64, f64)> = HashMap::new();
for i in &p {
OB_HM__.insert(
*i,
(
OB_HM.lock().unwrap()[i].0, OB_HM.lock().unwrap()[i].1
R[&i].0,
R[&i].1
)
);
};
let mut OB_HM = OB_HM__;
And this solved it:
let mut OB_HM__:HashMap<u8, (f64, f64, f64, f64)> = HashMap::new();
for i in &p {
let x = OB_HM.lock().unwrap()[i].0;
let y = OB_HM.lock().unwrap()[i].1;
OB_HM__.insert(
*i,
(
x, y,
R[&i].0,
R[&i].1
)
);
};
let mut OB_HM = OB_HM__;
Why so? I had mere intuition but I need to understand the internals behind it. I'm guessing it has to do with how Rust creates tuples or how the method insert works?
And a side question about insert - why does it require ; EOL when it's the only instruction inside a loop?
This caused the deadlock at very first iteration [...]
Why so?
Shortly, because of temporary lifetime extension.
Generally1, OB_HM.lock() create a new object, a mutex guard.
This is a temporary object and it implements Deref trait (allowing autoderef).
De-referencing the mutex guard creates a borrow of the inner object (i.e., the HashMap) that extends the lifetime of the guard itself.
Therefore, the mutex guard is not dropped until the end of entire function call (i.e., insert) and so the Mutex is not unlocked. Since you have two locks in your function call expression, that generates a deadlock.
Maybe a toy-example might help to understand: example.
From the above,
foo(mutex.lock().value(), // <--- first lock
mutex.lock().value()); // <--- second lock
produces a deadlock because both locks are released only after the entire expression foo (the function call) has been evaluated.
Indeed the output:
MutexLock! // <--- First lock
MutexLock! // <--- Second lock
> !DEADLOCK! < // <--- Two interleaved locks! Deadlock!
foo: Call!
// <--- Now the function ends
MutexUnlock // <--- And only at the end the mutex are unlocked!
MutexUnlock
Note that, this is independent from tuples.
The "working" solution works because:
let x = OB_HM.lock().unwrap()[i].0;
let y = OB_HM.lock().unwrap()[i].1;
the temporary expression of lock lasts for the entire statement.
So after x has been assigned, the temporary lock is released. Allowing y to acquire it without deadlocks.
And a side question about insert - why does it require ; EOL
Because of rust syntax I guess.
From here: loop syntax
loop BlockExpression
where BlockExpression is defined as zero or more statements.
Expression are accepted only as expression-statements (that is, with ;).
It might depend on your Mutex type.

Unable to join threads from JoinHandles stored in a Vector - Rust

I am writing a program which scrapes data from a list of websites and stores it into a struct called Listing which is then collected into a final struct called Listings.
use std::{ thread,
sync::{ Arc, Mutex }
};
fn main() {
// ... some declarations
let sites_count = site_list.len(); // site_list is a vector containing the list of websites
// The variable to be updated by the thread instances ( `Listing` is a struct holding the information )
let listings: Arc<Mutex<Vec<Vec<types::Listing<String>>>>> = Arc::new(Mutex::new(Vec::new()));
// A vector containing all the JoinHandles for the spawned threads
let mut fetch_handle: Vec<thread::JoinHandle<()>> = Vec::new();
// Spawn a thread for each concurrent website
for i in 0..sites_count {
let slist = Arc::clone(&site_list);
let listng = Arc::clone(&listings);
fetch_handle.push(
thread::spawn(move || {
println!("⌛ Spawned Thread: {}",i);
let site_profile = read_profile(&slist[i]);
let results = function1(function(2)) // A long list of functions from a submodule that make the http request and parse the data into `Listing`
listng.lock().unwrap().push(results);
}));
}
for thread in fetch_handle.iter_mut() {
thread.join().unwrap();
}
// This is the one line version of the above for loop - yields the same error.
// fetch_handle.iter().map(|thread| thread.join().unwrap());
// The final println to just test feed the target struct `Listings` with the values
println!("{}",types::Listings{ date_time: format!("{}", chrono::offset::Local::now()),
category: category.to_string(),
query: (&search_query).to_string(),
listings: listings.lock().unwrap() // It prevents me from owning this variable
}.to_json());
}
To which I stumble upon the error
error[E0507]: cannot move out of `*thread` which is behind a mutable reference
--> src/main.rs:112:9
|
112 | thread.join().unwrap();
| ^^^^^^ move occurs because `*thread` has type `JoinHandle<()>`, which does not implement the `Copy` trait
It prevents me from owning the variable after the thread.join() for loop.
When I tried assigning to check the output type
let all_listings = listings.lock().unwrap()
all_listings reports a type of MutexGuard(which is also true inside the thread for loop, but it allows me to call vector methods on it) and wouldn't allow me to own the data.
I changed the data type in the Listings struct to hold a reference instead of owning it. But it seems so the operations I perform on the struct in .to_json() require me to own its value.
The type declaration for listings inside the Listings Struct is Vec<Vec<Listing<T>>.
This code however works just fine when I move the .join().unwrap() to the end of thread::spawn() block or apply to its handle inside the for loop(whilst disabling the external .join() ). But that makes all the threads execute in a chain which is not desirable, since the main intention of using threads was to execute same functions with different data values simultaneously.
I am quite new to Rust in general(been 3 weeks since I am using it) and its my first time ever implementing Multithreading. I have only ever written single threaded programs in java and python before this, so if possible be a little noob friendly. However any help is appreciated :) .
I figured out what needed to happen. First, for this kind of thing, I agree that into_iter does what you want, but it IMO it obscures why. The why is that when you borrow on it, it doesn't own the value, which is necessary for the join() method on the JoinHandle<()> struct. You'll note its signature takes self and not &mut self or anything like that. So it needs the real object there.
To do that, you need to get your object out of the Vec<thread::JoinHandle<()>> that it's inside. As stated, into_iter does this, because it "destroys" the existing Vec and takes it over, so it fully owns the contents, and the iteration returns the "actual" objects to be joined without a copy. But you can also own the contents one at a time with remove as demonstrated below:
while fetch_handle.len() > 0 {
let cur_thread = fetch_handle.remove(0); // moves it into cur_thread
cur_thread.join().unwrap();
}
This is instead of your for loop above. The complete example in the playground is linked if you want to try that.
I hope this is clearer on how to work with things that can't be copied, but methods need to fully own them, and the issues in getting them out of collections. Imagine if you needed to end just one of those threads, and you knew which one to end, but didn't want to end them all? Vec<_>::remove would work, but into_iter would not.
Thank you for asking a question which made me think, and prompted me to go look up the answer (and try it) myself. I'm still learning Rust as well, so this helped a lot.
Edit:
Another way to do it with pop() and while let:
while let Some(cur_thread) = fetch_handle.pop() {
cur_thread.join().unwrap();
}
This goes through it from the end (pop pulls it off of the end, not the front), but doesn't reallocate or move the vector contents via pulling it off the front either.
Okay so the problem as pointed out by #PiRocks seems to be in the for loop that joins the threads.
for thread in fetch_handle.iter_mut() {
thread.join().unwrap();
}
The problem is the iter_mut(). Using into_iter() instead
for thread in fetch_handle.into_iter() {
thread.join().unwrap();
}
yields no errors and the program runs across the threads simultaneously as required.
The explanation to this, as given by #Kevin Anderson is:
Using into_iter() causes JoinHandle<()> to move into the for loop.
Also looking into the docs(std::iter)
I found that iter() and iter_mut() iterate over a reference of self whereas into_iter() iterates over self directly(owning it).
So iter_mut() was iterating over &mut thread::JoinHandle<()> instead of thread::JoinHandle<()>.

Why I get "temporary value dropped while borrowed" if I assign, but not when passing via function?

I am quite fresh with Rust. I have experience mainly in C and C++.
This code from lol_html crate example works.
use lol_html::{element, HtmlRewriter, Settings};
let mut output = vec![];
{
let mut rewriter = HtmlRewriter::try_new(
Settings {
element_content_handlers: vec![
// Rewrite insecure hyperlinks
element!("a[href]", |el| {
let href = el
.get_attribute("href")
.unwrap()
.replace("http:", "https:");
el.set_attribute("href", &href).unwrap();
Ok(())
})
],
..Settings::default()
},
|c: &[u8]| output.extend_from_slice(c)
).unwrap();
rewriter.write(b"<div><a href=").unwrap();
rewriter.write(b"http://example.com>").unwrap();
rewriter.write(b"</a></div>").unwrap();
rewriter.end().unwrap();
}
assert_eq!(
String::from_utf8(output).unwrap(),
r#"<div></div>"#
);
But if I move element_content_handlers vec outside and assign it, I get
temporary value dropped while borrowed
for the let line:
use lol_html::{element, HtmlRewriter, Settings};
let mut output = vec![];
{
let handlers = vec![
// Rewrite insecure hyperlinks
element!("a[href]", |el| {
let href = el
.get_attribute("href")
.unwrap()
.replace("http:", "https:");
el.set_attribute("href", &href).unwrap();
Ok(())
}) // this element is deemed temporary
];
let mut rewriter = HtmlRewriter::try_new(
Settings {
element_content_handlers: handlers,
..Settings::default()
},
|c: &[u8]| output.extend_from_slice(c)
).unwrap();
rewriter.write(b"<div><a href=").unwrap();
rewriter.write(b"http://example.com>").unwrap();
rewriter.write(b"</a></div>").unwrap();
rewriter.end().unwrap();
}
assert_eq!(
String::from_utf8(output).unwrap(),
r#"<div></div>"#
);
I think that the method takes ownership of the vector, but I don't understand why it does not work with the simple assignment. I don't want to let declare all elements first. I expect that there is a simple idiom to make it own all elements.
EDIT:
Compiler proposed to bind the element before the line, but what if I have a lot of elements? I would like to avoid naming 50 elements for example. Is there a way to do this without binding all the elements? Also why the lifetime of the temporary ends there inside of vec! invocation in case of a let binding, but not when I put the vec! inside newly constructed struct passed to a method? The last question is very important to me.
When I first tried to reproduce your issue, I got that try_new didn't exist. It's been removed in the latest version of lol_html. Replacing it with new, your issue didn't reproduce. I was able to reproduce with v0.2.0, though. Since the issue had to do with code generated by macros, I tried cargo expand (something you need to install, see here).
Here's what let handlers = ... expanded to in v0.2.0:
let handlers = <[_]>::into_vec(box [(
&"a[href]".parse::<::lol_html::Selector>().unwrap(),
::lol_html::ElementContentHandlers::default().element(|el| {
let href = el.get_attribute("href").unwrap().replace("http:", "https:");
el.set_attribute("href", &href).unwrap();
Ok(())
}),
)]);
and here's what it expands to in v0.3.0
let handlers = <[_]>::into_vec(box [(
::std::borrow::Cow::Owned("a[href]".parse::<::lol_html::Selector>().unwrap()),
::lol_html::ElementContentHandlers::default().element(|el| {
let href = el.get_attribute("href").unwrap().replace("http:", "https:");
el.set_attribute("href", &href).unwrap();
Ok(())
}),
)]);
Ignore the first line, it's how the macro vec! expands. The second line shows the difference in what the versions generate. The first takes a borrow of the result of parse, the second takes a Cow::Owned of it. (Cow stands for copy on write, but it's more generally useful for anything where you want to be generic over either the borrowed or owned version of something.).
So the short answer is the macro used to expand to something that wasn't owned, and now it does. As for why it worked without a separate assignment, that's because Rust automatically created a temporary variable for you.
When using a value expression in most place expression contexts, a temporary unnamed memory location is created initialized to that value and the expression evaluates to that location instead, except if promoted to a static
https://doc.rust-lang.org/reference/expressions.html#tempora...
Initially rust created multiple temporaries for you, all valid for the same-ish scope, the scope of the call to try_new. When you break out the vector to its own assignment the temporary created for element! is only valid for the scope of the vector assignment.
I took a look at the git blame for the element! macro in lol_html, and they made the change because someone opened an issue with essentially your problem. So I'd say this is a bug in a leaky abstraction, not an issue with your understanding of rust.
You are creating a temporary value inside the vector (element). This means that the value created inside of the vector only exists for that fleeting lifetime inside of the vector. At the end of the vector declaration, that value is freed, meaning that it no longer exists. This means that the value created inside vec![] only exists for that fleeting lifetime inside of vec![]. At the end of vec![], the value is freed, meaning that it no longer exists:
let handlers = vec![
______
|
| element!("a[href]", |el| {
| let href = el.get_attribute("href").unwrap().replace("http:", |"https:");
| el.set_attribute("href", &href).unwrap();
| Ok(())
| }),
|______ ^ This value is temporary
]; > the element is freed here, it no longer exists!
You then try to create a HtmlRewriter using a non-existent value!
Settings {
element_content_handlers: handlers,
// the element inside of `handlers` doesn't exist anymore!
..Settings::default()
},
Obviously, the borrow checker catches this issue, and your code doesn't compile.
The solution here is to bind that element to a variable with let:
let element = element!("a[href]", |el| {
let href = el.get_attribute("href").unwrap().replace("http:", "https:");
el.set_attribute("href", &href).unwrap();
Ok(())
});
And then create the vector:
let handlers = vec![element];
Now, the value is bound to a variable (element), and so it lives long enough to be borrowed later in HtmlRewriter::try_new
When you create something, it gets bound to the innermost scope possible for the purposes of tracking its lifetime. Using a let binding at a higher scope binds the value to that scope, making its lifetime longer. If you're creating a lot of things, then applying an operation to them (for example, passing them to another function), it often makes sense to create a vector of values and then apply a transformation to them instead. As an example,
let xs = (0..10).map(|n| SomeStruct { n }).map(|s| another_function(s)).collect();
This way you don't need to bind the SomeStruct objects to anything explicitly.

How to spawn threads from a vector of structs to run an impl fn from each struct in the vector? [duplicate]

This question already has answers here:
How can I pass a reference to a stack variable to a thread?
(1 answer)
Concurrent access to vector from multiple threads using a mutex lock
(1 answer)
How do I pass disjoint slices from a vector to different threads?
(1 answer)
Lifetime of variables passed to a new thread
(1 answer)
Processing vec in parallel: how to do safely, or without using unstable features?
(2 answers)
Closed 2 years ago.
I have a struct which contains a Vec of instances of another base class of struct. I am trying to iterate over the Vec and spawn threads which each run a single impl fn from the base struct. Nothing needs mutable access at any time after the iteration of thread spawning begins; just some basic math returning an f64 (based on values in a HashMap using keys stored in a fixed Vec in each base struct).
I am running into lifetime issues which I don't fully understand and which the compiler error messages (for once) don't help with.
Here is a stripped down version of what I want to implement (with some annotation of the errors encountered):
struct BaseStruct {
non_copy_field: Vec<&'static str>, // BaseStruct has vector members (thus can't implement Copy).
}
impl BaseStruct {
fn target_function(&self, value_dict: &HashMap<&'static str, f64>) -> f64 {
// Does some calculations, returns the result.
// Uses self.non_copy_field to get values from value_dict.
0.0
}
}
struct StructSet {
values: HashMap<&'static str, f64>, // This will be set prior to passing to base_struct.target_function().
all_structs: Vec<BaseStruct>, // Vector to be iterated over.
}
impl StructSet {
fn parallel_calculation(&self) -> f64 {
let mut result = 0.0;
let handles: Vec<_> = self.all_structs.iter().map(|base_struct| {
// Complains about lifetime here ^^^^^ or ^ here if I switch to &base_struct
thread::spawn(move || {
base_struct.target_function(&self.values)
})
}).collect();
for process in handles.iter() {
result += process.join().unwrap();
};
// Shouldn't all base_structs from self.all_structs.iter() be processed by this point?
result
} // Why does it say "...so that reference does not outlive borrowed content" here?
}
I have been trying various combinations of RwLock/Arc/Mutex wrapping the contents of the fields of StructSet to attempt to gain thread-safe, read-only access to each of the elements iterated/passed, but nothing seems to work. I'm looking to keep the codebase light, but I guess I'd consider rayon or similar as I'll need to follow this same process multiple places in the full module.
Can anyone point me in the correct direction?
Rust's lifetime system needs to know that if you borrow something (i.e. have a reference to it), that underlying value will exist for the whole time that you borrow it. For regular function calls, this is easy, but threads cause problems here - a thread you start may outlive the function you start it from. So you can't borrow values into a thread, you have to move values into a thread (which is why you have to write thread::spawn(move || { not just thread::spawn(|| {.
When you call .iter() on a Vec, the values the iterator produces are references to the values in the Vec - they're being borrowed. So you can't use them as-is from another thread. You need to move some owned value into the thread.
There are a few ways you can go about this:
If you don't need the Vec after your processing, you could switch to use .into_iter() rather than .iter(). This will iterate over the owned values in the Vec, rather than borrowing them, which means you can move them into the threads. But because the Vec is giving up ownership of the items, your Vec stops being usable after that.
If you do need your Vec after, and your values are clonable (i.e. they implement the Clone trait), you could call .iter().cloned() instead of .iter() - this will make copies of each of the values, which you can then move into the thread.
If you need the Vec afterwards, and either your values aren't clonable, or you don't want to clone them (maybe because it's expensive, or because it matters to you that both threads are using the exact same object), you can have your Vec store Arc<BaseStruct> instead of just BaseStruct - but that's not quite enough - you'll also need to explicitly clone the values before moving them into the thread (perhaps by using .iter().cloned() like above).
A downside of using Arcs is that no threads will be able to modify the values in the future. If you want to use Arcs, but you ever want o be able to modify the values in the future, you'll need instead of storing an Arc<BaseStruct>, to store an Arc<Mutex<BaseStruct>> (or an Arc<RwLock<BaseStruct>>. The Mutex ensures that only one thread can be modifying (or indeed reading) the value at a time, and the Arc allows for the cloning (so you can move a copy into the other thread).
I found the issue. Specifically it was the use of self.all_structs.iter().map(). Apparently iter() was requiring static lifetimes.
I switched to the following and it's working now:
fn parallel_calculation(&self) -> f64 {
let mut handles: Vec<_> = Vec::new();
for base_struct in &self.all_structs {
let for_solve = self.for_solve.clone();
let base_struct = base_struct.clone();
handles.push(thread::spawn(move || {
let for_solve = for_solve.read().unwrap();
base_struct.target_function(&for_solve)
}));
};
let mut result = 0.0;
for process in handles { result += process.join().unwrap(); };
return result
}
With Arcs and an RwLock in the main struct as follows:
pub struct StructSet {
all_structs: Vec<Arc<BaseStruct>>,
for_solve: Arc<RwLock<HashMap<&'static str, f64>>>,
}

Why doesn't this compile - use of undeclared type name `thread::scoped`

I'm trying to get my head around Rust. I've got an alpha version of 1.
Here's the problem I'm trying to program: I have a vector of floats. I want to set up some threads asynchronously. Each thread should wait for the number of seconds specified by each element of the vector, and return the value of the element, plus 10. The results need to be in input order.
It's an artificial example, to be sure, but I wanted to see if I could implement something simple before moving onto more complex code. Here is my code so far:
use std::thread;
use std::old_io::timer;
use std::time::duration::Duration;
fn main() {
let mut vin = vec![1.4f64, 1.2f64, 1.5f64];
let mut guards: Vec<thread::scoped> = Vec::with_capacity(3);
let mut answers: Vec<f64> = Vec::with_capacity(3);
for i in 0..3 {
guards[i] = thread::scoped( move || {
let ms = (1000.0f64 * vin[i]) as i64;
let d = Duration::milliseconds(ms);
timer::sleep(d);
println!("Waited {}", vin[i]);
answers[i] = 10.0f64 + (vin[i] as f64);
})};
for i in 0..3 {guards[i].join(); };
for i in 0..3 {println!("{}", vin[i]); }
}
So the input vector is [1.4, 1.2, 1.5], and I'm expecting the output vector to be [11.4, 11.2, 11.5].
There appear to be a number of problems with my code, but the first one is that I get a compilation error:
threads.rs:7:25: 7:39 error: use of undeclared type name `thread::scoped`
threads.rs:7 let mut guards: Vec<thread::scoped> = Vec::with_capacity(3);
^~~~~~~~~~~~~~
error: aborting due to previous error
There also seem to be a number of other problems, including using vin within a closure. Also, I have no idea what move does, other than the fact that every example I've seen seems to use it.
Your error is due to the fact that thread::scoped is a function, not a type. What you want is a Vec<T> where T is the result type of the function. Rust has a neat feature that helps you here: It automatically detects the correct type of your variables in many situations.
If you use
let mut guards = Vec::with_capacity(3);
the type of guards will be chosen when you use .push() the first time.
There also seem to be a number of other problems.
you are accessing guards[i] in the first for loop, but the length of the guards vector is 0. Its capacity is 3, which means that you won't have any unnecessary allocations as long as the vector never contains more than 3 elements. use guards.push(x) instead of guards[i] = x.
thread::scoped expects a Fn() -> T, so your closure can return an object. You get that object when you call .join(), so you don't need an answer-vector.
vin is moved to the closure. Therefore in the second iteration of the loop that creates your guards, vin isn't available anymore to be moved to the "second" closure. Every loop iteration creates a new closure.
i is moved to the closure. I have no idea what's going on there. But the solution is to let inval = vin[i]; outside the closure, and then use inval inside the closure. This also solves Point 3.
vin is mutable. Yet you never mutate it. Don't bind variables mutably if you don't need to.
vin is an array of f64. Therefore (vin[i] as f64) does nothing. Therefore you can simply use vin[i] directly.
join moves out of the guard. Since you cannot move out of an array, your cannot index into an array of guards and join the element at the specified index. What you can do is loop over the elements of the array and join each guard.
Basically this means: don't iterate over indices (for i in 1..3), but iterate over elements (for element in vector) whenever possible.
All of the above implemented:
use std::thread;
use std::old_io::timer;
use std::time::duration::Duration;
fn main() {
let vin = vec![1.4f64, 1.2f64, 1.5f64];
let mut guards = Vec::with_capacity(3);
for inval in vin {
guards.push(thread::scoped( move || {
let ms = (1000.0f64 * inval) as i64;
let d = Duration::milliseconds(ms);
timer::sleep(d);
println!("Waited {}", inval);
10.0f64 + inval
}));
}
for guard in guards {
let answer = guard.join();
println!("{}", answer);
};
}
In supplement of Ker's answer: if you really need to mutate arrays within a thread, I suppose the most closest valid solution for your task will be something like this:
use std::thread::spawn;
use std::old_io::timer;
use std::sync::{Arc, Mutex};
use std::time::duration::Duration;
fn main() {
let vin = Arc::new(vec![1.4f64, 1.2f64, 1.5f64]);
let answers = Arc::new(Mutex::new(vec![0f64, 0f64, 0f64]));
let mut workers = Vec::new();
for i in 0..3 {
let worker_vin = vin.clone();
let worker_answers = answers.clone();
let worker = spawn( move || {
let ms = (1000.0f64 * worker_vin[i]) as i64;
let d = Duration::milliseconds(ms);
timer::sleep(d);
println!("Waited {}", worker_vin[i]);
let mut answers = worker_answers.lock().unwrap();
answers[i] = 10.0f64 + (worker_vin[i] as f64);
});
workers.push(worker);
}
for worker in workers { worker.join().unwrap(); }
for answer in answers.lock().unwrap().iter() {
println!("{}", answer);
}
}
In order to share vectors between several threads, I have to prove, that these vectors outlive all of my threads. I cannot use just Vec, because it will be destroyed at the end of main block, and another thread could live longer, possibly accessing freed memory. So I took Arc reference counter, which guarantees, that my vectors will be destroyed only when the counter downs to zero.
Arc allows me to share read-only data. In order to mutate answers array, I should use some synchronize tools, like Mutex. That is how Rust prevents me to make data races.

Resources