Split/gather pattern for jobs

Split/gather pattern for jobs - multithreading

I have a set of jobs that I am trying to run in parallel. I want to run each task on its own thread and gather the responses on the calling thread.
Some jobs may take much longer than others, so I'd like to start using each result as it comes in, and not have to wait for all jobs to complete.
Here is an attempt:
struct Container<T> {
items : Vec<T>
}
#[derive(Debug)]
struct Item {
x: i32
}
impl Item {
fn foo (&mut self) {
self.x += 1; //consider an expensive mutating computation
}
}
fn main() {
use std;
use std::sync::{Mutex, Arc};
use std::collections::RingBuf;
//set up a container with 2 items
let mut item1 = Item { x: 0};
let mut item2 = Item { x: 1};
let container = Container { items: vec![item1, item2]};
//set a gather system for our results
let ringBuf = Arc::new(Mutex::new(RingBuf::<Item>::new()));
//farm out each job to its own thread...
for item in container.items {
std::thread::Thread::spawn(|| {
item.foo(); //job
ringBuf.lock().unwrap().push_back(item); //push item back to caller
});
}
loop {
let rb = ringBuf.lock().unwrap();
if rb.len() > 0 { //gather results as soon as they are available
println!("{:?}",rb[0]);
rb.pop_front();
}
}
}
For starters, this does not compile due to the impenetrable cannot infer an appropriate lifetime due to conflicting requirements error.
What am I doing wrong and how do I do it right?

You've got a couple compounding issues, but the first one is a misuse / misunderstanding of Arc. You need to give each thread it's own copy of the Arc. Arc itself will make sure that changes are synchronized. The main changes were the addition of .clone() and the move keyword:
for item in container.items {
let mrb = ringBuf.clone();
std::thread::Thread::spawn(move || {
item.foo(); //job
mrb.lock().unwrap().push_back(item); //push item back to caller
});
}
After changing this, you'll run into some simpler errors about forgotten mut qualifiers, and then you hit another problem - you are trying to send mutable references across threads. Your for loop will need to return &mut Item to call foo, but this doesn't match your Vec. Changing it, we can get to something that compiles:
for mut item in container.items.into_iter() {
let mrb = ringBuf.clone();
std::thread::Thread::spawn(move || {
item.foo(); //job
mrb.lock().unwrap().push_back(item); //push item back to caller
});
}
Here, we consume the input vector, moving each of the Items to the worker thread. Unfortunately, this hits the Playpen timeout, so there's probably some deeper issue.
All that being said, I'd highly recommend using channels:
#![feature(std_misc)]
use std::sync::mpsc::channel;
#[derive(Debug)]
struct Item {
x: i32
}
impl Item {
fn foo(&mut self) { self.x += 1; }
}
fn main() {
let items = vec![Item { x: 0 }, Item { x: 1 }];
let rx = {
let (tx, rx) = channel();
for item in items.into_iter() {
let my_tx = tx.clone();
std::thread::Thread::spawn(move || {
let mut item = item;
item.foo();
my_tx.send(item).unwrap();
});
}
rx
};
for item in rx.iter() {
println!("{:?}", item);
}
}
This also times-out in the playpen, but works fine when compiled and run locally.

Related

Channels for passing hashmap between threads | stuck in loop | Rust

I am solving a problem for the website Exercism in rust, where I basically try to concurrently count how many times different letters occur in some text. I am doing this by passing hashmaps between threads, and somehow am in some kind of infinite loop. I think the issue is in my handling of the receiver, but I really don't know. Please help.
use std::collections::HashMap;
use std::thread;
use std::sync::mpsc;
use std::str;
pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> {
// Empty case
if input.is_empty() {
return HashMap::new();
}
// Flatten input, set workload for each thread, create hashmap to catch results
let mut flat_input = input.join("");
let workload = input.len() / worker_count;
let mut final_map: HashMap<char, usize> = HashMap::new();
let (tx, rx) = mpsc::channel();
for _i in 0..worker_count {
let task = flat_input.split_off(flat_input.len() - workload);
let tx_clone = mpsc::Sender::clone(&tx);
// Separate threads ---------------------------------------------
thread::spawn(move || {
let mut partial_map: HashMap<char, usize> = HashMap::new();
for letter in task.chars() {
match partial_map.remove(&letter) {
Some(count) => {
partial_map.insert(letter, count + 1);
},
None => {
partial_map.insert(letter, 1);
}
}
}
tx_clone.send(partial_map).expect("Didn't work fool");
});
// --------------------------------------------------
}
// iterate through the returned hashmaps to update the final map
for received in rx {
for (key, value) in received {
match final_map.remove(&key) {
Some(count) => {
final_map.insert(key, count + value);
},
None => {
final_map.insert(key, value);
}
}
}
}
return final_map;
}

Iterating on the receiver rx will block for new messages while senders exist. The ones you've cloned into the threads will drop out of scope when they're done, but you have the original sender tx still in scope.
You can force tx out of scope by dropping it manually:
for _i in 0..worker_count {
...
}
std::mem::drop(tx); // <--------
for received in rx {
...
}

How to let struct hold a thread and destroy thread as soon as it go out of scope

struct ThreadHolder{
state: ???
thread: ???
}
impl ThreadHolder {
fn launch(&mut self) {
self.thread = ???
// in thread change self.state
}
}
#[test]
fn test() {
let mut th = ThreadHolder{...};
th.launch();
// thread will be destroy as soon as th go out of scope
}
I think there is something to deal with lifetime, but I don't know how to write it.

What you want is so simple that you don't even need it to be mutable in any way, and then it becomes trivial to share it across threads, unless you want to reset it. You said you need to leave a thread, for one reason or another, therefore I'll assume that you don't care about this.
You instead can poll it every tick (most games run in ticks so I don't think there will be any issue implementing that).
I will provide example that uses sleep, so it's not most accurate thing, it is painfully obvious on the last subsecond duration, but I am not trying to do your work for you anyway, there's enough resources on internet that can help you deal with it.
Here it goes:
use std::{
sync::Arc,
thread::{self, Result},
time::{Duration, Instant},
};
struct Timer {
end: Instant,
}
impl Timer {
fn new(duration: Duration) -> Self {
// this code is valid for now, but might break in the future
// future so distant, that you really don't need to care unless
// you let your players draw for eternity
let end = Instant::now().checked_add(duration).unwrap();
Timer { end }
}
fn left(&self) -> Duration {
self.end.saturating_duration_since(Instant::now())
}
// more usable than above with fractional value being accounted for
fn secs_left(&self) -> u64 {
let span = self.left();
span.as_secs() + if span.subsec_millis() > 0 { 1 } else { 0 }
}
}
fn main() -> Result<()> {
let timer = Timer::new(Duration::from_secs(10));
let timer_main = Arc::new(timer);
let timer = timer_main.clone();
let t = thread::spawn(move || loop {
let seconds_left = timer.secs_left();
println!("[Worker] Seconds left: {}", seconds_left);
if seconds_left == 0 {
break;
}
thread::sleep(Duration::from_secs(1));
});
loop {
let seconds_left = timer_main.secs_left();
println!("[Main] Seconds left: {}", seconds_left);
if seconds_left == 5 {
println!("[Main] 5 seconds left, waiting for worker thread to finish work.");
break;
}
thread::sleep(Duration::from_secs(1));
}
t.join()?;
println!("[Main] worker thread finished work, shutting down!");
Ok(())
}
By the way, this kind of implementation wouldn't be any different in any other language, so please don't blame Rust for it. It's not the easiest language, but it provides more than enough tools to build anything you want from scratch as long as you put effort into it.
Goodluck :)

I think I got it work
use std::sync::{Arc, Mutex};
use std::thread::{sleep, spawn, JoinHandle};
use std::time::Duration;
struct Timer {
pub(crate) time: Arc<Mutex<u32>>,
jh_ticker: Option<JoinHandle<()>>,
}
impl Timer {
fn new<T>(i: T, duration: Duration) -> Self
where
T: Iterator<Item = u32> + Send + 'static,
{
let time = Arc::new(Mutex::new(0));
let arc_time = time.clone();
let jh_ticker = Some(spawn(move || {
for item in i {
let mut mg = arc_time.lock().unwrap();
*mg = item;
drop(mg); // needed, otherwise this thread will always hold lock
sleep(duration);
}
}));
Timer { time, jh_ticker }
}
}
impl Drop for Timer {
fn drop(&mut self) {
self.jh_ticker.take().unwrap().join();
}
}
#[test]
fn test_timer() {
let t = Timer::new(0..=10, Duration::from_secs(1));
let a = t.time.clone();
for _ in 0..100 {
let b = *a.lock().unwrap();
println!("{}", b);
sleep(Duration::from_millis(100));
}
}

Rust: concurrency error, program hangs after first thread

I have created a simplified version of my problem below, I have a Bag struct and Item struct. I want to spawn 10 threads that execute item_action method from Bag on each item in an item_list, and print a statement if both item's attributes are in the bag's attributes.
use std::sync::{Mutex,Arc};
use std::thread;
#[derive(Clone, Debug)]
struct Bag{
attributes: Arc<Mutex<Vec<usize>>>
}
impl Bag {
fn new(n: usize) -> Self {
let mut v = Vec::with_capacity(n);
for _ in 0..n {
v.push(0);
}
Bag{
attributes:Arc::new(Mutex::new(v)),
}
}
fn item_action(&self, item_attr1: usize, item_attr2: usize) -> Result<(),()> {
if self.attributes.lock().unwrap().contains(&item_attr1) ||
self.attributes.lock().unwrap().contains(&item_attr2) {
println!("Item attributes {} and {} are in Bag attribute list!", item_attr1, item_attr2);
Ok(())
} else {
Err(())
}
}
}
#[derive(Clone, Debug)]
struct Item{
item_attr1: usize,
item_attr2: usize,
}
impl Item{
pub fn new(item_attr1: usize, item_attr2: usize) -> Self {
Item{
item_attr1: item_attr1,
item_attr2: item_attr2
}
}
}
fn main() {
let mut item_list: Vec<Item> = Vec::new();
for i in 0..10 {
item_list.push(Item::new(i, (i+1)%10));
}
let bag: Bag= Bag::new(10); //create 10 attributes
let mut handles = Vec::with_capacity(10);
for x in 0..10 {
let bag2 = bag.clone();
let item_list2= item_list.clone();
handles.push(
thread::spawn(move || {
bag2.item_action(item_list2[x].item_attr1, item_list2[x].item_attr2);
})
)
}
for h in handles {
println!("Here");
h.join().unwrap();
}
}
When running, I only got one line, and the program just stops there without returning.
Item attributes 0 and 1 are in Bag attribute list!
May I know what went wrong? Please see code in Playground
Updated:
With suggestion from #loganfsmyth, the program can return now... but still only prints 1 line as above. I expect it to print 10 because my item_list has 10 items. Not sure if my thread logic is correct.
I have added println!("Here"); when calling join all threads. And I can see Here is printed 10 times, just not the actual log from item_action

I believe this is because Rust is not running your
if self.attributes.lock().unwrap().contains(&item_attr1) ||
self.attributes.lock().unwrap().contains(&item_attr2) {
expression in the order you expect. The evaluation order of subexpressions in Rust is currently undefined. What appears to be happening is that you essentially end up with
const condition = {
let lock1 = self.attributes.lock().unwrap();
let lock2 = self.attributes.lock().unwrap();
lock1.contains(&item_attr1) || lock2.contains(&item_attr2)
};
if condition {
which is causing your code to deadlock.
You should instead write:
let attributes = self.attributes.lock().unwrap();
if attributes.contains(&item_attr1) ||
attributes.contains(&item_attr2) {
so that there is only one lock.
Your code would also work as-is if you used an RwLock or ReentrantMutex instead of a Mutex since those allow the same thread to have multiple immutable references to the data.

Implement a monitoring thread without lock?

I have a struct that sends messages to a channel as well as updating some of its own fields. How do I implement a monitoring thread that looks (read only) at its internal fields periodically?
I can write it using a Arc<Mutex<T>> wrapper, but I feel it is not that efficient as A::x could have been i32 which is stored and updated on the stack. Is there any better way to do it without the locks?
use std::sync::{Arc, Mutex};
use std::sync::mpsc::{channel, Sender};
use std::{thread, time};
struct A {
x: Arc<Mutex<i32>>,
y: Sender<i32>,
}
impl A {
fn do_some_loop(&mut self) {
let sleep_time = time::Duration::from_millis(200);
// This is a long running thread.
for x in 1..1000000 {
*self.x.lock().unwrap() = x;
self.y.send(x);
thread::sleep(sleep_time);
}
}
}
fn test() {
let (sender, recever) = channel();
let x = Arc::new(Mutex::new(1));
let mut a = A { x: x.clone(), y: sender };
thread::spawn(move || {
// Monitor every 10 secs.
let sleep_time = time::Duration::from_millis(10000);
loop {
thread::sleep(sleep_time);
println!("{}", *x.lock().unwrap());
}
});
a.do_some_loop();
}

"thread '<main>' has overflowed its stack" when constructing a large tree

I implemented a tree struct:
use std::collections::VecDeque;
use std::rc::{Rc, Weak};
use std::cell::RefCell;
struct A {
children: Option<VecDeque<Rc<RefCell<A>>>>
}
// I got thread '<main>' has overflowed its stack
fn main(){
let mut tree_stack: VecDeque<Rc<RefCell<A>>> = VecDeque::new();
// when num is 1000, everything works
for i in 0..100000 {
tree_stack.push_back(Rc::new(RefCell::new(A {children: None})));
}
println!("{:?}", "reach here means we are not out of mem");
loop {
if tree_stack.len() == 1 {break;}
let mut new_tree_node = Rc::new(RefCell::new(A {children: None}));
let mut tree_node_children: VecDeque<Rc<RefCell<A>>> = VecDeque::new();
// combine last two nodes to one new node
match tree_stack.pop_back() {
Some(x) => {
tree_node_children.push_front(x);
},
None => {}
}
match tree_stack.pop_back() {
Some(x) => {
tree_node_children.push_front(x);
},
None => {}
}
new_tree_node.borrow_mut().children = Some(tree_node_children);
tree_stack.push_back(new_tree_node);
}
}
Playpen link
But it crashes with
thread '<main>' has overflowed its stack
How do I fix that?

The problem that you are experiencing is because you have a giant linked-list of nodes. When that list is dropped, the first element tries to free all the members of the struct first. That means that the second element does the same, and so on, until the end of the list. This means that you will have a call stack that is proportional to the number of elements in your list!
Here's a small reproduction:
struct A {
children: Option<Box<A>>
}
fn main() {
let mut list = A { children: None };
for _ in 0..1_000_000 {
list = A { children: Some(Box::new(list)) };
}
}
And here's how you would fix it:
impl Drop for A {
fn drop(&mut self) {
if let Some(mut child) = self.children.take() {
while let Some(next) = child.children.take() {
child = next;
}
}
}
}
This code overrides the default recursive drop implementation with an iterative one. It rips the children out of the node, replacing it with a terminal item (None). It then allows the node to drop normally, but there will be no recursive calls.
The code is complicated a bit because we can't drop ourselves, so we need to do a little two-step dance to ignore the first item and then eat up all the children.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
How do I move out of a struct field that is an Option?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Split/gather pattern for jobs - multithreading

Related

Channels for passing hashmap between threads | stuck in loop | Rust

How to let struct hold a thread and destroy thread as soon as it go out of scope

Rust: concurrency error, program hangs after first thread

Implement a monitoring thread without lock?

"thread '<main>' has overflowed its stack" when constructing a large tree

Categories

Resources