I want to efficiently search the keys of two HashMaps for a single value, and terminate both threads once the value has been found. I'm currently doing this using two separate message channels (i.e. two transmitters and two receivers), but I'm not sure this is the correct approach. Given that the "mpsc" component of mpsc::channel stands for "multiple producer, single consumer", it feels wrong to have multiple producers and multiple consumers. So, is there a better way to concurrently search two arrays?
My code Also available in the playground:
use std::collections::HashMap;
use std::array::IntoIter;
use std::thread;
use std::time::Duration;
use std::iter::FromIterator;
use std::sync::mpsc;
fn main() {
let m1 = HashMap::<_, _>::from_iter(IntoIter::new([(1, 2), (3, 4), (5, 6), (7,8), (9, 10)]));
let m2 = HashMap::<_, _>::from_iter(IntoIter::new([(1, 2), (3, 4), (5, 6), (7,8), (9, 10), (11, 12), (13, 14), (15, 16), (17,18), (19, 20)]));
let (tx1, rx1) = mpsc::channel::<u8>();
let (tx2, rx2) = mpsc::channel::<u8>();
let handle1 = thread::spawn(move || {
let iter_keys1 = m1.keys();
for k in iter_keys1 {
if k.clone() == 11u8 {
tx2.send(*k);
break
} else {
println!("Key from handle1: {}", k);
}
thread::sleep(Duration::from_millis(1));
}
for received in rx1 {
let into: u8 = received;
if into == 11u8 {
println!("handle2 sent a message to receiver1: {}", into);
break
}
}
m1
});
let handle2 = thread::spawn(move || {
let iter_keys2 = m2.keys();
for k in iter_keys2 {
if k.clone() == 11u8 {
tx1.send(*k);
break
} else {
println!("Key from handle2: {}", k);
}
thread::sleep(Duration::from_millis(1));
}
for received in rx2 {
let into: u8 = received;
if into == 11u8 {
println!("handle1 sent a message to receiver2: {}", into);
break
}
}
m2
});
handle1.join().unwrap();
handle2.join().unwrap();
}
Somewhat related question: Is there a practical reason to use sleep, or does that just make it easier to see the results of concurrent processing on small samples? When I comment out the thread::sleep(Duration::from_millis(1)); lines, it seems like the threads are being processed sequentially:
Key from handle1: 9
Key from handle1: 5
Key from handle1: 3
Key from handle1: 1
Key from handle1: 7
Key from handle2: 1
handle2 sent a message to receiver1: 11
Clarification:
I'm trying to search for a key that could exist in two different hash maps. In this example, I'm searching for 11 in both sets of keys, and want to terminate both threads when I've found it in either one of the sets of keys.
I'm trying to search for a key that could exist in two different hash maps. In this example, I'm searching for 11 in both sets of keys, and want to terminate both threads when I've found it in either one of the sets of keys.
In that case there is no reason to use mpsc to communicate the stop condition. You can use a simple atomic bool:
use std::array::IntoIter;
use std::collections::HashMap;
use std::iter::FromIterator;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use std::thread;
fn main() {
let m1 = HashMap::<_, _>::from_iter(IntoIter::new([(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]));
let m2 = HashMap::<_, _>::from_iter(IntoIter::new([
(1, 2),
(3, 4),
(5, 6),
(7, 8),
(9, 10),
(11, 12),
(13, 14),
(15, 16),
(17, 18),
(19, 20),
]));
let stop_signal = Arc::new(AtomicBool::new(false));
let stop = stop_signal.clone();
let h1 = thread::spawn(move || {
let keys = m1.keys();
for &k in keys {
if stop.load(Ordering::Relaxed) {
println!("Another thread found it!");
break;
}
if k == 11u8 {
stop.store(true, Ordering::Relaxed);
// do something with the found key
println!("Found by thread 1");
break;
}
}
m1
});
let stop = stop_signal.clone();
let h2 = thread::spawn(move || {
let keys = m2.keys();
for &k in keys {
if stop.load(Ordering::Relaxed) {
println!("Another thread found it!");
break;
}
if k == 11u8 {
stop.store(true, Ordering::Relaxed);
// do something with the found key
println!("Found by thread 2");
break;
}
}
m2
});
h1.join().unwrap();
h2.join().unwrap();
}
Your original code had several issues:
One of the threads would have been kept alive even when it finished with its map until it received a message.
Even if one of the thread found the key, the other would have still continued to search for it
There is no point in doing thread::sleep() in the loop. It does not achieve anything except for slowing down the app
Related
I am trying to implement an outer function that could calculate the outer product of two 1D arrays. Something like this:
use std::thread;
use ndarray::prelude::*;
pub fn multithread_outer(A: &Array1<f64>, B: &Array1<f64>) -> Array2<f64> {
let mut result = Array2::<f64>::default((A.len(), B.len()));
let thread_num = 5;
let n = A.len() / thread_num;
// a & b are ArcArray2<f64>
let a = A.to_owned().into_shared();
let b = B.to_owned().into_shared();
for i in 0..thread_num{
let a = a.clone();
let b = b.clone();
thread::spawn(move || {
for j in i * n..(i + 1) * n {
for k in 0..b.len() {
// This is the line I want to change
result[[j, k]] = a[j] * b[k];
}
}
});
}
// Use join to make sure all threads finish here
// Not so related to this question, so I didn't put it here
result
}
You can see that by design, two threads will never write to the same element. However, rust compiler will not allow two mutable references to the same result variable. And using mutex will make this much slower. What is the right way to implement this function?
While it is possible to do manually (with thread::scope and split_at_mut, for example), ndarray already has parallel iteration integrated into its library, based on rayon:
https://docs.rs/ndarray/latest/ndarray/parallel
Here is how your code would look like with parallel iterators:
use ndarray::parallel::prelude::*;
use ndarray::prelude::*;
pub fn multithread_outer(a: &Array1<f64>, b: &Array1<f64>) -> Array2<f64> {
let mut result = Array2::<f64>::default((a.len(), b.len()));
result
.axis_iter_mut(Axis(0))
.into_par_iter()
.enumerate()
.for_each(|(row_id, mut row)| {
for (col_id, cell) in row.iter_mut().enumerate() {
*cell = a[row_id] * b[col_id];
}
});
result
}
fn main() {
let a = Array1::from_vec(vec![1., 2., 3.]);
let b = Array1::from_vec(vec![4., 5., 6., 7.]);
let c = multithread_outer(&a, &b);
println!("{}", c)
}
[[4, 5, 6, 7],
[8, 10, 12, 14],
[12, 15, 18, 21]]
Is there an efficient way of adding together the values of fields from multiple struct together?
I am learning Rust and trying to explore different methods and ways to get more efficient or more elegant code.
An easy method would be with the code as followed but is there maybe a better way? Using maybe a more in-depth use of iterators and its .map() method? I have tried using it but to not avail.
fn create_bloc(name: String, value: u32) -> ControlBloc {
ControlBloc { name, value }
}
fn main() {
let vec_bloc = vec![
create_bloc(String::from("b1"), 1),
create_bloc(String::from("b2"), 2),
create_bloc(String::from("b3"), 3),
create_bloc(String::from("b4"), 4),
create_bloc(String::from("b5"), 5),
];
let mut count = 0;
for ele in vec_bloc.iter() {
count += ele.value;
}
println!("Count = {}", count);
}
A more idiomatic way:
struct ControlBloc {
name: String,
value: i32,
}
impl ControlBloc {
fn new(name: String, value: i32) -> Self {
Self {
name,
value,
}
}
}
fn main() {
let vec_bloc = vec![
ControlBloc::new(String::from("b1"), 1),
ControlBloc::new(String::from("b2"), 2),
ControlBloc::new(String::from("b3"), 3),
ControlBloc::new(String::from("b4"), 4),
ControlBloc::new(String::from("b5"), 5),
];
let count = vec_bloc.iter().fold(0, |acc, x| acc + x.value);
println!("Count = {}", count);
}
As others have said, there's obviously a number of concepts you haven't come across yet which you should learn. I'd start with The Book!
I have a let mut arr = vec![100,200,300,400,500,600]; and want to compare chunk of sum
chunkA: 100+200+300 = 600
chunkB: 200+300+400 = 900
compare chunkA and chunkB
To iterate over overlapping chunks in a slice, use windows(). To process pairs of chunks, you can use itertools' tuple_windows():
use itertools::Itertools;
for (prev, current) in v.windows(3).tuple_windows() {
// ...
}
If you want some accumulated result, I'd recommend going with Iterator::fold() (or Iterator::reduce()).
Something like this should do it:
fn main(){
let arr = vec![100,200,300,400,500,600];
println!("{}", arr.windows (3).fold ((None, 0), |(prev, count), w| {
let s = w.iter().sum::<i32>();
(Some (s),
if let Some (prev) = prev {
if s > prev { count + 1 } else { count }
} else {
0
})
}).1);
}
Playground
It uses slice::windows to create the "chunks", then Iterator::fold to process them.
Or equivalently:
fn main() {
let arr = vec![100, 200, 300, 400, 500, 600];
let mut chunks = arr.windows (3);
println!(
"{}",
chunks
.next()
.map(
|first| chunks.fold ((first.iter().sum::<i32>(), 0), |(prev, count), w| {
let s = w.iter().sum::<i32>();
(s, if s > prev { count + 1 } else { count })
})
)
.unwrap_or ((0, 0))
.1
);
}
Playground
I want to some work on a vector shared by multiple threads but I don't want to use a Mutex because it is not wait-free.
The code below is written as I would in C.
#![feature(core_intrinsics, ptr_internals)]
use std::intrinsics::atomic_xadd_rel;
use std::ptr::Unique;
use std::thread::spawn;
fn main() {
let mut data = [0; 8];
let mut pool = Vec::with_capacity(8);
for index in 0..8 {
let data_ptr = Unique::new(data.as_mut_ptr());
pool.push(spawn(move || {
println!("Thread {} -> {}", index, unsafe {
atomic_xadd_rel(
data_ptr
.unwrap()
.as_ptr()
.add(if index % 2 != 0 { index - 1 } else { index }),
1,
)
});
}));
}
for work in pool {
work.join().unwrap();
}
println!("Data {:?}", data);
}
I've also written the code using only the stable API:
use std::iter::repeat_with;
use std::sync::atomic::{AtomicUsize, Ordering::*};
use std::sync::Arc;
use std::thread::spawn;
fn main() {
let data = Arc::new(
repeat_with(|| AtomicUsize::new(0))
.take(8)
.collect::<Vec<_>>(),
);
let mut pool = Vec::with_capacity(8);
for index in 0..8 {
let data_clone = data.clone();
pool.push(spawn(move || {
let offset = index - (index % 2 != 0) as usize;
println!(
"Thread {} -> {}",
index,
data_clone[offset].fetch_add(1, Relaxed)
);
}));
}
for work in pool {
work.join().unwrap();
}
println!("Data {:?}", data);
}
This code returns
Thread 0 -> 0
Thread 1 -> 1
Thread 3 -> 0
Thread 5 -> 1
Thread 7 -> 1
Thread 2 -> 1
Thread 6 -> 0
Thread 4 -> 0
Data [2, 0, 2, 0, 2, 0, 2, 0]
Is there is a proper way to do this in Rust?
I do not think this is a duplicate of How do I pass disjoint slices from a vector to different threads? because my vector / slice elements overlap between threads. In my sample, each odd index of the slice is incremented twice by two different threads.
Assuming that each thread has unique access to a particular element or sub-slice of your vector, this would be a case to use split_at (or one of the similar functions). split_at splits a mutable slice into two independent mutable slices; you can call it multiple times to split your slice into the correct number of segments, and pass each sub-slice to a separate thread.
The best way to pass the sub-slices to a thread would be to use something like the scoped threads in crossbeam.
I have a Vec<i64> and I want to know all the groups of integers that are consecutive. As an example:
let v = vec![1, 2, 3, 5, 6, 7, 9, 10];
I'm expecting something like this or similar:
[[1, 2, 3], [5, 6, 7], [9, 10]];
The view (vector of vectors or maybe tuples or something else) really doesn't matter, but I should get several grouped lists with continuous numbers.
At the first look, it seems like I'll need to use itertools and the group_by function, but I have no idea how...
You can indeed use group_by for this, but you might not really want to. Here's what I would probably write instead:
fn consecutive_slices(data: &[i64]) -> Vec<&[i64]> {
let mut slice_start = 0;
let mut result = Vec::new();
for i in 1..data.len() {
if data[i - 1] + 1 != data[i] {
result.push(&data[slice_start..i]);
slice_start = i;
}
}
if data.len() > 0 {
result.push(&data[slice_start..]);
}
result
}
This is similar in principle to eXodiquas' answer, but instead of accumulating a Vec<Vec<i64>>, I use the indices to accumulate a Vec of slice references that refer to the original data. (This question explains why I made consecutive_slices take &[T].)
It's also possible to do the same thing without allocating a Vec, by returning an iterator; however, I like the above version better. Here's the zero-allocation version I came up with:
fn consecutive_slices(data: &[i64]) -> impl Iterator<Item = &[i64]> {
let mut slice_start = 0;
(1..=data.len()).flat_map(move |i| {
if i == data.len() || data[i - 1] + 1 != data[i] {
let begin = slice_start;
slice_start = i;
Some(&data[begin..i])
} else {
None
}
})
}
It's not as readable as a for loop, but it doesn't need to allocate a Vec for the return value, so this version is more flexible.
Here's a "more functional" version using group_by:
use itertools::Itertools;
fn consecutive_slices(data: &[i64]) -> Vec<Vec<i64>> {
(&(0..data.len()).group_by(|&i| data[i] as usize - i))
.into_iter()
.map(|(_, group)| group.map(|i| data[i]).collect())
.collect()
}
The idea is to make a key function for group_by that takes the difference between each element and its index in the slice. Consecutive elements will have the same key because indices increase by 1 each time. One reason I don't like this version is that it's quite difficult to get slices of the original data structure; you almost have to create a Vec<Vec<i64>> (hence the two collects). The other reason is that I find it harder to read.
However, when I first wrote my preferred version (the first one, with the for loop), it had a bug (now fixed), while the other two versions were correct from the start. So there may be merit to writing denser code with functional abstractions, even if there is some hit to readability and/or performance.
let v = vec![1, 2, 3, 5, 6, 7, 9, 10];
let mut res = Vec::new();
let mut prev = v[0];
let mut sub_v = Vec::new();
sub_v.push(prev);
for i in 1..v.len() {
if v[i] == prev + 1 {
sub_v.push(v[i]);
prev = v[i];
} else {
res.push(sub_v.clone());
sub_v.clear();
sub_v.push(v[i]);
prev = v[i];
}
}
res.push(sub_v);
This should solve your problem.
Iterating over the given vector, checking if the current i64 (in my case i32) is +1 to the previous i64, if so push it into a vector (sub_v). After the series breaks, push the sub_v into the result vector. Repeat.
But I guess you wanted something functional?
Another possible solution, that uses std only, could be:
fn consecutive_slices(v: &[i64]) -> Vec<Vec<i64>> {
let t: Vec<Vec<i64>> = v
.into_iter()
.chain([*v.last().unwrap_or(&-1)].iter())
.scan(Vec::new(), |s, &e| {
match s.last() {
None => { s.push(e); Some((false, Vec::new())) },
Some(&p) if p == e - 1 => { s.push(e); Some((false, Vec::new()))},
Some(&p) if p != e - 1 => {let o = s.clone(); *s = vec![e]; Some((true, o))},
_ => None,
}
})
.filter_map(|(n, v)| {
match n {
true => Some(v.clone()),
false => None,
}
})
.collect();
t
}
The chain is used to get the last vector.
I like the answers above but you could also use peekable() to tell if the next value is different.
https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html
I would probably use a fold for this?
That's because I'm very much a functional programmer.
Obviously mutating the accumulator is weird :P but this works too and represents another way of thinking about it.
This is basically a recursive solution and can be modified easily to use immutable datastructures.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=43b9e3613c16cb988da58f08724471a4
fn main() {
let v = vec![1, 2, 3, 5, 6, 7, 9, 10];
let mut res: Vec<Vec<i32>> = vec![];
let (last_group, _): (Vec<i32>, Option<i32>) = v
.iter()
.fold((vec![], None), |(mut cur_group, last), x| {
match last {
None => {
cur_group.push(*x);
(cur_group, Some(*x))
}
Some(last) => {
if x - last == 1 {
cur_group.push(*x);
(cur_group, Some(*x))
} else {
res.push(cur_group);
(vec![*x], Some(*x))
}
}
}
});
res.push(last_group);
println!("{:?}", res);
}