parallel sorting on separate sections of a single slice - multithreading

I'm trying to implement a sort of parallel bubble sort, e.g. have a number of threads work on distinct parts of the same slice and then have a final thread sort those two similar to a kind of merge sort
I have this code so far
pub fn parallel_bubble_sort(to_sort: Arc<&[i32]>) {
let midpoint = to_sort.len() / 2;
let ranges = [0..midpoint, midpoint..to_sort.len()];
let handles = (ranges).map(|range| {
thread::spawn(|| {
to_sort[range].sort();
})
});
}
But I get a series of errors, relating to 'to_sort's lifetime, etc
How would someone go about modifying distinct slices of a larger slice across thread bounds?

Disclaimer: I assume that you want to sort in place, as you call .sort().
There's a couple of problems with your code:
The to_sort isn't mutable, so you won't be able to modify it. Which is an essential part of sorting ;) So I think that Arc<&[i32]> should most certainly be &mut [i32].
You cannot split a mutable slice like this. Rust doesn't know if your ranges overlap, and therefore disallows this entirely. You can, however, use split_at to split it into two parts. This even works with mutable references, which is important in your case.
You cannot move mutable references to threads, because it's unknown how long the
thread will exists. Overcoming this issue is the hardest part, I'm afraid; I don't know how easy it is in normal Rust without the use of unsafe. I think the easiest solution would be to use a library like rayon which already solved those problems for you.
EDIT: Rust 1.63 introduces scoped threads, which eliminates the need for rayon in this usecase.
This should be a good start for you:
pub fn parallel_bubble_sort(to_sort: &mut [i32]) {
let midpoint = to_sort.len() / 2;
let (left, right) = to_sort.split_at_mut(midpoint);
std::thread::scope(|s| {
s.spawn(|| left.sort());
s.spawn(|| right.sort());
});
// TODO: merge left and right
}
fn main() {
let mut data = [1, 6, 3, 4, 9, 7, 4];
parallel_bubble_sort(&mut data);
println!("{:?}", data);
}
[1, 3, 6, 4, 4, 7, 9]
Previous answer for Rust versions older than 1.63
pub fn parallel_bubble_sort(to_sort: &mut [i32]) {
let midpoint = to_sort.len() / 2;
let (left, right) = to_sort.split_at_mut(midpoint);
rayon::scope(|s| {
s.spawn(|_| left.sort());
s.spawn(|_| right.sort());
});
// TODO: merge left and right
}
fn main() {
let mut data = [1, 6, 3, 4, 9, 7, 4];
parallel_bubble_sort(&mut data);
println!("{:?}", data);
}
[1, 3, 6, 4, 4, 7, 9]

Related

Doing more than 1 thing in a iter().map()

I would like to use a map to create a new vector, but at the same time, do other things inside that map. I'm working on Advent of Code 2021, day 6 part 1.
This code loops through a vector and decrements all the values by one. If the value is at 0, then it resets that position to 6 and adds an 8 to the end of the vector.
fn run_growth_simulation(mut state: Vec<u8>, days: i32) -> usize {
for _day in 0..days {
let mut new_fish = 0;
state.iter_mut().map(|x| match x {
num: u8 # 1..=8 => {num - 1},
0 => {new_fish += 1; 6},
_ => unreachable!()
})
for _fish in 0..new_fish {
state.push(8);
}
}
state.iter().count() as usize
}
How do I return the right item from the closure?
I would mutate the value in the iterator directly and not build a new array, because of that use for_each instead of map (or preferable directly a for loop).
Then inside the match statement mutate the value:
state.iter_mut().for_each(|x| match x {
//: u8 removed because it gave me an syntax error
// mutate the number directly (we have to use `num` because x was moved)
num # 1..=8 => {*num -= 1;},
// mutate the number
0 => {new_fish += 1; *x = 6;},
_ => unreachable!()
});
A slightly different approach would be to count the 0s in the vector, remove them, subtract each value by 1 and add the new fish
As a complement to the answer stating that for_each() is preferable to map() here (since we don't consume what map() emits), below is a simpler example trying to illustrate the problem (and why the borrow-checker is right when it forbids such attempts).
In both cases (test1() and test2()) we are iterating over a vector while we are extending it (this is what was intended in the question).
In test1() the iterator considers the storage for the values once for all when it is created.
For all the subsequent iterations, it will refer to this initial storage, thus this storage must not move elsewhere in memory in the meantime.
That's why the iterator borrows the vector (mutably or not, this is not important here).
However, during these iterations we try to append new values to this vector: this may move the storage (for reallocation purpose) and fortunately this requires a mutable borrow of the vector (then it's rejected).
In test2() we avoid keeping a reference to the initial storage, and use a counter instead.
This works, but this is suboptimal since at each iteration this index operation ([]) needs to check the bounds.
The iterator in the previous function knows the bounds one for all; that's why iterators lead to better optimisation opportunities by the compiler.
Note that len() is evaluated once for all at the beginning of the loop here; this is probably what we want, but if we wanted to reevaluate it at each iteration, then we would have to use a loop {} instruction.
What is discussed here is not specific to the language but to the problem itself.
With a more permissive programming language, the first attempt may have been allowed but would have lead to memory errors; or such language should shift systematically towards the second attempt and pay the cost of bound checking at each iteration.
In the end, your solution with a second loop is probably the best choice.
fn test1() {
let mut v = vec![1, 2, 3, 4, 5, 6, 7, 8];
v.iter_mut().for_each(|e| {
if *e <= 3 {
let n = *e + 100;
// v.push(n) // !!! INCORRECT !!!
// we are trying to reallocate the storage while iterating over it
} else {
*e += 10;
}
});
println!("{:?}", v);
}
fn test2() {
let mut v = vec![1, 2, 3, 4, 5, 6, 7, 8];
for i in 0..v.len() {
let e = &mut v[i];
if *e <= 3 {
let n = *e + 100;
v.push(n);
} else {
*e += 10;
}
}
println!("{:?}", v);
}
fn main() {
test1(); // [1, 2, 3, 14, 15, 16, 17, 18]
test2(); // [1, 2, 3, 14, 15, 16, 17, 18, 101, 102, 103]
}

How to copy to slice with different size in Rust? [duplicate]

If I have two arrays of different sizes:
let mut array1 = [0; 8];
let array2 = [1, 2, 3, 4];
How would I copy array2 into the first 4 bytes of array1? I can take a mutable 4 byte slice of array1, but I'm not sure how or if I can assign into it.
Manually one can do
for (dst, src) in array1.iter_mut().zip(&array2) {
*dst = *src
}
for a typical slice. However, there is a likely faster specialization in clone_from_slice:
dst[..4].clone_from_slice(&src)
A slightly older method is to use std::io::Write, which was implemented for &mut [u8].
use std::io::Write;
let _ = dst.write(&src)
This will write up to the end of dst and return how many values were written in a Result. If you use write_all, this will return an Err if not all bytes could be written.
The most flexible way is to use iterators to handle each element successively:
for (place, data) in array1.iter_mut().zip(array2.iter()) {
*place = *data
}
.mut_iter creates an Iterator that yields &mut u8, that is, mutable references pointing into the slice/array. iter does the same but with shared references. .zip takes two iterators and steps over them in lock-step, yielding the elements from both as a tuple (and stops as soon as either one stops).
If you need/want to do anything 'fancy' with the data before writing to place this is the approach to use.
However, the plain copying functionality is also provided as single methods,
.copy_from, used like array1.copy_from(array2).
std::slice::bytes::copy_memory, although you will need to trim the two arrays because copy_memory requires they are the same length:
use std::cmp;
use std::slice::bytes;
let len = cmp::min(array1.len(), array2.len());
bytes::copy_memory(array1.mut_slice_to(len), array2.slice_to(len));
(If you know that array1 is always longer than array2 then bytes::copy_memory(array1.mut_slice_to(array2.len()), array2) should also work.)
At the moment, the bytes version optimises the best, down to a memcpy call, but hopefully rustc/LLVM improvements will eventually take them all to that.
You could simply use copy_from_slice() and use Range & Co:
fn main() {
let mut dest = [0; 8];
let src = [1, 2, 3, 4];
dest[..4].copy_from_slice(&src);
assert_eq!(dest, [1, 2, 3, 4, 0, 0, 0, 0]);
}
Inverse case:
fn main() {
let src = [1, 2, 3, 4, 5, 6, 7, 8];
let mut dest = [0; 4];
dest.copy_from_slice(&src[2..6]);
assert_eq!(dest, [3, 4 ,5, 6]);
}
Combined case:
fn main() {
let src = [1, 2, 3, 4, 5, 6, 7, 8];
let mut dest = [0; 4];
dest[1..3].copy_from_slice(&src[3..5]);
assert_eq!(dest, [0, 4, 5, 0]);
}

Implementing PHP array_column in Rust

I'm in the process of learning Rust, but I could not find an answer to this question.
In PHP, there's the array_column method and it works this way:
given an array of arrays (this would be a a Vector of vectors in Rust):
$records = [
[1,2,3],
[1,2,3],
[1,2,3],
[1,2,3]
];
if I want to get an array containing all the first elements (a "column") of the inner arrays I can do:
$column = array_column($records, 0);
This way, for example, I get [1,1,1,1]. If I change that 0 with 1, I get [2,2,2,2] and so on.
Since there's no array_column equivalent in Rust (that is: I could not find it), what could be the best way to implement a similar behavior with a vector of vectors?
I decided to play with iterators, as you tried in the comments.
This version works with any clonable values (numbers included). We iterate over subvectors, and for each we call a get method, which either yields an element of the vector Some(&e) or None if we ask out of bounds.
and_then then accepts a value from get, and if it was None, then None is returned, otherwise, if it's Some(&e) then Some(e.clone()) is returned, i.e. we clone the value (because we only have the reference to the value from get, we can't store it, we have to copy the value).
collect then works with Iter<Option<T>>, and it conveniently turns it in Option<Vec<T>>, i.e. it returns None if some Nones were in the iterator (which means some arrays didn't have big enough size), or returns Some(Vec<T>), if everything is fine.
fn main() {
let array = vec![
vec![1, 2, 3, 4],
vec![1, 2, 3, 4, 5],
vec![1, 2, 3, 4],
vec![1, 2, 3, 4],
];
let ac = array_column(&array, 0);
println!("{:?}", ac); // Some([1, 1, 1, 1])
let ac = array_column(&array, 3);
println!("{:?}", ac); // Some([4, 4, 4, 4])
let ac = array_column(&array, 4); // None
println!("{:?}", ac);
}
fn array_column<T: Clone>(array: &Vec<Vec<T>>, column: usize) -> Option<Vec<T>> {
array.iter()
.map( |subvec| subvec.get(column).and_then(|e| Some(e.clone())) )
.collect()
}
Alex version is good, but you can generalize it using references too, so there will be no need for the item to be Clone:
fn array_column<'a, T>(array: &'a Vec<Vec<T>>, column: usize) -> Option<Vec<&'a T>> {
array.iter()
.map( |subvec| subvec.get(column) )
.collect()
}
Playground

Rust's drain, iterator dropped ... "removes any remaining elements"

On page 327 of Programming Rust you can find the following statement
However, unlike the into_iter() method, which takes the collection by value and consumes it, drain merely borrows a mutable references to the collection, and when the iterator is dropped, it removes any remaining elements from the collection, and leaves it empty.
I'm confused at what it means it says it removes any remaining elements from the collection? I can see with this code when the iterator is dropped the remaining elements from a are still there,
fn main() {
let mut a = vec![1, 2, 3, 4, 5];
{
let b: Vec<i32> = a.drain(0..3).collect();
}
println!("Hello, world! {:?}", a);
}
Perhaps I'm confused at merely the wording. Is there something more to this?
This looks like a bit imprecise wording.
The real meaning of these words is: if you drop the drain iterator without exhausting it, it will drop all the elements used for its creation. As you've asked it to use only the first three elements, it won't empty the entire vector, but rather the first part only; but it will do this even if unused:
fn main() {
let mut a = vec![1, 2, 3, 4, 5];
{
let _ = a.drain(0..3);
}
println!("Hello, world! {:?}", a);
}
Hello, world! [4, 5]
playground
You could understand this in the following way: the "collection" mentioned here is not the initial collection the drain was called on, but rather is "sub-collection", specified by the passed range.

How to set a range in a Vec or slice?

My end goal is to shuffle the rows of a matrix (for which I am using nalgebra).
To address this I need to set a mutable range (slice) of an array.
Supposing I have an array as such (let's say it's a 3x3 matrix):
let mut scores = [7, 8, 9, 10, 11, 12, 13, 14, 15];
I have extracted a row like this:
let r = &scores[..].chunks(3).collect::<Vec<_>>()[1];
Now, for the knuth shuffle I need to swap this with another row. What I need to do is:
scores.chunks_mut(3)[0] = r;
however this fails as such:
cannot index a value of type `core::slice::ChunksMut<'_, _>`
Example: http://is.gd/ULkN6j
I ended up doing a loop over and an element by element swap which seems like a cleaner implementation to me:
fn swap_row<T>(matrix: &mut [T], row_src: usize, row_dest: usize, cols: usize){
for c in 0..cols {
matrix.swap(cols * row_src + c, cols * row_dest + c);
}
}
Your code, as you'd like to write it, can never work. You have an array that you are trying to read from and write to at the same time. This will cause you to have duplicated data:
[1, 2, 3, 4]
// Copy last two to first two
[3, 4, 3, 4]
// Copy first two to last two
[3, 4, 3, 4]
Rust will prevent you from having mutable and immutable references to the same thing for this very reason.
cannot index a value of type core::slice::ChunksMut<'_, _>
chunks_mut returns an iterator. The only thing that an iterator is guaranteed to do is return "the next thing". You cannot index it, it is not all available in contiguous memory.
To move things around, you are going to need somewhere temporary to store the data. One way is to copy the array:
let scores = [7, 8, 9, 10, 11, 12, 13, 14, 15];
let mut new_scores = scores;
for (old, new) in scores[0..3].iter().zip(new_scores[6..9].iter_mut()) {
*new = *old;
}
for (old, new) in scores[3..6].iter().zip(new_scores[0..3].iter_mut()) {
*new = *old;
}
for (old, new) in scores[6..9].iter().zip(new_scores[3..6].iter_mut()) {
*new = *old;
}
Then it's a matter of following one of these existing questions to copy from one to the other.
that's probably closer to what You wanted to do:
fn swap_row<T: Clone>(matrix: &mut [T], row_src: usize, row_dest: usize, cols: usize) {
let v = matrix[..].to_vec();
let mut chunks = v.chunks(cols).collect::<Vec<&[T]>>();
chunks.swap(row_src, row_dest);
matrix.clone_from_slice(chunks.into_iter().fold((&[]).to_vec(), |c1, c2| [c1, c2.to_vec()].concat()).as_slice());
}
I would prefer:
fn swap_row<T: Clone>(matrix: &[T], row_src: usize, row_dest: usize, cols: usize) -> Vec<T> {
let mut chunks = matrix[..].chunks(cols).collect::<Vec<&[T]>>();
chunks.swap(row_src, row_dest);
chunks.iter().fold((&[]).to_vec(), |c1, c2| [c1, c2.to_vec()].concat())
}
btw: nalgebra provides unsafe fn as_slice_unchecked(&self) -> &[T] for all kinds of Storage and RawStorage.
Shuffeling this slice avoids the need for row swapping.

Resources