Explicit partial array initialisation in Rust - rust

In C, I can write int foo[100] = { 7, 8 }; and I will get [7, 8, 0, 0, 0...].
This allows me to explicitly and concisely choose initial values for a contiguous group of elements at the beginning of the array, and the remainder will be initialised as if they had static storage duration (i.e. to the zero value of the appropriate type).
Is there an equivalent in Rust?

To the best of my knowledge, there is no such shortcut. You do have a few options, though.
The direct syntax
The direct syntax to initialize an array works with Copy types (integers are Copy):
let array = [0; 1024];
initializes an array of 1024 elements with all 0s.
Based on this, you can afterwards modify the array:
let array = {
let mut array = [0; 1024];
array[0] = 7;
array[1] = 8;
array
};
Note the trick of using a block expression to isolate the mutability to a smaller section of the code; we'll reuse it below.
The iterator syntax
There is also support to initialize an array from an iterator:
let array = {
let mut array = [0; 1024];
for (i, element) in array.iter_mut().enumerate().take(2) {
*element = (i + 7);
}
array
};
And you can even (optionally) start from an uninitialized state, using an unsafe block:
let array = unsafe {
// Create an uninitialized array.
let mut array: [i32; 10] = mem::uninitialized();
let nonzero = 2;
for (i, element) in array.iter_mut().enumerate().take(nonzero) {
// Overwrite `element` without running the destructor of the old value.
ptr::write(element, i + 7)
}
for element in array.iter_mut().skip(nonzero) {
// Overwrite `element` without running the destructor of the old value.
ptr::write(element, 0)
}
array
};
The shorter iterator syntax
There is a shorter form, based on clone_from_slice, it is currently unstable however.
#![feature(clone_from_slice)]
let array = {
let mut array = [0; 32];
// Override beginning of array
array.clone_from_slice(&[7, 8]);
array
};

Here is macro
macro_rules! array {
($($v:expr),*) => (
{
let mut array = Default::default();
{
let mut e = <_ as ::std::convert::AsMut<[_]>>::as_mut(&mut array).iter_mut();
$(*e.next().unwrap() = $v);*;
}
array
}
)
}
fn main() {
let a: [usize; 5] = array!(7, 8);
assert_eq!([7, 8, 0, 0, 0], a);
}

Related

How to let several threads write to the same variable without mutex in Rust?

I am trying to implement an outer function that could calculate the outer product of two 1D arrays. Something like this:
use std::thread;
use ndarray::prelude::*;
pub fn multithread_outer(A: &Array1<f64>, B: &Array1<f64>) -> Array2<f64> {
let mut result = Array2::<f64>::default((A.len(), B.len()));
let thread_num = 5;
let n = A.len() / thread_num;
// a & b are ArcArray2<f64>
let a = A.to_owned().into_shared();
let b = B.to_owned().into_shared();
for i in 0..thread_num{
let a = a.clone();
let b = b.clone();
thread::spawn(move || {
for j in i * n..(i + 1) * n {
for k in 0..b.len() {
// This is the line I want to change
result[[j, k]] = a[j] * b[k];
}
}
});
}
// Use join to make sure all threads finish here
// Not so related to this question, so I didn't put it here
result
}
You can see that by design, two threads will never write to the same element. However, rust compiler will not allow two mutable references to the same result variable. And using mutex will make this much slower. What is the right way to implement this function?
While it is possible to do manually (with thread::scope and split_at_mut, for example), ndarray already has parallel iteration integrated into its library, based on rayon:
https://docs.rs/ndarray/latest/ndarray/parallel
Here is how your code would look like with parallel iterators:
use ndarray::parallel::prelude::*;
use ndarray::prelude::*;
pub fn multithread_outer(a: &Array1<f64>, b: &Array1<f64>) -> Array2<f64> {
let mut result = Array2::<f64>::default((a.len(), b.len()));
result
.axis_iter_mut(Axis(0))
.into_par_iter()
.enumerate()
.for_each(|(row_id, mut row)| {
for (col_id, cell) in row.iter_mut().enumerate() {
*cell = a[row_id] * b[col_id];
}
});
result
}
fn main() {
let a = Array1::from_vec(vec![1., 2., 3.]);
let b = Array1::from_vec(vec![4., 5., 6., 7.]);
let c = multithread_outer(&a, &b);
println!("{}", c)
}
[[4, 5, 6, 7],
[8, 10, 12, 14],
[12, 15, 18, 21]]

into_shape after remove_index fails

I have an &[f64] with 309.760 elements. This dataset is an array of 7040 property sets. Each property set contains a pair of f64s and 14 triplets of f64.
I am only interested in the triplets.
I can read this dataset into an ndarray like this:
let array = Array::from_iter(data);
let mut propertysets = vector.into_shape(IxDyn(&[7040, 44])).unwrap();
and I can remove the first two f64 of each property set like this:
propertysets.remove_index(Axis(1), 0);
propertysets.remove_index(Axis(1), 0);
println!("{:?}", propertysets.shape()); // [7040, 42]
which looks promising. But now I want to reshape the array into [7040, 14, 3], which should work because 3 * 14 = 42, but:
let result = propertysets.into_shape(IxDyn(&[7040, 14, 3])).unwrap();
panics with this message:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ShapeError/IncompatibleLayout: incompatible memory layout'
The documentation of remove_index says:
the elements are not deinitialized or dropped by this, just moved out of view
which probably explains why this fails. But how to do it right? Do I have to copy propertysets somehow into a new ndarray of the correct shape? But how?
Using Array::from_iter(propertysets.iter()) results in an ndarray of &f64 instead of f64.
For the into_shape operation to work arrays have to be c-contiguous or fortran-contigous in memory (see docs). After
vector.into_shape(IxDyn(&[7040, 44])).unwrap();
they are contiguous. But after
propertysets.remove_index(Axis(1), 0);
they are not. Why? The whole array is not moved with remove_index. The elements are just moved out of view (see docs).
How to solve this?
reassemble using from_shape_vec
use the new to_shape func. (not it the docs yet, but here is some info and lots of examples here)
Example:
use ndarray::{Array, Order, ViewRepr};
use ndarray::IxDyn;
use ndarray::Axis;
use ndarray::ArrayBase;
use ndarray::CowRepr;
use ndarray::Dim;
use ndarray::OwnedRepr;
fn into_shape_reassemble(data: Vec<f64>) -> Array<f64, IxDyn>
{
let array = Array::from_iter(data);
let mut result = array.into_shape(IxDyn(&[7040, 44])).unwrap();
result.remove_index(Axis(1), 0);
result.remove_index(Axis(1), 0);
let result = Array::from_shape_vec((7040, 42), result.iter().cloned().collect()).unwrap();
let result = result.into_shape(IxDyn(&[7040, 14, 3])).unwrap();
println!("{:?}", result.shape());
result
}
fn to_shape(data: Vec<f64>) -> ArrayBase<OwnedRepr<f64>, IxDyn>
{
let array = Array::from_iter(data);
let mut result = array.into_shape(IxDyn(&[7040, 44])).unwrap();
result.remove_index(Axis(1), 0);
result.remove_index(Axis(1), 0);
let result = result.to_shape((7040, 14, 3)).unwrap().to_owned();
println!("{:?}", result.shape());
result.into_dyn()
}
#[cfg(test)]
mod tests {
#[test]
fn test_into_shape() {
let data = vec![0.; 7040 * 44];
super::into_shape_reassemble(data);
}
#[test]
fn test_to_shape() {
let data = vec![0.; 7040 * 44];
super::to_shape(data);
}
}
Output:
[7040, 14, 3]
[7040, 14, 3]

How to create statically sized array in Rust?

I have the following array of constants in Rust:
const COUNTS: [usize; 5] = [1, 2, 4, 8, 16];
And the point is that I want to iterate over it like this:
for (i, count) in COUNTS.iter().enumerate() {
const length: usize = *count/2;
let indices: [usize; length] = Vec::from_iter(0..length).try_into().unwrap();
let set: IndexSet<usize> = IndexSet::from(indices);
...
}
The point is that from method of IndexSet requires a statically sized array, i.e., [T; N]. What's the proper way here to create a statically sized array that includes the half of the values? Because the above code throws an error at const length: usize = *count/2 that count is a non-constant.
You probably want to use .collect instead, which is a standard way to instantiate a collection (here IndexSet) from an other collection (here a range of numbers).
use indexmap::set::IndexSet;
const COUNTS: [usize; 5] = [1, 2, 4, 8, 16];
fn main() {
for (i, count) in COUNTS.iter().copied().enumerate() {
let length = count/2;
let set = (0..length).collect::<IndexSet<_>>();
}
}
See the playground.
You need to statically add the array size:
let indices: [usize; 5] = Vec::from_iter(0..length).try_into().unwrap();
Which invalidates the purpouse of your code. That said, IndexSet implements FromIterator:
use indexmap::IndexSet;
const COUNTS: [usize; 5] = [1, 2, 4, 8, 16];
fn main() {
for (i, count) in COUNTS.iter().enumerate() {
let length: usize = *count / 2;
let set: IndexSet<usize> = IndexSet::from_iter(0..length);
}
}
Playground

Is there a way to shuffle two or more lists in the same order?

I want something like this pseudocode:
a = [1, 2, 3, 4];
b = [3, 4, 5, 6];
iter = a.iter_mut().zip(b.iter_mut());
shuffle(iter);
// example shuffle:
// a = [2, 4, 3, 1];
// b = [4, 6, 5, 3];
More specifically, is there some function which performs like:
fn shuffle<T>(iterator: IterMut<T>) { /* ... */ }
My specific case is trying to shuffle an Array2 by rows and a vector (array2:Lndarray:Array2<f32>, vec:Vec<usize>).
Specifically array2.iter_axis(Axis(1)).zip(vec.iter()).
Shuffling a generic iterator in-place is not possible.
However, it's pretty easy to implement shuffling for a slice:
use rand::Rng;
pub fn shufflex<T: Copy>(slice: &mut [T]) {
let mut rng = rand::thread_rng();
let len = slice.len();
for i in 0..len {
let next = rng.gen_range(i, len);
let tmp = slice[i];
slice[i] = slice[next];
slice[next] = tmp;
}
}
But it's also possible to write a more general shuffle function that works on many types:
use std::ops::{Index, IndexMut};
use rand::Rng;
pub fn shuffle<T>(indexable: &mut T)
where
T: IndexMut<usize> + Len + ?Sized,
T::Output: Copy,
{
let mut rng = rand::thread_rng();
let len = indexable.len();
for i in 0..len {
let next = rng.gen_range(i, len);
let tmp = indexable[i];
indexable[i] = indexable[next];
indexable[next] = tmp;
}
}
I wrote a complete example that also allows shuffling across multiple slices in the playground.
EDIT: I think I misunderstood what you want to do. To shuffle several slices in the same way, I would do this:
use rand::Rng;
pub fn shuffle<T: Copy>(slices: &mut [&mut [T]]) {
if slices.len() > 0 {
let mut rng = rand::thread_rng();
let len = slices[0].len();
assert!(slices.iter().all(|s| s.len() == len));
for i in 0..len {
let next = rng.gen_range(i, len);
for slice in slices.iter_mut() {
let tmp: T = slice[i];
slice[i] = slice[next];
slice[next] = tmp;
}
}
}
}
To shuffle in the same order, you can first remember the order and then reuse it for every shuffle. Starting with the Fisher-Yates shuffle from the rand crate:
fn shuffle<R>(&mut self, rng: &mut R)
where R: Rng + ?Sized {
for i in (1..self.len()).rev() {
self.swap(i, gen_index(rng, i + 1));
}
}
It turns out that we need to store random numbers between 0 and i + 1 for each i between 1 and the length of the slice, in reverse order:
// create a vector of indices for shuffling slices of given length
let indices: Vec<usize> = {
let mut rng = rand::thread_rng();
(1..slice_len).rev()
.map(|i| rng.gen_range(0, i + 1))
.collect()
};
Then we can implement a variant of shuffle where, instead of generating new random numbers, we pick them up from the above list of random indices:
// shuffle SLICE taking indices from the provided vector
for (i, &rnd_ind) in (1..slice.len()).rev().zip(&indices) {
slice.swap(i, rnd_ind);
}
Putting the two together, you can shuffle multiple slices in the same order using a method like this (playground):
pub fn shuffle<T>(slices: &mut [&mut [T]]) {
if slices.len() == 0 {
return;
}
let indices: Vec<usize> = {
let mut rng = rand::thread_rng();
(1..slices[0].len())
.rev()
.map(|i| rng.gen_range(0, i + 1))
.collect()
};
for slice in slices {
assert_eq!(slice.len(), indices.len() + 1);
for (i, &rnd_ind) in (1..slice.len()).rev().zip(&indices) {
slice.swap(i, rnd_ind);
}
}
}

Best way to remove elements of Vec depending on other elements of the same Vec

I have a vector of sets and I want to remove all sets that are subsets of other sets in the vector. Example:
a = {0, 3, 5}
b = {0, 5}
c = {0, 2, 3}
In this case I would like to remove b, because it's a subset of a. I'm fine with using a "dumb" n² algorithm.
Sadly, it's pretty tricky to get it working with the borrow checker. The best I've come up with is (Playground):
let mut v: Vec<HashSet<u8>> = vec![];
let mut to_delete = Vec::new();
for (i, set_a) in v.iter().enumerate().rev() {
for set_b in &v[..i] {
if set_a.is_subset(&set_b) {
to_delete.push(i);
break;
}
}
}
for i in to_delete {
v.swap_remove(i);
}
(note: the code above is not correct! See comments for further details)
I see a few disadvantages:
I need an additional vector with additional allocations
Maybe there are more efficient ways than calling swap_remove often
If I need to preserve order, I can't use swap_remove, but have to use remove which is slow
Is there a better way to do this? I'm not just asking about my use case, but about the general case as it's described in the title.
Here is a solution that does not make additional allocations and preserves the order:
fn product_retain<T, F>(v: &mut Vec<T>, mut pred: F)
where F: FnMut(&T, &T) -> bool
{
let mut j = 0;
for i in 0..v.len() {
// invariants:
// items v[0..j] will be kept
// items v[j..i] will be removed
if (0..j).chain(i + 1..v.len()).all(|a| pred(&v[i], &v[a])) {
v.swap(i, j);
j += 1;
}
}
v.truncate(j);
}
fn main() {
// test with a simpler example
// unique elements
let mut v = vec![1, 2, 3];
product_retain(&mut v, |a, b| a != b);
assert_eq!(vec![1, 2, 3], v);
let mut v = vec![1, 3, 2, 4, 5, 1, 2, 4];
product_retain(&mut v, |a, b| a != b);
assert_eq!(vec![3, 5, 1, 2, 4], v);
}
This is a kind of partition algorithm. The elements in the first partition will be kept and in the second partition will be removed.
You can use a while loop instead of the for:
use std::collections::HashSet;
fn main() {
let arr: &[&[u8]] = &[
&[3],
&[1,2,3],
&[1,3],
&[1,4],
&[2,3]
];
let mut v:Vec<HashSet<u8>> = arr.iter()
.map(|x| x.iter().cloned().collect())
.collect();
let mut pos = 0;
while pos < v.len() {
let is_sub = v[pos+1..].iter().any(|x| v[pos].is_subset(x))
|| v[..pos].iter().any(|x| v[pos].is_subset(x));
if is_sub {
v.swap_remove(pos);
} else {
pos+=1;
}
}
println!("{:?}", v);
}
There are no additional allocations.
To avoid using remove and swap_remove, you can change the type of vector to Vec<Option<HashSet<u8>>>:
use std::collections::HashSet;
fn main() {
let arr: &[&[u8]] = &[
&[3],
&[1,2,3],
&[1,3],
&[1,4],
&[2,3]
];
let mut v:Vec<Option<HashSet<u8>>> = arr.iter()
.map(|x| Some(x.iter().cloned().collect()))
.collect();
for pos in 0..v.len(){
let is_sub = match v[pos].as_ref() {
Some(chk) =>
v[..pos].iter().flat_map(|x| x).any(|x| chk.is_subset(x))
|| v[pos+1..].iter().flat_map(|x| x).any(|x| chk.is_subset(x)),
None => false,
};
if is_sub { v[pos]=None };//Replace with None instead remove
}
println!("{:?}", v);//[None, Some({3, 2, 1}), None, Some({1, 4}), None]
}
I need an additional vector with additional allocations
I wouldn't worry about that allocation, since the memory and runtime footprint of that allocation will be really small compared to the rest of your algorithm.
Maybe there are more efficient ways than calling swap_remove often.
If I need to preserve order, I can't use swap_remove, but have to use remove which is slow
I'd change to_delete from Vec<usize> to Vec<bool> and just mark whether a particular hashmap should be removed. You can then use the Vec::retain, which conditionaly removes elements while preserving order. Unfortunately, this function doesn't pass the index to the closure, so we have to create a workaround (playground):
let mut to_delete = vec![false; v.len()];
for (i, set_a) in v.iter().enumerate().rev() {
for set_b in &v[..i] {
if set_a.is_subset(&set_b) {
to_delete[i] = true;
}
}
}
{
// This assumes that retain checks the elements in the order.
let mut i = 0;
v.retain(|_| {
let ret = !to_delete[i];
i += 1;
ret
});
}
If your hashmap has a special value which can never occur under normal conditions, you can use it to mark a hashmap as "to delete", and then check that condition in retain (it would require changing the outer loop from iterator-based to range-based though).
Sidenote (if that HashSet<u8> is not just a toy example): More eficient way to store and compare sets of small integers would be to use a bitset.

Resources