How can I group consecutive integers in a vector in Rust? - rust

I have a Vec<i64> and I want to know all the groups of integers that are consecutive. As an example:
let v = vec![1, 2, 3, 5, 6, 7, 9, 10];
I'm expecting something like this or similar:
[[1, 2, 3], [5, 6, 7], [9, 10]];
The view (vector of vectors or maybe tuples or something else) really doesn't matter, but I should get several grouped lists with continuous numbers.
At the first look, it seems like I'll need to use itertools and the group_by function, but I have no idea how...

You can indeed use group_by for this, but you might not really want to. Here's what I would probably write instead:
fn consecutive_slices(data: &[i64]) -> Vec<&[i64]> {
let mut slice_start = 0;
let mut result = Vec::new();
for i in 1..data.len() {
if data[i - 1] + 1 != data[i] {
result.push(&data[slice_start..i]);
slice_start = i;
}
}
if data.len() > 0 {
result.push(&data[slice_start..]);
}
result
}
This is similar in principle to eXodiquas' answer, but instead of accumulating a Vec<Vec<i64>>, I use the indices to accumulate a Vec of slice references that refer to the original data. (This question explains why I made consecutive_slices take &[T].)
It's also possible to do the same thing without allocating a Vec, by returning an iterator; however, I like the above version better. Here's the zero-allocation version I came up with:
fn consecutive_slices(data: &[i64]) -> impl Iterator<Item = &[i64]> {
let mut slice_start = 0;
(1..=data.len()).flat_map(move |i| {
if i == data.len() || data[i - 1] + 1 != data[i] {
let begin = slice_start;
slice_start = i;
Some(&data[begin..i])
} else {
None
}
})
}
It's not as readable as a for loop, but it doesn't need to allocate a Vec for the return value, so this version is more flexible.
Here's a "more functional" version using group_by:
use itertools::Itertools;
fn consecutive_slices(data: &[i64]) -> Vec<Vec<i64>> {
(&(0..data.len()).group_by(|&i| data[i] as usize - i))
.into_iter()
.map(|(_, group)| group.map(|i| data[i]).collect())
.collect()
}
The idea is to make a key function for group_by that takes the difference between each element and its index in the slice. Consecutive elements will have the same key because indices increase by 1 each time. One reason I don't like this version is that it's quite difficult to get slices of the original data structure; you almost have to create a Vec<Vec<i64>> (hence the two collects). The other reason is that I find it harder to read.
However, when I first wrote my preferred version (the first one, with the for loop), it had a bug (now fixed), while the other two versions were correct from the start. So there may be merit to writing denser code with functional abstractions, even if there is some hit to readability and/or performance.

let v = vec![1, 2, 3, 5, 6, 7, 9, 10];
let mut res = Vec::new();
let mut prev = v[0];
let mut sub_v = Vec::new();
sub_v.push(prev);
for i in 1..v.len() {
if v[i] == prev + 1 {
sub_v.push(v[i]);
prev = v[i];
} else {
res.push(sub_v.clone());
sub_v.clear();
sub_v.push(v[i]);
prev = v[i];
}
}
res.push(sub_v);
This should solve your problem.
Iterating over the given vector, checking if the current i64 (in my case i32) is +1 to the previous i64, if so push it into a vector (sub_v). After the series breaks, push the sub_v into the result vector. Repeat.
But I guess you wanted something functional?

Another possible solution, that uses std only, could be:
fn consecutive_slices(v: &[i64]) -> Vec<Vec<i64>> {
let t: Vec<Vec<i64>> = v
.into_iter()
.chain([*v.last().unwrap_or(&-1)].iter())
.scan(Vec::new(), |s, &e| {
match s.last() {
None => { s.push(e); Some((false, Vec::new())) },
Some(&p) if p == e - 1 => { s.push(e); Some((false, Vec::new()))},
Some(&p) if p != e - 1 => {let o = s.clone(); *s = vec![e]; Some((true, o))},
_ => None,
}
})
.filter_map(|(n, v)| {
match n {
true => Some(v.clone()),
false => None,
}
})
.collect();
t
}
The chain is used to get the last vector.

I like the answers above but you could also use peekable() to tell if the next value is different.
https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html

I would probably use a fold for this?
That's because I'm very much a functional programmer.
Obviously mutating the accumulator is weird :P but this works too and represents another way of thinking about it.
This is basically a recursive solution and can be modified easily to use immutable datastructures.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=43b9e3613c16cb988da58f08724471a4
fn main() {
let v = vec![1, 2, 3, 5, 6, 7, 9, 10];
let mut res: Vec<Vec<i32>> = vec![];
let (last_group, _): (Vec<i32>, Option<i32>) = v
.iter()
.fold((vec![], None), |(mut cur_group, last), x| {
match last {
None => {
cur_group.push(*x);
(cur_group, Some(*x))
}
Some(last) => {
if x - last == 1 {
cur_group.push(*x);
(cur_group, Some(*x))
} else {
res.push(cur_group);
(vec![*x], Some(*x))
}
}
}
});
res.push(last_group);
println!("{:?}", res);
}

Related

How to let several threads write to the same variable without mutex in Rust?

I am trying to implement an outer function that could calculate the outer product of two 1D arrays. Something like this:
use std::thread;
use ndarray::prelude::*;
pub fn multithread_outer(A: &Array1<f64>, B: &Array1<f64>) -> Array2<f64> {
let mut result = Array2::<f64>::default((A.len(), B.len()));
let thread_num = 5;
let n = A.len() / thread_num;
// a & b are ArcArray2<f64>
let a = A.to_owned().into_shared();
let b = B.to_owned().into_shared();
for i in 0..thread_num{
let a = a.clone();
let b = b.clone();
thread::spawn(move || {
for j in i * n..(i + 1) * n {
for k in 0..b.len() {
// This is the line I want to change
result[[j, k]] = a[j] * b[k];
}
}
});
}
// Use join to make sure all threads finish here
// Not so related to this question, so I didn't put it here
result
}
You can see that by design, two threads will never write to the same element. However, rust compiler will not allow two mutable references to the same result variable. And using mutex will make this much slower. What is the right way to implement this function?
While it is possible to do manually (with thread::scope and split_at_mut, for example), ndarray already has parallel iteration integrated into its library, based on rayon:
https://docs.rs/ndarray/latest/ndarray/parallel
Here is how your code would look like with parallel iterators:
use ndarray::parallel::prelude::*;
use ndarray::prelude::*;
pub fn multithread_outer(a: &Array1<f64>, b: &Array1<f64>) -> Array2<f64> {
let mut result = Array2::<f64>::default((a.len(), b.len()));
result
.axis_iter_mut(Axis(0))
.into_par_iter()
.enumerate()
.for_each(|(row_id, mut row)| {
for (col_id, cell) in row.iter_mut().enumerate() {
*cell = a[row_id] * b[col_id];
}
});
result
}
fn main() {
let a = Array1::from_vec(vec![1., 2., 3.]);
let b = Array1::from_vec(vec![4., 5., 6., 7.]);
let c = multithread_outer(&a, &b);
println!("{}", c)
}
[[4, 5, 6, 7],
[8, 10, 12, 14],
[12, 15, 18, 21]]

How can I retain vector elements with their original index?

If I have a Vec I can iterate over elements using an index via v.iter().enumerate(), and I can remove elements via v.retain(). Is there a way to do both at once?
In this case the index could no longer be used to access the element - it would be the index of the element before the loop was started.
I can implement this myself but to be as efficient as .retain() I'd need to use unsafe, which I'd like to avoid.
This is the result I want:
let mut v: Vec<i32> = vec![1, 2, 3, 4, 5, 4, 7, 8];
v.iter()
.retain_with_index(|(index, item)| (index % 2 == 0) || item == 4);
assert(v == vec![1, 3, 4, 5, 4, 7]);
#Timmmm's and #Hauleth's answers are quite pragmatic I wanted to provide a couple of alternatives.
Here's a playground with some benchmarks and tests:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=cffc3c39c4b33d981a1a034f3a092e7b
This is ugly, but if you really want a v.retain_with_index() method, you could do a little copy-pasting of the retain method with a new trait:
trait IndexedRetain<T> {
fn retain_with_index<F>(&mut self, f: F)
where
F: FnMut(usize, &T) -> bool;
}
impl<T> IndexedRetain<T> for Vec<T> {
fn retain_with_index<F>(&mut self, mut f: F)
where
F: FnMut(usize, &T) -> bool, // the signature of the callback changes
{
let len = self.len();
let mut del = 0;
{
let v = &mut **self;
for i in 0..len {
// only implementation change here
if !f(i, &v[i]) {
del += 1;
} else if del > 0 {
v.swap(i - del, i);
}
}
}
if del > 0 {
self.truncate(len - del);
}
}
}
such that the example would look like this:
v.retain_with_index(|index, item| (index % 2 == 0) || item == 4);
Or... better yet, you could use a higher-order function:
fn with_index<T, F>(mut f: F) -> impl FnMut(&T) -> bool
where
F: FnMut(usize, &T) -> bool,
{
let mut i = 0;
move |item| (f(i, item), i += 1).0
}
such that the example would now look like this:
v.retain(with_index(|index, item| (index % 2 == 0) || item == 4));
(my preference is the latter)
I found essentially the same question on the Rust User's Forum. They suggested this solution, which isn't too bad:
let mut index = 0;
v.retain(|item| {
index += 1;
((index - 1) % 2 == 0) || item == 4
});
At the time it wasn't a valid solution because the iteration order of retain() was not guaranteed, but happily for me someone in that thread documented the order so now it is. :-)
If you want to enumerate, filter (retain), and then collect the resulting vector then I would say to do exactly that:
v.iter()
.enumerate()
.filter(|&(idx, &val)| val - idx > 0)
.collect()

Accessing an array while iterating over it mutably

Disclaimer: I am fairly new to Rust.
Simplified Use-Case
From the best practices I read about Rust so far, I understood that iterating with for elem in array {} is preferred over to for i in 0..array.len(){}.
Is there any way to iterate over an array mutably, while simultaneously accessing specific elements from it by index?
My usecase is quite complex, so I wrote a simple fibonacci calculator to demonstrate the problem:
let mut arr = vec![0;10];
arr[0] = 1;
arr[1] = 1;
for (i, elem) in arr.iter_mut().skip(2).enumerate() {
*elem = arr[i-2] + arr[i-1];
}
println!("{:?}", arr);
error[E0502]: cannot borrow arr as immutable because it is also borrowed as mutable
Of course this makes sense, but is there a way around that? I mean, from a programmers perspective, it is obvious that this code is safe, because we borrow immutable from an array that we already have in the current context as mutable, just not directly, but through an iterator.
Of course, if I implement it by iterating over the indices, it works:
let mut arr = vec![0;10];
arr[0] = 1;
arr[1] = 1;
for i in 2..arr.len(){
arr[i] = arr[i-2] + arr[i-1];
}
println!("{:?}", arr);
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
So, my question, is there another way to solve this problem, or do I have to use the second version?
Real use-case
This code is to demonstrate my use-case and does not do anything on its own.
let mut labels = vec![vec![0; width]; height];
for (y, row) in labels.iter_mut().enumerate() {
for (x, label) in row.iter_mut().enumerate() {
let label_left = {
if x > 0 && some_condition() {
Some(labels[y][x - 1]) // <== Fails
} else {
None
}
};
let label_top = {
if y > 0 && some_condition() {
Some(labels[y - 1][x]) // <== Fails
} else {
None
}
};
*label = some_function(label_left, label_right);
}
}
Rewriting this with a 2D-index based iteration feels a lot like I'm trying to force C programming style into Rust, so I can't believe it's the intended way.
A more 'functional' way to implement your simplified use-case could be:
fn main() {
let mut arr = vec![0; 10];
arr[0] = 1;
arr[1] = 1;
let arr: Vec<i32> = arr
.iter()
.skip(2)
.scan((arr[0], arr[1]), |pair, _| {
let (a, b) = *pair;
let c = a + b;
*pair = (b, c);
Some(c)
})
.collect();
println!("{:?}", arr);
}
But this is not necessarily more rusty or easier to read than iterating with index. That said, if you are willing to go down the FP rabbit hole, it can be very rewarding.

Best way to remove elements of Vec depending on other elements of the same Vec

I have a vector of sets and I want to remove all sets that are subsets of other sets in the vector. Example:
a = {0, 3, 5}
b = {0, 5}
c = {0, 2, 3}
In this case I would like to remove b, because it's a subset of a. I'm fine with using a "dumb" n² algorithm.
Sadly, it's pretty tricky to get it working with the borrow checker. The best I've come up with is (Playground):
let mut v: Vec<HashSet<u8>> = vec![];
let mut to_delete = Vec::new();
for (i, set_a) in v.iter().enumerate().rev() {
for set_b in &v[..i] {
if set_a.is_subset(&set_b) {
to_delete.push(i);
break;
}
}
}
for i in to_delete {
v.swap_remove(i);
}
(note: the code above is not correct! See comments for further details)
I see a few disadvantages:
I need an additional vector with additional allocations
Maybe there are more efficient ways than calling swap_remove often
If I need to preserve order, I can't use swap_remove, but have to use remove which is slow
Is there a better way to do this? I'm not just asking about my use case, but about the general case as it's described in the title.
Here is a solution that does not make additional allocations and preserves the order:
fn product_retain<T, F>(v: &mut Vec<T>, mut pred: F)
where F: FnMut(&T, &T) -> bool
{
let mut j = 0;
for i in 0..v.len() {
// invariants:
// items v[0..j] will be kept
// items v[j..i] will be removed
if (0..j).chain(i + 1..v.len()).all(|a| pred(&v[i], &v[a])) {
v.swap(i, j);
j += 1;
}
}
v.truncate(j);
}
fn main() {
// test with a simpler example
// unique elements
let mut v = vec![1, 2, 3];
product_retain(&mut v, |a, b| a != b);
assert_eq!(vec![1, 2, 3], v);
let mut v = vec![1, 3, 2, 4, 5, 1, 2, 4];
product_retain(&mut v, |a, b| a != b);
assert_eq!(vec![3, 5, 1, 2, 4], v);
}
This is a kind of partition algorithm. The elements in the first partition will be kept and in the second partition will be removed.
You can use a while loop instead of the for:
use std::collections::HashSet;
fn main() {
let arr: &[&[u8]] = &[
&[3],
&[1,2,3],
&[1,3],
&[1,4],
&[2,3]
];
let mut v:Vec<HashSet<u8>> = arr.iter()
.map(|x| x.iter().cloned().collect())
.collect();
let mut pos = 0;
while pos < v.len() {
let is_sub = v[pos+1..].iter().any(|x| v[pos].is_subset(x))
|| v[..pos].iter().any(|x| v[pos].is_subset(x));
if is_sub {
v.swap_remove(pos);
} else {
pos+=1;
}
}
println!("{:?}", v);
}
There are no additional allocations.
To avoid using remove and swap_remove, you can change the type of vector to Vec<Option<HashSet<u8>>>:
use std::collections::HashSet;
fn main() {
let arr: &[&[u8]] = &[
&[3],
&[1,2,3],
&[1,3],
&[1,4],
&[2,3]
];
let mut v:Vec<Option<HashSet<u8>>> = arr.iter()
.map(|x| Some(x.iter().cloned().collect()))
.collect();
for pos in 0..v.len(){
let is_sub = match v[pos].as_ref() {
Some(chk) =>
v[..pos].iter().flat_map(|x| x).any(|x| chk.is_subset(x))
|| v[pos+1..].iter().flat_map(|x| x).any(|x| chk.is_subset(x)),
None => false,
};
if is_sub { v[pos]=None };//Replace with None instead remove
}
println!("{:?}", v);//[None, Some({3, 2, 1}), None, Some({1, 4}), None]
}
I need an additional vector with additional allocations
I wouldn't worry about that allocation, since the memory and runtime footprint of that allocation will be really small compared to the rest of your algorithm.
Maybe there are more efficient ways than calling swap_remove often.
If I need to preserve order, I can't use swap_remove, but have to use remove which is slow
I'd change to_delete from Vec<usize> to Vec<bool> and just mark whether a particular hashmap should be removed. You can then use the Vec::retain, which conditionaly removes elements while preserving order. Unfortunately, this function doesn't pass the index to the closure, so we have to create a workaround (playground):
let mut to_delete = vec![false; v.len()];
for (i, set_a) in v.iter().enumerate().rev() {
for set_b in &v[..i] {
if set_a.is_subset(&set_b) {
to_delete[i] = true;
}
}
}
{
// This assumes that retain checks the elements in the order.
let mut i = 0;
v.retain(|_| {
let ret = !to_delete[i];
i += 1;
ret
});
}
If your hashmap has a special value which can never occur under normal conditions, you can use it to mark a hashmap as "to delete", and then check that condition in retain (it would require changing the outer loop from iterator-based to range-based though).
Sidenote (if that HashSet<u8> is not just a toy example): More eficient way to store and compare sets of small integers would be to use a bitset.

Is there any way to unpack an iterator into a tuple?

Is there any way to accomplish something like the following:
let v = vec![1, 2, 3];
let (a, b) = v.iter().take(2);
Such that a = 1 and b = 2 at the end?
I know I could just use a vector but I would like to have named variables.
The itertools crate has methods like tuples and next_tuple that can help with this.
use itertools::Itertools; // 0.9.0
fn main() {
let v = vec![1, 2, 3];
let (a, b) = v.iter().next_tuple().unwrap();
assert_eq!(a, &1);
assert_eq!(b, &2);
}
This may not be exactly what you asked for, but I suppose you rarely want to convert an arbitrarily large vector to a tuple anyway. If you just want to extract the first few elements of a vector into a tuple, you can do so using slice pattern matching:
fn main() {
let v = vec![1, 2, 3];
let (a, b) = match &v[..] {
&[first, second, ..] => (first, second),
_ => unreachable!(),
};
assert_eq!((a, b), (1, 2));
}
I wrote this ugly recursive macro that converts a Vec to a tuple because I wanted to learn something about macros.
macro_rules! tuplet {
{ ($y:ident $(, $x:ident)*) = $v:expr } => {
let ($y, $($x),*) = tuplet!($v ; 1 ; ($($x),*) ; ($v[0]) );
};
{ $v:expr ; $j:expr ; ($y:ident $(, $x:ident)*) ; ($($a:expr),*) } => {
tuplet!( $v ; $j+1 ; ($($x),*) ; ($($a),*,$v[$j]) )
};
{ $v:expr ; $j:expr ; () ; $accu:expr } => {
$accu
}
}
I am new to this and probably very bad at it, so there's most likely a better way to do it. This is just a proof of concept. It allows you to write:
fn main() {
let v = vec![1, 2, 3];
tuplet!((a, b, c) = v);
assert_eq!(a, 1);
assert_eq!(b, 2);
assert_eq!(c, 3);
}
Somewhere in that macro definition you find the part $v[$j], which you could replace by $v.nth($j) if you want to use it for iterators.
gcp is on the right track; his answer seems like the correct one to me.
I'm going to give a more compelling example, though, since the OP seemed in a comment to wonder whether what he asked for is even worthwhile ("I can't think of a good enough reason for this functionality to be possible."). Check out the Person::from_csv function below:
use itertools::Itertools;
#[derive(Debug)]
struct Person<'a> {
first: &'a str,
last: &'a str,
}
impl<'a> Person<'a> {
// Create a Person from a str of form "last,first".
fn from_csv(s: &'a str) -> Option<Self> {
s.split(',').collect_tuple().map(
|(last, first)| Person { first, last }
)
}
}
fn main() {
dbg!(Person::from_csv("Doe")); // None
dbg!(Person::from_csv("Doe,John")); // Some(...)
dbg!(Person::from_csv("Doe,John,foo")); // None
}
It takes the Iterator produced by split and collects the results into a tuple so that we can match and destructure it. If there are too many or too few commas, you won't get a matching tuple. This code is clean because collect_tuple lets us use pattern matching and destructuring.
Here it is in the playground.

Resources