How can I retain vector elements with their original index?

How can I retain vector elements with their original index? - rust

If I have a Vec I can iterate over elements using an index via v.iter().enumerate(), and I can remove elements via v.retain(). Is there a way to do both at once?
In this case the index could no longer be used to access the element - it would be the index of the element before the loop was started.
I can implement this myself but to be as efficient as .retain() I'd need to use unsafe, which I'd like to avoid.
This is the result I want:
let mut v: Vec<i32> = vec![1, 2, 3, 4, 5, 4, 7, 8];
v.iter()
.retain_with_index(|(index, item)| (index % 2 == 0) || item == 4);
assert(v == vec![1, 3, 4, 5, 4, 7]);

#Timmmm's and #Hauleth's answers are quite pragmatic I wanted to provide a couple of alternatives.
Here's a playground with some benchmarks and tests:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=cffc3c39c4b33d981a1a034f3a092e7b
This is ugly, but if you really want a v.retain_with_index() method, you could do a little copy-pasting of the retain method with a new trait:
trait IndexedRetain<T> {
fn retain_with_index<F>(&mut self, f: F)
where
F: FnMut(usize, &T) -> bool;
}
impl<T> IndexedRetain<T> for Vec<T> {
fn retain_with_index<F>(&mut self, mut f: F)
where
F: FnMut(usize, &T) -> bool, // the signature of the callback changes
{
let len = self.len();
let mut del = 0;
{
let v = &mut **self;
for i in 0..len {
// only implementation change here
if !f(i, &v[i]) {
del += 1;
} else if del > 0 {
v.swap(i - del, i);
}
}
}
if del > 0 {
self.truncate(len - del);
}
}
}
such that the example would look like this:
v.retain_with_index(|index, item| (index % 2 == 0) || item == 4);
Or... better yet, you could use a higher-order function:
fn with_index<T, F>(mut f: F) -> impl FnMut(&T) -> bool
where
F: FnMut(usize, &T) -> bool,
{
let mut i = 0;
move |item| (f(i, item), i += 1).0
}
such that the example would now look like this:
v.retain(with_index(|index, item| (index % 2 == 0) || item == 4));
(my preference is the latter)

I found essentially the same question on the Rust User's Forum. They suggested this solution, which isn't too bad:
let mut index = 0;
v.retain(|item| {
index += 1;
((index - 1) % 2 == 0) || item == 4
});
At the time it wasn't a valid solution because the iteration order of retain() was not guaranteed, but happily for me someone in that thread documented the order so now it is. :-)

If you want to enumerate, filter (retain), and then collect the resulting vector then I would say to do exactly that:
v.iter()
.enumerate()
.filter(|&(idx, &val)| val - idx > 0)
.collect()

Related

What is the correct way to decrement values in an integer hashmap?

I have the following code where nums is a Vec<i32>, k is an i32, and counts is a HashMap<&i32, &i32>:
for n in nums.iter() {
*counts.entry(n).or_insert(0) -= -1;
*counts.entry(k - n).or_insert(0) -= 1;
}
It seems like a HashMap wants the types to be references, which complicates things. As k is an i32 and n is an &i32 I would need to do something like:
*counts.entry(&(k - *n)).or_insert(0) -= 1;
However, this won't work because the lifetime of &(k - *n).

- Super wild mindreading guess -
Your minimal reproducible example:
use std::collections::HashMap;
fn main() {
let nums: Vec<i32> = vec![1, 2, 3];
let k: i32 = 100;
let mut counts = HashMap::<&i32, &i32>::new();
for n in nums.iter() {
*counts.entry(*n).or_insert(0) -= -1;
*counts.entry(k - n).or_insert(0) -= 1;
}
println!("counts: {:?}", counts);
}
Solution:
use std::collections::HashMap;
fn main() {
let nums: Vec<i32> = vec![1, 2, 3];
let k: i32 = 100;
let mut counts = HashMap::<i32, i32>::new();
for n in nums.iter() {
*counts.entry(*n).or_insert(0) -= -1;
*counts.entry(k - n).or_insert(0) -= 1;
}
println!("counts: {:?}", counts);
}
counts: {99: -1, 97: -1, 1: 1, 98: -1, 2: 1, 3: 1}
The mistake you made was that you meant to write HashMap<i32,i32> instead of HashMap<&i32,&i32>.
It seems like a HashMap wants the types to be references
That is an incorrect statement. It only wants you to use references because you explicitly told it to use references with your HashMap<&i32, &i32> type definition.

Reuse iter variable for conditional Skip, Filter, etc

I have an iterator to which I want to conditionally apply filters, skips, etc. before ultimately calling collect().
In a different language, I might write something like
// (Rust-like pseudocode)
let mut iter = [1,2,3,4].iter();
if should_filter {
iter = iter.filter(|x| x % 2 == 0);
}
if should_truncate {
iter = iter.take(2);
}
iter.collect()
But since iter is of type Iter, and skip(), filter() return types Skip, Filter, I've been unable to reuse the original binding for iter. As a result, my Rust code currently looks something like this:
let iter = [1,2,3,4].iter();
// conditionally filter
if should_filter {
let iter = iter.filter(|x| x % 2 == 0);
// conditionally "truncate"
if should_truncate {
let iter = iter.take(2);
return iter.collect();
}
return iter.collect();
}
// conditionally "truncate"
if should_truncate {
let iter = iter.take(2);
return iter.collect();
}
iter.collect()
Is there any way I can avoid this duplication?

You can reliably trust the compiler the optimize loop invariants in this case. Since should_filter can not change while iterating, the compiler will figure out that it can check that precondition before the loop and skip the test if should_filter is true. This means you can simply put the condition into the loop - which seems inefficient - and have much cleaner code. Even if the check does not get removed from the loop body, the CPU's branch predictor will easily skip around it. Similarly you can "inline" the should_truncate condition:
fn do_stuff(
inp: impl IntoIterator<Item = u32>,
should_filter: bool,
should_truncate: bool,
) -> Vec<u32> {
inp.into_iter()
.filter(|x| should_filter && x % 2 == 0)
.take(if should_truncate { 2 } else { usize::MAX })
.collect()
}

Easiest solution, and probably closest to what your "other language" would do under the hood, is to box the iterator:
let mut iter: Box<dyn Iterator<Item = &i32>> = Box::new ([1, 2, 3, 4].iter());
if should_filter {
iter = Box::new (iter.filter(|x| *x % 2 == 0));
}
if should_truncate {
iter = Box::new (iter.take(2));
}
let v: Vec<_> = iter.collect();
Playground

Is there a way to shuffle two or more lists in the same order?

I want something like this pseudocode:
a = [1, 2, 3, 4];
b = [3, 4, 5, 6];
iter = a.iter_mut().zip(b.iter_mut());
shuffle(iter);
// example shuffle:
// a = [2, 4, 3, 1];
// b = [4, 6, 5, 3];
More specifically, is there some function which performs like:
fn shuffle<T>(iterator: IterMut<T>) { /* ... */ }
My specific case is trying to shuffle an Array2 by rows and a vector (array2:Lndarray:Array2<f32>, vec:Vec<usize>).
Specifically array2.iter_axis(Axis(1)).zip(vec.iter()).

Shuffling a generic iterator in-place is not possible.
However, it's pretty easy to implement shuffling for a slice:
use rand::Rng;
pub fn shufflex<T: Copy>(slice: &mut [T]) {
let mut rng = rand::thread_rng();
let len = slice.len();
for i in 0..len {
let next = rng.gen_range(i, len);
let tmp = slice[i];
slice[i] = slice[next];
slice[next] = tmp;
}
}
But it's also possible to write a more general shuffle function that works on many types:
use std::ops::{Index, IndexMut};
use rand::Rng;
pub fn shuffle<T>(indexable: &mut T)
where
T: IndexMut<usize> + Len + ?Sized,
T::Output: Copy,
{
let mut rng = rand::thread_rng();
let len = indexable.len();
for i in 0..len {
let next = rng.gen_range(i, len);
let tmp = indexable[i];
indexable[i] = indexable[next];
indexable[next] = tmp;
}
}
I wrote a complete example that also allows shuffling across multiple slices in the playground.
EDIT: I think I misunderstood what you want to do. To shuffle several slices in the same way, I would do this:
use rand::Rng;
pub fn shuffle<T: Copy>(slices: &mut [&mut [T]]) {
if slices.len() > 0 {
let mut rng = rand::thread_rng();
let len = slices[0].len();
assert!(slices.iter().all(|s| s.len() == len));
for i in 0..len {
let next = rng.gen_range(i, len);
for slice in slices.iter_mut() {
let tmp: T = slice[i];
slice[i] = slice[next];
slice[next] = tmp;
}
}
}
}

To shuffle in the same order, you can first remember the order and then reuse it for every shuffle. Starting with the Fisher-Yates shuffle from the rand crate:
fn shuffle<R>(&mut self, rng: &mut R)
where R: Rng + ?Sized {
for i in (1..self.len()).rev() {
self.swap(i, gen_index(rng, i + 1));
}
}
It turns out that we need to store random numbers between 0 and i + 1 for each i between 1 and the length of the slice, in reverse order:
// create a vector of indices for shuffling slices of given length
let indices: Vec<usize> = {
let mut rng = rand::thread_rng();
(1..slice_len).rev()
.map(|i| rng.gen_range(0, i + 1))
.collect()
};
Then we can implement a variant of shuffle where, instead of generating new random numbers, we pick them up from the above list of random indices:
// shuffle SLICE taking indices from the provided vector
for (i, &rnd_ind) in (1..slice.len()).rev().zip(&indices) {
slice.swap(i, rnd_ind);
}
Putting the two together, you can shuffle multiple slices in the same order using a method like this (playground):
pub fn shuffle<T>(slices: &mut [&mut [T]]) {
if slices.len() == 0 {
return;
}
let indices: Vec<usize> = {
let mut rng = rand::thread_rng();
(1..slices[0].len())
.rev()
.map(|i| rng.gen_range(0, i + 1))
.collect()
};
for slice in slices {
assert_eq!(slice.len(), indices.len() + 1);
for (i, &rnd_ind) in (1..slice.len()).rev().zip(&indices) {
slice.swap(i, rnd_ind);
}
}
}

How can I group consecutive integers in a vector in Rust?

I have a Vec<i64> and I want to know all the groups of integers that are consecutive. As an example:
let v = vec![1, 2, 3, 5, 6, 7, 9, 10];
I'm expecting something like this or similar:
[[1, 2, 3], [5, 6, 7], [9, 10]];
The view (vector of vectors or maybe tuples or something else) really doesn't matter, but I should get several grouped lists with continuous numbers.
At the first look, it seems like I'll need to use itertools and the group_by function, but I have no idea how...

You can indeed use group_by for this, but you might not really want to. Here's what I would probably write instead:
fn consecutive_slices(data: &[i64]) -> Vec<&[i64]> {
let mut slice_start = 0;
let mut result = Vec::new();
for i in 1..data.len() {
if data[i - 1] + 1 != data[i] {
result.push(&data[slice_start..i]);
slice_start = i;
}
}
if data.len() > 0 {
result.push(&data[slice_start..]);
}
result
}
This is similar in principle to eXodiquas' answer, but instead of accumulating a Vec<Vec<i64>>, I use the indices to accumulate a Vec of slice references that refer to the original data. (This question explains why I made consecutive_slices take &[T].)
It's also possible to do the same thing without allocating a Vec, by returning an iterator; however, I like the above version better. Here's the zero-allocation version I came up with:
fn consecutive_slices(data: &[i64]) -> impl Iterator<Item = &[i64]> {
let mut slice_start = 0;
(1..=data.len()).flat_map(move |i| {
if i == data.len() || data[i - 1] + 1 != data[i] {
let begin = slice_start;
slice_start = i;
Some(&data[begin..i])
} else {
None
}
})
}
It's not as readable as a for loop, but it doesn't need to allocate a Vec for the return value, so this version is more flexible.
Here's a "more functional" version using group_by:
use itertools::Itertools;
fn consecutive_slices(data: &[i64]) -> Vec<Vec<i64>> {
(&(0..data.len()).group_by(|&i| data[i] as usize - i))
.into_iter()
.map(|(_, group)| group.map(|i| data[i]).collect())
.collect()
}
The idea is to make a key function for group_by that takes the difference between each element and its index in the slice. Consecutive elements will have the same key because indices increase by 1 each time. One reason I don't like this version is that it's quite difficult to get slices of the original data structure; you almost have to create a Vec<Vec<i64>> (hence the two collects). The other reason is that I find it harder to read.
However, when I first wrote my preferred version (the first one, with the for loop), it had a bug (now fixed), while the other two versions were correct from the start. So there may be merit to writing denser code with functional abstractions, even if there is some hit to readability and/or performance.

let v = vec![1, 2, 3, 5, 6, 7, 9, 10];
let mut res = Vec::new();
let mut prev = v[0];
let mut sub_v = Vec::new();
sub_v.push(prev);
for i in 1..v.len() {
if v[i] == prev + 1 {
sub_v.push(v[i]);
prev = v[i];
} else {
res.push(sub_v.clone());
sub_v.clear();
sub_v.push(v[i]);
prev = v[i];
}
}
res.push(sub_v);
This should solve your problem.
Iterating over the given vector, checking if the current i64 (in my case i32) is +1 to the previous i64, if so push it into a vector (sub_v). After the series breaks, push the sub_v into the result vector. Repeat.
But I guess you wanted something functional?

Another possible solution, that uses std only, could be:
fn consecutive_slices(v: &[i64]) -> Vec<Vec<i64>> {
let t: Vec<Vec<i64>> = v
.into_iter()
.chain([*v.last().unwrap_or(&-1)].iter())
.scan(Vec::new(), |s, &e| {
match s.last() {
None => { s.push(e); Some((false, Vec::new())) },
Some(&p) if p == e - 1 => { s.push(e); Some((false, Vec::new()))},
Some(&p) if p != e - 1 => {let o = s.clone(); *s = vec![e]; Some((true, o))},
_ => None,
}
})
.filter_map(|(n, v)| {
match n {
true => Some(v.clone()),
false => None,
}
})
.collect();
t
}
The chain is used to get the last vector.

I like the answers above but you could also use peekable() to tell if the next value is different.
https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html

I would probably use a fold for this?
That's because I'm very much a functional programmer.
Obviously mutating the accumulator is weird :P but this works too and represents another way of thinking about it.
This is basically a recursive solution and can be modified easily to use immutable datastructures.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=43b9e3613c16cb988da58f08724471a4
fn main() {
let v = vec![1, 2, 3, 5, 6, 7, 9, 10];
let mut res: Vec<Vec<i32>> = vec![];
let (last_group, _): (Vec<i32>, Option<i32>) = v
.iter()
.fold((vec![], None), |(mut cur_group, last), x| {
match last {
None => {
cur_group.push(*x);
(cur_group, Some(*x))
}
Some(last) => {
if x - last == 1 {
cur_group.push(*x);
(cur_group, Some(*x))
} else {
res.push(cur_group);
(vec![*x], Some(*x))
}
}
}
});
res.push(last_group);
println!("{:?}", res);
}

Best way to remove elements of Vec depending on other elements of the same Vec

I have a vector of sets and I want to remove all sets that are subsets of other sets in the vector. Example:
a = {0, 3, 5}
b = {0, 5}
c = {0, 2, 3}
In this case I would like to remove b, because it's a subset of a. I'm fine with using a "dumb" n² algorithm.
Sadly, it's pretty tricky to get it working with the borrow checker. The best I've come up with is (Playground):
let mut v: Vec<HashSet<u8>> = vec![];
let mut to_delete = Vec::new();
for (i, set_a) in v.iter().enumerate().rev() {
for set_b in &v[..i] {
if set_a.is_subset(&set_b) {
to_delete.push(i);
break;
}
}
}
for i in to_delete {
v.swap_remove(i);
}
(note: the code above is not correct! See comments for further details)
I see a few disadvantages:
I need an additional vector with additional allocations
Maybe there are more efficient ways than calling swap_remove often
If I need to preserve order, I can't use swap_remove, but have to use remove which is slow
Is there a better way to do this? I'm not just asking about my use case, but about the general case as it's described in the title.

Here is a solution that does not make additional allocations and preserves the order:
fn product_retain<T, F>(v: &mut Vec<T>, mut pred: F)
where F: FnMut(&T, &T) -> bool
{
let mut j = 0;
for i in 0..v.len() {
// invariants:
// items v[0..j] will be kept
// items v[j..i] will be removed
if (0..j).chain(i + 1..v.len()).all(|a| pred(&v[i], &v[a])) {
v.swap(i, j);
j += 1;
}
}
v.truncate(j);
}
fn main() {
// test with a simpler example
// unique elements
let mut v = vec![1, 2, 3];
product_retain(&mut v, |a, b| a != b);
assert_eq!(vec![1, 2, 3], v);
let mut v = vec![1, 3, 2, 4, 5, 1, 2, 4];
product_retain(&mut v, |a, b| a != b);
assert_eq!(vec![3, 5, 1, 2, 4], v);
}
This is a kind of partition algorithm. The elements in the first partition will be kept and in the second partition will be removed.

You can use a while loop instead of the for:
use std::collections::HashSet;
fn main() {
let arr: &[&[u8]] = &[
&[3],
&[1,2,3],
&[1,3],
&[1,4],
&[2,3]
];
let mut v:Vec<HashSet<u8>> = arr.iter()
.map(|x| x.iter().cloned().collect())
.collect();
let mut pos = 0;
while pos < v.len() {
let is_sub = v[pos+1..].iter().any(|x| v[pos].is_subset(x))
|| v[..pos].iter().any(|x| v[pos].is_subset(x));
if is_sub {
v.swap_remove(pos);
} else {
pos+=1;
}
}
println!("{:?}", v);
}
There are no additional allocations.
To avoid using remove and swap_remove, you can change the type of vector to Vec<Option<HashSet<u8>>>:
use std::collections::HashSet;
fn main() {
let arr: &[&[u8]] = &[
&[3],
&[1,2,3],
&[1,3],
&[1,4],
&[2,3]
];
let mut v:Vec<Option<HashSet<u8>>> = arr.iter()
.map(|x| Some(x.iter().cloned().collect()))
.collect();
for pos in 0..v.len(){
let is_sub = match v[pos].as_ref() {
Some(chk) =>
v[..pos].iter().flat_map(|x| x).any(|x| chk.is_subset(x))
|| v[pos+1..].iter().flat_map(|x| x).any(|x| chk.is_subset(x)),
None => false,
};
if is_sub { v[pos]=None };//Replace with None instead remove
}
println!("{:?}", v);//[None, Some({3, 2, 1}), None, Some({1, 4}), None]
}

I need an additional vector with additional allocations
I wouldn't worry about that allocation, since the memory and runtime footprint of that allocation will be really small compared to the rest of your algorithm.
Maybe there are more efficient ways than calling swap_remove often.
If I need to preserve order, I can't use swap_remove, but have to use remove which is slow
I'd change to_delete from Vec<usize> to Vec<bool> and just mark whether a particular hashmap should be removed. You can then use the Vec::retain, which conditionaly removes elements while preserving order. Unfortunately, this function doesn't pass the index to the closure, so we have to create a workaround (playground):
let mut to_delete = vec![false; v.len()];
for (i, set_a) in v.iter().enumerate().rev() {
for set_b in &v[..i] {
if set_a.is_subset(&set_b) {
to_delete[i] = true;
}
}
}
{
// This assumes that retain checks the elements in the order.
let mut i = 0;
v.retain(|_| {
let ret = !to_delete[i];
i += 1;
ret
});
}
If your hashmap has a special value which can never occur under normal conditions, you can use it to mark a hashmap as "to delete", and then check that condition in retain (it would require changing the outer loop from iterator-based to range-based though).
Sidenote (if that HashSet<u8> is not just a toy example): More eficient way to store and compare sets of small integers would be to use a bitset.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can I retain vector elements with their original index? - rust

If you want to enumerate, filter (retain), and then collect the resulting vector then I would say to do exactly that: v.iter() .enumerate() .filter(|&(idx, &val)| val - idx > 0) .collect()

Related

What is the correct way to decrement values in an integer hashmap?

Reuse iter variable for conditional Skip, Filter, etc

Is there a way to shuffle two or more lists in the same order?

How can I group consecutive integers in a vector in Rust?

Best way to remove elements of Vec depending on other elements of the same Vec

Categories

Resources