How do I match to a pattern like `&(&usize, &u32)`? - rust

Let's say I have vectors of primes and powers:
let mut primes: Vec<usize> = ...;
let mut powers: Vec<u32> = ...;
It is a fact that primes.len() == powers.len().
I'd like to return to the user a list of primes which have a corresponding power value of 0 (this code is missing proper refs and derefs):
primes.iter().zip(powers)
.filter(|(p, power)| power > 0)
.map(|(p, power)| p)
.collect::<Vec<usize>>()
The compiler is complaining a lot, as you might imagine. In particular, the filter is receiving arguments of type &(&usize, &u32), but I am not correctly de-referencing in the pattern matching. I have tried various patterns the compiler suggests (e.g. &(&p, &power), which is the one that makes the most sense to me), but with no luck. How do I correctly perform the pattern matching so that I can do the power > 0 comparison without issue, and so that I can collect in the end a Vec<usize>?

primes.iter().zip(powers)
iter() iterates by reference, so you get &usize elements for primes. OTOH .zip() calls .into_iter() which iterates owned values, so powers are u32, and these iterators combined iterate over (&usize, u32). Technically, there's nothing wrong with iterating over such mixed type, but the inconsistency may be confusing. You can use .into_iter() or .iter().cloned() on primes to avoid the reference, or call .zip(powers.iter()) to get both as references.
Second thing is that .filter() takes items by reference &(_,_) (since it only "looks" at them), and .map() by owned value (_,_) (which allows it to change and return it).
For small values like integers, you'd usually use these methods like this:
.filter(|&item| …)
.map(|item| …)
Note that in closures the syntax is |pattern: type|, so in the example above &item is equivalent to:
.filter(|by_ref| {
let item = *by_ref;
})

That works:
fn main() {
let primes: Vec<usize> = vec![2, 3, 5, 7];
let powers: Vec<u32> = vec![2, 2, 2, 2];
let ret = primes.iter().zip(powers.iter())
.filter_map(|(p, pow)| { // both are refs, so we need to deref
if *pow > 0 {
Some(*p)
} else {
None
}
})
.collect::<Vec<usize>>();
println!("{:?}", ret);
}
Note that I also used powers.iter() which yields elements by reference. You could also use cloned() on both iterators and work with values.

filter_map can be used well with match:
.filter_map(|(p, pow)| match pow.cmp(&0) {
Greater => Some(*p),
_ => None,
})
Playground

Related

Flatten vector of enums in Rust

I am trying to flatten a vector of Enum in Rust, but I am having some issues:
enum Foo {
A(i32),
B(i32, i32),
}
fn main() {
let vf = vec![Foo::A(1), Foo::A(2), Foo::B(3, 4)];
let vi: Vec<i32> = vf
.iter()
.map(|f| match f {
Foo::A(i) => [i].into_iter(),
Foo::B(i, j) => [i, j].into_iter(),
})
.collect(); // this does not compile
// I want vi = [1, 2, 3, 4]. vf must still be valid
}
I could just use a regular for loop and insert elements into an existing vector, but that would be no fun. I'd like to know if there is a more idiomatic Rust way of doing it.
Here's a way to do it that produces an iterator (rather than necessarily a vector, as the fold() based solution does).
use std::iter::once;
enum Foo {
A(i32),
B(i32, i32),
}
fn main() {
let vf = vec![Foo::A(1), Foo::A(2), Foo::B(3, 4)];
let vi: Vec<i32> = vf
.iter()
.flat_map(|f| {
match f {
&Foo::A(i) => once(i).chain(None),
&Foo::B(i, j) => once(i).chain(Some(j)),
}
})
.collect();
dbg!(vi);
}
This does essentially the same thing that you were attempting, but in a way which will succeed. Here are the parts I changed, in the order they appear in the code:
I used .flat_map() instead of .map(). flat_map accepts a function which returns an iterator and produces the elements of that iterator ("flattening") whereas .map() would have just given the iterator.
I used & in the match patterns. This is because, since you are using .iter() on the vector (which is appropriate for your requirement “vf must still be valid”), you have references to enums, and pattern matching on a reference to an enum will normally give you references to its elements, but we almost certainly want to handle the i32s by value instead. There are several other things I could have done, such as using the * dereference operator on the values instead, but this is concise and tidy.
You tried to .into_iter() an array. Unfortunately, in current Rust this does not do what you want and you can't actually return that iterator, for somewhat awkward reasons (which will be fixed in an upcoming Rust version). And then, if it did mean what you wanted, then you'd get an error because the two match arms have unequal types — one is an iterator over [i32; 1] and the other is an iterator over [i32; 2].
Instead, you need to build two possible iterators which are clearly of the same type. There are lots of ways to do this, and the way I picked was to use Iterator::chain to combine once(i), an iterator that returns the single element i, with an Option<i32> (which implements IntoIterator) that contains the second element j if it exists.
Notice that in the first match arm I wrote the seemingly useless expression .chain(None); this is so that the two arms have the same type. Another way to write the same thing, which is arguably clearer since it doesn't duplicate code that has to be identical, is:
let (i, opt_j) = match f {
&Foo::A(i) => (i, None),
&Foo::B(i, j) => (i, Some(j)),
};
once(i).chain(opt_j)
In either case, the iterator's type is std::iter::Chain<std::iter::Once<i32>, std::option::IntoIter<i32>> — you don't need to know this exactly, just notice that there must be a type which handles both the A(i) and the B(i, j) cases.
First of all, you need to change the i32 references to owned values by e.g. dereferencing them. Then you can circumvent proxying through inlined arrays by using fold():
enum Foo {
A(i32),
B(i32, i32),
}
fn main() {
let vf = vec![Foo::A(1), Foo::A(2), Foo::B(3, 4)];
let vi: Vec<i32> = vf
.iter()
.fold(Vec::new(), |mut acc, f| {
match f {
Foo::A(i) => acc.push(*i),
Foo::B(i, j) => {
acc.push(*i);
acc.push(*j);
}
}
acc
});
}

Why can I not use a slice pattern to filter a Window iterator?

I have a vector of numbers and use the windows(2) method to create an iterator that gives me neighbouring pairs. For example, the vector [1, 2, 3] is transformed into [1, 2], [2, 3]. I want to use the find method to find a slice that fulfills a specific condition:
fn step(g: u64) -> Option<(u64, u64)> {
let prime_list: Vec<u64> = vec![2, 3, 5, 7]; //For example
if prime_list.len() < 2 {
return None;
}
let res = prime_list.windows(2).find(|&&[a, b]| b - a == g)?;
//...
None
}
I get an error:
error[E0005]: refutable pattern in function argument: `&&[]` not covered
--> src/lib.rs:6:43
|
6 | let res = prime_list.windows(2).find(|&&[a, b]| b - a == g)?;
| ^^^^^^^^ pattern `&&[]` not covered
I don't know what that error means: the list cannot have less than two elements, for example. Maybe the closure parameter is wrong? I tried to vary it but that didn't change anything. a and b are being properly detected as u64 in my IDE too. What is going on here?
You, the programmer, know that each iterated value will have a length of 2, but how do you know that? You can only tell that from the prose documentation of the function:
Returns an iterator over all contiguous windows of length size. The windows overlap. If the slice is shorter than size, the iterator returns no values.
Nowhere does the compiler know this information. The implementation of Windows only states that the iterated value will be a slice:
impl<'a, T> Iterator for Windows<'a, T> {
type Item = &'a [T];
}
I'd convert the slice into an array reference, discarding any slices that were the wrong length (which you know cannot happen):
use std::convert::TryFrom;
fn step(g: u64) -> Option<(u64, u64)> {
let prime_list: Vec<u64> = vec![2, 3, 5, 7]; // For example
if prime_list.len() < 2 {
return None;
}
let res = prime_list
.windows(2)
.flat_map(<&[u64; 2]>::try_from)
.find(|&&[a, b]| b - a == g)?;
//...
None
}
See also:
How to convert a slice into an array reference?
How can I find a subsequence in a &[u8] slice?
How do I imply the type of the value when there are no type parameters or ascriptions?
Alternatively, you could use an iterator of integers and chunk it up.
See also:
Are there equivalents to slice::chunks/windows for iterators to loop over pairs, triplets etc?
At some point in the future, const generics might be stabilized and allow baking the array length into the function call and the return type.
See also:
Is it possible to control the size of an array using the type parameter of a generic?

Take slice of certain length known at compile time

In this code:
fn unpack_u32(data: &[u8]) -> u32 {
assert_eq!(data.len(), 4);
let res = data[0] as u32 |
(data[1] as u32) << 8 |
(data[2] as u32) << 16 |
(data[3] as u32) << 24;
res
}
fn main() {
let v = vec![0_u8, 1_u8, 2_u8, 3_u8, 4_u8, 5_u8, 6_u8, 7_u8, 8_u8];
println!("res: {:X}", unpack_u32(&v[1..5]));
}
the function unpack_u32 accepts only slices of length 4. Is there any way to replace the runtime check assert_eq with a compile time check?
Yes, kind of. The first step is easy: change the argument type from &[u8] to [u8; 4]:
fn unpack_u32(data: [u8; 4]) -> u32 { ... }
But transforming a slice (like &v[1..5]) into an object of type [u8; 4] is hard. You can of course create such an array simply by specifying all elements, like so:
unpack_u32([v[1], v[2], v[3], v[4]]);
But this is rather ugly to type and doesn't scale well with array size. So the question is "How to get a slice as an array in Rust?". I used a slightly modified version of Matthieu M.'s answer to said question (playground):
fn unpack_u32(data: [u8; 4]) -> u32 {
// as before without assert
}
use std::convert::AsMut;
fn clone_into_array<A, T>(slice: &[T]) -> A
where A: Default + AsMut<[T]>,
T: Clone
{
assert_eq!(slice.len(), std::mem::size_of::<A>()/std::mem::size_of::<T>());
let mut a = Default::default();
<A as AsMut<[T]>>::as_mut(&mut a).clone_from_slice(slice);
a
}
fn main() {
let v = vec![0_u8, 1, 2, 3, 4, 5, 6, 7, 8];
println!("res: {:X}", unpack_u32(clone_into_array(&v[1..5])));
}
As you can see, there is still an assert and thus the possibility of runtime failure. The Rust compiler isn't able to know that v[1..5] is 4 elements long, because 1..5 is just syntactic sugar for Range which is just a type the compiler knows nothing special about.
I think the answer is no as it is; a slice doesn't have a size (or minimum size) as part of the type, so there's nothing for the compiler to check; and similarly a vector is dynamically sized so there's no way to check at compile time that you can take a slice of the right size.
The only way I can see for the information to be even in principle available at compile time is if the function is applied to a compile-time known array. I think you'd still need to implement a procedural macro to do the check (so nightly Rust only, and it's not easy to do).
If the problem is efficiency rather than compile-time checking, you may be able to adjust your code so that, for example, you do one check for n*4 elements being available before n calls to your function; you could use the unsafe get_unchecked to avoid later redundant bounds checks. Obviously you'd need to be careful to avoid mistakes in the implementation.
I had a similar problem, creating a fixed byte-array on stack corresponding to const length of other byte-array (which may change during development time)
A combination of compiler plugin and macro was the solution:
https://github.com/frehberg/rust-sizedbytes

How do I cope with lazy iterators?

I'm trying to sort an array with a map() over an iterator.
struct A {
b: Vec<B>,
}
#[derive(PartialEq, Eq, PartialOrd, Ord)]
struct B {
c: Vec<i32>,
}
fn main() {
let mut a = A { b: Vec::new() };
let b = B { c: vec![5, 2, 3] };
a.b.push(b);
a.b.iter_mut().map(|b| b.c.sort());
}
Gives the warning:
warning: unused `std::iter::Map` that must be used
--> src/main.rs:16:5
|
16 | a.b.iter_mut().map(|b| b.c.sort());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: #[warn(unused_must_use)] on by default
= note: iterators are lazy and do nothing unless consumed
Which is true, sort() isn't actually called here. This warning is described in the book, but I don't understand why this variation with iter_mut() works fine:
a.b.iter_mut().find(|b| b == b).map(|b| b.c.sort());
As the book you linked to says:
If you are trying to execute a closure on an iterator for its side effects, use for instead.
That way it works, and it's much clearer to anyone reading the code. You should use map when you want to transform a vector to a different one.
I don't understand why this variation with iter_mut() works fine:
a.b.iter_mut().find(|b| b == b).map(|b| b.c.sort());
It works because find is not lazy; it's an iterator consumer. It returns an Option not an Iterator. This might be why it is confusing you, because Option also has a map method, which is what you are using here.
As others have said, map is intended for transforming data, without modifying it and without any other side-effects. If you really want to use map, you can map over the collection and assign it back:
fn main() {
let mut a = A { b: Vec::new() };
let mut b = B { c: vec![5, 2, 3] };
a.b.push(b);
a.b =
a.b.into_iter()
.map(|mut b| {
b.c.sort();
b
})
.collect();
}
Note that vector's sort method returns (), so you have to explicitly return the sorted vector from the mapping function.
I use for_each.
According to the doc:
It is equivalent to using a for loop on the iterator, although break and continue are not possible from a closure. It's generally more idiomatic to use a for loop, but for_each may be more legible when processing items at the end of longer iterator chains. In some cases for_each may also be faster than a loop, because it will use internal iteration on adaptors like Chain.

What's the best way to compare 2 vectors or strings element by element?

What's the best way to compare 2 vectors or strings element by element in Rust, while being able to do processing on each pair of elements? For example if you wanted to keep count of the number of differing elements. This is what I'm using:
let mut diff_count: i32 = 0i32;
for (x, y) in a.chars().zip(b.chars()) {
if x != y {
diff_count += 1i32;
}
}
Is that the correct way or is there something more canonical?
To get the count of matching elements, I'd probably use filter and count.
fn main() {
let a = "Hello";
let b = "World";
let matching = a.chars().zip(b.chars()).filter(|&(a, b)| a == b).count();
println!("{}", matching);
let a = [1, 2, 3, 4, 5];
let b = [1, 1, 3, 3, 5];
let matching = a.iter().zip(&b).filter(|&(a, b)| a == b).count();
println!("{}", matching);
}
Iterator::zip takes two iterators and produces another iterator of the tuple of each iterator's values.
Iterator::filter takes a reference to the iterator's value and discards any value where the predicate closure returns false. This performs the comparison.
Iterator::count counts the number of elements in the iterator.
Note that Iterator::zip stops iterating when one iterator is exhausted. If you need different behavior, you may also be interested in
Itertools::zip_longest or Itertools::zip_eq.
If you wanted to use #Shepmaster's answer as the basis of an assertion to be used in a unit test, try this:
fn do_vecs_match<T: PartialEq>(a: &Vec<T>, b: &Vec<T>) -> bool {
let matching = a.iter().zip(b.iter()).filter(|&(a, b)| a == b).count();
matching == a.len() && matching == b.len()
}
Of course, be careful when using this on floats! Those pesky NaNs won't compare, and you might want to use a tolerance for comparing the other values. And you might want to make it fancy by telling the index of the first nonmatching value.

Resources