Getting Enumerate to work as ExactSizeIterator in Rust

Getting Enumerate to work as ExactSizeIterator in Rust - rust

I want to use Rust's Enumerate to get both a character and its index in the slice from each iteration:
fn main() {
for (j, val) in "dummy string".chars().enumerate().rev() {
// ...
}
}
When I compile with cargo run I get:
error: the trait `core::iter::ExactSizeIterator` is not implemented for the type `core::str::Chars<'_>` [E0277]
for (j, val) in "dummy string".chars().enumerate().rev() {
^~~
help: see the detailed explanation for E0277
error: the trait `core::iter::ExactSizeIterator` is not implemented for the type `core::str::Chars<'_>` [E0277]
for (j, val) in "dummy string".chars().enumerate().rev() {
// ...
}
I can understand why this would fail: the rev method needs an ExactSizeIterator since it needs to know the last element in the slice and its index from the beginning. Is it possible to get an ExactSizeIterator in this case, or does the length of the iterator need to be baked in at compile time? If it is possible, is it just a matter of specifying the iterator with something like as ExactSizeIterator or something like that?

The docs for ExactSizeIterator state:
An iterator that knows its exact length.
Many Iterators don't know how many times they will iterate, but some do. If an iterator knows how many times it can iterate, providing access to that information can be useful. For example, if you want to iterate backwards, a good start is to know where the end is.
But that's not the actual trait required by rev!
fn rev(self) -> Rev<Self>
where Self: DoubleEndedIterator
The ExactSizeIterator requirement comes from Enumerate's implementation of DoubleEndedIterator:
impl<I> DoubleEndedIterator for Enumerate<I>
where I: ExactSizeIterator + DoubleEndedIterator
Is it possible to get an ExactSizeIterator in this case, or does the length of the iterator need to be baked in at compile time?
The Chars iterator needs to support both ExactSizeIterator and DoubleEndedIterator, but it only natively supports DoubleEndedIterator.
In order to implement ExactSizeIterator for Chars, you'd need to be able to look at an arbitrary string and know (in a small enough time) how many characters it is made of. This is not generally possible with the UTF-8 encoding, the only encoding of Rust strings.
The length of the iterator is never a compile-time constant.
is it just a matter of specifying the iterator with something like as ExactSizeIterator
You cannot make a type into something it is not.
If you really need this, you could collect it all into a big Vec:
fn main() {
let chars: Vec<_> = "dummy string".chars().collect();
for (j, val) in chars.into_iter().enumerate().rev() {
println!("{}, {}", j, val)
}
}
It's also possible you actually want the characters in reverse order with the count in increasing direction:
fn main() {
for (j, val) in "dummy string".chars().rev().enumerate() {
println!("{}, {}", j, val)
}
}
But you said this:
a character and its index in the slice
Since strings are UTF-8, it's possible you mean you want the number of bytes into the slice. That can be found with the char_indices iterator:
fn main() {
for (j, val) in "dummy string".char_indices().rev() {
println!("{}, {}", j, val)
}
}

Related

`fold` values into a HashMap

After reading this article Learning Programming Concepts by Jumping in at the Deep End I can't seem to understand how exactly fold() is working in this context. Mainly how fold() knows to grab the word variable from split().
Here's the example:
use std::collections::HashMap;
fn count_words(text: &str) -> HashMap<&str, usize> {
text.split(' ').fold(
HashMap::new(),
|mut map, word| { *map.entry(word).or_insert(0) += 1; map }
)
}
Playground
Rust docs say:
fold() takes two arguments: an initial value, and a closure with two arguments: an ‘accumulator’, and an element. The closure returns the value that the accumulator should have for the next iteration.
Iterator - fold
So I get the mut map is the accumulator and I get that split() returns an iterator and therefore fold() is iterating over those values but how does fold know to grab that value? It's being implicitly passed but I cant seem to wrap my head around this. How is that being mapped to the word variable...
Not sure if I have the right mental model for this...
Thanks!

but how does fold know to grab that value?
fold() is a method on the iterator. That means that it has access to self which is the actual iterator, so it can call self.next() to get the next item (in this case the word, since self is of type Split, so its next() does get the next word). You could imagine fold() being implemented with the following pseudocode:
fn fold<B, F>(mut self, init: B, mut f: F) -> B
where
Self: Sized,
F: FnMut(B, Self::Item) -> B,
{
let mut accum = init;
while let Some(x) = self.next() {
accum = f(accum, x);
}
accum
}
Ok, the above is not pseudocode, it's the actual implementation.

Proper signature for a function accepting an iterator of strings

I'm confused about the proper type to use for an iterator yielding string slices.
fn print_strings<'a>(seq: impl IntoIterator<Item = &'a str>) {
for s in seq {
println!("- {}", s);
}
}
fn main() {
let arr: [&str; 3] = ["a", "b", "c"];
let vec: Vec<&str> = vec!["a", "b", "c"];
let it: std::str::Split<'_, char> = "a b c".split(' ');
print_strings(&arr);
print_strings(&vec);
print_strings(it);
}
Using <Item = &'a str>, the arr and vec calls don't compile. If, instead, I use <Item = &'a'a str>, they work, but the it call doesn't compile.
Of course, I can make the Item type generic too, and do
fn print_strings<'a, I: std::fmt::Display>(seq: impl IntoIterator<Item = I>)
but it's getting silly. Surely there must be a single canonical "iterator of string values" type?

The error you are seeing is expected because seq is &Vec<&str> and &Vec<T> implements IntoIterator with Item=&T, so with your code, you end up with Item=&&str where you are expecting it to be Item=&str in all cases.
The correct way to do this is to expand Item type so that is can handle both &str and &&str. You can do this by using more generics, e.g.
fn print_strings(seq: impl IntoIterator<Item = impl AsRef<str>>) {
for s in seq {
let s = s.as_ref();
println!("- {}", s);
}
}
This requires the Item to be something that you can retrieve a &str from, and then in your loop .as_ref() will return the &str you are looking for.
This also has the added bonus that your code will also work with Vec<String> and any other type that implements AsRef<str>.

TL;DR The signature you use is fine, it's the callers that are providing iterators with wrong Item - but can be easily fixed.
As explained in the other answer, print_string() doesn't accept &arr and &vec because IntoIterator for &[T; n] and &Vec<T> yield references to T. This is because &Vec, itself a reference, is not allowed to consume the Vec in order to move T values out of it. What it can do is hand out references to T items sitting inside the Vec, i.e. items of type &T. In the case of your callers that don't compile, the containers contain &str, so their iterators hand out &&str.
Other than making print_string() more generic, another way to fix the issue is to call it correctly to begin with. For example, these all compile:
print_strings(arr.iter().map(|sref| *sref));
print_strings(vec.iter().copied());
print_strings(it);
Playground
iter() is the method provided by slices (and therefore available on arrays and Vec) that iterates over references to elements, just like IntoIterator of &Vec. We call it explicitly to be able to call map() to convert &&str to &str the obvious way - by using the * operator to dereference the &&str. The copied() iterator adapter is another way of expressing the same, possibly a bit less cryptic than map(|x| *x). (There is also cloned(), equivalent to map(|x| x.clone()).)
It's also possible to call print_strings() if you have a container with String values:
let v = vec!["foo".to_owned(), "bar".to_owned()];
print_strings(v.iter().map(|s| s.as_str()));

How can I add onto the end of a large [u8] variable in rust? [duplicate]

The following compiles:
pub fn build_proverb(list: &[&str]) -> String {
if list.is_empty() {
return String::new();
}
let mut result = (0..list.len() - 1)
.map(|i| format!("For want of a {} the {} was lost.", list[i], list[i + 1]))
.collect::<Vec<String>>();
result.push(format!("And all for the want of a {}.", list[0]));
result.join("\n")
}
The following does not (see Playground):
pub fn build_proverb(list: &[&str]) -> String {
if list.is_empty() {
return String::new();
}
let mut result = (0..list.len() - 1)
.map(|i| format!("For want of a {} the {} was lost.", list[i], list[i + 1]))
.collect::<Vec<String>>()
.push(format!("And all for the want of a {}.", list[0]))
.join("\n");
result
}
The compiler tells me
error[E0599]: no method named `join` found for type `()` in the current scope
--> src/lib.rs:9:10
|
9 | .join("\n");
| ^^^^
I get the same type of error if I try to compose just with push.
What I would expect is that collect returns B, aka Vec<String>. Vec is not (), and Vec of course has the methods I want to include in the list of composed functions.
Why can't I compose these functions? The explanation might include describing the "magic" of terminating the expression after collect() to get the compiler to instantiate the Vec in a way that does not happen when I compose with push etc.

If you read the documentation for Vec::push and look at the signature of the method, you will learn that it does not return the Vec:
pub fn push(&mut self, value: T)
Since there is no explicit return type, the return type is the unit type (). There is no method called join on (). You will need to write your code in multiple lines.
See also:
What is the purpose of the unit type in Rust?
I'd write this more functionally:
use itertools::Itertools; // 0.8.0
pub fn build_proverb(list: &[&str]) -> String {
let last = list
.get(0)
.map(|d| format!("And all for the want of a {}.", d));
list.windows(2)
.map(|d| format!("For want of a {} the {} was lost.", d[0], d[1]))
.chain(last)
.join("\n")
}
fn main() {
println!("{}", build_proverb(&["nail", "shoe"]));
}
See also:
What's an idiomatic way to print an iterator separated by spaces in Rust?

Thank you to everyone for the useful interactions. Everything stated in the previous response is precisely correct. And, there is a bigger picture as I'm learning Rust.
Coming from Haskell (with C training years ago), I bumped into the OO method chaining approach that uses a pointer to chain between method calls; no need for pure functions (i.e., what I was doing with let mut result = ..., which was then used/required to change the value of the Vec using push in result.push(...)). What I believe is a more general observation is that, in OO, it is "aok" to return unit because method chaining does not require a return value.
The custom code below defines push as a trait; it uses the same inputs as the "OO" push, but returns the updated self. Perhaps only as a side comment, this makes the function pure (output depends on input) but in practice, means the push defined as a trait enables the FP composition of functions I had come to expect was a norm (fair enough I thought at first given how much Rust borrows from Haskell).
What I was trying to accomplish, and at the heart of the question, is captured by the code solution that #Stargateur, #E_net4 and #Shepmaster put forward. With only the smallest edits is as follows:
(see playground)
pub fn build_proverb(list: &[&str]) -> String {
if list.is_empty() {
return String::new();
}
list.windows(2)
.map(|d| format!("For want of a {} the {} was lost.", d[0], d[1]))
.collect::<Vec<_>>()
.push(format!("And all for the want of a {}.", list[0]))
.join("\n")
}
The solution requires that I define push as a trait that return the self,
type Vec in this instance.
trait MyPush<T> {
fn push(self, x: T) -> Vec<T>;
}
impl<T> MyPush<T> for Vec<T> {
fn push(mut self, x: T) -> Vec<T> {
Vec::push(&mut self, x);
self
}
}
Final observation, in surveying many of the Rust traits, I could not find a trait function that returns () (modulo e.g., Write that returns Result ()).
This contrasts with what I learned here to expect with struct and enum methods. Both traits and the OO methods have access to self and thus have each been described as "methods", but there seems to be an inherent difference worth noting: OO methods use a reference to enable sequentially changing self, FP traits (if you will) uses function composition that relies on the use of "pure", state-changing functions to accomplish the same (:: (self, newValue) -> self).
Perhaps as an aside, where Haskell achieves referential transparency in this situation by creating a new copy (modulo behind the scenes optimizations), Rust seems to accomplish something similar in the custom trait code by managing ownership (transferred to the trait function, and handed back by returning self).
A final piece to the "composing functions" puzzle: For composition to work, the output of one function needs to have the type required for the input of the next function. join worked both when I was passing it a value, and when I was passing it a reference (true with types that implement IntoIterator). So join seems to have the capacity to work in both the method chaining and function composition styles of programming.
Is this distinction between OO methods that don't rely on a return value and traits generally true in Rust? It seems to be the case "here and there". Case in point, in contrast to push where the line is clear, join seems to be on its way to being part of the standard library defined as both a method for SliceConcatExt and a trait function for SliceConcatExt (see rust src and the rust-lang issue discussion). The next question, would unifying the approaches in the standard library be consistent with the Rust design philosophy? (pay only for what you use, safe, performant, expressive and a joy to use)

Why does std::vec::Vec implement two kinds of the Extend trait?

The struct std::vec::Vec implements two kinds of Extend, as specified here – impl<'a, T> Extend<&'a T> for Vec<T> and impl<T> Extend<T> for Vec<T>. The documentation states that the first kind is an "Extend implementation that copies elements out of references before pushing them onto the Vec". I'm rather new to Rust, and I'm not sure if I'm understanding it correctly.
I would guess that the first kind is used with the equivalent of C++ normal iterators, and the second kind is used with the equivalent of C++ move iterators.
I'm trying to write a function that accepts any data structure that will allow inserting i32s to the back, so I take a parameter that implements both kinds of Extend, but I can't figure out how to specify the generic parameters to get it to work:
fn main() {
let mut vec = std::vec::Vec::<i32>::new();
add_stuff(&mut vec);
}
fn add_stuff<'a, Rec: std::iter::Extend<i32> + std::iter::Extend<&'a i32>>(receiver: &mut Rec) {
let x = 1 + 4;
receiver.extend(&[x]);
}
The compiler complains that &[x] "creates a temporary which is freed while still in use" which makes sense because 'a comes from outside the function add_stuff. But of course what I want is for receiver.extend(&[x]) to copy the element out of the temporary array slice and add it to the end of the container, so the temporary array will no longer be used after receiver.extend returns. What is the proper way to express what I want?

From the outside of add_stuff, Rect must be able to be extended with a reference whose lifetime is given in the inside of add_stuff. Thus, you could require that Rec must be able to be extended with references of any lifetime using higher-ranked trait bounds:
fn main() {
let mut vec = std::vec::Vec::<i32>::new();
add_stuff(&mut vec);
}
fn add_stuff<Rec>(receiver: &mut Rec)
where
for<'a> Rec: std::iter::Extend<&'a i32>
{
let x = 1 + 4;
receiver.extend(&[x]);
}
Moreover, as you see, the trait bounds were overly tight. One of them should be enough if you use receiver consistently within add_stuff.
That said, I would simply require Extend<i32> and make sure that add_stuff does the right thing internally (if possible):
fn add_stuff<Rec>(receiver: &mut Rec)
where
Rec: std::iter::Extend<i32>
{
let x = 1 + 4;
receiver.extend(std::iter::once(x));
}

Take slice of certain length known at compile time

In this code:
fn unpack_u32(data: &[u8]) -> u32 {
assert_eq!(data.len(), 4);
let res = data[0] as u32 |
(data[1] as u32) << 8 |
(data[2] as u32) << 16 |
(data[3] as u32) << 24;
res
}
fn main() {
let v = vec![0_u8, 1_u8, 2_u8, 3_u8, 4_u8, 5_u8, 6_u8, 7_u8, 8_u8];
println!("res: {:X}", unpack_u32(&v[1..5]));
}
the function unpack_u32 accepts only slices of length 4. Is there any way to replace the runtime check assert_eq with a compile time check?

Yes, kind of. The first step is easy: change the argument type from &[u8] to [u8; 4]:
fn unpack_u32(data: [u8; 4]) -> u32 { ... }
But transforming a slice (like &v[1..5]) into an object of type [u8; 4] is hard. You can of course create such an array simply by specifying all elements, like so:
unpack_u32([v[1], v[2], v[3], v[4]]);
But this is rather ugly to type and doesn't scale well with array size. So the question is "How to get a slice as an array in Rust?". I used a slightly modified version of Matthieu M.'s answer to said question (playground):
fn unpack_u32(data: [u8; 4]) -> u32 {
// as before without assert
}
use std::convert::AsMut;
fn clone_into_array<A, T>(slice: &[T]) -> A
where A: Default + AsMut<[T]>,
T: Clone
{
assert_eq!(slice.len(), std::mem::size_of::<A>()/std::mem::size_of::<T>());
let mut a = Default::default();
<A as AsMut<[T]>>::as_mut(&mut a).clone_from_slice(slice);
a
}
fn main() {
let v = vec![0_u8, 1, 2, 3, 4, 5, 6, 7, 8];
println!("res: {:X}", unpack_u32(clone_into_array(&v[1..5])));
}
As you can see, there is still an assert and thus the possibility of runtime failure. The Rust compiler isn't able to know that v[1..5] is 4 elements long, because 1..5 is just syntactic sugar for Range which is just a type the compiler knows nothing special about.

I think the answer is no as it is; a slice doesn't have a size (or minimum size) as part of the type, so there's nothing for the compiler to check; and similarly a vector is dynamically sized so there's no way to check at compile time that you can take a slice of the right size.
The only way I can see for the information to be even in principle available at compile time is if the function is applied to a compile-time known array. I think you'd still need to implement a procedural macro to do the check (so nightly Rust only, and it's not easy to do).
If the problem is efficiency rather than compile-time checking, you may be able to adjust your code so that, for example, you do one check for n*4 elements being available before n calls to your function; you could use the unsafe get_unchecked to avoid later redundant bounds checks. Obviously you'd need to be careful to avoid mistakes in the implementation.

I had a similar problem, creating a fixed byte-array on stack corresponding to const length of other byte-array (which may change during development time)
A combination of compiler plugin and macro was the solution:
https://github.com/frehberg/rust-sizedbytes

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Getting Enumerate to work as ExactSizeIterator in Rust - rust

Related

`fold` values into a HashMap

Proper signature for a function accepting an iterator of strings

How can I add onto the end of a large [u8] variable in rust? [duplicate]

Why does std::vec::Vec implement two kinds of the Extend trait?

Take slice of certain length known at compile time

Categories

Resources