`fold` values into a HashMap

`fold` values into a HashMap - rust

After reading this article Learning Programming Concepts by Jumping in at the Deep End I can't seem to understand how exactly fold() is working in this context. Mainly how fold() knows to grab the word variable from split().
Here's the example:
use std::collections::HashMap;
fn count_words(text: &str) -> HashMap<&str, usize> {
text.split(' ').fold(
HashMap::new(),
|mut map, word| { *map.entry(word).or_insert(0) += 1; map }
)
}
Playground
Rust docs say:
fold() takes two arguments: an initial value, and a closure with two arguments: an ‘accumulator’, and an element. The closure returns the value that the accumulator should have for the next iteration.
Iterator - fold
So I get the mut map is the accumulator and I get that split() returns an iterator and therefore fold() is iterating over those values but how does fold know to grab that value? It's being implicitly passed but I cant seem to wrap my head around this. How is that being mapped to the word variable...
Not sure if I have the right mental model for this...
Thanks!

but how does fold know to grab that value?
fold() is a method on the iterator. That means that it has access to self which is the actual iterator, so it can call self.next() to get the next item (in this case the word, since self is of type Split, so its next() does get the next word). You could imagine fold() being implemented with the following pseudocode:
fn fold<B, F>(mut self, init: B, mut f: F) -> B
where
Self: Sized,
F: FnMut(B, Self::Item) -> B,
{
let mut accum = init;
while let Some(x) = self.next() {
accum = f(accum, x);
}
accum
}
Ok, the above is not pseudocode, it's the actual implementation.

Related

Proper signature for a function accepting an iterator of strings

I'm confused about the proper type to use for an iterator yielding string slices.
fn print_strings<'a>(seq: impl IntoIterator<Item = &'a str>) {
for s in seq {
println!("- {}", s);
}
}
fn main() {
let arr: [&str; 3] = ["a", "b", "c"];
let vec: Vec<&str> = vec!["a", "b", "c"];
let it: std::str::Split<'_, char> = "a b c".split(' ');
print_strings(&arr);
print_strings(&vec);
print_strings(it);
}
Using <Item = &'a str>, the arr and vec calls don't compile. If, instead, I use <Item = &'a'a str>, they work, but the it call doesn't compile.
Of course, I can make the Item type generic too, and do
fn print_strings<'a, I: std::fmt::Display>(seq: impl IntoIterator<Item = I>)
but it's getting silly. Surely there must be a single canonical "iterator of string values" type?

The error you are seeing is expected because seq is &Vec<&str> and &Vec<T> implements IntoIterator with Item=&T, so with your code, you end up with Item=&&str where you are expecting it to be Item=&str in all cases.
The correct way to do this is to expand Item type so that is can handle both &str and &&str. You can do this by using more generics, e.g.
fn print_strings(seq: impl IntoIterator<Item = impl AsRef<str>>) {
for s in seq {
let s = s.as_ref();
println!("- {}", s);
}
}
This requires the Item to be something that you can retrieve a &str from, and then in your loop .as_ref() will return the &str you are looking for.
This also has the added bonus that your code will also work with Vec<String> and any other type that implements AsRef<str>.

TL;DR The signature you use is fine, it's the callers that are providing iterators with wrong Item - but can be easily fixed.
As explained in the other answer, print_string() doesn't accept &arr and &vec because IntoIterator for &[T; n] and &Vec<T> yield references to T. This is because &Vec, itself a reference, is not allowed to consume the Vec in order to move T values out of it. What it can do is hand out references to T items sitting inside the Vec, i.e. items of type &T. In the case of your callers that don't compile, the containers contain &str, so their iterators hand out &&str.
Other than making print_string() more generic, another way to fix the issue is to call it correctly to begin with. For example, these all compile:
print_strings(arr.iter().map(|sref| *sref));
print_strings(vec.iter().copied());
print_strings(it);
Playground
iter() is the method provided by slices (and therefore available on arrays and Vec) that iterates over references to elements, just like IntoIterator of &Vec. We call it explicitly to be able to call map() to convert &&str to &str the obvious way - by using the * operator to dereference the &&str. The copied() iterator adapter is another way of expressing the same, possibly a bit less cryptic than map(|x| *x). (There is also cloned(), equivalent to map(|x| x.clone()).)
It's also possible to call print_strings() if you have a container with String values:
let v = vec!["foo".to_owned(), "bar".to_owned()];
print_strings(v.iter().map(|s| s.as_str()));

What does this Rust Closure argument syntax mean?

I modified code found on the internet to create a function that obtains the statistical mode of any Hashable type that implements Eq, but I do not understand some of the syntax. Here is the function:
use std::hash::Hash;
use std::collections::HashMap;
pub fn mode<'a, I, T>(items: I) -> &'a T
where I: IntoIterator<Item = &'a T>,
T: Hash + Clone + Eq, {
let mut occurrences: HashMap<&T, usize> = HashMap::new();
for value in items.into_iter() {
*occurrences.entry(value).or_insert(0) += 1;
}
occurrences
.into_iter()
.max_by_key(|&(_, count)| count)
.map(|(val, _)| val)
.expect("Cannot compute the mode of zero items")
}
(I think requiring Clone may be overkill.)
The syntax I do not understand is in the closure passed to map_by_key:
|&(_, count)| count
What is the &(_, count) doing? I gather the underscore means I can ignore that parameter. Is this some sort of destructuring of a tuple in a parameter list? Does this make count take the reference of the tuple's second item?

.max_by_key(|&(_, count)| count) is equivalent to .max_by_key(f) where f is this:
fn f<T>(t: &(T, usize)) -> usize {
(*t).1
}
f() could also be written using pattern matching, like this:
fn f2<T>(&(_, count): &(T, usize)) -> usize {
count
}
And f2() is much closer to the first closure you're asking about.
The second closure is essentially the same, except there is no reference slightly complicating matters.

Chain Vector and IntoIterator element in Rust

I want to write a function in Rust that will return the vector composed of start integer, then all intermediate integers and then end integer. The assertion it should hold is this:
assert_eq!(intervals(0, 4, 1..4), vec![0, 1, 2, 3, 4]);
The hint is to use chain method for iterators. The function declaration is predefined, I implemented it in one way, which is the following code:
pub fn intervals< I>(start: u32, end: u32, intermediate: I) -> Vec<u32>
where
I: IntoIterator<Item = u32>,
{
let mut a1 = vec![];
a1.push(start);
let inter: Vec<u32> = intermediate.into_iter().collect();
let mut iter : Vec<u32> = a1.iter().chain(inter.iter()).map(|x| *x).collect();
iter.push(end);
return iter;
}
But I am quite convinced this is not really optimal way to do this. I am sure I am doing lots of unnecessary things in the middle two lines. I tried to use intermediate directly like this:
let mut iter: Vec<u32> = a1.iter().chain(intermediate).map(|x| *x).collect();
But I am getting this error for chain method and I don't know how to solve it:
type mismatch resolving <I as std::iter::IntoIterator>::Item==&u32,
expected u32, found &u32
I am super new in Rust so any advice would be helpful to understand what's the right way to use intermediate parameter here.

Here are a few hints:
You have created three separate vectors (one explicitly, two using collect) when in fact you only need one.
You can use the std::iter::once iterator to produce iterators for the start and end integers
No need to collect the intermediate range. The intermediate argument implements IntoIterator, so you can feed it directly to chain. So, you can chain together the start, intermediate and end.
No need to use the 'return' keyword at the end of a function - the result of a function is the value of the last expression in it (as long as there is no semicolon on the end).
Applying those tips your function would look like this:
use std::iter::once;
pub fn intervals< I>(start: u32, end: u32, intermediate: I) -> Vec<u32>
where
I: IntoIterator<Item = u32>,
{
once(start).chain(intermediate).chain(once(end)).collect()
}
One additional thing to note, to answer your question from the comments:
why trying this: a1.iter().chain(intermediate) gives an error with chain method
Calling Vec::iter() returns an iterator that returns references to the values in the vector. This makes sense: calling iter() does not consume the vector, and its contents remain intact: you could iterate over it multiple times if you wanted.
On the other hand, invoking into_iter() from the IntoIterator trait returns an iterator that returns the values. This also makes sense: into_iter() does consume the object you are calling it on, so the iterator then takes ownership of the items that were previously owned by the object.
Trying to chain together two such iterators does not work because they are each iterating different types. One resolution would be to consume a1 as well, like this:
let mut iter : Vec<u32> = a1.into_iter().chain(intermediate).collect();

Is it possible to map a function over a Vec without allocating a new Vec?

I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?

Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.

No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.

You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.

How do I compare a vector against a reversed version of itself?

Why won't this compile?
fn isPalindrome<T>(v: Vec<T>) -> bool {
return v.reverse() == v;
}
I get
error[E0308]: mismatched types
--> src/main.rs:2:25
|
2 | return v.reverse() == v;
| ^ expected (), found struct `std::vec::Vec`
|
= note: expected type `()`
found type `std::vec::Vec<T>`

Since you only need to look at the front half and back half, you can use the DoubleEndedIterator trait (methods .next() and .next_back()) to look at pairs of front and back elements this way:
/// Determine if an iterable equals itself reversed
fn is_palindrome<I>(iterable: I) -> bool
where
I: IntoIterator,
I::Item: PartialEq,
I::IntoIter: DoubleEndedIterator,
{
let mut iter = iterable.into_iter();
while let (Some(front), Some(back)) = (iter.next(), iter.next_back()) {
if front != back {
return false;
}
}
true
}
(run in playground)
This version is a bit more general, since it supports any iterable that is double ended, for example slice and chars iterators.
It only examines each element once, and it automatically skips the remaining middle element if the iterator was of odd length.

Read up on the documentation for the function you are using:
Reverse the order of elements in a slice, in place.
Or check the function signature:
fn reverse(&mut self)
The return value of the method is the unit type, an empty tuple (). You can't compare that against a vector.
Stylistically, Rust uses 4 space indents, snake_case identifiers for functions and variables, and has an implicit return at the end of blocks. You should adjust to these conventions in a new language.
Additionally, you should take a &[T] instead of a Vec<T> if you are not adding items to the vector.
To solve your problem, we will use iterators to compare the slice. You can get forward and backward iterators of a slice, which requires a very small amount of space compared to reversing the entire array. Iterator::eq allows you to do the comparison succinctly.
You also need to state that the T is comparable against itself, which requires Eq or PartialEq.
fn is_palindrome<T>(v: &[T]) -> bool
where
T: Eq,
{
v.iter().eq(v.iter().rev())
}
fn main() {
println!("{}", is_palindrome(&[1, 2, 3]));
println!("{}", is_palindrome(&[1, 2, 1]));
}
If you wanted to do the less-space efficient version, you have to allocate a new vector yourself:
fn is_palindrome<T>(v: &[T]) -> bool
where
T: Eq + Clone,
{
let mut reverse = v.to_vec();
reverse.reverse();
reverse == v
}
fn main() {
println!("{}", is_palindrome(&[1, 2, 3]));
println!("{}", is_palindrome(&[1, 2, 1]));
}
Note that we are now also required to Clone the items in the vector, so we add that trait bound to the method.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

`fold` values into a HashMap - rust

Related

Proper signature for a function accepting an iterator of strings

What does this Rust Closure argument syntax mean?

Chain Vector and IntoIterator element in Rust

Is it possible to map a function over a Vec without allocating a new Vec?

How do I compare a vector against a reversed version of itself?

Categories

Resources