Why does this HashMap key have to be dereferenced twice? - rust

This function computes the mode of a Vec<i32> using a HashMap to keep count of the occurrence of each value. I do not understand why this will not compile unless the key is deferenced twice in this last line:
fn mode(vec: &Vec<i32>) -> i32 {
let mut counts = HashMap::new();
for n in vec {
let count = counts.entry(n).or_insert(0);
*count += 1;
}
**counts.iter().max_by_key(|a| a.1).unwrap().0
}

It has to be dereferenced twice because you've created a double reference.
You are iterating over &Vec<T> which produces &T.
You called HashMap::iter on HashMap<K, V> which produces (&K, &V).
fn mode(vec: &[i32]) -> i32 {
let mut counts = std::collections::HashMap::new();
for &n in vec {
*counts.entry(n).or_insert(0) += 1;
}
counts.into_iter().max_by_key(|a| a.1).unwrap().0
}
See also:
What is the difference between iter and into_iter?
What does it mean to pass in a vector into a `for` loop versus a reference to a vector?
Iterating over a slice's values instead of references in Rust?
Meaning of '&variable' in arguments/patterns
What is the difference between `e1` and `&e2` when used as the for-loop variable?
What is the purpose of `&` before the loop variable?
Why is it discouraged to accept a reference to a String (&String), Vec (&Vec), or Box (&Box) as a function argument?

Related

Construct string slice from vector of string slices

I have a vector holding n string slices. I would like to construct a string slice based on these.
fn main() {
let v: Vec<&str> = vec!["foo", "bar"];
let h: &str = "home";
let result = format!("hello={}#{}&{}#{}", v[0], h, v[1], h);
println!("{}", result);
}
I searched through the docs but I failed to find anything on this subject.
This can be done (somewhat inefficiently) with iterators:
let result = format!("hello={}",
v.iter().map(|s| format!("{}#{}", s, h))
.collect::<Vec<_>>()
.join("&")
);
(Playground)
If high performance is needed, a loop that builds a String will be quite a bit faster. The approach above allocates an additional String for each input &str, then a vector to hold them all before finally joining them together.
Here's a more efficient way to implement this. The operation carried out by this function is to call the passed function for each element in the iterator, giving it access to the std::fmt::Write reference passed in, and sticking the iterator in between successive calls. (Note that String implements std::fmt::Write!)
use std::fmt::Write;
fn delimited_write<W, I, V, F>(writer: &mut W, seq: I, delim: &str, mut func: F)
-> Result<(), std::fmt::Error>
where W: Write,
I: IntoIterator<Item=V>,
F: FnMut(&mut W, V) -> Result<(), std::fmt::Error>
{
let mut iter = seq.into_iter();
match iter.next() {
None => { },
Some(v) => {
func(writer, v)?;
for v in iter {
writer.write_str(delim)?;
func(writer, v)?;
}
},
};
Ok(())
}
You'd use it to implement your operation like so:
use std::fmt::Write;
fn main() {
let v: Vec<&str> = vec!["foo", "bar"];
let h: &str = "home";
let mut result: String = "hello=".to_string();
delimited_write(&mut result, v.iter(), "&", |w, i| {
write!(w, "{}#{}", i, h)
}).expect("write succeeded");
println!("{}", result);
}
It's not as pretty, but it makes no temporary String or Vec allocations. (Playground)
You will need to iterate over the vector as cdhowie suggests above. Let me explain why this is necessarily an O(n) problem and you can't create a single string slice from a vector of string slices without iterating over the vector:
Your vector only holds references to the strings; it doesn't hold the strings themselves. The strings are likely not stored contiguously in memory (only their references inside your vector are) so combining them into a single slice is not as simple as creating a slice that points to the beginning of the first string referenced in the vector and then extending the size of the slice.
Given that a &str is just an integer indicating the length of the slice and a pointer to a location in memory or the application binary where a str (essentially an array of char's) is stored, you can imagine that if the first &str in your vector references a string on the stack and the next one references a hardcoded string that is stored in the executable binary of the program, there is no way to create a single &str that points to both str's without copying at least one of them (in practice, probably both of them will be copied).
In order to get a single string slice from all of those &str's in your vector, you need to copy each of the str's they reference to a single, contiguous chunk of memory and then create a slice of that chunk. That copying requires iterating over the vector.

How does Rust determine type in for-in loop? Does Rust auto reference/dereference on variable assignment? [duplicate]

I'm confused by how Rust for loops work. Consider the following:
#![feature(core_intrinsics)]
fn print_type_of<T>(_: T) {
println!("{}", unsafe { std::intrinsics::type_name::<T>() });
}
fn main() {
let nums = vec![1, 2, 3];
for num in &nums { print_type_of(num); }
for num in nums { print_type_of(num); }
}
It outputs the following:
&i32
&i32
&i32
i32
i32
i32
What does it mean to pass in a vector into for versus a reference to a vector? Why, when you pass in a reference, do you get a reference to the items and when you pass in an actual vector, you get the actual items?
The argument to a for loop must implement IntoIterator. If you check out the docs for Vec, you will see these two implementations of IntoIterator:
impl<T> IntoIterator for Vec<T> {
type Item = T;
type IntoIter = IntoIter<T>
}
impl<'a, T> IntoIterator for &'a Vec<T> {
type Item = &'a T;
type IntoIter = Iter<'a, T>
}
You get references for &vec and values for vec because that's how the iterators are defined.
Sometimes, you'll see these as the more explicit forms: iter or into_iter. The same logic applies; see What is the difference between iter and into_iter?
There's another form that you will encounter: &mut vec and iter_mut. These return mutable references to the elements in the vector.
As to why the differences at all...
Using a reference to the vector allows you to access the vector after the loop is done. This compiles:
let v = vec![1, 2, 3];
for i in &v {}
for i in &v {}
This does not:
let v = vec![1, 2, 3];
for i in v {}
for i in v {}
error[E0382]: use of moved value: `v`
--> src/main.rs:4:14
|
3 | for i in v {}
| - value moved here
4 | for i in v {}
| ^ value used here after move
|
= note: move occurs because `v` has type `std::vec::Vec<i32>`, which does not implement the `Copy` trait
Ownership-wise, you can't get a value from a reference unless you clone the value (assuming the type can even be cloned!). This means &vec is unable to yield values that aren't references.
The implementer of Vec's iterator could have chosen to only yield references, but transferring ownership of the elements to the iterator allows the iterator's consumer to do more things; from a capability perspective it's preferred.
See also:
What is the difference between iter and into_iter?
Why can I iterate over a slice twice, but not a vector?

Using an iterator as an argument to a function multiple times from one vector

I'm trying to write some Rust code to decode GPS data from an SDR receiver. I'm reading samples in from a file and converting the binary data to a series of complex numbers, which is a time-consuming process. However, there are times when I want to stream samples in without keeping them in memory (e.g. one very large file processed only one way or samples directly from the receiver) and other times when I want to keep the whole data set in memory (e.g. one small file processed in multiple different ways) to avoid repeating the work of parsing the binary file.
Therefore, I want to write functions or structs with iterators to be as general as possible, but I know they aren't sized, so I need to put them in a Box. I would have expected something like this to work.
This is the simplest example I could come up with to demonstrate the same basic problem.
fn sum_squares_plus(iter: Box<Iterator<Item = usize>>, x: usize) -> usize {
let mut ans: usize = 0;
for i in iter {
ans += i * i;
}
ans + x
}
fn main() {
// Pretend this is an expensive operation that I don't want to repeat five times
let small_data: Vec<usize> = (0..10).collect();
for x in 0..5 {
// Want to iterate over immutable references to the elements of small_data
let iterbox: Box<Iterator<Item = usize>> = Box::new(small_data.iter());
println!("{}: {}", x, sum_squares_plus(iterbox, x));
}
// 0..100 is more than 0..10 and I'm only using it once,
// so I want to 'stream' it instead of storing it all in memory
let x = 55;
println!("{}: {}", x, sum_squares_plus(Box::new(0..100), x));
}
I've tried several different variants of this, but none seem to work. In this particular case, I'm getting
error[E0271]: type mismatch resolving `<std::slice::Iter<'_, usize> as std::iter::Iterator>::Item == usize`
--> src/main.rs:15:52
|
15 | let iterbox: Box<Iterator<Item = usize>> = Box::new(small_data.iter());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected reference, found usize
|
= note: expected type `&usize`
found type `usize`
= note: required for the cast to the object type `dyn std::iter::Iterator<Item = usize>`
I'm not worried about concurrency and I'd be happy to just get it working sequentially on a single thread, but a concurrent solution would be a nice bonus.
The current error you're running into is here:
let iterbox:Box<Iterator<Item = usize>> = Box::new(small_data.iter());
You're declaring that you want an iterator that returns usize items, but small_data.iter() is an iterator that returns references to usize items (&usize). That why you get the error "expected reference, found usize". usize is a small type that's cloneable so you can simply use the .cloned() iterator adapter to provide an iterator that actually returns a usize.
let iterbox: Box<Iterator<Item = usize>> = Box::new(small_data.iter().cloned());
Once you're past that hurdle, the next problem is that the iterator returned over small_data contains a reference to the small_data. Since sum_squares_plus is defined to accept a Box<Iterator<Item = usize>>, it's implied in that signature that the Iterator trait object within the box has a 'static lifetime. The iterator you're providing does not because it borrows small_data. To fix that you need to adjust the sum_squares_plus definition to
fn sum_squares_plus<'a>(iter: Box<Iterator<Item = usize> + 'a>, x: usize) -> usize
Note the 'a lifetime annotations. The code should then compile, but unless there's some constraints other than what's clearly defined here, a more idiomatic and efficient approach would be to avoid using trait objects and the associated allocations. The below code should work using static dispatch without any trait objects.
fn sum_squares_plus<I: Iterator<Item = usize>>(iter: I, x: usize) -> usize {
let mut ans: usize = 0;
for i in iter {
ans += i * i;
}
ans + x
}
fn main() {
// Pretend this is an expensive operation that I don't want to repeat five times
let small_data: Vec<usize> = (0..10).collect();
for x in 0..5 {
println!("{}: {}", x, sum_squares_plus(small_data.iter().cloned(), x));
}
// 0..100 is more than 0..10 and I'm only using it once,
// so I want to 'stream' it instead of storing it all in memory
let x = 55;
println!("{}: {}", x, sum_squares_plus(Box::new(0..100), x));
}

Finding most frequently occurring string in a structure in Rust?

I'm looking for the string which occurs most frequently in the second part of the tuple of Vec<(String, Vec<String>)>:
use itertools::Itertools; // 0.8.0
fn main() {
let edges: Vec<(String, Vec<String>)> = vec![];
let x = edges
.iter()
.flat_map(|x| &x.1)
.map(|x| &x[..])
.sorted()
.group_by(|x| x)
.max_by_key(|x| x.len());
}
Playground
This:
takes the iterator
flat-maps to the second part of the tuple
turns elements into a &str
sorts it (via itertools)
groups it by string (via itertools)
find the group with the highest count
This supposedly gives me the group with the most frequently occurring string, except it doesn't compile:
error[E0599]: no method named `max_by_key` found for type `itertools::groupbylazy::GroupBy<&&str, std::vec::IntoIter<&str>, [closure#src/lib.rs:9:19: 9:24]>` in the current scope
--> src/lib.rs:10:10
|
10 | .max_by_key(|x| x.len());
| ^^^^^^^^^^
|
= note: the method `max_by_key` exists but the following trait bounds were not satisfied:
`&mut itertools::groupbylazy::GroupBy<&&str, std::vec::IntoIter<&str>, [closure#src/lib.rs:9:19: 9:24]> : std::iter::Iterator`
I'm totally lost in these types.
You didn't read the documentation for a function you are using. This is not a good idea.
This type implements IntoIterator (it is not an iterator itself),
because the group iterators need to borrow from this value. It should
be stored in a local variable or temporary and iterated.
Personally, I'd just use a BTreeMap or HashMap:
let mut counts = BTreeMap::new();
for word in edges.iter().flat_map(|x| &x.1) {
*counts.entry(word).or_insert(0) += 1;
}
let max = counts.into_iter().max_by_key(|&(_, count)| count);
println!("{:?}", max);
If you really wanted to use the iterators, it could look something like this:
let groups = edges
.iter()
.flat_map(|x| &x.1)
.sorted()
.group_by(|&x| x);
let max = groups
.into_iter()
.map(|(key, group)| (key, group.count()))
.max_by_key(|&(_, count)| count);

What do I have to do to solve a "use of moved value" error?

I'm trying to compute the 10,001st prime in Rust (Project Euler 7), and as a part of this, my method to check whether or not an integer is prime references a vector:
fn main() {
let mut count: u32 = 1;
let mut num: u64 = 1;
let mut primes: Vec<u64> = Vec::new();
primes.push(2);
while count < 10001 {
num += 2;
if vectorIsPrime(num, primes) {
count += 1;
primes.push(num);
}
}
}
fn vectorIsPrime(num: u64, p: Vec<u64>) -> bool {
for i in p {
if num > i && num % i != 0 {
return false;
}
}
true
}
When I try to reference the vector, I get the following error:
error[E0382]: use of moved value: `primes`
--> src/main.rs:9:31
|
9 | if vectorIsPrime(num, primes) {
| ^^^^^^ value moved here, in previous iteration of loop
|
= note: move occurs because `primes` has type `std::vec::Vec<u64>`, which does not implement the `Copy` trait
What do I have to do to primes in order to be able to access it within the vectorIsPrime function?
With the current definition of your function vectorIsPrime(), the function specifies that it requires ownership of the parameter because you pass it by value.
When a function requires a parameter by value, the compiler will check if the value can be copied by checking if it implements the trait Copy.
If it does, the value is copied (with a memcpy) and given to the function, and you can still continue to use your original value.
If it doesn't, then the value is moved to the given function, and the caller cannot use it afterwards
That is the meaning of the error message you have.
However, most functions do not require ownership of the parameters: they can work on "borrowed references", which means they do not actually own the value (and cannot for example put it in a container or destroy it).
fn main() {
let mut count: u32 = 1;
let mut num: u64 = 1;
let mut primes: Vec<u64> = Vec::new();
primes.push(2);
while count < 10001 {
num += 2;
if vector_is_prime(num, &primes) {
count += 1;
primes.push(num);
}
}
}
fn vector_is_prime(num: u64, p: &[u64]) -> bool {
for &i in p {
if num > i && num % i != 0 {
return false;
}
}
true
}
The function vector_is_prime() now specifies that it only needs a slice, i.e. a borrowed pointer to an array (including its size) that you can obtain from a vector using the borrow operator &.
For more information about ownership, I invite you to read the part of the book dealing with ownership.
Rust is, as I would say, a “value-oriented” language. This means that if you define primes like this
let primes: Vec<u64> = …
it is not a reference to a vector. It is practically a variable that stores a value of type Vec<u64> just like any u64 variable stores a u64 value. This means that if you pass it to a function defined like this
fn vec_is_prime(num: u64, vec: Vec<u64>) -> bool { … }
the function will get its own u64 value and its own Vec<u64> value.
The difference between u64 and Vec<u64> however is that a u64 value can be easily copied to another place while a Vec<u64> value can only move to another place easily. If you want to give the vec_is_prime function its own Vec<u64> value while keeping one for yourself in main, you have to duplicate it, somehow. That's what's clone() is for. The reason you have to be explicit here is because this operation is not cheap. That's one nice thing about Rust: It's not hard to spot expensive operations. So, you could call the function like this
if vec_is_prime(num, primes.clone()) { …
but that's not really what you want, actually. The function does not need its own a Vec<64> value. It just needs to borrow it for a short while. Borrowing is much more efficient and applicable in this case:
fn vec_is_prime(num: u64, vec: &Vec<u64>) -> bool { …
Invoking it now requires the “borrowing operator”:
if vec_is_prime(num, &primes) { …
Much better. But we can still improve it. If a function wants to borrow a Vec<T> just for the purpose of reading it, it's better to take a &[T] instead:
fn vec_is_prime(num: u64, vec: &[u64]) -> bool { …
It's just more general. Now, you can lend a certain portion of a Vec to the function or something else entirely (not necessarily a Vec, as long as this something stores its values consecutively in memory, like a static lookup table). What's also nice is that due to coersion rules you don't need to alter anything at the call site. You can still call this function with &primes as argument.
For String and &str the situation is the same. String is for storing string values in the sense that a variable of this type owns that value. &str is for borrowing them.
You move value of primes to the function vectorIsPrime (BTW Rust use snake_case by convention). You have other options, but the best one is to borrow vector instead of moving it:
fn vector_is_prime(num: u64, p: &Vec<u64>) -> bool { … }
And then passing reference to it:
vector_is_prime(num, &primes)

Resources