Aggregating (fold) hash sets to a struct requires clone - rust

I'm trying to aggregating a simple struct, where it has a hash set and integer, using fold. I was able to do eventually, but I needed to use clone, which I feel inefficient as it requires copies.
Playground link is here.
The code is here:
use std::collections::HashSet;
struct Result {
value: usize,
names: HashSet<String>,
}
fn main() {
let results = [
Result {
value: 10,
names: HashSet::from(["a".to_string(), "b".to_string()]),
},
Result {
value: 20,
names: HashSet::from(["a".to_string(), "c".to_string()]),
},
];
// Aggregating to the same struct type requires clone.
let aggregated = results.iter().fold(
Result {
value: 0,
names: HashSet::new(),
},
|mut acc, x| {
acc.value += x.value;
// Q1: This doesn't work. Why can't I do that?
// acc.names.extend(&x.names);
// Q2: I had to use clone. Is it inefficient?
acc.names.extend(x.names.clone());
acc
},
);
println!("Aggregated: {}, {}", aggregated.value, aggregated.names.len());
// Aggregating to a hash set works nicely.
let aggregated = results.iter().fold(HashSet::new(), |mut acc, x| {
// Q3: I can pass as &. No any copy, right?
acc.extend(&x.names);
acc
});
println!("Aggregated: {}", aggregated.len());
}
Q1: I don't understand why I can't pass as a reference. In Q3, I was able to do so. This is the error, which I don't clearly understand.
error[E0271]: type mismatch resolving `<&HashSet<String> as IntoIterator>::Item == String`
--> src/main.rs:29:30
|
29 | acc.names.extend(&x.names);
| ------ ^^^^^^^^ expected struct `String`, found `&String`
| |
| required by a bound introduced by this call
|
note: required by a bound in `extend`
Q2: Using clone makes it work. But does it mean there was literally a copy of the hash set? The hash set can contain a million of items. So, I'd like to avoid it.
Q3: Super interesting. If I don't aggregate to a hash set only, I was able to pass a reference to do extend.

I think your main problem is using .iter() instead of .into_iter():
// `.into_iter()` iterates over the elements by value,
// consuming the underlying structure.
let aggregated: Result = results.into_iter().fold(
Result {
value: 0,
names: HashSet::new(),
},
|mut acc, x| {
acc.value += x.value;
acc.names.extend(x.names);
acc
},
);
println!("Aggregated: {}, {}", aggregated.value, aggregated.names.len());
You can't pass as reference because a reference to a String is not the same as a String. You can only get a reference because x is a reference to a Result.
Yes, clone can be inefficient because it deep copies whatever data is in the data structure and can result in a new heap allocation.
That's because you're aggregating into a HashSet<&String> rather than a HashSet<String>:
// `.iter()` iterates over each element by reference,
// leaving the original data structure untouched.
let aggregated: HashSet<&String> = results.iter().fold(HashSet::new(), |mut acc, x| {
// No cloning necessary because you're storing
// references to each name in the set.
acc.extend(&x.names);
acc
});
println!("Aggregated: {}", aggregated.len());
playground

Related

Rust error :Cannot return value referencing temporary value

I'm trying to make a code that returns the mode of a list of given numbers.
Here's the code :
use std::collections::HashMap;
fn mode (vector: &Vec<i32>) -> Vec<&&i32> {
let mut occurrences = HashMap::new();
let mut n= Vec::new();
let mut mode = Vec::new();
for i in vector {
let j= occurrences.entry(i).or_insert(0);
*j+=1;
}
for (num, occ) in occurrences.clone().iter() {
if occ> n[0] {
n.clear();
mode.clear();
n.push(occ);
mode.push(num);
} else if occ== n[0] {
mode.push(num);
}
}
mode
}
fn main () {
let mut numbers: Vec<i32>= vec![1,5,2,2,5,3]; // 2 and 5 are the mode
numbers.sort();
println!("the mode is {:?}:", mode(&numbers));
}
I used a vector for the mode since a dataset could be multimodal.
Anyway, I'm getting the following error:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:26:5
|
13 | for (num, occ) in occurrences.clone().iter() {
| ------------------- temporary value created here
...
26 | mode
| ^^^^ returns a value referencing data owned by the current function
When you return from the current function, any owned values are destroyed (other than the ones being returned from the function), and any data referencing that destroyed data therefore cannot be returned, e.g.:
fn example() -> &str {
let s = String::from("hello"); // owned data
&s // error: returns a value referencing data owned by the current function
// you can imagine this is added by the compiler
drop(s);
}
The issue you have comes from iter(). iter() returns an iterator of shared references:
let values: Vec<i32> = vec![1, 2, 3];
for i in values.iter() {
// i is a &i32
}
for i in values {
// i is an i32
}
So when you call occurrences.clone().iter() you're creating a temporary value (via clone()) which is owned by the current function, then iterating over that data via shared reference. When you destructure the tuple in (num, occ), these are also shared references.
Because you later call mode.push(num), Rust realizes that mode has the type Vec<&i32>. However, there is an implicit lifetime here. The lifetime of num is essentially the lifetime of the current function (let's call that 'a), so the full type of mode is Vec<&'a i32>.
Because of that, you can't return it from the current function.
To fix
Removing iter() should work, since then you will be iterating over owned values. You might also find that you can remove .clone() too, I haven't looked too closely but it seems like it's redundant.
A couple of other points while you're here:
It's rare to interact with &Vec<Foo>, instead it's much more usual to use slices: &[Foo]. They're more general, and in almost all cases more performant (you can still pass your data in like: &numbers)
Check out clippy, it has a bunch of linter rules that can catch a bunch of errors much earlier, and usually does a good job explaining them: https://github.com/rust-lang/rust-clippy

Mutating vector elements in HashMap<String, Vec<&mut String>>

I've boiled a problem I'm seeing down to this example:
use std::collections::HashMap;
fn process(mut inputs: Vec<String>) -> Vec<String> {
// keep track of duplicate entries in the input Vec
let mut duplicates: HashMap<String, Vec<&mut String>> = HashMap::new();
for input in &mut inputs {
duplicates.entry(input.clone())
.or_insert_with(|| Vec::new())
.push(input);
}
// modify the input vector in place to append the number of each duplicate
for (key, instances) in duplicates {
for (i, instance) in instances.iter().enumerate() {
*instance = format!("{}_{}", instance, i);
}
}
return inputs;
}
fn main() {
println!("results: {:?}", process(vec![
String::from("test"),
String::from("another_test"),
String::from("test")
]));
}
I would expect this to print something like results: test_0, another_test_0, test_1 but am instead running into build issues:
error[E0308]: mismatched types
--> src/main.rs:13:25
|
13 | *instance = format!("{}_{}", instance, i);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `&mut String`, found struct `String`
|
I'm still a bit new to rust and so haven't really found anything online that has helped. I'm hoping that I'm doing something silly. Thanks in advance for the help!
for (key, instances) in duplicates {
for (i, instance) in instances.iter().enumerate() {
*instance = format!("{}_{}", instance, i);
}
}
for (key, instances) in duplicates invokes the IntoIterator trait on HashMap<String, Vec<&mut String>>, with
type Item = (String, Vec<&mut String>);
so instances: Vec<&mut String>.1 instances.iter() then produces references to the elements of instances, so instance: &&mut String. So instance would need to be dereferenced twice in order to access the element in inputs, which, unfortunately, doesn't actually work:
error[E0594]: cannot assign to `**instance` which is behind a `&` reference
--> src/main.rs:14:13
|
13 | for (i, instance) in instances.iter().enumerate() {
| -------- help: consider changing this to be a mutable reference: `&mut &mut String`
14 | **instance = format!("{}_{}", instance, i);
| ^^^^^^^^^^ `instance` is a `&` reference, so the data it refers to cannot be written
The problem is that we are not allowed to modify the T value behind &&mut T. Here's the reason: immutable references can be freely copied, so there may be multiple &&mut T instances pointing to the same &mut T, defeating the anti-aliasing property of &mut.
Therefore, we have to change the type of instance to &mut &mut String, by using mut and .iter_mut():
for (key, mut instances) in duplicates {
for (i, instance) in instances.iter_mut().enumerate() {
**instance = format!("{}_{}", instance, i);
}
}
Since key isn't needed, we can use .values_mut(), so that instances itself becomes an &mut:
for instances in duplicates.values_mut() {
for (i, instance) in instances.iter_mut().enumerate() {
**instance = format!("{}_{}", instance, i);
}
}
1 We'll use var: type to denote that var is of type type.
In general, references (especially mutable ones) stored in containers are tedious to work with, so I recommend an alternative approach: (with an in-place interface for simplicity)
use std::{collections::HashMap, fmt::Write};
fn process(texts: &mut Vec<String>) {
let mut counter = HashMap::<String, usize>::new();
for text in texts.iter_mut() {
let count = counter.entry(text.clone()).or_insert(0);
write!(text, "_{}", count).unwrap();
*count += 1;
}
}
(playground)
Iterating returns you references over the values stored in your map's vecs.
As your map's vecs contain references to start with, you have to dereference twice, using **.
You also need to use iter_mut when iterating if you want to change the content.
All in one, you can change your loop like this:
for (key, instances) in duplicates.iter_mut() {
for (i, instance) in instances.iter_mut().enumerate() {
**instance = format!("{}_{}", instance, i);
}
}
playground
The problem you're facing is that immutability is transitive.
That is, coming from other languages you might expect that a & &mut Foo would let you dereference the outer ref', access the inner mutable ref, and update the Foo based on that. However if you consider the purpose of rust's unique references, this would be the same as allowing multiple mutable references: take a mutable reference, create any number of immutable references to it, and any holder of one such would be able to modify the original object, concurrently.
Therefore that's not acceptable. All references in a chain must be unique (&mut) in order to get mutable access to what's at the end of the chain, and so you need to instances.iter_mut(), and dereference twice (otherwise you're assigning to the &mut String at the first level, not the inner String), and for that you need instances itself to be mutable:
for (_, mut instances) in duplicates {
for (i, instance) in instances.iter_mut().enumerate() {
**instance = format!("{}_{}", instance, i);
}
}
alternatively since you don't care about duplicates' keys:
for instances in duplicates.values_mut() {
for (i, instance) in instances.iter_mut().enumerate() {
**instance = format!("{}_{}", instance, i);
}
}

How to make Rust temporary value live longer?

I'm still learning Rust and have the following code.
use std::collections::BTreeMap;
#[derive(Debug)]
struct MyStruct {
a: String,
b: String,
}
fn main() {
let mut hash = BTreeMap::new();
let data = vec![
MyStruct {
a: "entry1".to_string(),
b: "entry1 body".to_string(),
},
MyStruct {
a: "entry2".to_string(),
b: "entry2 body".to_string(),
}
];
let re = regex::Regex::new(r#".(\d)"#).unwrap();
for item in &data {
for m in re.captures_iter(&item.b) {
hash.insert(&m[1].parse::<i32>().unwrap(), &item.a);
}
}
println!("{:#?}", hash);
}
It generates an error:
error[E0716]: temporary value dropped while borrowed
--> src\main.rs:26:26
|
26 | hash.insert(&m[1].parse::<i32>().unwrap(), &item.a);
| ---- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - temporary value is freed at the end of this statement
| | |
| | creates a temporary which is freed while still in use
| borrow later used here
|
= note: consider using a `let` binding to create a longer lived value
What's the correct way to fix it? I tried to put &m[1].parse::<i32>().unwrap() in a variable but of no avail.
The BTreeMap structure should either be the owner of the data and key inserted, or the data and key should have a 'static lifetime (same as HashMap and other collections). In this case, the key being used is i32, which has the Copy trait defined for it, so simply removing the & reference should pass the i32 value in as the key. For the data, you will either want to clone the string rather than & borrow, but you could also rewrite the loop to consume the data vector and pass the item.b string value in without the need to clone.

How can I conditionally provide a default reference without performing unnecessary computation when it isn't used?

I have some variable passed into my function by reference. I don't need to mutate it or transfer ownership, I just look at its contents. If the contents are in some state, I want to replace the value with a default value.
For instance, my function accepts a &Vec<String> and if the Vec is empty, replace it with vec!["empty"]:
fn accept(mut vec: &Vec<String>) {
if vec.len() == 0 {
vec = &vec!["empty".to_string()];
}
// ... do something with `vec`, like looping over it
}
But this gives the error:
error[E0716]: temporary value dropped while borrowed
--> src/lib.rs:3:16
|
1 | fn accept(mut vec: &Vec<String>) {
| - let's call the lifetime of this reference `'1`
2 | if vec.len() == 0 {
3 | vec = &vec!["empty".to_string()];
| -------^^^^^^^^^^^^^^^^^^^^^^^^^- temporary value is freed at the end of this statement
| | |
| | creates a temporary which is freed while still in use
| assignment requires that borrow lasts for `'1````
Preventing the mut results in the same error as the previous example:
fn accept(input: &Vec<String>) {
let vec = if input.len() == 0 {
&vec!["empty".to_string()]
} else {
input
};
// ... do something with `vec`, like looping over it
}
The only solution I've come up with is to extract the default value outside the if and reference the value:
fn accept(input: &Vec<String>) {
let default = vec!["empty".to_string()];
let vec = if input.len() == 0 {
&default
} else {
input
};
// ... do something with `vec`
}
This results in less clean code and also unnecessarily doing that computation.
I know and understand the error... you're borrowing the default value inside the body of the if, but that value you're borrowing from doesn't exist outside the if. That's not my question.
Is there any cleaner way to write out this pattern?
I don't believe this is a duplicate of Is there any way to return a reference to a variable created in a function? because I have a reference I'd like to use first if possible. I don't want to dereference the reference or clone() it because that would perform unnecessary computation.
Can I store either a value or a reference in a variable at the same time?
You don't have to create the default vector if you don't use it. You just have to ensure the declaration is done outside the if block.
fn accept(input: &Vec<String>) {
let def;
let vec = if input.is_empty() {
def = vec!["empty".to_string()];
&def
} else {
input
};
// ... do something with `vec`
}
Note that you don't have to build a new default vector every time you receive an empty one. You can create it the first time this happens using lazy_static or once_cell:
#[macro_use]
extern crate lazy_static;
fn accept(input: &[String]) {
let vec = if input.is_empty() {
lazy_static! {
static ref DEFAULT: Vec<String> = vec!["empty".to_string()];
}
&DEFAULT
} else {
input
};
// use vec
}
It sounds like you may be looking for std::borrow::Cow, depending on how you're going to use it.

Why is a reference variable accessed via auto-deref moved?

I thought I got the idea of move semantics until this code.
fn main() {
let v = Data {
body: vec![10, 40, 30],
};
p(&v);
}
fn p(d: &Data) {
for i in d.body {
// &d.body, Why d.body move?
println!("{}", i);
}
}
struct Data {
body: Vec<i32>,
}
error[E0507]: cannot move out of borrowed content
--> src/main.rs:9:14
|
9 | for i in d.body {
| ^^^^^^ cannot move out of borrowed content
error[E0507]: cannot move out of `d.body` which is behind a `&` reference
--> src/main.rs:9:14
|
8 | fn p(d: &Data) {
| ----- help: consider changing this to be a mutable reference: `&mut Data`
9 | for i in d.body {
| ^^^^^^
| |
| cannot move out of `d.body` which is behind a `&` reference
| `d` is a `&` reference, so the data it refers to cannot be moved
I passed a reference, and I accessed a field via auto-deref feature, so why is it a move?
What you are doing is field accessing on pointer.
Check Field Access Expression :
if the type of the expression to the left of the dot is a pointer, it
is automatically dereferenced as many times as necessary to make the
field access possible
Sample for how Rust evaluates Field Access Expression on Borrowed Content :
let d = Data { /*input*/}
let body = (&d).body // -> (*&d).body -> d.body
let ref_body = &(&d).body // -> &(*&).body -> &d.body -> &(d.body)
Note : d is still borrowed content, auto deref is just needed to access the fields.
Why move ?
Consider this code:
struct Data {
body: Vec<i32>,
id: i32,
}
fn p(mut d: &Data) {
let id = d.id;
}
This code will work as expected and there will be no moves in here so you will able to reuse d.id. In this situation:
Rust will try to copy the value of d.id. Since d.id is i32 and implements the Copy trait, it will copy the value to id.
Consider this code:
fn p(mut d: &Data) {
let id = d.id; // works
let body = d.body; // fails
}
This code will not work because:
Rust will try to copy d.body but Vec<i32> has no implementation of the Copy trait.
Rust will try to move body from d, and you will get the "cannot move out of borrowed content" error.
How does this effect the loop?
From the reference
A for expression is a syntactic construct for looping over elements provided by an implementation of std::iter::IntoIterator
A for loop is equivalent to the following block expression.
'label: for PATTERN in iter_expr {
/* loop body */
}
is equivalent to
{
let result = match IntoIterator::into_iter(iter_expr) {
mut iter => 'label: loop {
let mut next;
match Iterator::next(&mut iter) {
Option::Some(val) => next = val,
Option::None => break,
};
let PAT = next;
let () = { /* loop body */ };
},
};
result
}
This means your vector must have an implementation of IntoIterator because IntoIterator::into_iter(self) expects self as an argument. Luckily, both impl IntoIterator for Vec<T>, another is impl<'a, T> IntoIterator for &'a Vec<T> exist.
Why does this happen?
Simply:
When you use &d.body, your loop uses the &Vec implementation of IntoIterator.
This implementation returns an iterator which points at your vector's slice. This means you will get the reference of elements from your vector.
When you use d.body, your loop uses the Vec implementation of IntoIterator.
This implementation returns an iterator which is a consuming iterator. This means your loop will have the ownership of actual elements, not their references. For the consuming part this implementation needs the actual vector not the reference, so the move occurs.
You are accessing the field body in d. body itself is a Vec<i32> which is not a reference. If you would use d directly, no & would be necessary, but since you are accessing a field in d, you must specify that you want to have the reference to the field.
Basically d owns body. If you borrow d you cannot steal body, it belongs to d but you can borrow it.
This loop will be desugared into something similar to the following:
let mut iter = IntoIterator::into_iter(v);
loop {
match iter.next() {
Some(x) => {
// loop body
},
None => break,
}
}
As you can see, it's using into_iter, which moves the vector d.body.

Resources