How to fold using a HashMap as an accumulator?

How to fold using a HashMap as an accumulator? - rust

This code works:
let stdin = std::io::stdin();
let mut rdr = csv::Reader::from_reader(stdin);
let mut hmap = HashMap::<String, u64>::new();
rdr.records()
.map(|r| r.unwrap())
.fold((), |_, item| {
// TODO: Is there a way not to have to copy item[col] every time?
let counter = hmap.entry(item[col].to_string()).or_insert(0);
*counter += 1;
});
This code fails with the message: "cannot move out of acc because it is borrowed"
let stdin = std::io::stdin();
let mut rdr = csv::Reader::from_reader(stdin);
let hmap = rdr.records()
.map(|r| r.unwrap())
.fold(HashMap::<String, u64>::new(), |mut acc, item| {
// TODO: Is there a way not to have to copy item[col] every time?
let counter = acc.entry(item[col].to_string()).or_insert(0);
*counter += 1;
acc
});

You cannot return acc from the closure because you have a mutable borrow to it that still exists (counter).
This is a limitation of the Rust compiler (specifically the borrow checker). When non-lexical lifetimes are enabled, your original code will work:
#![feature(nll)]
use std::collections::HashMap;
fn main() {
let hmap = vec![1, 2, 3].iter().fold(HashMap::new(), |mut acc, _| {
let counter = acc.entry("foo".to_string()).or_insert(0);
*counter += 1;
acc
});
println!("{:?}", hmap);
}
Before NLL, the compiler is overly conservative about how long a borrow will last. To work around this, you can introduce a new scope to constrain the mutable borrow:
use std::collections::HashMap;
fn main() {
let hmap = vec![1, 2, 3].iter().fold(HashMap::new(), |mut acc, _| {
{
let counter = acc.entry("foo".to_string()).or_insert(0);
*counter += 1;
}
acc
});
println!("{:?}", hmap);
}
You can also prevent the borrow from lasting beyond the line it's needed in:
use std::collections::HashMap;
fn main() {
let hmap = vec![1, 2, 3].iter().fold(HashMap::new(), |mut acc, _| {
*acc.entry("foo".to_string()).or_insert(0) += 1;
acc
});
println!("{:?}", hmap);
}
I assumed Rust would know that counter would go out of scope once acc was returned
This is understandable and relates to the non-lexical lifetimes discussion. The "good" news is that Rust is being consistent about how references work when the thing being referenced moves. In this case, you are moving the accumulator into an "output slot". You can see this with plain functions as well:
fn foo(mut s: Vec<u8>) -> Vec<u8> {
let borrow = &mut s[0];
s
}
fn main() {}
But really, it's the same as moving a referred-to variable at all:
fn main() {
let mut s = Vec::<u8>::new();
let borrow = &mut s[0];
let s2 = s;
}
Both of these fail before NLL and work afterwards.

Related

Mutable borrow inside loop

I have a vector of tuples, each containing two strings. I want to transfer (one of) the two strings as a mutable reference into a hashmap. The other string is also transferred, but does not have to be mutable. The background is that I want to overwrite one string with the value of the other one later.
Given the following code:
use std::collections::HashMap;
fn main() {
let mut foo = String::from("foo");
let mut bar = String::from("bar");
let mut v = vec![(foo, &mut bar)];
let mut counter: HashMap<&str, (&str, &mut String, u8)> = HashMap::new();
create_counter(&mut v, &mut counter);
}
fn create_counter<'a>(
rows: &'a mut Vec<(String, &'a mut String)>,
counter: &mut HashMap<&'a str, (&'a str, &'a mut String, u8)>,
) {
let mut skip_count = 0;
let len = rows.len();
for i in 0..len {
if i == len - 1 {
break;
}
if skip_count > 0 {
skip_count -= 1;
continue;
}
let r = rows[i..i + 3].as_mut();
if r[0].0 == r[1].0 && r[0].1 != r[1].1 {
if r.len() == 2 || r[0].0 != r[2].0 {
counter.entry(&r[0].0).or_insert((r[1].1, &mut r[0].1, 0)).2 += 1;
skip_count = 1;
} else {
skip_count = 2;
}
}
}
}
Unfortunately the borrow checker does not allow this and gives me two error messages:
cannot borrow `*rows` as mutable more than once at a time
cannot borrow `r[_].1` as mutable because it is also borrowed as immutable
I understand the problem, but unfortunately I have no idea how best to solve it.
Can someone please help me to solve these two problems?
Playground Link

What is a reborrow and how does it influence the code the compiler generates?

Part of the assert_eq macro code is:
($left:expr, $right:expr, $($arg:tt)+) => ({
match (&($left), &($right)) {
(left_val, right_val) => {
if !(*left_val == *right_val) {
// The reborrows below are intentional. Without them, the stack slot for the
// borrow is initialized even before the values are compared, leading to a
// noticeable slow down.
$crate::panic!(r#"assertion failed: `(left == right)`
left: `{:?}`,
right: `{:?}`: {}"#, &*left_val, &*right_val,
$crate::format_args!($($arg)+))
}
}
}
});
The comment says that the reborrow is intentional and that it would somehow influence how stack is used.
What is a reborrow and how does it influence the code the compiler generates?

I am not sure how it affects compilation but a reborrow is when you dereference something and then borrow it straight after.
For example:
fn main() {
let mut vec: Vec<u8> = vec![1, 2, 3];
let mut_ref: &mut Vec<u8> = &mut vec;
let reborrow: &Vec<u8> = &*mut_ref;
}
It can be used to:
get an immutable reference from a mutable reference (code above)
get a reference with a shorter lifetime
get a reference to something in a smart pointer
get a reference from a pointer
and probably other things too.. (I am fairly new to Rust 😃)
Example use for shortening lifetime (very contrived but this is oversimplified):
fn main() {
let mut outer_vec = vec![1, 2, 3];
let outer_ref = &mut outer_vec;
{
// shorten lifetime of outer_ref
// not doing this causes an error
let mut outer_ref = &mut *outer_ref; // mutable reborrow
let mut inner_vec = vec![1, 2, 3];
let inner_ref = &mut inner_vec;
// imagine a real condition
if true {
outer_ref = inner_ref;
}
// use outer_ref which could point to the outer_vec or inner_vec
println!("{:?}", outer_ref);
}
println!("{:?}", outer_ref);
}
Getting a reference to something behind a smart pointer (Box in this case):
fn main() {
let vec = vec![1, 2, 3];
let boxed = Box::new(vec);
let reborrow: &Vec<u8> = &*boxed;
println!("{:?}", reborrow);
}
Reference from pointer:
fn main() {
let num: u8 = 10;
let pointer: *const u8 = &num as *const u8;
unsafe {
let ref_from_ptr: &u8 = &*pointer;
}
}
Not sure if all of these are strictly considered a "reborrow" by the way.

Mutable vs. immutable borrows in closure?

I can't figure out how to get the following to work. I think I need the closure to borrow by &mut Vec, but I don't know how to express that. This is distilled from a larger function, but shows the same error.
fn main() {
let mut v = vec![0; 10];
let next = |i| (i + 1) % v.len();
v[next(1usize)] = 1;
v.push(13);
v[next(2usize)] = 1;
}
Error:
error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
--> a.rs:9:5
|
5 | let next = |i| {
| --- immutable borrow occurs here
6 | (i + 1) % v.len()
| - first borrow occurs due to use of `v` in closure
...
9 | v[next(1usize)] = 1;
| ^ ---- immutable borrow later used here
| |
| mutable borrow occurs here
error: aborting due to previous error

If you really want to do it with a closure, you will have to pass the vector by parameter:
let next = |v: &Vec<_>, i| (i + 1) % v.len();
This makes the closure borrow per-call, rather than capture for the scope. You still need to separate the borrows, though:
let j = next(&v, 1usize);
v[j] = 1;
To make your life easier, you can put everything inside the closure instead:
let next = |v: &mut Vec<_>, i, x| {
let j = (i + 1) % v.len();
v[j] = x;
};
Which allows you to simply do:
next(&mut v, 1usize, 1);
next(&mut v, 2usize, 2);
// etc.
This pattern is useful for cases where you are writing a closure just for avoiding local code repetition (which I suspect is why you are asking given the comments).

Since the closure only needs the length of the Vec. Then instead, you can just get that prior to the closure. Then you avoid the whole borrowing issue, as the closure doesn't need to borrow v anymore.
fn main() {
let mut v = vec![0; 10];
let len = v.len();
let next = |i| (i + 1) % len;
v[next(1usize)] = 1;
}
Assuming your closure is not dependent on other things, then instead of a closure, you could define a trait with a method that does that.
For simplicity let's call the trait VecExt and the method set.
trait VecExt<T> {
fn set(&mut self, index: usize, value: T);
}
impl<T> VecExt<T> for Vec<T> {
fn set(&mut self, index: usize, value: T) {
let len = self.len();
self[(index + 1) % len] = value;
}
}
fn main() {
let mut v = vec![0; 10];
v.set(1, 1);
v.push(13);
v.set(2, 1);
}

A closure probably isn't the right tool for the job. The compiler is unhappy because your closure has taken a reference against your Vec, but then while that closure reference is still outstanding you're trying to mutate the Vec. Under Rust's borrow rules, that's not allowed.
The most straightforward approach would be storing your data inside a struct, and making next a member function. That way, there's no closure taking references; it can just check the length only when needed.
struct WrapVec<T>(Vec<T>);
impl<T> WrapVec<T> {
fn wrap_next(&mut self, index: usize) -> &mut T {
let index = (index + 1) % self.0.len();
&mut self.0[index]
}
}
fn main() {
let mut v = WrapVec(vec![0; 10]);
*v.wrap_next(1) = 1;
v.0.push(13);
*v.wrap_next(2) = 1;
}
If you want to be able to apply this function to any Vec, then you may find it useful to define a new trait. For example:
trait WrapNext<T> {
fn wrap_next(&mut self, index: usize) -> &mut T;
}
impl<T> WrapNext<T> for Vec<T> {
fn wrap_next(&mut self, index: usize) -> &mut T {
let index = (index + 1) % self.len();
&mut self[index]
}
}
fn main() {
let mut v = vec![0; 10];
*v.wrap_next(1) = 1;
v.push(13);
*v.wrap_next(2) = 1;
}

In addition to the other answers here, you can use a macro to scope v so you don't have to pass it in every call:
fn main() {
let mut v = vec![0; 10];
macro_rules! next {
// rule to get an index
($i: expr) => {
($i + 1) % v.len()
};
// rule to mutate the vector
($i: expr => $v: expr) => {{
let ind = next!($i);
v[ind] = $v;
}};
};
// get an index
let ind = next!(1usize);
// mutate the vector
v[ind] = 1;
v.push(13);
// or, with the mutation syntax
next!(2usize => 3);
println!("{:?}", v);
}

Why does multithreaded writing to a Vec<HashSet<&str>> using raw pointers segfault?

I want to modify a big vector from multiple threads in parallel.
Works fine: u32
use std::thread;
use std::sync::Arc;
fn main() {
let input = Arc::new([1u32, 2, 3, 4]);
let mut handles = Vec::new();
for t in 0..4 {
let inp = input.clone();
let handle = thread::spawn(move || unsafe {
let p = (inp.as_ptr() as *mut u32).offset(t as isize);
*p = inp[t] + t as u32 ;
});
handles.push(handle);
}
for h in handles {
h.join().unwrap();
}
println!("{:?}", input);
}
Segfaults: Vec<HashSet<&str>>
When I change the u32 to Vec<HashSet<&str>>, the pointer does not seem to work.
use std::thread;
use std::sync::Arc;
use std::collections::HashSet;
fn main() {
let mut a = HashSet::new();
a.insert("aaa");
let input = Arc::new(vec![a.clone(), a.clone(), a.clone(), a.clone()]);
let mut handles = Vec::new();
for _t in 0..4 {
let inp = input.clone();
let handle = thread::spawn(move || unsafe {
let p = (inp.as_ptr() as *mut Vec<HashSet<&str>>).offset(0);
(*p)[0].insert("bbb");
});
handles.push(handle);
}
for h in handles {
h.join().unwrap();
}
println!("{:?}", input);
}
What is the difference?

It is hard to say what is wrong with your initial code as it segfaults in the playground. You are likely invoking undefined behavior by taking a reference to immutable (!) vec and trying to mutate its elements by casting &Vec -> *mut Vec -> &mut Vec (on a method call). Multiple mutable references to the same thing are a big no-no. Besides, your code even uses the same HashSet ((*p)[0]) mutably in parallel, which, again, is undefined behavior.
The easiest way here would be to use crossbeam's scoped threads. They allow referencing stack variables, like your input. Vec can also give out distinct mutable references to its elements without using unsafe. Using this, your code seems to do the expected thing.
use crossbeam::thread;
use std::collections::HashSet;
fn main() {
let mut a = HashSet::new();
a.insert("aaa");
let mut input = vec![a.clone(), a.clone(), a.clone(), a.clone()];
thread::scope(|s| {
for set in &mut input {
s.spawn(move |_| {
set.insert("bbb");
});
}
}).unwrap();
println!("{:?}", input);
}

I have found the way:
use std::thread;
use std::sync::Arc;
use std::collections::HashSet;
fn main() {
let mut a = HashSet::new();
a.insert("aaa");
let input = Arc::new(vec![a.clone(), a.clone(), a.clone(), a.clone()]);
let mut handles = Vec::new();
for _t in 0..4 {
let inp = input.clone();
//let out = output.clone();
let handle = thread::spawn(move || unsafe {
let p = (inp.as_ptr() as *mut Vec<HashSet<&str>>).offset(0);
(*p)[0].insert("bbb");
});
handles.push(handle);
}
for h in handles {
h.join().unwrap();
}
println!("{:?}", input);
}
thanks for guys!

Read reference from Option<&mut T> multiple times

I have an Option<&mut T> and want to access the contained reference multiple times, like so:
fn f(a: Option<&mut i32>) {
if let Some(x) = a {
*x = 6;
}
// ...
if let Some(x) = a {
*x = 7;
}
}
fn main() {
let mut x = 5;
f(Some(&mut x));
}
That doesn't work, because if let Some(x) = a moves the reference value out of the Option, and the second if let Some(x) = a will result in a compiler error. Without the second if let ..., this works flawlessly, so a doesn't have to be mutable.
The following:
if let Some(ref x) = a {
**x = 6;
}
gives an error: "assignment into an immutable reference".
This would work:
fn f(mut a: Option<&mut i32>) {
if let Some(ref mut x) = a {
**x = 6;
}
if let Some(ref mut x) = a {
**x = 7;
}
}
The mut a is necessary, otherwise I get an error "cannot borrow immutable anonymous field (a:std::prelude::v1::Some).0 as mutable". But this feels wrong: a shouldn't have to be mutable, because I'm not modifying it (see above).
What's the correct solution?
Edit 1
My problem is different from the one in How to pass `Option<&mut ...>` to multiple function calls without causing move errors?. I want to mutably dereference the reference in an Option<&mut T> multiple times, while the other one wants to pass an Option to multiple function invocations. The solutions to the other question are not applicable to my situation.

What about this?
fn f(a: Option<&mut i32>) {
if let Some(&mut ref mut x) = a {
*x = 6;
}
// ...
if let Some(&mut ref mut x) = a {
*x = 7;
}
}
In this case, a doesn't need to be mutable.
The &mut ref mut feels a bit awkward, but it makes sense: first we remove a &mut by destructuring and then take a mutable reference to the value again. It's more obvious when we don't use the Option:
let mr: &mut Vec<u32> = &mut vec![];
{
let &mut ref mut a = mr;
a.push(3);
}
mr.push(4);
This also works. The third (special) line is equivalent to:
let a = &mut *mr ;
// ^^^----- this is an lvalue of type `Vec<u32>`
// ^^^^^^^^^^^^----- together it's of type `&mut Vec<u32>` again
In the Option case, we can't use the &mut *X version, but need to do all of it inside of the pattern. Thus the &mut ref mut x.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to fold using a HashMap as an accumulator? - rust

Related

Mutable borrow inside loop

What is a reborrow and how does it influence the code the compiler generates?

Mutable vs. immutable borrows in closure?

Why does multithreaded writing to a Vec<HashSet<&str>> using raw pointers segfault?

Read reference from Option<&mut T> multiple times

Categories

Resources