Consume Vector inside closure without cloning - rust

I have this data structure.
let bucket = HashMap<&str, Vec<&str>>
Given
let cluster = Vec<&str>
I want to expand it from the Vecs on Bucket and I can guarantee that I will just access each key value pair once and the &str in cluster are always a key in bucket.
use std::collections::HashMap;
fn main() {
let mut bucket: HashMap<&str, Vec<&str>> = HashMap::new();
bucket.insert("a", vec!["hello", "good morning"]);
bucket.insert("b", vec!["bye", "ciao"]);
bucket.insert("c", vec!["good"]);
let cluster = vec!["a", "b"];
let cluster2 = vec!["c"];
let mut clusters = [cluster, cluster2];
clusters.iter_mut().for_each(|cluster| {
// I don't like this clone
let tmp = cluster.clone();
let tmp = tmp.iter().flat_map(|seq| bucket[seq].
clone() // I really don't like this other clone
);
cluster.extend(tmp);
});
println!("{:?}", clusters);
}
This compiles but what I really want to do is drain the vector on bucket since I know I won't access it again.
let tmp = tmp.iter().flat_map(|seq| bucket.get_mut(seq).
unwrap().drain(..)
);
That gives me a compiler error:
error: captured variable cannot escape `FnMut` closure body
--> src/main.rs:13:45
|
4 | let mut bucket: HashMap<&str, Vec<&str>> = HashMap::new();
| ---------- variable defined here
...
13 | let tmp = tmp.iter().flat_map(|seq| bucket.get_mut(seq).
| - ^-----
| | |
| ___________________________________________|_variable captured here
| | |
| | inferred to be a `FnMut` closure
14 | | unwrap().drain(..)
| |______________________________^ returns a reference to a captured variable which escapes the closure body
|
= note: `FnMut` closures only have access to their captured variables while they are executing...
= note: ...therefore, they cannot allow references to captured variables to escape
Do I need to go unsafe? How? And more importantly, is it reasonable to want to remove that clone?

You can eliminate bucket[seq].clone() using std::mem::take():
let tmp = tmp.iter().flat_map(
|seq| std::mem::take(bucket.get_mut(seq).unwrap()),
);
That will transfer ownership of the existing Vec and leave an empty one in the hash map. Since the map remains in a well-defined state, this is 100% safe. Since the empty vector doesn't allocate, it is also efficient. And finally, since you can guarantee that you no longer access that key, it is correct. (Playground.)
As pointed out in the comments, an alternative is to remove the vector from the hash map, which also transfer the ownership of the vector:
let tmp = tmp.iter().flat_map(|seq| bucket.remove(seq).unwrap());
The outer cluster.clone() cannot be replaced with take() because you need the old contents. The issue here is that you cannot extend the vector you are iterating over, which Rust doesn't allow in order to implement efficient pointer-based iteration. A simple and effective solution here would be to use indices instead of iteration (playground):
clusters.iter_mut().for_each(|cluster| {
let initial_len = cluster.len();
for ind in 0..initial_len {
let seq = cluster[ind];
cluster.extend(std::mem::take(bucket.get_mut(seq).unwrap()));
}
});
Of course, with indexing you pay the price of indirection and bound checks, but rustc/llvm is pretty good at removing both when it is safe to do so, and even if it doesn't, indexed access might still be more efficient than cloning. The only way to be sure whether this improves on your original code is to benchmark both versions on production data.

Related

About the change of the ownership of the array and vec

This code causes a error. It seems reasonable, because the ownership has moved:
fn main() {
let mut arr = vec![1, 2];
let mut arr2 = vec![2, 6];
arr = arr2;
arr2[1] = 2;
}
error[E0382]: borrow of moved value: `arr2`
--> src/main.rs:5:5
|
3 | let mut arr2 = vec![2, 6];
| -------- move occurs because `arr2` has type `Vec<i32>`, which does not implement the `Copy` trait
4 | arr = arr2;
| ---- value moved here
5 | arr2[1] = 2;
| ^^^^ value borrowed here after move
This code won't cause an error:
fn main() {
let mut arr = [1, 2];
let mut arr2 = [2, 4];
arr = arr2;
arr2[1] = 2;
}
This will cause an error:
fn main() {
let mut arr = ["1".to_string(), "2".to_string()];
let mut arr2 = ["2".to_string(), "4".to_string()];
arr = arr2;
arr2[1] = "2".to_string();
}
error[E0382]: use of moved value: `arr2`
--> src/main.rs:5:5
|
3 | let mut arr2 = ["2".to_string(), "4".to_string()];
| -------- move occurs because `arr2` has type `[String; 2]`, which does not implement the `Copy` trait
4 | arr = arr2;
| ---- value moved here
5 | arr2[1] = "2".to_string();
| ^^^^^^^ value used here after move
This works fine...
fn main() {
let mut arr = ["1", "2"];
let mut arr2 = ["2", "4"];
arr = arr2;
arr2[1] = "2";
}
I am completely confused.
In Rust all types fall into one of two categories:
copy-semantic (if it implements the Copy-trait)
move-semantic (if it does not implement the Copy-trait)
These semantics are implicitly employed by Rust whenever you e.g. assign a value to a variable. You can't choose it at the assignment. Instead, it depends only on the type in question. Therefore, whether a in the example below is moved or copied, depends entirely on what T is:
let a: T = /* some value */;
let b = a; // move or copy?
Also notice, that generic types are rather a set of similar types, than a single type by themselves (i.e. it is not a concrete type). For instance, you can't tell whether [T;2] is Copy or not, since it is not a concrete type (would need to know T first). Thus, [&str;2] and [String;2] are two totally different concrete types, and consequently one can be Copy and the other not.
To further elaborate, a concrete type can only be Copy if all constituting types are Copy. For instance, arrays [T;2] might be Copy (if T is too), Vec may never be Copy (independent of T).
So in your examples, when it does not compile due to move-semantics, you either have a Vec that is not Copy or String that is not, and any combinations with them can not be Copy either. Only if you combine Copy-types (arrays, ints, &str) you get copy-semantics, and your examples compile.
Also, this not an issue about ownership, because if you have copy-semantics you can just generate new values (by coping them) wherever you need, which gives you (fresh) ownership of these new values.
Couple of things here. First of all you are not changing the ownership in the working examples. You are merely borrowing their values. The difference in your not working examples is that in them you are actually changing the ownership.
As #eggyal correctly pointed out String and Vec don't implement Copy and that's important because Copy is used automatically when assigning another variable to a new variable. and in your working examples you have i32 (default for numeric types) and &str(also known as a string slice) which both implement Copy.
Every string literal is initially a &str and not a String. The .to_string() method in your example converts a &str into a String. If you want more in-depth information about the differences between &str and String I suggest you either check this answer or read an article about it but what's most important in your case is that string slices point to a read-only portion of the memory making them effectively immutable and therefore safe for copying.
The compiler nowadays is so good that it tells you the entire story. I compiled your code and got this wonderful error message:
|
18 | let mut a = vec![2, 4];
| ----- move occurs because `a` has type `Vec<i32>`, which does not implement the `Copy` trait
19 | let b = a;
| - value moved here
20 | println!("{:?}", a);
| ^ value borrowed here after move

Programming in Rust , how to fix error[E0515] "cannot return value referencing local variable"?

Please help me to compile my code attached bellow. The compiler says that following 2 patterns depending on which lines I comment out.
The program reads a &str which is a simple "svg path command" like code then parses it. The pasted code has been simplified for simplicity. It uses Regex to split the input string into lines then study each line in the main for loop. Each loop pushes the parse result onto a vector. Finally the function returns the vector.
Basically the compiler says returning the vector is not allowed because it refers local variable. Though I don't have any workaround.
error[E0597]: `cmd` does not live long enough
--> src/main.rs:24:25
|
24 | codeV = re.captures(cmd.as_str());
| ----- ^^^ borrowed value does not live long enough
| |
| borrow might be used here, when `codeV` is dropped and runs the destructor for type `Option<regex::Captures<'_>>`
...
30 | }
| - `cmd` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are defined
error[E0515]: cannot return value referencing local variable `cmd`
--> src/main.rs:31:1
|
24 | codeV = re.captures(cmd.as_str());
| --- `cmd` is borrowed here
...
31 | V //Error
| ^ returns a value referencing data owned by the current function
Playground
use regex::Regex;
pub fn parse(path:&str) {//->Vec<Option<regex::Captures<>>> //Error
let reg_n=Regex::new(r"\n").unwrap();
let path=reg_n.replace_all("\n"," ");
let reg_cmd=Regex::new(r"(?P<cmd>[mlhv])").unwrap();
let path=reg_cmd.replace_all(&path,"\n${cmd}");
let cmdV=reg_n.split(&path);
//let cmdV:Vec<&str> = reg.split(path).map(|x|x).collect();
let mut V:Vec<Option<regex::Captures<>>>=vec![];
let mut codeV:Option<regex::Captures<>>=None;
let mut count=0;
for cmd_f in cmdV{//This loop block has been simplified.
count+=1;
if count==1{continue;}
let mut cmd="".to_string();
cmd=cmd_f.to_string();
cmd=cmd.replace(" ","");
let re = Regex::new(r"\{(?P<code>[^\{^\}]{0,})\}").unwrap();
codeV = re.captures(cmd.as_str());
//cmd= re.replace_all(cmd.as_str(),"").to_string();
let cmd_0=cmd.chars().nth(0).unwrap();
//cmd.remove(0);
//V.push(codeV); //Compile error
V.push(None); //OK
}
//V
}
fn main() {
parse("m {abcd} l {efgh}");
}
Though I don't have any workaround.
regex's captures refer to the string they matched for efficiency. This means they can't outlive that string, as the match groups are essentially just offsets into that string.
Since the strings you match are created in the loop body, this means captures can't escape the loop body.
Aside from not creating strings in the loop body (or even the function), the solution / workaround is to convert your capture groups to owned data and store that: instead of trying to return a vector of captures, extract from the capture the data you actually want, convert it to an owned String (or tuple thereof, or whatever), and push that onto your vector.
e.g. https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0107333e30f831a418d75b280e9e2f31
you can use cmd.clone().as_str() if you not sure the value has borrowed or no

Rust check borrow with the whole HashMap, not check the key, is there any good way?

I want move the elements of HashMap<u64, Vec> key=1 to key=2
use std::collections::HashMap;
fn main() {
let mut arr: HashMap<u64, Vec<u64>> = HashMap::new();
arr.insert(1, vec![10, 11, 12]); // in fact, elments more than 1,000,000, so can't use clone()
arr.insert(2, vec![20, 21, 22]);
// in fact, following operator is in recusive closure, I simplify the expression:
let mut vec1 = arr.get_mut(&1).unwrap();
let mut vec2 = arr.get_mut(&2).unwrap();
// move the elements of key=1 to key=2
for v in vec1 {
vec2.push(vec1.pop().unwrap());
}
}
got error:
error[E0499]: cannot borrow `arr` as mutable more than once at a time
--> src/main.rs:10:20
|
9 | let mut vec1 = arr.get_mut(&1).unwrap();
| --- first mutable borrow occurs here
10 | let mut vec2 = arr.get_mut(&2).unwrap();
| ^^^ second mutable borrow occurs here
11 | for v in vec1 {
| ---- first borrow later used here
Rust check borrow with the whole HashMap, not check the key.
Is there any good way ?
It's not clear what the context / constraints are, so depending on those there are various possibilities of different impact and levels of complexity
if you don't care about keeping an empty version of the first entry, you can just use HashMap::remove as it returns the value for the removed key: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=1734142acb598bad2ff460fdff028b6e
otherwise, you can use something like mem::swap to swap the vector held by key 1 with an empty vector, then you can update the vector held by key 2: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e05941cb4d7ddf8982baf7c9437a0446
because HashMap doesn't have splits, the final option would be to use a mutable iterator, iterators are inherently non-overlapping so they provide mutable references to individual values, meaning they would let you obtain mutable references to both values simultanously, though the code is a lot more complicated: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87d3c0a151382ce2f47dda59dc089d70
While the third option has to traverse the entire hashmap (which is less efficient than directly finding the correct entries by hashing), it has the possible advantage of not "losing" v1's allocation which is useful if v1 will be filled again in the future: in the first option v1 is completely dropped, and in the second option v1 becomes a vector of capacity 0 (unless you swap in a vector with a predefined capacity, but that's still an extra allocation)
You can put the Vec into a RefCell moving the borrow check to runtime:
use std::cell::RefCell;
use std::collections::HashMap;
fn main() {
let mut arr: HashMap<u64, RefCell<Vec<u64>>> = HashMap::new();
arr.insert(1, RefCell::new(vec![10, 11, 12])); // in fact, elments more than 1,000,000, so can't use clone()
arr.insert(2, RefCell::new(vec![20, 21, 22]));
// in fact, following operator is in recusive closure, I simplify the expression:
let mut vec1 = arr.get(&1).unwrap().borrow_mut();
let mut vec2 = arr.get(&2).unwrap().borrow_mut();
// move the elements of key=1 to key=2
vec2.append(&mut vec1);
}
Tip: Use Vec::append which moves the values from one vector to another.

Insert into a HashMap based on another value in the same Hashmap

I'm trying to insert a value into a HashMap based on another value in the same HashMap, like so:
use std::collections::HashMap;
fn main() {
let mut some_map = HashMap::new();
some_map.insert("a", 1);
let some_val = some_map.get("a").unwrap();
if *some_val != 2 {
some_map.insert("b", *some_val);
}
}
which gives this warning:
warning: cannot borrow `some_map` as mutable because it is also borrowed as immutable
--> src/main.rs:10:9
|
7 | let some_val = some_map.get("a").unwrap();
| -------- immutable borrow occurs here
...
10 | some_map.insert("b", *some_val);
| ^^^^^^^^ --------- immutable borrow later used here
| |
| mutable borrow occurs here
|
= note: `#[warn(mutable_borrow_reservation_conflict)]` on by default
= warning: this borrowing pattern was not meant to be accepted, and may become a hard error in the future
= note: for more information, see issue #59159 <https://github.com/rust-lang/rust/issues/59159>
If I were instead trying to update an existing value, I could use interior mutation and RefCell, as described here.
If I were trying to insert or update a value based on itself, I could use the entry API, as described here.
I could work around the issue with cloning, but I would prefer to avoid that since the retrieved value in my actual code is somewhat complex. Will this require unsafe code?
EDIT
Since previous answer is simply false and doesn't answer the question at all, there's code which doesn't show any warning (playground)
Now it's a hashmap with Rc<_> values, and val_rc contains only a reference counter on actual data (number 1 in this case). Since it's just a counter, there's no cost of cloning it. Note though, that there's only one copy of a number exists, so if you modify a value of some_map["a"], then some_map["b"] is modified aswell, since they refer to a single piece of memory. Also note, that 1 lives on stack, so you better consider turn it into Rc<Box<_>> if you plan to add many heavy objects.
use std::collections::HashMap;
use std::rc::Rc;
fn main() {
let mut some_map = HashMap::new();
some_map.insert("a", Rc::new(1));
let val_rc = Rc::clone(some_map.get("a").unwrap());
if *val_rc != 2 {
some_map.insert("b", val_rc);
}
}
Previous version of answer
Hard to tell what exactly you're looking for, but in this particular case, if you only need to check the value, then destroy the borrowed value, before you update the hashmap. A dirty and ugly code would be like this:
fn main() {
let mut some_map = HashMap::new();
some_map.insert("a", 1);
let is_ok = false;
{
let some_val = some_map.get("a").unwrap();
is_ok = *some_val != 2;
}
if is_ok {
some_map.insert("b", *some_val);
}
}

swapping two entries of a HashMap

i have a simple HashMap; say HashMap<char, char>.
is there a way to swap two elements in this hashmap using std::mem::swap (or any other method)?
Of course there is the simple way getting the values with get and then replace them with insert - but that would trigger the hasher twice (once for getting then for inserting) and i was looking for a way to side-step the second hasher invocation (more out of curiosity than for performance).
what i tried is this (in several versions; none of which worked - and as remarked in the comments: entry would not do what i expect even if i got this past the compiler):
use std::collections::HashMap;
use std::mem::swap;
let mut hash_map: HashMap<char, char> = HashMap::default();
hash_map.insert('A', 'Z');
hash_map.insert('B', 'Y');
swap(&mut hash_map.entry('A'), &mut hash_map.entry('B'));
now the compiler complains (an i understand why it should)
error[E0499]: cannot borrow `hash_map` as mutable more than once at a time
--> tests.rs:103:42
|
103 | swap(&mut hash_map.entry('A'), &mut hash_map.entry('B'));
| ---- -------- ^^^^^^^^ second mutable borrow occurs here
| | |
| | first mutable borrow occurs here
| first borrow later used by call
also just getting the two values this way fails in more or less the same way:
let mut a_val = hash_map.get_mut(&'A').expect("failed to get A value");
let mut b_val = hash_map.get_mut(&'B').expect("failed to get B value");
swap(&mut a_val, &mut b_val);
is there a way to simply swap two entries of a HashMap?
I can't see any safe way to do it:
use std::collections::HashMap;
fn main() {
let mut map = HashMap::new();
map.insert('A', 'Z');
map.insert('B', 'Y');
let a = map.get_mut(&'A').unwrap() as *mut char;
let b = map.get_mut(&'B').unwrap() as *mut char;
unsafe {
std::ptr::swap(a, b);
}
assert_eq!(map.get(&'A'), Some(&'Y'));
assert_eq!(map.get(&'B'), Some(&'Z'));
}
There is one completely safe way I can think of to do this safely, but it's super inefficient: what you want is to get two &mut values, which means borrowck needs to know they're nonoverlapping. Missing a builtin along the lines of split_mut (or the collection being handled specially), the only way I see is to mutably iterate the entire collection, keep refs to the items you're interested in, and swap that:
let mut h = HashMap::new();
h.insert("a", "a");
h.insert("b", "b");
let mut it = h.iter_mut();
let e0 = it.next().unwrap();
let e1 = it.next().unwrap();
std::mem::swap(e0.1, e1.1);
println!("{:?}", h);
It requires a linear traversal of the map until you've found the entries whose values you want to swap though. So even though this has the advantage of not hashing at all edwardw's is answer is probably more practical.

Resources