Rust: How to combine the Entry API with owned data? - rust

I have a HashMap and would like to update a value if it exists and otherwise add a default one. Normally I would do it like this:
some_map.entry(some_key)
.and_modify(|e| modify(e))
.or_insert(default)
But now my modify has type fn(T)->T, but the borrow checker obviously won’t allow me to write:
some_map.entry(some_key)
.and_modify(|e| *e = modify(*e))
.or_insert(default)
What is the preferred way of doing this in Rust? Should I just use remove and insert?

Assuming you can create an empty version of your T for cheap, you could use mem::replace:
some_map.entry(some_key)
.and_modify(|e| {
// swaps the second parameter in and returns the value which was there
let mut v = mem::replace(e, T::empty());
v = modify(v);
// puts the value back in and discards the empty one
mem::replace(e, v);
})
.or_insert(default)
this assumes modify does not panic though, otherwise you'll find yourself with the "empty" value staying in your map. But you'd have a similar issue with remove / insert.

Related

Why match object is borrowed in rust?

The code case is from rustlings 'quiz2.rs'. I known command in ‘for(string,command)’ is borrowed from vector iterator. The command is borrowed, but why the 'n' Append(n) is also borrowed?
pub fn transformer(input: Vec<(String, Command)>) -> Vec<String> {
// TODO: Complete the output declaration!
let mut output: Vec<String> = vec![];
for (string, command) in input.iter() {
// TODO: Complete the function body. You can do it!
match command {
Command::Uppercase => output.push(string.to_uppercase()),
Command::Trim => output.push(string.trim().to_string()),
Command::Append(n) => {
let can_mv_str = string.to_string() + &"bar".repeat(*n);
output.push(can_mv_str);
}
}
}
output
}
First, Vec::iter() returns an iterator over references to the elements of the vector, that is, &(String, Command).
Then, whenever you write a pattern for some specific structure, like the 2-element tuple (string, command) in the for loop, but the input is a reference to that structure, Rust matches "through" the reference and automatically gives you references to the elements (because it's not possible in general to not get references, since not every type is Copy).
So, the type of command is &Command. Then the same thing happens to the match command { and every variable binding in the match's pattern (that is, n) will be a reference too.
If you want to avoid this, what you have to do is explicitly write out matching against the references:
match command {
...
&Command::Append(n) => {
let can_mv_str = string.to_string() + &"bar".repeat(n);
output.push(can_mv_str);
}
}
Or, you can dereference the input to the match (this won't necessarily try to move out of the reference — as long as you don't bind any non-Copy values):
match *command {
...
Command::Append(n) => {
let can_mv_str = string.to_string() + &"bar".repeat(n);
output.push(can_mv_str);
}
}
Finally, if you want to completely avoid the magic and write out a program that is doing the whole thing explicitly, you also need to adjust the for pattern:
for &(ref string, ref command) in input.iter() {
The ref means "please don't try to move this value out of what I'm matching; just give me a reference to it. It's rarely seen in modern Rust because the automatic matching behavior I'm talking about makes it mostly unnecessary. This feature of Rust is called “match ergonomics”, because it saves you from writing a lot of & and ref all the time. But, as you've seen, it can lead to surprising behavior, and the old explicit style also lets you avoid dealing with needless references to Copy types like integers.
If you'd like to try writing Rust without ever using match ergonomics, to understand “what's really going on”, you can enable a Clippy restriction lint to mark any places where the pattern doesn't actually fit the type of what it's matching:
#[warn(clippy::pattern_type_mismatch)]
and run cargo clippy to see the new warnings. You'll probably find that there's quite a lot of patterns that are implicitly working on references!
This is because of RFC 2005. Before that, there used to be special syntax of ref and ref mut when you wanted to take a reference to an inner field in match.
What happens here is that your command is an &Command. You match on it but the arms are all Command. So command gets de-referenced automatically. But then, in Command::Append you take ownership of the inner n. This cannot happen as your command was a reference. So rust gives you a reference to n.
You can fix this in multiple ways. The easiest is to dereference the n yourself, like so:
match command {
Command::Append(&n) => ..., // n is owned here
...
}
This works when n is Copy. You can also do a let n = n.clone() in the match body itself if n is not Copy but is Clone.
You can also take ownership of command, like so:
match *command { // if Command is Copy
Command::Append(n) => ..., // n is owned here
...
}
// OR
match command.clone() { // if Command is Clone
Command::Append(n) => ..., // n is owned here
...
}
If you cannot do any of the above, then you will need to work with the reference itself or maybe change your iterator from input.iter() to input.into_iter() to get an owned Command from the start.

What is the proper way of modifying a value of an entry in a HashMap?

I am a beginner in Rust, I haven't finished the "Book" yet, but one thing made me ask this question.
Considering this code:
fn main() {
let mut entries = HashMap::new();
entries.insert("First".to_string(), 10);
entries.entry("Second".to_string()).or_insert(20);
assert_eq!(10, *entries.get("First").unwrap());
entries.entry(String::from("First")).and_modify(|value| { *value = 20});
assert_eq!(20, *entries.get("First").unwrap());
entries.insert("First".to_string(), 30);
assert_eq!(30, *entries.get("First").unwrap());
}
I have used two ways of modifying an entry:
entries.entry(String::from("First")).and_modify(|value| { *value = 20});
entries.insert("First".to_string(), 30);
The insert way looks clunkish, and I woundn't personally use it to modify a value in an entry, but... it works. Nevertheless, is there a reason not to use it other than semantics? As I said, I'd rather use the entry construct than just bruteforcing an update using insert with an existing key. Something a newbie Rustacean like me could not possibly know?
insert() is a bit more idiomatic when you are replacing an entire value, particularly when you don't know (or care) if the value was present to begin with.
get_mut() is more idiomatic when you want to do something to a value that requires mutability, such as replacing only one field of a struct or invoking a method that requires a mutable reference. If you know the key is present you can use .unwrap(), otherwise you can use one of the other Option utilities or match.
entry(...).and_modify(...) by itself is rarely idiomatic; it's more useful when chaining other methods of Entry together, such as where you want to modify a value if it exists, otherwise add a different value. You might see this pattern when working with maps where the values are totals:
entries.entry(key)
.and_modify(|v| *v += 1)
.or_insert(1);

Why I get "temporary value dropped while borrowed" if I assign, but not when passing via function?

I am quite fresh with Rust. I have experience mainly in C and C++.
This code from lol_html crate example works.
use lol_html::{element, HtmlRewriter, Settings};
let mut output = vec![];
{
let mut rewriter = HtmlRewriter::try_new(
Settings {
element_content_handlers: vec![
// Rewrite insecure hyperlinks
element!("a[href]", |el| {
let href = el
.get_attribute("href")
.unwrap()
.replace("http:", "https:");
el.set_attribute("href", &href).unwrap();
Ok(())
})
],
..Settings::default()
},
|c: &[u8]| output.extend_from_slice(c)
).unwrap();
rewriter.write(b"<div><a href=").unwrap();
rewriter.write(b"http://example.com>").unwrap();
rewriter.write(b"</a></div>").unwrap();
rewriter.end().unwrap();
}
assert_eq!(
String::from_utf8(output).unwrap(),
r#"<div></div>"#
);
But if I move element_content_handlers vec outside and assign it, I get
temporary value dropped while borrowed
for the let line:
use lol_html::{element, HtmlRewriter, Settings};
let mut output = vec![];
{
let handlers = vec![
// Rewrite insecure hyperlinks
element!("a[href]", |el| {
let href = el
.get_attribute("href")
.unwrap()
.replace("http:", "https:");
el.set_attribute("href", &href).unwrap();
Ok(())
}) // this element is deemed temporary
];
let mut rewriter = HtmlRewriter::try_new(
Settings {
element_content_handlers: handlers,
..Settings::default()
},
|c: &[u8]| output.extend_from_slice(c)
).unwrap();
rewriter.write(b"<div><a href=").unwrap();
rewriter.write(b"http://example.com>").unwrap();
rewriter.write(b"</a></div>").unwrap();
rewriter.end().unwrap();
}
assert_eq!(
String::from_utf8(output).unwrap(),
r#"<div></div>"#
);
I think that the method takes ownership of the vector, but I don't understand why it does not work with the simple assignment. I don't want to let declare all elements first. I expect that there is a simple idiom to make it own all elements.
EDIT:
Compiler proposed to bind the element before the line, but what if I have a lot of elements? I would like to avoid naming 50 elements for example. Is there a way to do this without binding all the elements? Also why the lifetime of the temporary ends there inside of vec! invocation in case of a let binding, but not when I put the vec! inside newly constructed struct passed to a method? The last question is very important to me.
When I first tried to reproduce your issue, I got that try_new didn't exist. It's been removed in the latest version of lol_html. Replacing it with new, your issue didn't reproduce. I was able to reproduce with v0.2.0, though. Since the issue had to do with code generated by macros, I tried cargo expand (something you need to install, see here).
Here's what let handlers = ... expanded to in v0.2.0:
let handlers = <[_]>::into_vec(box [(
&"a[href]".parse::<::lol_html::Selector>().unwrap(),
::lol_html::ElementContentHandlers::default().element(|el| {
let href = el.get_attribute("href").unwrap().replace("http:", "https:");
el.set_attribute("href", &href).unwrap();
Ok(())
}),
)]);
and here's what it expands to in v0.3.0
let handlers = <[_]>::into_vec(box [(
::std::borrow::Cow::Owned("a[href]".parse::<::lol_html::Selector>().unwrap()),
::lol_html::ElementContentHandlers::default().element(|el| {
let href = el.get_attribute("href").unwrap().replace("http:", "https:");
el.set_attribute("href", &href).unwrap();
Ok(())
}),
)]);
Ignore the first line, it's how the macro vec! expands. The second line shows the difference in what the versions generate. The first takes a borrow of the result of parse, the second takes a Cow::Owned of it. (Cow stands for copy on write, but it's more generally useful for anything where you want to be generic over either the borrowed or owned version of something.).
So the short answer is the macro used to expand to something that wasn't owned, and now it does. As for why it worked without a separate assignment, that's because Rust automatically created a temporary variable for you.
When using a value expression in most place expression contexts, a temporary unnamed memory location is created initialized to that value and the expression evaluates to that location instead, except if promoted to a static
https://doc.rust-lang.org/reference/expressions.html#tempora...
Initially rust created multiple temporaries for you, all valid for the same-ish scope, the scope of the call to try_new. When you break out the vector to its own assignment the temporary created for element! is only valid for the scope of the vector assignment.
I took a look at the git blame for the element! macro in lol_html, and they made the change because someone opened an issue with essentially your problem. So I'd say this is a bug in a leaky abstraction, not an issue with your understanding of rust.
You are creating a temporary value inside the vector (element). This means that the value created inside of the vector only exists for that fleeting lifetime inside of the vector. At the end of the vector declaration, that value is freed, meaning that it no longer exists. This means that the value created inside vec![] only exists for that fleeting lifetime inside of vec![]. At the end of vec![], the value is freed, meaning that it no longer exists:
let handlers = vec![
______
|
| element!("a[href]", |el| {
| let href = el.get_attribute("href").unwrap().replace("http:", |"https:");
| el.set_attribute("href", &href).unwrap();
| Ok(())
| }),
|______ ^ This value is temporary
]; > the element is freed here, it no longer exists!
You then try to create a HtmlRewriter using a non-existent value!
Settings {
element_content_handlers: handlers,
// the element inside of `handlers` doesn't exist anymore!
..Settings::default()
},
Obviously, the borrow checker catches this issue, and your code doesn't compile.
The solution here is to bind that element to a variable with let:
let element = element!("a[href]", |el| {
let href = el.get_attribute("href").unwrap().replace("http:", "https:");
el.set_attribute("href", &href).unwrap();
Ok(())
});
And then create the vector:
let handlers = vec![element];
Now, the value is bound to a variable (element), and so it lives long enough to be borrowed later in HtmlRewriter::try_new
When you create something, it gets bound to the innermost scope possible for the purposes of tracking its lifetime. Using a let binding at a higher scope binds the value to that scope, making its lifetime longer. If you're creating a lot of things, then applying an operation to them (for example, passing them to another function), it often makes sense to create a vector of values and then apply a transformation to them instead. As an example,
let xs = (0..10).map(|n| SomeStruct { n }).map(|s| another_function(s)).collect();
This way you don't need to bind the SomeStruct objects to anything explicitly.

Not live long enough with CSV and dataflow

fn main() {
timely::execute_from_args(std::env::args().skip(0), move |worker| {
let (mut input, probe) = worker.dataflow::<_, _, _>(|scope| {
let (input, data) = scope.new_collection();
let probe = data.inspect(|x| println!("observed data: {:?}", x)).probe();
(input, probe)
});
let mut rdr = csv::ReaderBuilder::new()
.has_headers(false)
.flexible(true)
.delimiter(b'\t')
.from_reader(io::stdin());
for result in rdr.deserialize() {
let record = result.expect("a CSV record");
let mut vec = Vec::new();
for i in 0..13 {
vec.push(&record[i]);
}
input.insert(vec);
}
});
}
The error is record can not live long enough. I try to read the CSV record and read it as a vector. Then insert records in to the data flow. I can run them separate. I can read the CSv as vector and use the data flow in other place.
The problem is that you are pushing to the Vec a borrowed value: &record[i]. The & means borrow, and as a consequence the original value record must outlive the borrower vec.
That might seem fine (both are in the for body, and thus both have the same lifetime, i.e., they both live inside the for body and therefore none outlive each other), but this doesn't happen because the line input.insert(vec) is moving vec. What this means is that vec now becomes owned by input and hence it lives as long as input (as far as I understand). Now, because input is outside the for body, the moved vec lives as long as input and therefore outlives the record[i]s.
There are a few solutions, but all of them try to remove the dependency between record and input:
If the record is an array of primitive values, or something that implements the Copy trait, you can simply omit the borrow and the value will be copied into the vector: vec.push(record[i]).
Clone the record value into the vector: vec.push(record[i].clone()). This forces the creation of a clone, which as above, the vec becomes the owner, avoiding the borrow.
If the elements in the record array don't implement Copy nor Clone, you have to move it. Because the value is in an array, you have to move the array fully (it can't have elements that haven't been removed). One solution is to transform it into an iterator that moves out the values one by one, and then push them into the vector:
for element in record.into_iter().take(13) {
vec.push(element)
}
Replace the record value with a different value. One final solution in order to move only parts of the array is to replace the element in the array with something else. This means that although you remove an element from the array, you replace it with something else, and the array continues to be valid.
for i in 0..13 {
vec.push(std::mem::replace(&record[i], Default::default()));
}
You can replace Default::default() with another value if you want to.
I hope this helps. I'm still a noob in Rust, so improvements and critique on the answer are accepted :)

How does one create a HashMap with a default value in Rust?

Being fairly new to Rust, I was wondering on how to create a HashMap with a default value for a key? For example, having a default value 0 for any key inserted in the HashMap.
In Rust, I know this creates an empty HashMap:
let mut mymap: HashMap<char, usize> = HashMap::new();
I am looking to maintain a counter for a set of keys, for which one way to go about it seems to be:
for ch in "AABCCDDD".chars() {
mymap.insert(ch, 0)
}
Is there a way to do it in a much better way in Rust, maybe something equivalent to what Ruby provides:
mymap = Hash.new(0)
mymap["b"] = 1
mymap["a"] # 0
Answering the problem you have...
I am looking to maintain a counter for a set of keys.
Then you want to look at How to lookup from and insert into a HashMap efficiently?. Hint: *map.entry(key).or_insert(0) += 1
Answering the question you asked...
How does one create a HashMap with a default value in Rust?
No, HashMaps do not have a place to store a default. Doing so would cause every user of that data structure to allocate space to store it, which would be a waste. You'd also have to handle the case where there is no appropriate default, or when a default cannot be easily created.
Instead, you can look up a value using HashMap::get and provide a default if it's missing using Option::unwrap_or:
use std::collections::HashMap;
fn main() {
let mut map: HashMap<char, usize> = HashMap::new();
map.insert('a', 42);
let a = map.get(&'a').cloned().unwrap_or(0);
let b = map.get(&'b').cloned().unwrap_or(0);
println!("{}, {}", a, b); // 42, 0
}
If unwrap_or doesn't work for your case, there are several similar functions that might:
Option::unwrap_or_else
Option::map_or
Option::map_or_else
Of course, you are welcome to wrap this in a function or a data structure to provide a nicer API.
ArtemGr brings up an interesting point:
in C++ there's a notion of a map inserting a default value when a key is accessed. That always seemed a bit leaky though: what if the type doesn't have a default? Rust is less demanding on the mapped types and more explicit about the presence (or absence) of a key.
Rust adds an additional wrinkle to this. Actually inserting a value would require that simply getting a value can also change the HashMap. This would invalidate any existing references to values in the HashMap, as a reallocation might be required. Thus you'd no longer be able to get references to two values at the same time! That would be very restrictive.
What about using entry to get an element from the HashMap, and then modify it.
From the docs:
fn entry(&mut self, key: K) -> Entry<K, V>
Gets the given key's corresponding entry in the map for in-place
manipulation.
example
use std::collections::HashMap;
let mut letters = HashMap::new();
for ch in "a short treatise on fungi".chars() {
let counter = letters.entry(ch).or_insert(0);
*counter += 1;
}
assert_eq!(letters[&'s'], 2);
assert_eq!(letters[&'t'], 3);
assert_eq!(letters[&'u'], 1);
assert_eq!(letters.get(&'y'), None);
.or_insert() and .or_insert_with()
Adding to the existing example for .entry().or_insert(), I wanted to mention that if the default value passed to .or_insert() is dynamically generated, it's better to use .or_insert_with().
Using .or_insert_with() as below, the default value is not generated if the key already exists. It only gets created when necessary.
for v in 0..s.len() {
components.entry(unions.get_root(v))
.or_insert_with(|| vec![]) // vec only created if needed.
.push(v);
}
In the snipped below, the default vector passed to .or_insert() is generated on every call. If the key exists, a vector is being created and then disposed of, which can be wasteful.
components.entry(unions.get_root(v))
.or_insert(vec![]) // vec always created.
.push(v);
So for fixed values that don't have much creation overhead, use .or_insert(), and for values that have appreciable creation overhead, use .or_insert_with().
A way to start a map with initial values is to construct the map from a vector of tuples. For instance, considering, the code below:
let map = vec![("field1".to_string(), value1), ("field2".to_string(), value2)].into_iter().collect::<HashMap<_, _>>();

Resources