Dereferencing strings and HashMaps in Rust - hashmap

I'm trying to understand how HashMaps work in Rust and I have come up with this example.
use std::collections::HashMap;
fn main() {
let mut roman2number: HashMap<&'static str, i32> = HashMap::new();
roman2number.insert("X", 10);
roman2number.insert("I", 1);
let roman_num = "XXI".to_string();
let r0 = roman_num.chars().take(1).collect::<String>();
let r1: &str = &r0.to_string();
println!("{:?}", roman2number.get(r1)); // This works
// println!("{:?}", roman2number.get(&r0.to_string())); // This doesn't
}
When I try to compile the code with last line uncommented, I get the following error
error: the trait bound `&str: std::borrow::Borrow<std::string::String>` is not satisfied [E0277]
println!("{:?}", roman2number.get(&r0.to_string()));
^~~
note: in this expansion of format_args!
note: in this expansion of print! (defined in <std macros>)
note: in this expansion of println! (defined in <std macros>)
help: run `rustc --explain E0277` to see a detailed explanation
The Trait implementation section of the docs gives the dereferencing as fn deref(&self) -> &str
So what is happening here?

The error is caused by that generic function HashMap::get over String is selected by the compiler during type inference. But you want HashMap::get over str.
So just change
println!("{:?}", roman2number.get(&r0.to_string()));
to
println!("{:?}", roman2number.get::<str>(&r0.to_string()));
to make it explicit. This helps the compiler to select the right function.
Check out Playground here.
It looks to me that coercion Deref<Target> can only happen when we know the target type, so when compiler is trying to infer which HashMap::get to use, it sees &r0.to_string() as type &String but never &str. And &'static str does not implement Borrow<String>. This results a type error. When we specify HashMap::get::<str>, this function expects &str, when coercion can be applied to &String to get a matching &str.
You can check out Deref coercion and String Deref for more details.

The other answers are correct, but I wanted to point out that you have an unneeded to_string (you've already collected into a String) and an alternate way of coercing to a &str, using as:
let r0: String = roman_num.chars().take(1).collect();
println!("{:?}", roman2number.get(&r0 as &str));
In this case, I'd probably just rewrite the map to contain char as the key though:
use std::collections::HashMap;
fn main() {
let mut roman2number = HashMap::new();
roman2number.insert('X', 10);
roman2number.insert('I', 1);
let roman_num = "XXI";
for c in roman_num.chars() {
println!("{:?}", roman2number.get(&c));
}
}
Note there's no need to have an explicit type for the map, it will be inferred.

The definition of the get method looks as follows
fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V> where K: Borrow<Q>, Q: Hash + Eq
The first part is the type of object which you pass: Q. There are constraints on Q. The conditions on Q are that
the key-type K needs to implement the Borrow trait over Q
Q needs to implement the Hash and Eq traits.
Replacing this with your actual types means that the key-type &'static str needs to implement Borrow<String>. By the definition of Borrow, this means that a &'static str needs to be convertible to &String. But all the docs/texts I've read state that everywhere you'd use &String you should be using &str instead. So it makes little sense to offer a &str -> &String conversion, even if it would make life a little easier sometimes.
Since every reference type is borrowable as a shorter lived reference type.), you can pass a &str when a &'static str is the key-type, because &'static str implements Borrow<str>

Related

Why is a lifetime needed when implementing From<&[u8]>

I'm trying to have a MyType that supports a From<&[u8]> trait, but I'm running into "lifetime problems":
Here's a minimally viable example:
struct MyType {
i: i32
}
impl MyType {
fn from_bytes(_buf: &[u8]) -> MyType {
// for example...
MyType { i: 3 }
}
}
impl From<&[u8]> for MyType {
fn from(bytes: &[u8]) -> Self {
MyType::from_bytes(bytes)
}
}
fn do_smth<T>() -> T where T: From<&[u8]>
{
// for example...
let buf : Vec<u8> = vec![1u8, 2u8];
T::from(buf.as_slice())
}
(...and here's a Rust playground link)
For reasons I cannot understand, the Rust compiler is telling me:
error[E0637]: `&` without an explicit lifetime name cannot be used here
--> src/lib.rs:17:36
|
17 | fn do_smth<T>() -> T where T: From<&[u8]>
| ^ explicit lifetime name needed here
I'm not an expert on lifetimes and I don't understand why this piece of code needs one. What would be the best way to fix this?
Might Rust be thinking that the type T could be a &[u8] itself? But, in that case, the lifetime should be inferred to be the same as the input to From::<&[u8]>::from(), no?
One fix I was given was to do:
fn do_smth<T>() -> T where for<'a> T: From<&'a [u8]>
{
// for example...
let buf : Vec<u8> = vec![1u8, 2u8];
T::from(buf.as_slice())
}
...but I do not understand this fix, nor do I understand why lifetimes are needed in the first place.
Rust first wants you to write:
fn do_smth<'a, T>() -> T
where
T: From<&'a [u8]>,
{
// for example...
let buf: Vec<u8> = vec![1u8, 2u8];
T::from(&buf)
}
where you make explicit that this function can be called for any lifetime 'a and any type T such that T implements From<&'a [u8]>.
But Rust then complains:
error[E0597]: `buf` does not live long enough
--> src/lib.rs:24:13
|
18 | fn do_smth<'a, T>() -> T
| -- lifetime `'a` defined here
...
24 | T::from(&buf)
| --------^^^^-
| | |
| | borrowed value does not live long enough
| argument requires that `buf` is borrowed for `'a`
25 | }
| - `buf` dropped here while still borrowed
You promised that this function could work with any lifetime, but this turns out to not be true, because in the body of the function you create a fresh reference to the local Vec which has a different lifetime, say 'local. Your function only works when 'a equals 'local, but you promise that it also works for all other lifetimes. What you need is a way to express that these lifetimes are the same, and the only way I think that is possible is by changing the local reference to an argument:
fn do_smth<'a, T>(buf: &'a [u8]) -> T
where
T: From<&'a [u8]>,
{
T::from(buf)
}
And then it compiles.
If instead of the function promising it can work with any lifetime, you want to make the caller promise it can work with any lifetime, you can instead use HRTBs to make the caller promise it.
fn do_smth<T>() -> T
where
for<'a> T: From<&'a [u8]>,
{
// for example...
let buf: Vec<u8> = vec![1u8, 2u8];
T::from(&buf)
}
Now, since you can use any lifetime, a local one also works and the code compiles.
Lifetimes represent a "duration" (metaphorically), or, more pragmatically, a scope, in which a variable is valid. Outside of one's lifetime, the variable should be considered as having been freed from memory, even though you haven't done that explicitly, because that's how Rust manages memory.
It becomes a bit more complex when Rust tries to ensure that, once a variable is done for, no other parts of the code that could have had access to that variable still have access. These shared accesses are called borrows, and that's why borrows have lifetimes too. The main condition Rust enforces on them is that a borrow's lifetime is always shorter (or within, depending on how you see it) than its original variable, ie. you can't share something for more time than you actually own it.
Rust therefore enforces all borrows (as well as all variables, really) to have an established lifetime at compile-time. To lighten things, Rust has default rules about what a lifetime should be if it was not explicitly defined by the user, that is, when you talk about a type that involves a lifetime, Rust let's you not write that lifetime explicitly under certain conditions. However, this is not a "lifetime inference", in the sense of inferring types: Rust will not try to make sense out of explicit lifetimes, it's a lot less smart about it. In particular, this lifetime explicitation can fail, in the sense that Rust will not be able to figure out the right lifetime it has to assign even though it was possible to find out that worked.
Back to business: your first error simply stems from the fact that Rust has no rule to make a lifetime if it wasn't provided in the position pointed out by the error. As I said, Rust won't try to infer what the right lifetime would be, it simply checks whether not explicitly putting a lifetime there implicitly means something or not. So, you simply need to put one.
Your first reflex might be to make your function generic over the missing lifetime, which is often the right thing to do (and even the only possible action), that is, do something like that:
fn do_smth<'a, T>() -> T
where
T: From<&'a [u8]>
{
// for example...
let buf : Vec<u8> = vec![1, 2];
T::from(buf.as_slice())
}
What this means is that do_smth is generic over the lifetime 'a, just like it is generic over the type T. This has two consequences:
Rust will proceed to a monomorphisation of your function for each call, meaning it will actually provide a concrete implementation of your function for each type T and each lifetime 'a that is required. In particular, it will automatically find out what is the right lifetime. This might seem contradictory with what I said earlier, about Rust not inferring lifetimes. The difference is that type inference and monomorphisation, although similar, are not the same step, and so the compiler does not work lifetimes in the same way. Don't worry about this until you have understood the rest.
The second consequence, which is a bit disastrous, is that your function exposes the following contract: for any type T, and any lifetime 'a, such that T: From<&'a [u8]>, do_smth can produce a type T. If you think about it, it means that even if T only implements From<&'a [u8]> for a lifetime 'a that is already finished (or, if you see lifetimes as scopes, for a lifetime 'a that is disjoint from do_smth's scope), you can produce an element of type T. This is not what you actually meant: you don't want the caller to give you an arbitrary lifetime. Instead, you know that the lifetime of the borrow of the slice is the one you chose it to be, within your function (because you own the underlying vector), and you want that the type T to be buildable from that slice. That is, you want T: From<&'a [u8]> for a 'a that you have chosen, not one provided by the caller.
This last point should make you understand why the previous snippet of code is unsound, and won't compile. Your function should not take a lifetime as argument, just a type T with certain constraints. But then, how do you encode the said conditions? That's where for<'a> comes into play. If you have a type T such that T: for<'a> From<&'a [u8]>, it means that for all 'a, T: From<&'a [u8]>. In particular, it is true for the lifetime of your slice. This is why the following works
fn do_smth<T>() -> T
where
T: for<'a> From<&'a [u8]>
{
// for example...
let buf: Vec<u8> = vec![1, 2];
T::from(buf.as_slice())
}
Note that, as planned, this version of do_smth is not generic over a lifetime, that is, the caller does not provide a lifetime to the function.

Why doesn't &String automatically become &str in some cases?

In this toy example I'd like to map the items from a HashMap<String, String> with a helper function. There are two versions defined, one that takes arguments of the form &String and another with &str. Only the &String one compiles. I had thought that String always dereferences to &str but that doesn't seem to be the case here. What's the difference between a &String and a &str?
use std::collections::HashMap;
// &String works
fn process_item_1(key_value: (&String, &String)) -> String {
let mut result = key_value.0.to_string();
result.push_str(", ");
result.push_str(key_value.1);
result
}
// &str doesn't work (type mismatch in fn arguments)
fn process_item_2(key_value: (&str, &str)) -> String {
let mut result = key_value.0.to_string();
result.push_str(", ");
result.push_str(key_value.1);
result
}
fn main() {
let mut map: HashMap<String, String> = HashMap::new();
map.insert("a".to_string(), "b".to_string());
for s in map.iter().map(process_item_2) { // <-- compile error on this line
println!("{}", s);
}
}
Here's the error for reference:
error[E0631]: type mismatch in function arguments
--> src/main.rs:23:29
|
12 | fn process_item_2(key_value: (&str, &str)) -> String {
| ---------------------------------------------------- found signature of `for<'r, 's> fn((&'r str, &'s str)) -> _`
...
23 | for s in map.iter().map(process_item_2) {
| ^^^^^^^^^^^^^^ expected signature of `fn((&String, &String)) -> _`
Thanks for your help with a beginner Rust question!
It goes even stranger than that:
map.iter().map(|s| process_item_2(s)) // Does not work
map.iter().map(|(s1, s2)| process_item_2((s1, s2))) // Works
The point is that Rust never performs any expensive coercion. Converting &String to &str is cheap: you just take the data pointer and length. But converting (&String, &String) to (&str, &str) is no so cheap anymore: you have to take the data+length of the first string, then of the second string, then concatnate them together (and also, if it was done for tuple, what about (((&String, &String, &String), &String), (&String, &String))? And it was probably done then for arrays too, so what about &[&String; 10_000]?) That's why the first closure fails. The second closure, however, destruct the tuple and rebuild it. That means that instead of coercing a tuple, we coerce &String twice, and build a tuple from the results. That's fine.
The version without the closure is even more expensive: since you're passing a function directly to map(), and map produces &String, someone needs to convert this to &str! In order to do that, the compiler would have to introduce a shim - a small function that does that works: it takes (&String, &String) and calls process_item_2() with the (&String, &String) coerced to (&str, &str). This is a hidden cost, so Rust (almost) never creates shims. This is why it wouldn't work even for &String and not just for (&String, &String). And why |v| f(v) is not always the same as f - the first one performs coercions, while the second doesn't.

Clarification on Deref coercion

Consider this example:
fn main() {
let string: String = "A string".to_string();
let string_ref: &String = &string;
let str_ref_a: &str = string_ref; // A
let str_ref_b: &str = &string; // B
}
How exactly are lines A and B different? string_ref is of type &String, so my understanding is that in line A we have an example of Deref coercion. What about line B though? Is it correct to say that it has nothing to do with Deref coercion and we simply have "a direct borrowing of a String as str", due to this:
impl Borrow<str> for String {
#[inline]
fn borrow(&self) -> &str {
&self[..]
}
}
Both are essentially equivalent and involve deref coercion:
let str_ref_a: &str = string_ref; // A
let str_ref_b: &str = &string; // B
The value string above is of type String, so the expression &string is of type &String which coerces into &str due to deref coercion as String implements Deref<Target=str>.
Regarding your question about Borrow: No, you aren't calling borrow() on the string value. Instead, that would be:
let str_ref_b: &str = string.borrow(); // Borrow::borrow() on String
That is, unlike deref(), the call to borrow() isn't inserted automatically.
No. Both lines involve deref coercion.
The Borrow trait is not special in any way - it is not known to the compiler (not a lang item). The Deref trait is.
The difference between Deref and Borrow (and also AsRef) is that Deref can only have one implementation for a type (since Target is an associated type and not a generic parameter) while AsRef (and Borrow) take a generic parameter and thus can be implemented multiple times. This is because Deref is for smart pointers: you should implement Deref<Target = T> if I am a T (note that the accurate definition of "smart pointer" is in flux). String is a str (plus additional features), and Vec is a slice. Box<T> is T.
On the other hand, AsRef and Borrow are conversion traits. I should implement AsRef<T> if I can be viewed as T. String can be viewed as str. But take e.g. str and OsStr. str is not an OsStr. But it can be treated like it if it was. So it implements AsRef<OsStr>. Borrow is the same as AsRef except it has additional requirements (namely, it should have the same Eq, Hash and Ord as the original value).

Destructure immutable reference and bind mutably in parameter list

I want to destructure a reference to a Copy-type and bind the resulting value mutably. Note: yes, I want to create a copy of the given argument and mutate that copy. This works:
fn foo(a: &u64) {
let mut a = *a;
mutate(&mut a);
}
But I want to do it via destructuring in the argument list. When I just destructure like this:
fn foo(&a: &u64) {
mutate(&mut a);
}
Rust (understandably) complains:
<anon>:3:17: 3:18 error: cannot borrow immutable local variable `a` as mutable
<anon>:3 mutate(&mut a);
^
<anon>:1:9: 1:10 help: to make the local variable mutable, use `mut` as shown:
<anon>: fn foo(&mut a: &u64) {
But the solution the compiler suggests does not work!
<anon>:1:8: 1:14 error: mismatched types:
expected `&u64`,
found `&mut _`
(values differ in mutability) [E0308]
<anon>:1 fn foo(&mut a: &u64) {
^~~~~~
How to do what I want?
I don't think you can do that. While patterns can destructure data (i.e., introduce bindings for parts of data), and bindings can be marked mutable, there doesn't seem to be a way to combine the two in the case of destructuring a reference, even though other combinations work:
struct X(i32);
let it = X(5);
let X(mut inside) = it;
inside = 1;
This may just be an edge case where other syntactic choices leave no good possibility. As you noticed, &mut x is already taken, and disambiguating with parentheses is not supported in patterns.
This isn't a big issue because there are several ways to do the same thing outside the pattern:
Don't make the binding mutable in the pattern and then make a new, mutable binding (let mut x = x;)
Don't destructure to begin with &dash; it's not terribly useful with references since you can just say let mut x = *reference;.
Trying to be too clever in argument lists is detrimental to readability anyway IMHO.
This seems to work on the Rust playground:
fn foo(mut a: &mut u64) {
mutate(a);
}
Short answer is that you need to take in a mutable parameter that has the type of a mutable reference to a u64. I cannot provide a better answer, I'm still learning.
Here you can see a working example: Rust Playground
I am not 100% sure if I understand what you're asking, but it's worth a shot.

&&str.to_owned() doesn't result in a String

I've got the following code:
use std::collections::HashMap;
fn main() {
let xs: Vec<&str> = vec!("a", "b", "c", "d");
let ys: Vec<i32> = vec!(1, 2, 3, 4);
let mut map: HashMap<String,i32> = HashMap::new();
for (x,y) in xs.iter().zip(ys) {
map.insert(x.to_owned(), y);
}
println!("{:?}", map);
}
Which results in error:
<anon>:8:20: 8:32 error: mismatched types:
expected `collections::string::String`,
found `&str`
(expected struct `collections::string::String`,
found &-ptr) [E0308]
<anon>:8 map.insert(x.to_owned(), y);
But it doesn't make sense to me. x should be &&str at this point. So why doesn't &&str.to_owned() automagically Deref the same way x.to_string() does at this point? (Why is x.to_owned() a &str?)
I know I can fix this by either using x.to_string(), or xs.into_iter() instead.
Because ToOwned is implemented for T where T: Clone, and Clone is implemented for &T. You need to roughly understand how pattern matching on &self works when both T and &T are available. Using a pseudo-syntax for exposition,
str → String
str doesn't match &self
&str (auto-ref) matches &self with self == str
Thus ToOwned<str> kicks in.
&str → String
&str matches &self with self == str
Thus ToOwned<str> kicks in.
&&str → &str
&&str matches &self with self == &str
Thus ToOwned<&T> kicks in.
Note that in this case, auto-deref can never kick in, since &T will always match in cases where T might, which lowers the complexity a bit. Note also that auto-ref only kicks in once (and once more for each auto-deref'd type).
To copy from huon's much better answer than mine,
The core of the algorithm is:
For each each "dereference step" U (that is, set U = T and then U = *T, ...)
if there's a method bar where the receiver type (the type of self in the method) matches U exactly , use it (a "by value method")
otherwise, add one auto-ref (take & or &mut of the receiver), and, if some method's receiver matches &U, use it (an "autorefd method")
FWIW, .into() is normally prettier than .to_owned() (especially when types are implied; oft even when not), so I suggest that here. You still need a manual dereference, though.

Resources