Why doesn't &String automatically become &str in some cases? - rust

In this toy example I'd like to map the items from a HashMap<String, String> with a helper function. There are two versions defined, one that takes arguments of the form &String and another with &str. Only the &String one compiles. I had thought that String always dereferences to &str but that doesn't seem to be the case here. What's the difference between a &String and a &str?
use std::collections::HashMap;
// &String works
fn process_item_1(key_value: (&String, &String)) -> String {
let mut result = key_value.0.to_string();
result.push_str(", ");
result.push_str(key_value.1);
result
}
// &str doesn't work (type mismatch in fn arguments)
fn process_item_2(key_value: (&str, &str)) -> String {
let mut result = key_value.0.to_string();
result.push_str(", ");
result.push_str(key_value.1);
result
}
fn main() {
let mut map: HashMap<String, String> = HashMap::new();
map.insert("a".to_string(), "b".to_string());
for s in map.iter().map(process_item_2) { // <-- compile error on this line
println!("{}", s);
}
}
Here's the error for reference:
error[E0631]: type mismatch in function arguments
--> src/main.rs:23:29
|
12 | fn process_item_2(key_value: (&str, &str)) -> String {
| ---------------------------------------------------- found signature of `for<'r, 's> fn((&'r str, &'s str)) -> _`
...
23 | for s in map.iter().map(process_item_2) {
| ^^^^^^^^^^^^^^ expected signature of `fn((&String, &String)) -> _`
Thanks for your help with a beginner Rust question!

It goes even stranger than that:
map.iter().map(|s| process_item_2(s)) // Does not work
map.iter().map(|(s1, s2)| process_item_2((s1, s2))) // Works
The point is that Rust never performs any expensive coercion. Converting &String to &str is cheap: you just take the data pointer and length. But converting (&String, &String) to (&str, &str) is no so cheap anymore: you have to take the data+length of the first string, then of the second string, then concatnate them together (and also, if it was done for tuple, what about (((&String, &String, &String), &String), (&String, &String))? And it was probably done then for arrays too, so what about &[&String; 10_000]?) That's why the first closure fails. The second closure, however, destruct the tuple and rebuild it. That means that instead of coercing a tuple, we coerce &String twice, and build a tuple from the results. That's fine.
The version without the closure is even more expensive: since you're passing a function directly to map(), and map produces &String, someone needs to convert this to &str! In order to do that, the compiler would have to introduce a shim - a small function that does that works: it takes (&String, &String) and calls process_item_2() with the (&String, &String) coerced to (&str, &str). This is a hidden cost, so Rust (almost) never creates shims. This is why it wouldn't work even for &String and not just for (&String, &String). And why |v| f(v) is not always the same as f - the first one performs coercions, while the second doesn't.

Related

Why do I get a "type mismatch" error when passing a function directly but it works with an equivalent closure?

In my attempt to make my code a little bit cleaner, I tried passing the closure directly by name to the map() to an iterator for a &[&str]. map will need to iterate over &&str which is understandable but my closure only takes &str.
Code example:
fn capitalize_first(word: &str) -> String {
word.chars().next().unwrap().to_uppercase().to_string()
}
fn main() {
let cap_cap = |word: &&str| -> String {
word.chars().next().unwrap().to_uppercase().to_string()
};
let a = ["we", "are", "testing"];
let b = &a; // this is where this becomes interesting.
b.iter().map(|&x| capitalize_first(x)); // Works
b.iter().map(|x| capitalize_first(x)); // Works
b.iter().map(cap_cap); // That's what the compiler expects.
b.iter().map(capitalize_first); // fails to compile!
}
Compiler error:
error[E0631]: type mismatch in function arguments
--> src/main.rs:17:18
|
1 | fn capitalize_first(word: &str) -> String {
| ----------------------------------------- found signature of `for<'r> fn(&'r str) -> _`
...
16 | b.iter().map(capitalize_first); // fails to compile!
| ^^^^^^^^^^^^^^^^ expected signature of `fn(&&str) -> _`
A couple of questions:
Why does this work map(|x| capitalize_first(x)) while map(capitalize_first) doesn't? What sort of magic happens behind the scenes in the first case? In my view, those should be identical.
Is there a reason on why rust can't transparently convert between &&str and &str?
Is there a reason on why rust can't transparently convert between &&str and &str?
Rust does transparently convert from &&str to &str, you can see it in one of your working examples:
b.iter().map(|x| capitalize_first(x))
// ^ x is a &&str here ^ but seems to be a &str here
What sort of magic happens behind the scenes[?]
The "magic" that happens is due to type coercions, specifically Deref coercion. "Type coercions are implicit operations that change the type of a value".
Why does this work map(|x| capitalize_first(x)) while map(capitalize_first) doesn't?
What you've discovered is that while there is a coercion for &&str to &str, there is no coercion from fn(&&str) to fn(&str). What coercions are available is pretty restricted.
On top of that, there are only specific places where coercion can occur, but what makes the first case work is that, with the closure, a coercion site is available right where you call capitalize_first:
b.iter().map(|x| capitalize_first(x))
// ^ the compiler injects code here to do the coercion
// as if you had written capitalize_first(*x)
When you use the function directly, there is no such site where coercion can occur.

Proper signature for a function accepting an iterator of strings

I'm confused about the proper type to use for an iterator yielding string slices.
fn print_strings<'a>(seq: impl IntoIterator<Item = &'a str>) {
for s in seq {
println!("- {}", s);
}
}
fn main() {
let arr: [&str; 3] = ["a", "b", "c"];
let vec: Vec<&str> = vec!["a", "b", "c"];
let it: std::str::Split<'_, char> = "a b c".split(' ');
print_strings(&arr);
print_strings(&vec);
print_strings(it);
}
Using <Item = &'a str>, the arr and vec calls don't compile. If, instead, I use <Item = &'a'a str>, they work, but the it call doesn't compile.
Of course, I can make the Item type generic too, and do
fn print_strings<'a, I: std::fmt::Display>(seq: impl IntoIterator<Item = I>)
but it's getting silly. Surely there must be a single canonical "iterator of string values" type?
The error you are seeing is expected because seq is &Vec<&str> and &Vec<T> implements IntoIterator with Item=&T, so with your code, you end up with Item=&&str where you are expecting it to be Item=&str in all cases.
The correct way to do this is to expand Item type so that is can handle both &str and &&str. You can do this by using more generics, e.g.
fn print_strings(seq: impl IntoIterator<Item = impl AsRef<str>>) {
for s in seq {
let s = s.as_ref();
println!("- {}", s);
}
}
This requires the Item to be something that you can retrieve a &str from, and then in your loop .as_ref() will return the &str you are looking for.
This also has the added bonus that your code will also work with Vec<String> and any other type that implements AsRef<str>.
TL;DR The signature you use is fine, it's the callers that are providing iterators with wrong Item - but can be easily fixed.
As explained in the other answer, print_string() doesn't accept &arr and &vec because IntoIterator for &[T; n] and &Vec<T> yield references to T. This is because &Vec, itself a reference, is not allowed to consume the Vec in order to move T values out of it. What it can do is hand out references to T items sitting inside the Vec, i.e. items of type &T. In the case of your callers that don't compile, the containers contain &str, so their iterators hand out &&str.
Other than making print_string() more generic, another way to fix the issue is to call it correctly to begin with. For example, these all compile:
print_strings(arr.iter().map(|sref| *sref));
print_strings(vec.iter().copied());
print_strings(it);
Playground
iter() is the method provided by slices (and therefore available on arrays and Vec) that iterates over references to elements, just like IntoIterator of &Vec. We call it explicitly to be able to call map() to convert &&str to &str the obvious way - by using the * operator to dereference the &&str. The copied() iterator adapter is another way of expressing the same, possibly a bit less cryptic than map(|x| *x). (There is also cloned(), equivalent to map(|x| x.clone()).)
It's also possible to call print_strings() if you have a container with String values:
let v = vec!["foo".to_owned(), "bar".to_owned()];
print_strings(v.iter().map(|s| s.as_str()));

How to use a Box<String> as a lookup key for a hashmap with &str keys?

My hashmap keys are expected to be of type &str, in particular &'static str, and I've received an owned Box<String>.
How do I search my string against my hashmap?
use std::collections::HashMap;
fn main() {
let mut map: HashMap<&str, u32> = HashMap::new();
let static_string = "a";
map.insert(static_string, 5);
let owned_boxed_string = Box::new(String::from("a"));
map.get(owned_boxed_string); // mismatched type (ok)
map.get(*owned_boxed_string); // mismatched type (ok)
map.get(&*owned_boxed_string); // trait bound not satisfied (?)
}
Let's look at the definition of HashMap::get():
pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V> where
K: Borrow<Q>,
Q: Hash + Eq,
In your code, K is &'static str and Q is deduced from each of the calls to get. In plain English, get() takes a reference to a type Q such that &'static str implements Borrow<Q>.
The rationale is that usually you will store keys of type String or the like, and you will search using values of type &str. And naturally String implements Borrow<str> so you can do that.
In your case, your key is not String but &'static str... Take a look at the documentation for Borrow and look for what kind of Borrow<Q> is implemented by &'static str. I can only see these two blanket implementations:
impl<'_, T: ?Sized> Borrow<T> for &'_ T
impl<T: ?Sized> Borrow<T> for T
The first one states that if your map keys are references to some type &K you can call get(k: &K). The second one says that if your key is K you can call get(k: &K).
So for your particular case of &'static str they are realized as:
impl Borrow<str> for &'static str
impl Borrow<&'static str> for &'static str
From that you can deduce that your function is either get(k: &str) or get(k: &&str). Here the easiest choice is the first one.
Now, you may think that your last line (map.get(&*owned_boxed_string);) should work, because that value is of type &str, but it actually is not, it is of type &String. If HashMap::get() received a &str it would compile fine, but taking a Borrow<Q>, the compiler gets all confused and fails.
TL; DR;: Add a type cast or a temporary typed variable:
map.get(&*owned_boxed_string as &str); //ok
map.get(&owned_boxed_string as &str); //also ok
let s: &str = &*owned_boxed_string;
map.get(s);
let s: &str = &owned_boxed_string;
map.get(s);
Although an easier solution (thanks to #harmic for the comment below) is to use String::as_str() that will take care of as many derefs as required and just return the necessary &str:
map.get(owned_boxed_string.as_str());

Dereferencing strings and HashMaps in Rust

I'm trying to understand how HashMaps work in Rust and I have come up with this example.
use std::collections::HashMap;
fn main() {
let mut roman2number: HashMap<&'static str, i32> = HashMap::new();
roman2number.insert("X", 10);
roman2number.insert("I", 1);
let roman_num = "XXI".to_string();
let r0 = roman_num.chars().take(1).collect::<String>();
let r1: &str = &r0.to_string();
println!("{:?}", roman2number.get(r1)); // This works
// println!("{:?}", roman2number.get(&r0.to_string())); // This doesn't
}
When I try to compile the code with last line uncommented, I get the following error
error: the trait bound `&str: std::borrow::Borrow<std::string::String>` is not satisfied [E0277]
println!("{:?}", roman2number.get(&r0.to_string()));
^~~
note: in this expansion of format_args!
note: in this expansion of print! (defined in <std macros>)
note: in this expansion of println! (defined in <std macros>)
help: run `rustc --explain E0277` to see a detailed explanation
The Trait implementation section of the docs gives the dereferencing as fn deref(&self) -> &str
So what is happening here?
The error is caused by that generic function HashMap::get over String is selected by the compiler during type inference. But you want HashMap::get over str.
So just change
println!("{:?}", roman2number.get(&r0.to_string()));
to
println!("{:?}", roman2number.get::<str>(&r0.to_string()));
to make it explicit. This helps the compiler to select the right function.
Check out Playground here.
It looks to me that coercion Deref<Target> can only happen when we know the target type, so when compiler is trying to infer which HashMap::get to use, it sees &r0.to_string() as type &String but never &str. And &'static str does not implement Borrow<String>. This results a type error. When we specify HashMap::get::<str>, this function expects &str, when coercion can be applied to &String to get a matching &str.
You can check out Deref coercion and String Deref for more details.
The other answers are correct, but I wanted to point out that you have an unneeded to_string (you've already collected into a String) and an alternate way of coercing to a &str, using as:
let r0: String = roman_num.chars().take(1).collect();
println!("{:?}", roman2number.get(&r0 as &str));
In this case, I'd probably just rewrite the map to contain char as the key though:
use std::collections::HashMap;
fn main() {
let mut roman2number = HashMap::new();
roman2number.insert('X', 10);
roman2number.insert('I', 1);
let roman_num = "XXI";
for c in roman_num.chars() {
println!("{:?}", roman2number.get(&c));
}
}
Note there's no need to have an explicit type for the map, it will be inferred.
The definition of the get method looks as follows
fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V> where K: Borrow<Q>, Q: Hash + Eq
The first part is the type of object which you pass: Q. There are constraints on Q. The conditions on Q are that
the key-type K needs to implement the Borrow trait over Q
Q needs to implement the Hash and Eq traits.
Replacing this with your actual types means that the key-type &'static str needs to implement Borrow<String>. By the definition of Borrow, this means that a &'static str needs to be convertible to &String. But all the docs/texts I've read state that everywhere you'd use &String you should be using &str instead. So it makes little sense to offer a &str -> &String conversion, even if it would make life a little easier sometimes.
Since every reference type is borrowable as a shorter lived reference type.), you can pass a &str when a &'static str is the key-type, because &'static str implements Borrow<str>

&&str.to_owned() doesn't result in a String

I've got the following code:
use std::collections::HashMap;
fn main() {
let xs: Vec<&str> = vec!("a", "b", "c", "d");
let ys: Vec<i32> = vec!(1, 2, 3, 4);
let mut map: HashMap<String,i32> = HashMap::new();
for (x,y) in xs.iter().zip(ys) {
map.insert(x.to_owned(), y);
}
println!("{:?}", map);
}
Which results in error:
<anon>:8:20: 8:32 error: mismatched types:
expected `collections::string::String`,
found `&str`
(expected struct `collections::string::String`,
found &-ptr) [E0308]
<anon>:8 map.insert(x.to_owned(), y);
But it doesn't make sense to me. x should be &&str at this point. So why doesn't &&str.to_owned() automagically Deref the same way x.to_string() does at this point? (Why is x.to_owned() a &str?)
I know I can fix this by either using x.to_string(), or xs.into_iter() instead.
Because ToOwned is implemented for T where T: Clone, and Clone is implemented for &T. You need to roughly understand how pattern matching on &self works when both T and &T are available. Using a pseudo-syntax for exposition,
str → String
str doesn't match &self
&str (auto-ref) matches &self with self == str
Thus ToOwned<str> kicks in.
&str → String
&str matches &self with self == str
Thus ToOwned<str> kicks in.
&&str → &str
&&str matches &self with self == &str
Thus ToOwned<&T> kicks in.
Note that in this case, auto-deref can never kick in, since &T will always match in cases where T might, which lowers the complexity a bit. Note also that auto-ref only kicks in once (and once more for each auto-deref'd type).
To copy from huon's much better answer than mine,
The core of the algorithm is:
For each each "dereference step" U (that is, set U = T and then U = *T, ...)
if there's a method bar where the receiver type (the type of self in the method) matches U exactly , use it (a "by value method")
otherwise, add one auto-ref (take & or &mut of the receiver), and, if some method's receiver matches &U, use it (an "autorefd method")
FWIW, .into() is normally prettier than .to_owned() (especially when types are implied; oft even when not), so I suggest that here. You still need a manual dereference, though.

Resources