I was reading the doc from rust lang website and in chapter 4 they did the following example:
let s = String::from("hello world");
let hello = &s[0..5];
let world = &s[6..11];
hello is of type &str that I created from a variable s of type String.
Some rows below they define the following function:
fn first_word(s: &String) -> &str {
let bytes = s.as_bytes();
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return &s[0..i];
}
}
&s[..]
}
This time s is of type &String but still &s[0..i] gave me a &str slice.
How is it possible? I thought that the correct way to achieve this would be something like &((*str)[0..i]).
Am I missing something? Maybe during the [0..i] operation Rust auto deference the variable?
Thanks
Maybe during the [0..i] operation Rust auto deference the variable?
This is exactly what happens. When you call methods/index a reference, it automatically dereferences before applying the method. This behavior can also be manually implemented with the Deref trait. String implements the Deref with a target of str, which means when you call str methods on String. Read more about deref coercion here.
It's important to realize what happens with &s[1..5], and that it's &(s[1..5]), namely, s[1..5] is first first evaluated, this returns a value of type str, and a reference to that value is taken. In fact, there's even more indirection: x[y] in rust is actually syntactic sugar for *std::ops::Index::index(x,y). Note the dereference, as this function always returns a reference, which is then dereferenced by the sugar, and then it is referenced again by the & in our code — naturally, the compiler will optimize this and ensure we are not pointlessly taking references to only dereference them again.
It so happens that the String type does support the Index<Range<usize>> trait and it's Index::output type is str.
It also happens that the str type supports the same, and that it's output type is also str, viā a blanket implementation of SliceIndex.
On your question of auto-dereferencing, it is true that Rust has a Deref trait defined on String as well so that in many contexts, such as this one, &String is automatically cast to &str — any context that accepts a &str also accepts a &String, meaning that the implementation on Index<usize> on String is actually for optimization to avoid this indirection. If it not were there, the code would still work, and perhaps the compiler could even optimize the indirection away.
But that automatic casting is not why it works — it simply works because indexing is defined on many different types.
Finally:
I thought that the correct way to achieve this would be something like &((*str)[0..i]).
This would not work regardless, a &str is not the same as a &String and cannot be dereferenced to a String like a &String. In fact, a &str in many ways is closer to a String than it is to a &String. a &str is really just a fat pointer to a sequence of unicode bytes, also containing the length of said sequence in the second word; a String is, if one will, an extra-fat pointer that also contains the current capacity of the buffer with it, and owns the buffer it points to, so it can delete and resize it.
Related
I am a new learner of Rust, I see the * operator can be overloaded by Deref trait. The std::string::String type have Deref trait implemented, which returns &str type. However when I do the following test, the compiler tells me the type of s2 is str, with an error message "size for values of type str cannot be known at compilation time". So the code cannot be compiled. But the question is why s2 is str? Shouldn't it be the same type as s1?
let owned = "test".to_string(); // owned type is String
let s1 = owned.deref(); // s1 type is &str
let s2 = *owned; // s2 type is str
Deref is a bit of a special trait in Rust, and the rules can be found in the docs. There are some other places where Deref coercion occurs, but since you asked about unary *, the first rule on that page is relevant to you.
If T implements Deref<Target = U>, and x is a value of type T, then:
In immutable contexts, *x (where T is neither a reference nor a raw pointer) is equivalent to *Deref::deref(&x).
So after Deref::deref is called, Rust tries the unary * again. This can invoke Deref on some other type, as seen in this question. This is also the same way C++'s overloadable -> operator works. It does some sort of (user-defined) coercion and then tries to dereference again, which may recursively call -> on something else.
So this
let s2 = *owned;
is equivalent to
let s2 = *owned.deref();
And has type str. str is not a Sized type and hence can't be stored in a variable, which causes your error.
As for why Rust does this, the Deref trait is defined to take a reference and return a reference. This makes sense, since it's coercing some sort of reference behind the scenes, not actually creating data. Nine times out of ten, Deref simply returns a reference to some inner data on the outer structure (Box being a prime example of this).
On the other hand, when you as the programmer write *, you clearly don't want a reference. After all, you just went out of your way to dereference the data. So the * allows deref-coercion through the Deref trait but then still tries to take ownership of (or copy, if applicable) the data after coercion is finished.
Let's take another look at the relationship between Deref and the dereference operator:
let owned = "test".to_string(); // owned type is String
let s1 = owned.deref(); // type of s1 is &str
let s2 = &*owned; // type of s2 is also &str
//let s3 = *owned; // if this compiled, type of s3 would be str
Note how *x expands to *x.deref(), not to x.deref() itself. This you can think of deref() as a "pre-processing" step before applying the actual dereference operator. This is why the above example needed &*owned, and why *owned doesn't compile, despite owned.deref() compiling just fine.
In my code I often want to calculate a new value A, and then return some view of that value B, because B is a type that's more convenient to work with. The simplest case is where A is a vector and B is a slice that I would like to return. Let's say I want to write a function that returns a set of indices. Ideally this would return a slice directly because then I can use it immediately to index a string.
If I return a vector instead of a slice, I have to use to_slice:
fn all_except(except: usize, max:usize) -> Vec<usize> {
(0..except).chain((except + 1)..max).collect()
}
"abcdefg"[all_except(1, 7)]
string indices are ranges of `usize`
the type `str` cannot be indexed by `Vec<usize>`
help: the trait `SliceIndex<str>` is not implemented for `Vec<usize>`
I can't return a slice directly:
fn all_except(except: usize, max:usize) -> &[usize] {
(0..except).chain((except + 1)..max).collect()
}
"abcdefg"[all_except(1, 7)]
^ expected named lifetime parameter
missing lifetime specifier
help: this function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments
help: consider using the `'static` lifetime
I can't even return the underlying vector and a slice of it, for the same reason
pub fn except(index: usize, max: usize) -> (&[usize], Vec<usize>) {
let v = (0..index).chain((index + 1)..max).collect();
(v, v.as_slice)
}
"abcdefg"[all_except(1, 7)[1]
Now it may be possible to hack this particular example using deref coercion (I'm not sure), but I have encountered this problem with more complex types. For example, I have a function that loads an ndarray::Array2<T> from CSV file, then want to split it into two parts using array.split_at(), but this returns two ArrayView2<T> which reference the original Array2<T>, so I encounter the same issue. In general I'm wondering if there's a solution to this problem in general. Can I somehow tell the compiler to move A into the parent frame's scope, or let me return a tuple of (A, B), where it realises that the slice is still valid because A is still alive?
Your code doesn't seem to make any sense, you can't index a string using a slice. If you could the first snippet would have worked with just an as_slice in the caller or something, vecs trivially coerce to slices. That's exactly what the compiler error is telling you: the compiler is looking for a SliceIndex and a Vec (or slice) is definitely not that.
That aside,
Can I somehow tell the compiler to move A into the parent frame's scope, or let me return a tuple of (A, B), where it realises that the slice is still valid because A is still alive?
There are packages like owning_ref which can bundle owner and reference to avoid extra allocations. It tends to be somewhat fiddly.
I don't think there's any other general solution, because Rust reasons at the function level, the type checker has no notion of "tell the compiler to move A into the parent scope". So you need a construct which works around borrow checker.
A friend asked me to explain the following quirk in Rust. I was unable to, hence this question:
fn main() {
let l: Vec<String> = Vec::new();
//let ret = l.contains(&String::from(func())); // works
let ret = l.contains(func()); // does not work
println!("ret: {}", ret);
}
fn func() -> & 'static str {
"hello"
}
Example on the Rust Playground
The compiler will complain like this:
error[E0308]: mismatched types
--> src/main.rs:4:26
|
4 | let ret = l.contains(func()); // does not work
| ^^^^^^ expected struct `std::string::String`, found str
|
= note: expected type `&std::string::String`
found type `&'static str`
In other words, &str does not coerce with &String.
At first I thought it was to do with 'static, however that is a red herring.
The commented line fixes the example at the cost of an extra allocation.
My questions:
Why doesn't &str coerce with &String?
Is there a way to make the call to contains work without the extra allocation?
Your first question should be answer already by #Marko.
Your second question, should be easy to answer as well, just use a closure:
let ret = l.iter().any(|x| x == func());
Edit:
Not the "real" answer anymore, but I let this here for people who might be interested in a solution for this.
It seems that the Rust developers intend to adjust the signature of contains to allow the example posted above to work.
In some sense, this is a known bug in contains. It sounds like the fix won't allow those types to coerce, but will allow the above example to work.
std::string::String is a growable, heap-allocated data structure whereas string slice (str) is an immutable fixed-length string somewhere in memory. String slice is used as a borrowed type, via &str. Consider it as view to some string date that resides somewhere in memory. That is why it does not make sense for str to coerce to String, while the other way around perfectly makes sense. You have a heap-allocated String somewhere in memory and you want to use a view (a string slice) to that string.
To answer your second question. There is no way to make the code work in the current form. You either need to change to a vector of string slices (that way, there will be no extra allocation) or use something other then contains method.
I was reading through the book section about Strings and found they were using &* combined together to convert a piece of text. The following is what it says:
use std::net::TcpStream;
TcpStream::connect("192.168.0.1:3000"); // Parameter is of type &str.
let addr_string = "192.168.0.1:3000".to_string();
TcpStream::connect(&*addr_string); // Convert `addr_string` to &str.
In other words, they are saying they are converting a String to a &str. But why is that conversion done using both of the aforementioned signs? Should this not be done using some other method? Does not the & mean we are taking its reference, then using the * to dereference it?
In short: the * triggers an explicit deref, which can be overloaded via ops::Deref.
More Detail
Look at this code:
let s = "hi".to_string(); // : String
let a = &s;
What's the type of a? It's simply &String! This shouldn't be very surprising, since we take the reference of a String. Ok, but what about this?
let s = "hi".to_string(); // : String
let b = &*s; // equivalent to `&(*s)`
What's the type of b? It's &str! Wow, what happened?
Note that *s is executed first. As most operators, the dereference operator * is also overloadable and the usage of the operator can be considered syntax sugar for *std::ops::Deref::deref(&s) (note that we recursively dereferencing here!). String does overload this operator:
impl Deref for String {
type Target = str;
fn deref(&self) -> &str { ... }
}
So, *s is actually *std::ops::Deref::deref(&s), in which the deref() function has the return type &str which is then dereferenced again. Thus, *s has the type str (note the lack of &).
Since str is unsized and not very handy on its own, we'd like to have a reference to it instead, namely &str. We can do this by adding a & in front of the expression! Tada, now we reached the type &str!
&*s is rather the manual and explicit form. Often, the Deref-overload is used via automatic deref coercion. When the target type is fixed, the compiler will deref for you:
fn takes_string_slice(_: &str) {}
let s = "hi".to_string(); // : String
takes_string_slice(&s); // this works!
In general, &* means to first dereference (*) and then reference (&) a value. In many cases, this would be silly, as we'd end up at the same thing.
However, Rust has deref coercions. Combined with the Deref and DerefMut traits, a type can dereference to a different type!
This is useful for Strings as that means that they can get all the methods from str, it's useful for Vec<T> as it gains the methods from [T], and it's super useful for all the smart pointers, like Box<T>, which will have all the methods of the contained T!
Following the chain for String:
String --deref--> str --ref--> &str
Does not the & mean we are taking its reference, then using the * to dereference it?
No, your order of operations is backwards. * and & associate to the right. In this example, dereferencing is first, then referencing.
I think now you can do this &addr_string
(from a comment)
Sometimes, this will do the same thing. See What are Rust's exact auto-dereferencing rules? for the full details, but yes, a &String can be passed to a function that requires a &str. There are still times where you need to do this little dance by hand. The most common I can think of is:
let name: Option<String> = Some("Hello".to_string());
let name2: Option<&str> = name.as_ref().map(|s| &**s);
You'll note that we actually dereference twice:
&String -->deref--> String --deref--> str --ref--> &str
Although this case can now be done with name.as_ref().map(String::as_str);
There's an answer here for how to capitalize the ASCII characters in a string. This is not quite adequate for my specific problem.
For anyone who wanders in off Google, things have improved since Rust 0.13.0. (I'm on Rust 1.13.0)
&str (string slice) provides to_uppercase() and you can coerce anything to it.
// First, with an &str, which any string can coerce to.
let str = "hello øåÅßç";
let up = str.to_uppercase();
println!("{}", up);
// And again with a String
let nonstatic_string = String::from(str);
let up2 = nonstatic_string.to_uppercase();
println!("{}", up2);
// ...if this fails in an edge case, use &nonstatic_string to force coercion
Regarding Matthieu M.'s comment, this conversion should be fully Unicode-compliant.
Here's a runnable copy on the Rust Playground
In case it helps newcomers to understand what's going on:
Strings implement Deref<Target=str>, so you can call any str methods on String.
str::to_uppercase() takes &str (a borrowed, non-mutable slice. In other words, a read-only reference to a string), so it'll accept pretty much anything.
str::to_uppercase() allocates a new String and returns it, so its return value isn't constrained by the borrowing rules for the input.
The caveat mentioned in the comment at the end is that, if you call my_string.to_uppercase() as str::to_uppercase(my_string), then it'll complain like this:
expected &str, found struct `std::string::String`
What's happening is that my_string.to_uppercase() isn't actually equivalent to that... it's equivalent to str::to_uppercase(&my_string).
(You can't auto-deref if you're not starting with a reference. You don't need to provide the & when making a method call because the &self in the method definition does it for you.)
I don't think there's any function to do it directly, but you can use the functions in the UnicodeChar trait with chars and map like:
let str = "hello øåÅßç";
let up = str.chars().map(|c| c.to_uppercase()).collect::<String>();
println!("{}", up);
Output:
HELLO ØÅÅßÇ
Tested on rustc 0.13.0-dev (66601647c 2014-11-27 06:41:17 +0000)