How do I transform &str to ~str in Rust? - string

This is for the current 0.6 Rust trunk by the way, not sure the exact commit.
Let's say I want to for each over some strings, and my closure takes a borrowed string pointer argument (&str). I want my closure to add its argument to an owned vector of owned strings ~[~str] to be returned. My understanding of Rust is weak, but I think that strings are a special case where you can't dereference them with * right? How do I get my strings from &str into the vector's push method which takes a ~str?
Here's some code that doesn't compile
fn read_all_lines() -> ~[~str] {
let mut result = ~[];
let reader = io::stdin();
let util = #reader as #io::ReaderUtil;
for util.each_line |line| {
result.push(line);
}
result
}
It doesn't compile because it's inferring result's type to be [&str] since that's what I'm pushing onto it. Not to mention its lifetime will be wrong since I'm adding a shorter-lived variable to it.
I realize I could use ReaderUtil's read_line() method which returns a ~str. But this is just an example.
So, how do I get an owned string from a borrowed string? Or am I totally misunderstanding.

You should call the StrSlice trait's method, to_owned, as in:
fn read_all_lines() -> ~[~str] {
let mut result = ~[];
let reader = io::stdin();
let util = #reader as #io::ReaderUtil;
for util.each_line |line| {
result.push(line.to_owned());
}
result
}
StrSlice trait docs are here:
http://static.rust-lang.org/doc/core/str.html#trait-strslice

You can't.
For one, it doesn't work semantically: a ~str promises that only one thing owns it at a time. But a &str is borrowed, so what happens to the place you borrowed from? It has no way of knowing that you're trying to steal away its only reference, and it would be pretty rude to trash the caller's data out from under it besides.
For another, it doesn't work logically: ~-pointers and #-pointers are allocated in completely different heaps, and a & doesn't know which heap, so it can't be converted to ~ and still guarantee that the underlying data lives in the right place.
So you can either use read_line or make a copy, which I'm... not quite sure how to do :)
I do wonder why the API is like this, when & is the most restricted of the pointers. ~ should work just as well here; it's not like the iterated strings already exist somewhere else and need to be borrowed.

At first I thought it was possible to use copy line to create owning pointer from the borrowed pointer to the string but this apparently copies burrowed pointer.
So I found str::from_slice(s: &str) -> ~str. This is probably what you need.

Related

Confused about ownership in situations involving lines and map

fn problem() -> Vec<&'static str> {
let my_string = String::from("First Line\nSecond Line");
my_string.lines().collect()
}
This fails with the compilation error:
|
7 | my_string.lines().collect()
| ---------^^^^^^^^^^^^^^^^^^
| |
| returns a value referencing data owned by the current function
| `my_string` is borrowed here
I understand what this error means - it's to stop you returning a reference to a value which has gone out of scope. Having looked at the type signatures of the functions involved, it appears that the problem is with the lines method, which borrows the string it's called on. But why does this matter? I'm iterating over the lines of the string in order to get a vector of the parts, and what I'm returning is this "new" vector, not anything that would (illegally) directly reference my_string.
(I'm aware I could fix this particular example very easily by just using the string literal rather than converting to an "owned" string with String::from. This is a toy example to reproduce the problem - in my "real" code the string variable is read from a file, so I obviously can't use a literal.)
What's even more mysterious to me is that the following variation on the function, which to me ought to suffer from the same problem, works fine:
fn this_is_ok() -> Vec<i32> {
let my_string = String::from("1\n2\n3\n4");
my_string.lines().map(|n| n.parse().unwrap()).collect()
}
The reason can't be map doing some magic, because this also fails:
fn also_fails() -> Vec<&'static str> {
let my_string = String::from("First Line\nSecond Line");
my_string.lines().map(|s| s).collect()
}
I've been playing about for quite a while, trying various different functions inside the map - and some pass and some fail, and I've honestly no idea what the difference is. And all this is making me realise that I have very little handle on how Rust's ownership/borrowing rules work in non-trivial cases, even though I thought I at least understood the basics. So if someone could give me a relatively clear and comprehensive guide to what is going on in all these examples, and how it might be possible to fix those which fail, in some straightforward way, I would be extremely grateful!
The key is in the type of the value yielded by lines: &str. In order to avoid unnecessary clones, lines actually returns references to slices of the string it's called on, and when you collect it to a Vec, that Vec's elements are simply references to slices of your string. So, of course when your function exits and the string is dropped, the references inside the Vec will be dropped and invalid. Remember, &str is a borrowed string, and String is an owned string.
The parsing works because you take those &strs then you read them into an i32, so the data is transferred to a new value and you no longer need a reference to the original string.
To fix your problem, simply use str::to_owned to convert each element into a String:
fn problem() -> Vec<String> {
let my_string = String::from("First Line\nSecond Line");
my_string.lines().map(|v| v.to_owned()).collect()
}
It should be noted that to_string also works, and that to_owned is actually part of the ToOwned trait, so it is useful for other borrowed types as well.
For references to sized values (str is unsized so this doesn't apply), such as an Iterator<Item = &i32>, you can simply use Iterator::cloned to clone every element so they are no longer references.
An alternative solution would be to take the String as an argument so it, and therefore references to it, can live past the scope of the function:
fn problem(my_string: &str) -> Vec<&str> {
my_string.lines().collect()
}
The problem here is that this line:
let my_string = String::from("First Line\nSecond Line");
copies the string data to a buffer allocated on the heap (so no longer 'static). Then lines returns references to that heap-allocated buffer.
Note that &str also implements a lines method, so you don't need to copy the string data to the heap, you can use your string directly:
fn problem() -> Vec<&'static str> {
let my_string = "First Line\nSecond Line";
my_string.lines().collect()
}
Playground
which avoids all unnecessary allocations and copying.

Slice of String vs Slice &String

I was reading the doc from rust lang website and in chapter 4 they did the following example:
let s = String::from("hello world");
let hello = &s[0..5];
let world = &s[6..11];
hello is of type &str that I created from a variable s of type String.
Some rows below they define the following function:
fn first_word(s: &String) -> &str {
let bytes = s.as_bytes();
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return &s[0..i];
}
}
&s[..]
}
This time s is of type &String but still &s[0..i] gave me a &str slice.
How is it possible? I thought that the correct way to achieve this would be something like &((*str)[0..i]).
Am I missing something? Maybe during the [0..i] operation Rust auto deference the variable?
Thanks
Maybe during the [0..i] operation Rust auto deference the variable?
This is exactly what happens. When you call methods/index a reference, it automatically dereferences before applying the method. This behavior can also be manually implemented with the Deref trait. String implements the Deref with a target of str, which means when you call str methods on String. Read more about deref coercion here.
It's important to realize what happens with &s[1..5], and that it's &(s[1..5]), namely, s[1..5] is first first evaluated, this returns a value of type str, and a reference to that value is taken. In fact, there's even more indirection: x[y] in rust is actually syntactic sugar for *std::ops::Index::index(x,y). Note the dereference, as this function always returns a reference, which is then dereferenced by the sugar, and then it is referenced again by the & in our code — naturally, the compiler will optimize this and ensure we are not pointlessly taking references to only dereference them again.
It so happens that the String type does support the Index<Range<usize>> trait and it's Index::output type is str.
It also happens that the str type supports the same, and that it's output type is also str, viā a blanket implementation of SliceIndex.
On your question of auto-dereferencing, it is true that Rust has a Deref trait defined on String as well so that in many contexts, such as this one, &String is automatically cast to &str — any context that accepts a &str also accepts a &String, meaning that the implementation on Index<usize> on String is actually for optimization to avoid this indirection. If it not were there, the code would still work, and perhaps the compiler could even optimize the indirection away.
But that automatic casting is not why it works — it simply works because indexing is defined on many different types.
Finally:
I thought that the correct way to achieve this would be something like &((*str)[0..i]).
This would not work regardless, a &str is not the same as a &String and cannot be dereferenced to a String like a &String. In fact, a &str in many ways is closer to a String than it is to a &String. a &str is really just a fat pointer to a sequence of unicode bytes, also containing the length of said sequence in the second word; a String is, if one will, an extra-fat pointer that also contains the current capacity of the buffer with it, and owns the buffer it points to, so it can delete and resize it.

Understanding rust borrowing and dereferencing

I was reading through the Rust documentation and can't quite seem to be able to wrap my head around what is going on. For example, over here I see the following example:
// This function takes ownership of a box and destroys it
fn eat_box_i32(boxed_i32: Box<i32>) {
println!("Destroying box that contains {}", boxed_i32);
}
// This function borrows an i32
fn borrow_i32(borrowed_i32: &i32) {
println!("This int is: {}", borrowed_i32);
}
fn main() {
// Create a boxed i32, and a stacked i32
let boxed_i32 = Box::new(5_i32);
let stacked_i32 = 6_i32;
// Borrow the contents of the box. Ownership is not taken,
// so the contents can be borrowed again.
borrow_i32(&boxed_i32);
borrow_i32(&stacked_i32);
{
// Take a reference to the data contained inside the box
let _ref_to_i32: &i32 = &boxed_i32;
// Error!
// Can't destroy `boxed_i32` while the inner value is borrowed later in scope.
eat_box_i32(boxed_i32);
// FIXME ^ Comment out this line
// Attempt to borrow `_ref_to_i32` after inner value is destroyed
borrow_i32(_ref_to_i32);
// `_ref_to_i32` goes out of scope and is no longer borrowed.
}
// `boxed_i32` can now give up ownership to `eat_box` and be destroyed
eat_box_i32(boxed_i32);
}
Things I believe:
eat_box_i32 takes a pointer to a Box
this line let boxed_i32 = Box::new(5_i32); makes is so that boxed_i32 now contains a pointer because Box is not a primitive
Things I don't understand:
why do we need to call borrow_i32(&boxed_i32); with the ampersand? Isn't boxed_i32 already a pointer?
on this line: let _ref_to_i32: &i32 = &boxed_i32; why is the ampersand required on the right hand side? Isn't boxed_i32 already an address?
how come borrow_i32 can be called with pointer to Box and pointer to i32 ?
Comment on the term "pointers"
You can skip this part if you'd like, I just figured given the questions you asked, this might be a helpful comment:
In Rust, &i32, &mut i32, *const i32, *mut i32, Box<i32>, Rc<i32>, Arc<i32> are all arguably a "pointer to i32" type. However, Rust will not let you convert between them casually, even between those that are laid out identically in memory.
It can be useful to talk about pointers in general sometimes, but as a rule of thumb, if you're trying to figure out why one piece of Rust code compiles, and another doesn't, I'd recommend keeping track of which kind of pointer you're working with.
Things you believe:
eat_box_i32 takes a pointer to a Box
Actually not quite. eat_box_i32 accepts a Box<i32>, and not a pointer to a Box<i32>. It just so happens that Box<i32> in memory is stored as a pointer to an i32.
this line let boxed_i32 = Box::new(5_i32); makes is so that boxed_i32 now contains a pointer because Box is not a primitive
Yes, boxed_i32 is a pointer.
Things you don't understand:
why do we need to call borrow_i32(&boxed_i32); with the ampersand? Isn't boxed_i32 already a pointer?
Yes, boxed_i32 is already a pointer. However, a boxed pointer still indicates ownership. If you passed boxed_i32 instead of &boxed_i32, you would still be passing a pointer, but Rust will consider that variable "consumed", and you would no longer be able to use boxed_i32 after that function call.
on this line: let _ref_to_i32: &i32 = &boxed_i32; why is the ampersand required on the right hand side? Isn't boxed_i32 already an address?
Yes, boxed_i32 is already an address, but the fact that it's an address is kind of meant to be opaque (like a struct with a single private field). The actual type of &boxed_i32 is &Box<i32>.
Though this is weird right? If &boxed_i32 is &Box<i32>, how can you assign it to a variable of type &i32?
This is actually a shorthand -- If a type T implements the Deref<Target=R> trait, it'll automatically convert values of type &T into values of type &R as needed. And it turns out that the Box<T> type implements Deref<Target=T>.
See https://doc.rust-lang.org/std/ops/trait.Deref.html for more info about Deref.
So if you wrote it out explicitly without that automatic conversion, that line would actually look something like:
let _ref_to_i32: &i32 = Deref::deref(&boxed_i32);
how come borrow_i32 can be called with pointer to Box and pointer to i32 ?
The reason is the same as with (2) above.
borrow_i32 accepts &i32 as its parameter. Passing &i32 is obviously ok because the types match exactly. If you try to pass it &Box<i32>, Rust will automatically convert it to &i32 for you, because Box<i32> implements Deref<i32>.
EDIT: Thanks #kmdreko for pointing out that Deref allows the coercion, and not AsRef
Just to supplement #math4tots, the auto dereferencing is call Deref Coercion. It is explained in the rustbook here: https://doc.rust-lang.org/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods

Why does a &str not coerce to a &String when using Vec::contains?

A friend asked me to explain the following quirk in Rust. I was unable to, hence this question:
fn main() {
let l: Vec<String> = Vec::new();
//let ret = l.contains(&String::from(func())); // works
let ret = l.contains(func()); // does not work
println!("ret: {}", ret);
}
fn func() -> & 'static str {
"hello"
}
Example on the Rust Playground
The compiler will complain like this:
error[E0308]: mismatched types
--> src/main.rs:4:26
|
4 | let ret = l.contains(func()); // does not work
| ^^^^^^ expected struct `std::string::String`, found str
|
= note: expected type `&std::string::String`
found type `&'static str`
In other words, &str does not coerce with &String.
At first I thought it was to do with 'static, however that is a red herring.
The commented line fixes the example at the cost of an extra allocation.
My questions:
Why doesn't &str coerce with &String?
Is there a way to make the call to contains work without the extra allocation?
Your first question should be answer already by #Marko.
Your second question, should be easy to answer as well, just use a closure:
let ret = l.iter().any(|x| x == func());
Edit:
Not the "real" answer anymore, but I let this here for people who might be interested in a solution for this.
It seems that the Rust developers intend to adjust the signature of contains to allow the example posted above to work.
In some sense, this is a known bug in contains. It sounds like the fix won't allow those types to coerce, but will allow the above example to work.
std::string::String is a growable, heap-allocated data structure whereas string slice (str) is an immutable fixed-length string somewhere in memory. String slice is used as a borrowed type, via &str. Consider it as view to some string date that resides somewhere in memory. That is why it does not make sense for str to coerce to String, while the other way around perfectly makes sense. You have a heap-allocated String somewhere in memory and you want to use a view (a string slice) to that string.
To answer your second question. There is no way to make the code work in the current form. You either need to change to a vector of string slices (that way, there will be no extra allocation) or use something other then contains method.

Why does cloned() allow this function to compile

I'm starting to learn Rust and I tried to implement a function to reverse a vector of strings. I found a solution but I don't understand why it works.
This works:
fn reverse_strings(strings:Vec<&str>) -> Vec<&str> {
let actual: Vec<_> = strings.iter().cloned().rev().collect();
return actual;
}
But this doesn't.
fn reverse_strings(strings:Vec<&str>) -> Vec<&str> {
let actual: Vec<_> = strings.iter().rev().collect(); // without clone
return actual;
}
Error message
src/main.rs:28:10: 28:16 error: mismatched types:
expected `collections::vec::Vec<&str>`,
found `collections::vec::Vec<&&str>`
(expected str,
found &-ptr) [E0308]
Can someone explain to me why? What happens in the second function? Thanks!
So the call to .cloned() is essentially like doing .map(|i| i.clone()) in the same position (i.e. you can replace the former with the latter).
The thing is that when you call iter(), you're iterating/operating on references to the items being iterated. Notice that the vector already consists of 'references', specifically string slices.
So to zoom in a bit, let's replace cloned() with the equivalent map() that I mentioned above, for pedagogical purposes, since they are equivalent. This is what it actually looks like:
.map(|i: & &str| i.clone())
So notice that that's a reference to a reference (slice), because like I said, iter() operates on references to the items, not the items themselves. So since a single element in the vector being iterated is of type &str, then we're actually getting a reference to that, i.e. & &str. By calling clone() on each of these items, we go from a & &str to a &str, just like calling .clone() on a &i64 would result in an i64.
So to bring everything together, iter() iterates over references to the elements. So if you create a new vector from the collected items yielded by the iterator you construct (which you constructed by calling iter()) you would get a vector of references to references, that is:
let actual: Vec<& &str> = strings.iter().rev().collect();
So first of all realize that this is not the same as the type you're saying the function returns, Vec<&str>. More fundamentally, however, the lifetimes of these references would be local to the function, so even if you changed the return type to Vec<& &str> you would get a lifetime error.
Something else you could do, however, is to use the into_iter() method. This method actually does iterate over each element, not a reference to it. However, this means that the elements are moved from the original iterator/container. This is only possible in your situation because you're passing the vector by value, so you're allowed to move elements out of it.
fn reverse_strings(strings:Vec<&str>) -> Vec<&str> {
let actual: Vec<_> = strings.into_iter().rev().collect();
return actual;
}
playpen
This probably makes a bit more sense than cloning, since we are passed the vector by value, we're allowed to do anything with the elements, including moving them to a different location (in this case the new, reversed vector). And even if we don't, the vector will be dropped at the end of that function anyways, so we might as well. Cloning would be more appropriate if we're not allowed to do that (e.g. if we were passed the vector by reference, or a slice instead of a vector more likely).

Resources