There's an answer here for how to capitalize the ASCII characters in a string. This is not quite adequate for my specific problem.
For anyone who wanders in off Google, things have improved since Rust 0.13.0. (I'm on Rust 1.13.0)
&str (string slice) provides to_uppercase() and you can coerce anything to it.
// First, with an &str, which any string can coerce to.
let str = "hello øåÅßç";
let up = str.to_uppercase();
println!("{}", up);
// And again with a String
let nonstatic_string = String::from(str);
let up2 = nonstatic_string.to_uppercase();
println!("{}", up2);
// ...if this fails in an edge case, use &nonstatic_string to force coercion
Regarding Matthieu M.'s comment, this conversion should be fully Unicode-compliant.
Here's a runnable copy on the Rust Playground
In case it helps newcomers to understand what's going on:
Strings implement Deref<Target=str>, so you can call any str methods on String.
str::to_uppercase() takes &str (a borrowed, non-mutable slice. In other words, a read-only reference to a string), so it'll accept pretty much anything.
str::to_uppercase() allocates a new String and returns it, so its return value isn't constrained by the borrowing rules for the input.
The caveat mentioned in the comment at the end is that, if you call my_string.to_uppercase() as str::to_uppercase(my_string), then it'll complain like this:
expected &str, found struct `std::string::String`
What's happening is that my_string.to_uppercase() isn't actually equivalent to that... it's equivalent to str::to_uppercase(&my_string).
(You can't auto-deref if you're not starting with a reference. You don't need to provide the & when making a method call because the &self in the method definition does it for you.)
I don't think there's any function to do it directly, but you can use the functions in the UnicodeChar trait with chars and map like:
let str = "hello øåÅßç";
let up = str.chars().map(|c| c.to_uppercase()).collect::<String>();
println!("{}", up);
Output:
HELLO ØÅÅßÇ
Tested on rustc 0.13.0-dev (66601647c 2014-11-27 06:41:17 +0000)
Related
fn problem() -> Vec<&'static str> {
let my_string = String::from("First Line\nSecond Line");
my_string.lines().collect()
}
This fails with the compilation error:
|
7 | my_string.lines().collect()
| ---------^^^^^^^^^^^^^^^^^^
| |
| returns a value referencing data owned by the current function
| `my_string` is borrowed here
I understand what this error means - it's to stop you returning a reference to a value which has gone out of scope. Having looked at the type signatures of the functions involved, it appears that the problem is with the lines method, which borrows the string it's called on. But why does this matter? I'm iterating over the lines of the string in order to get a vector of the parts, and what I'm returning is this "new" vector, not anything that would (illegally) directly reference my_string.
(I'm aware I could fix this particular example very easily by just using the string literal rather than converting to an "owned" string with String::from. This is a toy example to reproduce the problem - in my "real" code the string variable is read from a file, so I obviously can't use a literal.)
What's even more mysterious to me is that the following variation on the function, which to me ought to suffer from the same problem, works fine:
fn this_is_ok() -> Vec<i32> {
let my_string = String::from("1\n2\n3\n4");
my_string.lines().map(|n| n.parse().unwrap()).collect()
}
The reason can't be map doing some magic, because this also fails:
fn also_fails() -> Vec<&'static str> {
let my_string = String::from("First Line\nSecond Line");
my_string.lines().map(|s| s).collect()
}
I've been playing about for quite a while, trying various different functions inside the map - and some pass and some fail, and I've honestly no idea what the difference is. And all this is making me realise that I have very little handle on how Rust's ownership/borrowing rules work in non-trivial cases, even though I thought I at least understood the basics. So if someone could give me a relatively clear and comprehensive guide to what is going on in all these examples, and how it might be possible to fix those which fail, in some straightforward way, I would be extremely grateful!
The key is in the type of the value yielded by lines: &str. In order to avoid unnecessary clones, lines actually returns references to slices of the string it's called on, and when you collect it to a Vec, that Vec's elements are simply references to slices of your string. So, of course when your function exits and the string is dropped, the references inside the Vec will be dropped and invalid. Remember, &str is a borrowed string, and String is an owned string.
The parsing works because you take those &strs then you read them into an i32, so the data is transferred to a new value and you no longer need a reference to the original string.
To fix your problem, simply use str::to_owned to convert each element into a String:
fn problem() -> Vec<String> {
let my_string = String::from("First Line\nSecond Line");
my_string.lines().map(|v| v.to_owned()).collect()
}
It should be noted that to_string also works, and that to_owned is actually part of the ToOwned trait, so it is useful for other borrowed types as well.
For references to sized values (str is unsized so this doesn't apply), such as an Iterator<Item = &i32>, you can simply use Iterator::cloned to clone every element so they are no longer references.
An alternative solution would be to take the String as an argument so it, and therefore references to it, can live past the scope of the function:
fn problem(my_string: &str) -> Vec<&str> {
my_string.lines().collect()
}
The problem here is that this line:
let my_string = String::from("First Line\nSecond Line");
copies the string data to a buffer allocated on the heap (so no longer 'static). Then lines returns references to that heap-allocated buffer.
Note that &str also implements a lines method, so you don't need to copy the string data to the heap, you can use your string directly:
fn problem() -> Vec<&'static str> {
let my_string = "First Line\nSecond Line";
my_string.lines().collect()
}
Playground
which avoids all unnecessary allocations and copying.
I was reading the doc from rust lang website and in chapter 4 they did the following example:
let s = String::from("hello world");
let hello = &s[0..5];
let world = &s[6..11];
hello is of type &str that I created from a variable s of type String.
Some rows below they define the following function:
fn first_word(s: &String) -> &str {
let bytes = s.as_bytes();
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return &s[0..i];
}
}
&s[..]
}
This time s is of type &String but still &s[0..i] gave me a &str slice.
How is it possible? I thought that the correct way to achieve this would be something like &((*str)[0..i]).
Am I missing something? Maybe during the [0..i] operation Rust auto deference the variable?
Thanks
Maybe during the [0..i] operation Rust auto deference the variable?
This is exactly what happens. When you call methods/index a reference, it automatically dereferences before applying the method. This behavior can also be manually implemented with the Deref trait. String implements the Deref with a target of str, which means when you call str methods on String. Read more about deref coercion here.
It's important to realize what happens with &s[1..5], and that it's &(s[1..5]), namely, s[1..5] is first first evaluated, this returns a value of type str, and a reference to that value is taken. In fact, there's even more indirection: x[y] in rust is actually syntactic sugar for *std::ops::Index::index(x,y). Note the dereference, as this function always returns a reference, which is then dereferenced by the sugar, and then it is referenced again by the & in our code — naturally, the compiler will optimize this and ensure we are not pointlessly taking references to only dereference them again.
It so happens that the String type does support the Index<Range<usize>> trait and it's Index::output type is str.
It also happens that the str type supports the same, and that it's output type is also str, viā a blanket implementation of SliceIndex.
On your question of auto-dereferencing, it is true that Rust has a Deref trait defined on String as well so that in many contexts, such as this one, &String is automatically cast to &str — any context that accepts a &str also accepts a &String, meaning that the implementation on Index<usize> on String is actually for optimization to avoid this indirection. If it not were there, the code would still work, and perhaps the compiler could even optimize the indirection away.
But that automatic casting is not why it works — it simply works because indexing is defined on many different types.
Finally:
I thought that the correct way to achieve this would be something like &((*str)[0..i]).
This would not work regardless, a &str is not the same as a &String and cannot be dereferenced to a String like a &String. In fact, a &str in many ways is closer to a String than it is to a &String. a &str is really just a fat pointer to a sequence of unicode bytes, also containing the length of said sequence in the second word; a String is, if one will, an extra-fat pointer that also contains the current capacity of the buffer with it, and owns the buffer it points to, so it can delete and resize it.
A friend asked me to explain the following quirk in Rust. I was unable to, hence this question:
fn main() {
let l: Vec<String> = Vec::new();
//let ret = l.contains(&String::from(func())); // works
let ret = l.contains(func()); // does not work
println!("ret: {}", ret);
}
fn func() -> & 'static str {
"hello"
}
Example on the Rust Playground
The compiler will complain like this:
error[E0308]: mismatched types
--> src/main.rs:4:26
|
4 | let ret = l.contains(func()); // does not work
| ^^^^^^ expected struct `std::string::String`, found str
|
= note: expected type `&std::string::String`
found type `&'static str`
In other words, &str does not coerce with &String.
At first I thought it was to do with 'static, however that is a red herring.
The commented line fixes the example at the cost of an extra allocation.
My questions:
Why doesn't &str coerce with &String?
Is there a way to make the call to contains work without the extra allocation?
Your first question should be answer already by #Marko.
Your second question, should be easy to answer as well, just use a closure:
let ret = l.iter().any(|x| x == func());
Edit:
Not the "real" answer anymore, but I let this here for people who might be interested in a solution for this.
It seems that the Rust developers intend to adjust the signature of contains to allow the example posted above to work.
In some sense, this is a known bug in contains. It sounds like the fix won't allow those types to coerce, but will allow the above example to work.
std::string::String is a growable, heap-allocated data structure whereas string slice (str) is an immutable fixed-length string somewhere in memory. String slice is used as a borrowed type, via &str. Consider it as view to some string date that resides somewhere in memory. That is why it does not make sense for str to coerce to String, while the other way around perfectly makes sense. You have a heap-allocated String somewhere in memory and you want to use a view (a string slice) to that string.
To answer your second question. There is no way to make the code work in the current form. You either need to change to a vector of string slices (that way, there will be no extra allocation) or use something other then contains method.
I was reading through the book section about Strings and found they were using &* combined together to convert a piece of text. The following is what it says:
use std::net::TcpStream;
TcpStream::connect("192.168.0.1:3000"); // Parameter is of type &str.
let addr_string = "192.168.0.1:3000".to_string();
TcpStream::connect(&*addr_string); // Convert `addr_string` to &str.
In other words, they are saying they are converting a String to a &str. But why is that conversion done using both of the aforementioned signs? Should this not be done using some other method? Does not the & mean we are taking its reference, then using the * to dereference it?
In short: the * triggers an explicit deref, which can be overloaded via ops::Deref.
More Detail
Look at this code:
let s = "hi".to_string(); // : String
let a = &s;
What's the type of a? It's simply &String! This shouldn't be very surprising, since we take the reference of a String. Ok, but what about this?
let s = "hi".to_string(); // : String
let b = &*s; // equivalent to `&(*s)`
What's the type of b? It's &str! Wow, what happened?
Note that *s is executed first. As most operators, the dereference operator * is also overloadable and the usage of the operator can be considered syntax sugar for *std::ops::Deref::deref(&s) (note that we recursively dereferencing here!). String does overload this operator:
impl Deref for String {
type Target = str;
fn deref(&self) -> &str { ... }
}
So, *s is actually *std::ops::Deref::deref(&s), in which the deref() function has the return type &str which is then dereferenced again. Thus, *s has the type str (note the lack of &).
Since str is unsized and not very handy on its own, we'd like to have a reference to it instead, namely &str. We can do this by adding a & in front of the expression! Tada, now we reached the type &str!
&*s is rather the manual and explicit form. Often, the Deref-overload is used via automatic deref coercion. When the target type is fixed, the compiler will deref for you:
fn takes_string_slice(_: &str) {}
let s = "hi".to_string(); // : String
takes_string_slice(&s); // this works!
In general, &* means to first dereference (*) and then reference (&) a value. In many cases, this would be silly, as we'd end up at the same thing.
However, Rust has deref coercions. Combined with the Deref and DerefMut traits, a type can dereference to a different type!
This is useful for Strings as that means that they can get all the methods from str, it's useful for Vec<T> as it gains the methods from [T], and it's super useful for all the smart pointers, like Box<T>, which will have all the methods of the contained T!
Following the chain for String:
String --deref--> str --ref--> &str
Does not the & mean we are taking its reference, then using the * to dereference it?
No, your order of operations is backwards. * and & associate to the right. In this example, dereferencing is first, then referencing.
I think now you can do this &addr_string
(from a comment)
Sometimes, this will do the same thing. See What are Rust's exact auto-dereferencing rules? for the full details, but yes, a &String can be passed to a function that requires a &str. There are still times where you need to do this little dance by hand. The most common I can think of is:
let name: Option<String> = Some("Hello".to_string());
let name2: Option<&str> = name.as_ref().map(|s| &**s);
You'll note that we actually dereference twice:
&String -->deref--> String --deref--> str --ref--> &str
Although this case can now be done with name.as_ref().map(String::as_str);
This is for the current 0.6 Rust trunk by the way, not sure the exact commit.
Let's say I want to for each over some strings, and my closure takes a borrowed string pointer argument (&str). I want my closure to add its argument to an owned vector of owned strings ~[~str] to be returned. My understanding of Rust is weak, but I think that strings are a special case where you can't dereference them with * right? How do I get my strings from &str into the vector's push method which takes a ~str?
Here's some code that doesn't compile
fn read_all_lines() -> ~[~str] {
let mut result = ~[];
let reader = io::stdin();
let util = #reader as #io::ReaderUtil;
for util.each_line |line| {
result.push(line);
}
result
}
It doesn't compile because it's inferring result's type to be [&str] since that's what I'm pushing onto it. Not to mention its lifetime will be wrong since I'm adding a shorter-lived variable to it.
I realize I could use ReaderUtil's read_line() method which returns a ~str. But this is just an example.
So, how do I get an owned string from a borrowed string? Or am I totally misunderstanding.
You should call the StrSlice trait's method, to_owned, as in:
fn read_all_lines() -> ~[~str] {
let mut result = ~[];
let reader = io::stdin();
let util = #reader as #io::ReaderUtil;
for util.each_line |line| {
result.push(line.to_owned());
}
result
}
StrSlice trait docs are here:
http://static.rust-lang.org/doc/core/str.html#trait-strslice
You can't.
For one, it doesn't work semantically: a ~str promises that only one thing owns it at a time. But a &str is borrowed, so what happens to the place you borrowed from? It has no way of knowing that you're trying to steal away its only reference, and it would be pretty rude to trash the caller's data out from under it besides.
For another, it doesn't work logically: ~-pointers and #-pointers are allocated in completely different heaps, and a & doesn't know which heap, so it can't be converted to ~ and still guarantee that the underlying data lives in the right place.
So you can either use read_line or make a copy, which I'm... not quite sure how to do :)
I do wonder why the API is like this, when & is the most restricted of the pointers. ~ should work just as well here; it's not like the iterated strings already exist somewhere else and need to be borrowed.
At first I thought it was possible to use copy line to create owning pointer from the borrowed pointer to the string but this apparently copies burrowed pointer.
So I found str::from_slice(s: &str) -> ~str. This is probably what you need.