Unknown size at compile time when trying to print string contents in Rust - rust

I have a couple of pieces of code, once errors out and the other doesn't, and I don't understand why.
The one that errors out when compiling:
fn main() {
let s1 = String::from("hello");
println!("{}", *s1);
}
This throws: doesn't have a size known at compile-time, on the line println!("{}", *s1);
The one that works:
fn main() {
let s1 = String::from("hello");
print_string(&s1);
}
fn print_string(s1: &String) {
println!("{}", *s1);
}
Why is this happening? Aren't both correct ways to access the string contents and printing them?

In the first snippet you’re dereferencing a String. This yields an str which is a dynamically sized type (sometimes called unsized types in older texts). DSTs are somewhat difficult to use directly
In the second snippet you’re dereferencing a &String, which yields a regular String, which is a normal sized type.
In both cases the dereference is completely useless, why are you even using one?

Related

Why can't I use println with a str?

Prologue: I'm at my first day on Rust here.
This is my demo code:
fn main() {
println!("Hello, world!");
println!(Move::X.to_string());
}
enum Move {
Empty,
X,
O,
}
impl Move {
fn to_string(&self) -> &'static str {
match self {
Move::Empty => "Empty",
Move::X => "X",
Move::O => "O"
}
}
}
This is not compiling because of these errors
I kindly ask you a fix, but mainly I need an explanation.
I tried
println!(String::from(Move::X.to_string()));
but the error is identical.
Because println! is a macro in where the first term expects a string literal. That string literal is evaluated in compile time (so it can never be a reference to actual data).
You can use the newly added formatting string:
let x = Move::X.to_string();
println!("{x}");
or the usual formatting as the error message suggest you to do:
println!("{}", Move::X.to_string())
Playground
First of all, Move::X.to_string(): String, not Move::X.to_string(): &str or Move::X.to_string(): str. See this for an explanation. So even if println! did accept a &str, it's not by building a String that you would solve that issue (even though when calling a function that requires a &str, Rust can Deref String to a &str — but println! is not a function).
Second, the println! macro always and only wants a string literal as its first "argument". That's because it must be able to know at compile time what is the formatting required.

How to generate a random String of alphanumeric chars?

The first part of the question is probably pretty common and there are enough code samples that explain how to generate a random string of alphanumerics. The piece of code I use is from here.
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect();
println!("{}", rand_string);
}
This piece of code does however not compile, (note: I'm on nightly):
error[E0277]: a value of type `String` cannot be built from an iterator over elements of type `u8`
--> src/main.rs:8:10
|
8 | .collect();
| ^^^^^^^ value of type `String` cannot be built from `std::iter::Iterator<Item=u8>`
|
= help: the trait `FromIterator<u8>` is not implemented for `String`
Ok, the elements that are generated are of type u8. So I guess this is an array or vector of u8:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let r = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>();
let s = String::from_utf8_lossy(&r);
println!("{}", s);
}
And this compiles and works!
2dCsTqoNUR1f0EzRV60IiuHlaM4TfK
All good, except that I would like to ask if someone could explain what exactly happens regarding the types and how this can be optimised.
Questions
.sample_iter(&Alphanumeric) produces u8 and not chars?
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.
.sample_iter(&Alphanumeric) produces u8 and not chars?
Yes, this was changed in rand v0.8. You can see in the docs for 0.7.3:
impl Distribution<char> for Alphanumeric
But then in the docs for 0.8.0:
impl Distribution<u8> for Alphanumeric
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
There are a couple of ways to do this, the most obvious being to just cast every u8 to a char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(|x| x as char)
.collect();
Or, using the From<u8> instance of char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(char::from)
.collect();
Of course here, since you know every u8 must be valid UTF-8, you can use String::from_utf8_unchecked, which is faster than from_utf8_lossy (although probably around the same speed as the as char method):
let s = unsafe {
String::from_utf8_unchecked(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
)
};
If, for some reason, the unsafe bothers you and you want to stay safe, then you can use the slower String::from_utf8 and unwrap the Result so you get a panic instead of UB (even though the code should never panic or UB):
let s = String::from_utf8(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
).unwrap();
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.
First of all, trust me, you don't want arrays of chars. They are not fun to work with. If you want a stack string, have a u8 array then use a function like std::str::from_utf8 or the faster std::str::from_utf8_unchecked (again only usable since you know valid utf8 will be generated.)
As to optimizing the heap allocation away, refer to this answer. Basically, it's not possible with a bit of hackiness/ugliness (such as making your own function that collects an iterator into an array of 30 elements).
Once const generics are finally stabilized, there'll be a much prettier solution.
The first example in the docs for rand::distributions::Alphanumeric shows that if you want to convert the u8s into chars you should map them using the char::from function:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.map(char::from) // map added here
.take(30)
.collect();
println!("{}", rand_string);
}
playground

Confused about ownership in situations involving lines and map

fn problem() -> Vec<&'static str> {
let my_string = String::from("First Line\nSecond Line");
my_string.lines().collect()
}
This fails with the compilation error:
|
7 | my_string.lines().collect()
| ---------^^^^^^^^^^^^^^^^^^
| |
| returns a value referencing data owned by the current function
| `my_string` is borrowed here
I understand what this error means - it's to stop you returning a reference to a value which has gone out of scope. Having looked at the type signatures of the functions involved, it appears that the problem is with the lines method, which borrows the string it's called on. But why does this matter? I'm iterating over the lines of the string in order to get a vector of the parts, and what I'm returning is this "new" vector, not anything that would (illegally) directly reference my_string.
(I'm aware I could fix this particular example very easily by just using the string literal rather than converting to an "owned" string with String::from. This is a toy example to reproduce the problem - in my "real" code the string variable is read from a file, so I obviously can't use a literal.)
What's even more mysterious to me is that the following variation on the function, which to me ought to suffer from the same problem, works fine:
fn this_is_ok() -> Vec<i32> {
let my_string = String::from("1\n2\n3\n4");
my_string.lines().map(|n| n.parse().unwrap()).collect()
}
The reason can't be map doing some magic, because this also fails:
fn also_fails() -> Vec<&'static str> {
let my_string = String::from("First Line\nSecond Line");
my_string.lines().map(|s| s).collect()
}
I've been playing about for quite a while, trying various different functions inside the map - and some pass and some fail, and I've honestly no idea what the difference is. And all this is making me realise that I have very little handle on how Rust's ownership/borrowing rules work in non-trivial cases, even though I thought I at least understood the basics. So if someone could give me a relatively clear and comprehensive guide to what is going on in all these examples, and how it might be possible to fix those which fail, in some straightforward way, I would be extremely grateful!
The key is in the type of the value yielded by lines: &str. In order to avoid unnecessary clones, lines actually returns references to slices of the string it's called on, and when you collect it to a Vec, that Vec's elements are simply references to slices of your string. So, of course when your function exits and the string is dropped, the references inside the Vec will be dropped and invalid. Remember, &str is a borrowed string, and String is an owned string.
The parsing works because you take those &strs then you read them into an i32, so the data is transferred to a new value and you no longer need a reference to the original string.
To fix your problem, simply use str::to_owned to convert each element into a String:
fn problem() -> Vec<String> {
let my_string = String::from("First Line\nSecond Line");
my_string.lines().map(|v| v.to_owned()).collect()
}
It should be noted that to_string also works, and that to_owned is actually part of the ToOwned trait, so it is useful for other borrowed types as well.
For references to sized values (str is unsized so this doesn't apply), such as an Iterator<Item = &i32>, you can simply use Iterator::cloned to clone every element so they are no longer references.
An alternative solution would be to take the String as an argument so it, and therefore references to it, can live past the scope of the function:
fn problem(my_string: &str) -> Vec<&str> {
my_string.lines().collect()
}
The problem here is that this line:
let my_string = String::from("First Line\nSecond Line");
copies the string data to a buffer allocated on the heap (so no longer 'static). Then lines returns references to that heap-allocated buffer.
Note that &str also implements a lines method, so you don't need to copy the string data to the heap, you can use your string directly:
fn problem() -> Vec<&'static str> {
let my_string = "First Line\nSecond Line";
my_string.lines().collect()
}
Playground
which avoids all unnecessary allocations and copying.

How to convert a String into a &'static str

How do I convert a String into a &str? More specifically, I would like to convert it into a str with the static lifetime (&'static str).
Updated for Rust 1.0
You cannot obtain &'static str from a String because Strings may not live for the entire life of your program, and that's what &'static lifetime means. You can only get a slice parameterized by String own lifetime from it.
To go from a String to a slice &'a str you can use slicing syntax:
let s: String = "abcdefg".to_owned();
let s_slice: &str = &s[..]; // take a full slice of the string
Alternatively, you can use the fact that String implements Deref<Target=str> and perform an explicit reborrowing:
let s_slice: &str = &*s; // s : String
// *s : str (via Deref<Target=str>)
// &*s: &str
There is even another way which allows for even more concise syntax but it can only be used if the compiler is able to determine the desired target type (e.g. in function arguments or explicitly typed variable bindings). It is called deref coercion and it allows using just & operator, and the compiler will automatically insert an appropriate amount of *s based on the context:
let s_slice: &str = &s; // okay
fn take_name(name: &str) { ... }
take_name(&s); // okay as well
let not_correct = &s; // this will give &String, not &str,
// because the compiler does not know
// that you want a &str
Note that this pattern is not unique for String/&str - you can use it with every pair of types which are connected through Deref, for example, with CString/CStr and OsString/OsStr from std::ffi module or PathBuf/Path from std::path module.
You can do it, but it involves leaking the memory of the String. This is not something you should do lightly. By leaking the memory of the String, we guarantee that the memory will never be freed (thus the leak). Therefore, any references to the inner object can be interpreted as having the 'static lifetime.
fn string_to_static_str(s: String) -> &'static str {
Box::leak(s.into_boxed_str())
}
fn main() {
let mut s = String::new();
std::io::stdin().read_line(&mut s).unwrap();
let s: &'static str = string_to_static_str(s);
}
As of Rust version 1.26, it is possible to convert a String to &'static str without using unsafe code:
fn string_to_static_str(s: String) -> &'static str {
Box::leak(s.into_boxed_str())
}
This converts the String instance into a boxed str and immediately leaks it. This frees all excess capacity the string may currently occupy.
Note that there are almost always solutions that are preferable over leaking objects, e.g. using the crossbeam crate if you want to share state between threads.
TL;DR: you can get a &'static str from a String which itself has a 'static lifetime.
Although the other answers are correct and most useful, there's a (not so useful) edge case, where you can indeed convert a String to a &'static str:
The lifetime of a reference must always be shorter or equal to the lifetime of the referenced object. I.e. the referenced object has to live longer (or equal long) than the reference. Since 'static means the entire lifetime of a program, a longer lifetime does not exist. But an equal lifetime will be sufficient. So if a String has a lifetime of 'static, you can get a &'static str reference from it.
Creating a static of type String has theoretically become possible with Rust 1.31 when the const fn feature was released. Unfortunately, the only const function returning a String is String::new() currently, and it's still behind a feature gate (so Rust nightly is required for now).
So the following code does the desired conversion (using nightly) ...and actually has no practical use except for completeness of showing that it is possible in this edge case.
#![feature(const_string_new)]
static MY_STRING: String = String::new();
fn do_something(_: &'static str) {
// ...
}
fn main() {
do_something(&MY_STRING);
}

Finding word in sentence

In the following example:
fn main() {
let str_vec: ~[&str] = "lorem lpsum".split(' ').collect();
if (str_vec.contains("lorem")) {
println!("found it!");
}
}
It will not compile, and says:
error: mismatched types: expected &&'static str
but found 'static str (expected &-ptr but found &'static str)
What's the proper way to find the word in sentence?
The contains() method on vectors (specifically, on all vectors satisfying the std::vec::ImmutableEqVector trait, which is for all vectors containing types that can be compared for equality), has the following signature,
fn contains(&self, x: &T) -> bool
where T is the type of the element in the array. In your code, str_vec holds elements of type &str, so you need to pass in a &&str -- that is, a borrowed pointer to a &str.
Since the type of "lorem" is &'static str, you might attempt first to just write
str_vec.contains(&"lorem")`
In the current version of Rust, that doesn't work. Rust is in the middle of a language change referred to as dynamically-sized types (DST). One of the side effects is that the meaning of the expressions &"string" and &[element1, element2], where & appears before a string or array literal, will be changing (T is the type of the array elements element1 and element2):
Old behavior (still current as of Rust 0.9): The expressions &"string" and &[element1, element2] are coerced to slices &str and &[T], respectively. Slices refer to unknown-length ranges of the underlying string or array.
New behavior: The expressions &"string" and &[element1, element2] are interpreted as & &'static str and &[T, ..2], making their interpretation consistent with the rest of Rust.
Under either of these regimes, the most idiomatic way to obtain a slice of a statically-sized string or array is to use the .as_slice() method. Once you have a slice, just borrow a pointer to that to get the &&str type that .contains() requires. The final code is below (the if condition doesn't need to be surrounded by parentheses in Rust, and rustc will warn if you do have unnecessary parentheses):
fn main() {
let str_vec: ~[&str] = "lorem lpsum".split(' ').collect();
if str_vec.contains(&"lorem".as_slice()) {
println!("found it!");
}
}
Compile and run to get:
found it!
Edit: Recently, a change has landed to start warning on ~[T], which is being deprecated in favor of the Vec<T> type, which is also an owned vector but doesn't have special syntax. (For now, you need to import the type from the std::vec_ng library, but I believe the module std::vec_ng will go away eventually by replacing the current std::vec.) Once this change is made, it seems that you can't borrow a reference to "lorem".as_slice() because rustc considers the lifetime too short -- I think this is a bug too. On the current master, my code above should be:
use std::vec_ng::Vec; // Import will not be needed in the future
fn main() {
let str_vec: Vec<&str> = "lorem lpsum".split(' ').collect();
let slice = &"lorem".as_slice();
if str_vec.contains(slice) {
println!("found it!");
}
}
let sentence = "Lorem ipsum dolor sit amet";
if sentence.words().any(|x| x == "ipsum") {
println!("Found it!");
}
You could also do something with .position() or .count() instead of .any(). See Iterator trait.

Resources