Why does the &str primitive exist? - string

If String is actually
pub struct String {
vec: Vec<u8>,
}
Then why is there a special syntax (&str) for a slice of a Vec<u8>? In Chapter 3 of "Programming Rust" by Jim Blandy & Jason Orendorff it says,
&str is very much like &[T]: a fat pointer to some data. String is analogous to Vec<T>
Following that statement there is a chart which shows all the ways they're similar, but there isn't any mention of a single method that they're different. Is a &str; just a &[T]?
Likewise in the answer to, What are the differences between Rust's String and str? it says
This is identical to the relationship between a vector Vec<T> and a slice &[T], and is similar to the relationship between by-value T and by-reference &T for general types.
That question focuses on the difference between String and &str. Knowing that a String really is a vector of u8, I'm more interested in &str, which I can't even find the source to. Why does this primitive even exist when we have a primitive (implemented as a fat pointer) for regular vector slices?

It exists for the same reason that String exists, and we don't just pass around Vec<u8> for every string.
A String is an owned, growable container of data that is guaranteed to be UTF-8.
&str is a borrowed, fixed-length container of data that is guaranteed to be UTF-8
A Vec<u8> is an owned, growable container of u8.
&[u8] is a borrowed, fixed-length container of u8.
This is effectively the reason that types exist, period — to provide abstraction and guarantees (a.k.a. restrictions) on a looser blob of bits.
If we had access to the string as &mut [u8], then we could trivially ruin the UTF-8 guarantee, which is why all such methods are marked as unsafe. Even with an immutable &[u8], we wouldn't be able to make assumptions (a.k.a. optimizations) about the data and would have to write much more defensive code everywhere.
but there isn't any mention of a single method that they're different
Looking at the documentation for str and slice quickly shows a number of methods that exist on one that don't exist on the other, so I don't understand your statement. split_last is the first one that caught my eye, for example.

&str is not necessarily a view to a String, it can be a view to anything that is a valid UTF-8 string.
For example, the crate arraystring allows creating a string on the stack that can be viewed as a &str.

Related

Why would you convert from a string in Rust to a String?

I'm new to rust and comes from a python domain, I've just started learning rust, so I've 2 questions to ask. 1) Why are we defining "String" again (name: String) when already I have mentioned the datatype to be as "String" in my "struct Person". 2) What exactly is the use of from. Please could someone explain me in simple english.
fn main() {
struct Person {
name: String,
}
// instantiate Person struct
let person = Person {
name: String::from("Steve Austin"), //Why are we defining "string" again and "from"
};
// access value of name field in Person struct
println!("Person name = {}", person.name);
}
Why are we defining "String" again (name: String) when already I have mentioned the datatype to be as "String" in my "struct Person". 2) What exactly is the use of from.
You're not defining anything, String::from is a static function (think classmethod), which converts to a string.
It's a bit like calling str() in Python.
The oddity here is that Rust has multiple string-adjacent types, and String is only one of them: an owned, heap-allocated, mutable, string buffer. Python doesn't really have an equivalent, StringIO is probably the closest relative (strarray would be if it existed, but it does not).
Meanwhile string literals are not owned heap-allocated strings, instead they are references to data stored in the rodata (or text) segment of the binary.
Python has nothing which really compares, because its strings are more or less always heap-allocated and created at runtime, they're just immutable. The closest general equivalent to &str would be a string version of memoryview, but it still lacks the lexical constraints, and the idea of a 'static lifetime.
For more, see
https://doc.rust-lang.org/rust-by-example/std/str.html
https://stackoverflow.com/a/24159933/8182118
https://doc.rust-lang.org/std/primitive.str.html
https://doc.rust-lang.org/std/string/struct.String.html
In Rust there's two kinds of strings: str and String. The former is a really lean construct and is passed around as a reference, like &str. These cannot be modified.1 They also can't be copied, they're references, so they will always refer to the same value.
The reason they exist is because they are the "minimum viable string", they are the cheapest possible representation of textual data. This efficiency does have trade-offs.
A String can be modified if it's mut, and can also be copied, and later altered again. This makes them more suitable for properties that can and will change, or need to be computed at runtime.
Learning the difference here can be a bit bewildering if you're used to Python where strings are strings, but once you get a handle on it, you'll realize what's going on here.
"x" is a &str value, while String::from("x") is a String converted from that value. You can also do "x".into() if the type is well understood by the compiler, such as for a function argument or struct property.
Strings are also such a common thing that there's to_string() and to_owned(), both of which effectively do the same thing here on &str values.
If you want the best of both of these features, you can use Cow<str> which can encapsulate either an &str value, or a String, and you can convert from the "borrowed" value (&str) to an "owned" value via the to_owned() function.
These are more exotic, though, so I'd recommend only using them when you know what you're doing and need the performance gains they can offer.
--
1 Treat these like const char* in C++, versus std::string. The former is compiled into non-modifiable data in the executable, while the other uses a buffer that can be dynamically allocated.
In your example String::from() is not needed at all, it will work just fine without:
struct Person<'a> {
name: &'a str
}
fn main() {
let person = Person {
name: "Steve Austin"
};
println!("Person name = {}", person.name);
}
But it really depends on if your program would ever need to change Person.name and if you think that the struct should hold the data itself ("own" the data).
A str is a fixed string, it is stored in your executable and can be referenced in your program with a pointer & as a &str pointer. But it can not be modified.
In your case, "Steve Austin" can be used in your program as a &str.
In your example the name property is of type String. The contents of a String can be modified by your program because it lives in its own place in memory. This is where ownership comes in.
Rust will make sure that the String which is held by the struct lives long enough, by implicitly deriving both the lifetime of the struct and that of the String. Notice that to Rust, those are two different things. In your example, the struct really holds the data itself and so there is no question about what the lengths of the lifetimes should be.
In my example, Rust needs to be told the separate lifetimes for both the struct and the &str. Rust needs the &str to live (be present in memory) at least as long as the struct, because of the memory safety Rust offers via its borrow checker. This way the struct can not have a &str pointing to invalid memory (something that is not under control).
At first you might think that String in Rust is a pointer. But that is not the case. To Rust, String represents the data itself. When you want to reference that data, with a promise not to make modifications to it, you use a & (pointer). If you do want to make changes, you need to tell Rust your intention, and use a &mut (pointer to mutable data).
The difference of str and String in Rust will give you precise control over you program, and performance.

Rust difference between creating variables [duplicate]

This question already has answers here:
What are the differences between Rust's `String` and `str`?
(14 answers)
Closed 5 months ago.
I started learning rust and I'm unsure what is the difference between creating variables like this
let mut word = String::new();
and
let mut word = "";
and also
what is a variable which has & infront of it, thanks for your help I hope it's not a very stupid question.
There are two different string-like types in Rust.
String is the type of owned strings. If you have a String, you have total ownership over it. You can modify it, append to it, empty it, and, yes, even drop it. You can pass ownership to someone else, you can let someone else borrow it, and so on. If you take a reference to a String, you get a &String (an immutable reference to a String). This type is fairly useless, and most style checkers will warn you about that. On the other hand, the mutable reference of String, denoted &mut String, can be useful since the borrower can append to the string without taking ownership of it.
str is a string slice. It's an unsized type, so it can't be used as a function argument or a structure component without being wrapped in a reference or a Box. Most commonly, you see it in the form &str, which is an immutably-borrowed string slice. Since this is a reference, it's sized and thus can be used in most conventional contexts.
String::new returns, as the name implies, a new owned String. A string literal such as "" has type &'static str, i.e. a borrowed string slice with static lifetime.
You can borrow a String as a &str with as_str, or with the Deref instance for String. The latter will often kick in automatically if you try to call a str method on a String. On the other hand, there's no way to treat a &str as a String, since the former is borrowed and the latter implies ownership, a stronger guarantee. If you want to go the other way, you need to copy the data from the &str into a new String. This can be done in a couple of ways, simply by using different traits.
String::from(my_str)
my_str.to_owned()
Personally, I use String::from on string literals (i.e. String::from("some constant")) and to_owned on variables, but that's by no means a universal rule.
In summary, String is the type of owned strings, and &str is the type of borrowed string slices. String literals in Rust have type &'static str.

What is the relationship between slices and references in Rust?

I am completely new to Rust (as in I just started looking at it yesterday), and am working my way through "The Rust Programming Language". I'm a little stuck on Chapters 4.2 (References and Borrowing) / 4.3 (The Slice Type) and am trying to solidify my initial understanding of references before I move on. I'm an experienced programmer whose background is mainly in C++ (I am intimately familiar with several languages, but C++ is what I'm most comfortable with).
Consider the following Rust code:
let string_obj: String = String::from("My String");
let string_ref: &String = &string_obj;
let string_slice: &str = &string_obj[1..=5];
Based on my understanding, from the first line, string_obj is an object of type String that is stored on the stack, which contains three fields: (1) a pointer to the text "My String", allocated on the heap, encoded in UTF-8; (2) A length field with value 9; (3) A capacity field with a value >= 9. That's straightforward enough.
From the second line, string_ref is an immutable reference to a String object, also stored on the stack, which contains a single field - a pointer to string_obj. This leads me to believe that (leaving aside ownership rules, semantics, and other things I am yet to learn about references), a reference is essentially a pointer to some other object. Again, pretty straightforward.
It's the third line which causing me some headaches. From the documentation, it would appear that string_slice is an object of type &str that is stored on the stack, and contains two fields: 1) a pointer to the text "y Str", within the text "My String" associated with string_obj. 2) A length field with value 5.
But, by appearances at least, the &str type is by definition an immutable reference to an object of type str. So my questions are as follows:
What exactly is an str, and how is it represented in memory?
How does &str - a reference type, which I thought was simply a pointer - contain TWO fields (a pointer AND a length)?
How does Rust know in general what / how many fields to create when constructing a reference? (and consequently how does the programmer know?)
Slices are primitive types in Rust, which means that they don't necessarily have to follow the syntax rules of other types. In this case, str and &str are special and are treated with a bit of magic.
The type str doesn't really exist, since you can't have a slice that owns its contents. The reason for requiring us to spell this type "&str" is syntactic: the & reminds us that we're working with data borrowed from somewhere else, and it's required to be able to specify lifetimes, such as:
fn example<'a>(x: &String, y: &'a String) -> &'a str {
&y[..]
}
It's also necessary so that we can differentiate between an immutably-borrowed string slice (&str) and a mutably-borrowed string slice (&mut str). (Though the latter are somewhat limited in their usefulness and so you don't see them that often.)
Note that the same thing applies to array slices. We have arrays like [u8; 16] and we have slices like &[u8] but we don't really directly interact with [u8]. Here the mutable variant (&mut [u8]) is more useful than with string slices.
What exactly is an str, and how is it represented in memory?
As per above, str kind-of doesn't really exist by itself. The layout of &str though is as you suspect -- a pointer and a length.
(str is the actual characters referred to by the slice, and is a so-called dynamically-sized type. In the general case, a &T can't exist without a T to refer to. In this case it's a bit backwards in that the str doesn't exist without the &str slice.)
How does &str - a reference type, which I thought was simply a pointer - contain TWO fields (a pointer AND a length)?
As a primitive, it's a special case handled by the compiler.
How does Rust know in general what / how many fields to create when constructing a reference? (and consequently how does the programmer know?)
If it's a non-slice reference, then it's either a pointer or it's nothing (if the reference itself can be optimized away).

Do I have to create distinct structs for both owned (easy-to-use) and borrowed (more efficient) data structures?

I have a Message<'a> which has &'a str references on a mostly short-lived buffer.
Those references mandate a specific program flow as they are guaranteed to never outlive the lifetime 'a of the buffer.
Now I also want to have an owned version of Message, such that it can be moved around, sent via threads, etc.
Is there an idiomatic way to achieve this? I thought that Cow<'a, str> might help, but unfortunately, Cow does not magically allocate in case &'a str would outlive the buffer's lifetime.
AFAIK, Cow is not special in the sense that no matter if Cow holds an Owned variant, it must still pass the borrow checker on 'a.
Definition of std::borrow::Cow.
pub enum Cow<'a, B> {
Borrowed(&'a B),
Owned(<B as ToOwned>::Owned),
}
Is there an idiomatic way to have an owned variant of Message? For some reason we have &str and String, &[u8] and Vec<u8>, ... does that mean people generally would go for &msg and Message?
I suppose I still have to think about if an owned variant is really, really needed, but my experience shows that having an escape hatch for owned variants generally improves prototyping speed.
Yes, you need to have multiple types, one representing the owned concept and one representing the borrowed concept.
You'll see the same technique throughout the standard library and third-party crates.
See also:
How to abstract over a reference to a value or a value itself?
How to avoid writing duplicate accessor functions for mutable and immutable references in Rust?

Is &[T] literally an alias of Slice in rust?

&[T] is confusing me.
I naively assumed that like &T, &[T] was a pointer, which is to say, a numeric pointer address.
However, I've seen some code like this, that I was rather surprised to see work fine (simplified for demonstration purposes; but you see code like this in many 'as_slice()' implementations):
extern crate core;
extern crate collections;
use self::collections::str::raw::from_utf8;
use self::core::raw::Slice;
use std::mem::transmute;
fn main() {
let val = "Hello World";
{
let value:&str;
{
let bytes = val.as_bytes();
let mut slice = Slice { data: &bytes[0] as *const u8, len: bytes.len() };
unsafe {
let array:&[u8] = transmute(slice);
value = from_utf8(array);
}
// slice.len = 0;
}
println!("{}", value);
}
}
So.
I initially thought that this was invalid code.
That is, the instance of Slice created inside the block scope is returned to outside the block scope (by transmute), and although the code runs, the println! is actually accessing data that is no longer valid through unsafe pointers. Bad!
...but that doesn't seem to be the case.
Consider commenting the line // slice.len = 0;
This code still runs fine (prints 'Hello World') when this happens.
So the line...
value = from_utf8(array);
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)
A slice &[T], as you have noticed, is "equivalent" to a structure std::raw::Slice. In fact, Slice is an internal representation of &[T] value, and yes, it is a pointer and a length of data behind that pointer. Sometimes such structure is called "fat pointer", that is, a pointer and an additional piece of information.
When you pass &[T] value around, you indeed are just copying its contents - the pointer and the length.
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
So, yes, exactly.
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
And this is also true. That's the whole idea of borrowed references, including slices - borrowed references are statically checked to be used as long as their referent is alive. When DST finally lands, slices and regular references will be even more unified.
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)
And this is actually an absolutely valid concern; it is one of the problems with aliasing. However, Rust is designed exactly to prevent such bugs. There are two things which render aliasing of slices valid.
First, slices can't change length; there are no methods defined on &[T] which would allow you changing its length in place. You can create a derived slice from a slice, but it will be a new object whatsoever.
But even if slices can't change length, if the data could be mutated through them, they still could bring disaster if aliased. For example, if values in slices are enum instances, mutating a value in such an aliased slice could make a pointer to internals of enum value contained in this slice invalid. So, second, Rust aliasable slices (&[T]) are immutable. You can't change values contained in them and you can't take mutable references into them.
These two features (and compiler checks for lifetimes) make aliasing of slices absolutely safe. However, sometimes you do need to modify the data in a slice. And then you need mutable slice, called &mut [T]. You can change your data through such slice; but these slices are not aliasable. You can't create two mutable slices into the same structure (an array, for example), so you can't do anything dangerous.
Note, however, that using transmute() to transform a slice into a Slice or vice versa is an unsafe operation. &[T] is guaranteed statically to be correct if you create it using right methods, like calling as_slice() on a Vec. However, creating it manually using Slice struct and then transmuting it into &[T] is error-prone and can easily segfault your program, for example, when you assign it more length than is actually allocated.

Resources