Is &[T] literally an alias of Slice in rust? - rust

&[T] is confusing me.
I naively assumed that like &T, &[T] was a pointer, which is to say, a numeric pointer address.
However, I've seen some code like this, that I was rather surprised to see work fine (simplified for demonstration purposes; but you see code like this in many 'as_slice()' implementations):
extern crate core;
extern crate collections;
use self::collections::str::raw::from_utf8;
use self::core::raw::Slice;
use std::mem::transmute;
fn main() {
let val = "Hello World";
{
let value:&str;
{
let bytes = val.as_bytes();
let mut slice = Slice { data: &bytes[0] as *const u8, len: bytes.len() };
unsafe {
let array:&[u8] = transmute(slice);
value = from_utf8(array);
}
// slice.len = 0;
}
println!("{}", value);
}
}
So.
I initially thought that this was invalid code.
That is, the instance of Slice created inside the block scope is returned to outside the block scope (by transmute), and although the code runs, the println! is actually accessing data that is no longer valid through unsafe pointers. Bad!
...but that doesn't seem to be the case.
Consider commenting the line // slice.len = 0;
This code still runs fine (prints 'Hello World') when this happens.
So the line...
value = from_utf8(array);
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)

A slice &[T], as you have noticed, is "equivalent" to a structure std::raw::Slice. In fact, Slice is an internal representation of &[T] value, and yes, it is a pointer and a length of data behind that pointer. Sometimes such structure is called "fat pointer", that is, a pointer and an additional piece of information.
When you pass &[T] value around, you indeed are just copying its contents - the pointer and the length.
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
So, yes, exactly.
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
And this is also true. That's the whole idea of borrowed references, including slices - borrowed references are statically checked to be used as long as their referent is alive. When DST finally lands, slices and regular references will be even more unified.
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)
And this is actually an absolutely valid concern; it is one of the problems with aliasing. However, Rust is designed exactly to prevent such bugs. There are two things which render aliasing of slices valid.
First, slices can't change length; there are no methods defined on &[T] which would allow you changing its length in place. You can create a derived slice from a slice, but it will be a new object whatsoever.
But even if slices can't change length, if the data could be mutated through them, they still could bring disaster if aliased. For example, if values in slices are enum instances, mutating a value in such an aliased slice could make a pointer to internals of enum value contained in this slice invalid. So, second, Rust aliasable slices (&[T]) are immutable. You can't change values contained in them and you can't take mutable references into them.
These two features (and compiler checks for lifetimes) make aliasing of slices absolutely safe. However, sometimes you do need to modify the data in a slice. And then you need mutable slice, called &mut [T]. You can change your data through such slice; but these slices are not aliasable. You can't create two mutable slices into the same structure (an array, for example), so you can't do anything dangerous.
Note, however, that using transmute() to transform a slice into a Slice or vice versa is an unsafe operation. &[T] is guaranteed statically to be correct if you create it using right methods, like calling as_slice() on a Vec. However, creating it manually using Slice struct and then transmuting it into &[T] is error-prone and can easily segfault your program, for example, when you assign it more length than is actually allocated.

Related

Idiomatic way to pass "out slice" to C function?

Some Windows APIs take slice as parameter and write to it upon return. It's similar to an out pointer but in slice form (so that caller don't need to pass an extra "length" parameter).
In cases of out pointer, I've been using MaybeUninit, which I think is the idiomatic way in Rust. However, I do not know how to use it in case of slices.
For example, many examples suggest to declare [MaybeUninit<u16>; 32], but how do I pass it to a function that accepts only &mut [u16]? I tried MaybeUninit<[u16; 32]>, but there is no way to get an uninitialized &mut T out of MaybeUninit. There is only as_mut_ptr, which is pointer, not slice.
Am I supposed to stick to let x: [u16; 32] = zeroed(); at the moment?
Creating a reference to an uninitialized memory is Undefined Behavior, even if the memory contents are never read. Quoting the reference, "Behavior considered undefined" (emphasis mine):
...
Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):
...
A reference or Box<T> that is dangling, unaligned, or points to an invalid value.
...
Note: Uninitialized memory is also implicitly invalid for any type that has a restricted set of valid values. In other words, the only cases in which reading uninitialized memory is permitted are inside unions and in "padding" (the gaps between the fields/elements of a type).
It is not obvious this is UB: there is an active discussion about that (part of the counterarguments is to allow things like your case). However, currently it is considered UB and you should avoid it.
(Note: the "has a restricted set of valid values" is somewhat ambiguous. Whether primitive types like integers are allowed to contain uninitialized bits is also a matter of active discussion, but in this case too you should avoid it until it is settled. You may claim the reference does not agree with that, because integers do not have a restricted set of values, but this is false: one can view the set of possible values for a byte as 0-256 and the uninit byte, and in fact this is an interpretation used in many places. Integers cannot contain the uninit byte and so have a restricted set of values).
Initializing the array using mem::zeroed() is sound but uses unsafe for no reason: you can just use an initialized, i.e. [0; size], and it will be just as performant.
You do not canonically need MaybeUninit, you can fill a buffer array with zeroes yourself:
let x: [u16; 32] = [0u16; 32];
let res = unsafe { GetClassNameW(param0, &mut x); }

What is the relationship between slices and references in Rust?

I am completely new to Rust (as in I just started looking at it yesterday), and am working my way through "The Rust Programming Language". I'm a little stuck on Chapters 4.2 (References and Borrowing) / 4.3 (The Slice Type) and am trying to solidify my initial understanding of references before I move on. I'm an experienced programmer whose background is mainly in C++ (I am intimately familiar with several languages, but C++ is what I'm most comfortable with).
Consider the following Rust code:
let string_obj: String = String::from("My String");
let string_ref: &String = &string_obj;
let string_slice: &str = &string_obj[1..=5];
Based on my understanding, from the first line, string_obj is an object of type String that is stored on the stack, which contains three fields: (1) a pointer to the text "My String", allocated on the heap, encoded in UTF-8; (2) A length field with value 9; (3) A capacity field with a value >= 9. That's straightforward enough.
From the second line, string_ref is an immutable reference to a String object, also stored on the stack, which contains a single field - a pointer to string_obj. This leads me to believe that (leaving aside ownership rules, semantics, and other things I am yet to learn about references), a reference is essentially a pointer to some other object. Again, pretty straightforward.
It's the third line which causing me some headaches. From the documentation, it would appear that string_slice is an object of type &str that is stored on the stack, and contains two fields: 1) a pointer to the text "y Str", within the text "My String" associated with string_obj. 2) A length field with value 5.
But, by appearances at least, the &str type is by definition an immutable reference to an object of type str. So my questions are as follows:
What exactly is an str, and how is it represented in memory?
How does &str - a reference type, which I thought was simply a pointer - contain TWO fields (a pointer AND a length)?
How does Rust know in general what / how many fields to create when constructing a reference? (and consequently how does the programmer know?)
Slices are primitive types in Rust, which means that they don't necessarily have to follow the syntax rules of other types. In this case, str and &str are special and are treated with a bit of magic.
The type str doesn't really exist, since you can't have a slice that owns its contents. The reason for requiring us to spell this type "&str" is syntactic: the & reminds us that we're working with data borrowed from somewhere else, and it's required to be able to specify lifetimes, such as:
fn example<'a>(x: &String, y: &'a String) -> &'a str {
&y[..]
}
It's also necessary so that we can differentiate between an immutably-borrowed string slice (&str) and a mutably-borrowed string slice (&mut str). (Though the latter are somewhat limited in their usefulness and so you don't see them that often.)
Note that the same thing applies to array slices. We have arrays like [u8; 16] and we have slices like &[u8] but we don't really directly interact with [u8]. Here the mutable variant (&mut [u8]) is more useful than with string slices.
What exactly is an str, and how is it represented in memory?
As per above, str kind-of doesn't really exist by itself. The layout of &str though is as you suspect -- a pointer and a length.
(str is the actual characters referred to by the slice, and is a so-called dynamically-sized type. In the general case, a &T can't exist without a T to refer to. In this case it's a bit backwards in that the str doesn't exist without the &str slice.)
How does &str - a reference type, which I thought was simply a pointer - contain TWO fields (a pointer AND a length)?
As a primitive, it's a special case handled by the compiler.
How does Rust know in general what / how many fields to create when constructing a reference? (and consequently how does the programmer know?)
If it's a non-slice reference, then it's either a pointer or it's nothing (if the reference itself can be optimized away).

Is Vec<&&str> the same as Vec<&Str>?

I'm learning Rust and I'm trying to solve an advent of code challenge (day 9 2015).
I created a situation where I end up with a variable that has the type Vec<&&str> (note the double '&', it's not a typo). I'm now wondering if this type is different than Vec<&str>. I can't figure out if a reference to a reference to something would ever make sense. I know I can avoid this situation by using String for the from and to variables. I'm asking if Vec<&&str> == Vec<&str> and if I should try and avoid Vec<&&str>.
Here is the code that triggered this question:
use itertools::Itertools
use std::collections::{HashSet};
fn main() {
let contents = fs::read_to_string("input.txt").unwrap();
let mut vertices: HashSet<&str> = HashSet::new();
for line in contents.lines() {
let data: Vec<&str> = line.split(" ").collect();
let from = data[0];
let to = data[2];
vertices.insert(from);
vertices.insert(to);
}
// `Vec<&&str>` originates from here
let permutations_iter = vertices.iter().permutations(vertices.len());
for perm in permutations_iter {
let length_trip = compute_length_of_trip(&perm);
}
}
fn compute_length_of_trip(trip: &Vec<&&str>) -> u32 {
...
}
Are Vec<&str> and Vec<&&str> different types?
I'm now wondering if this type is different than Vec<&str>.
Yes, a Vec<&&str> is a type different from Vec<&str> - you can't pass a Vec<&&str> where a Vec<&str> is expected and vice versa. Vec<&str> stores string slice references, which you can think of as pointers to data inside some strings. Vec<&&str> stores references to such string slice references, i.e. pointers to pointers to data. With the latter, accessing the string data requires an additional indirection.
However, Rust's auto-dereferencing makes it possible to use a Vec<&&str> much like you'd use a Vec<&str> - for example, v[0].len() will work just fine on either, v[some_idx].chars() will iterate over chars with either, and so on. The only difference is that Vec<&&str> stores the data more indirectly and therefore requires a bit more work on each access, which can lead to slightly less efficient code.
Note that you can always convert a Vec<&&str> to Vec<&str> - but since doing so requires allocating a new vector, if you decide you don't want Vec<&&str>, it's better not to create it in the first place.
Can I avoid Vec<&&str> and how?
Since a &str is Copy, you can avoid the creation of Vec<&&str> by adding a .copied() when you iterate over vertices, i.e. change vertices.iter() to vertices.iter().copied(). If you don't need vertices sticking around, you can also use vertices.into_iter(), which will give out &str, as well as free vertices vector as soon as the iteration is done.
The reason why the additional reference arises and the ways to avoid it have been covered on StackOverflow before.
Should I avoid Vec<&&str>?
There is nothing inherently wrong with Vec<&&str> that would require one to avoid it. In most code you'll never notice the difference in efficiency between Vec<&&str> and Vec<&str>. Having said that, there are some reasons to avoid it beyond performance in microbenchmarks. The additional indirection in Vec<&&str> requires the exact &strs it was created from (and not just the strings that own the data) to stick around and outlive the new collection. This is not relevant in your case, but would become noticeable if you wanted to return the permutations to the caller that owns the strings. Also, there is value in the simpler type that doesn't accumulate a reference on each transformation. Just imagine needing to transform the Vec<&&str> further into a new vector - you wouldn't want to deal with Vec<&&&str>, and so on for every new transformation.
Regarding performance, less indirection is usually better since it avoids an extra memory access and increases data locality. However, one should also note that a Vec<&str> takes up 16 bytes per element (on 64-bit architectures) because a slice reference is represented by a "fat pointer", i.e. a pointer/length pair. A Vec<&&str> (as well as Vec<&&&str> etc.) on the other hand takes up only 8 bytes per element, because a reference to a fat reference is represented by a regular "thin" pointer. So if your vector measures millions of elements, a Vec<&&str> might be more efficient than Vec<&str> simply because it occupies less memory. As always, if in doubt, measure.
The reason you have &&str is that the data &str is owned by vertices and when you create an interator over that data you are simply getting a reference to that data, hence the &&str.
There's really nothing to avoid here. It simply shows your iterator references the data that is inside the HashSet.

Is it possible to create a Rust function that inputs and outputs an edited value, without killing its' lifetime?

I need a function that should accept a vector without killing its' lifetime, but still being able to get modified (so no reference).
The modification in this case is the following: You have a vector of a certain structure, that has a children attribute which is a vector of this same type. The function takes the last struct of the vector, gets into it's children, once again gets the last of the children's vector, gets once again into the children, and so n times. Then the function returns the n-th level child.
How would I go about making a compilable code like following pseudo code?
fn g(vector: Vec<...>, n: numeric input) {
let temporary;
n times {
temporary = vector.last().unwrap().children;
}
return temporary;
}
let vec = Vec // Existing Vector
g(vec).push() // pushes into a certain child element of the vec element, this child is received over the function.
However the above thing won't work since by giving the vector to the function, it's ownership is granted to the function and it's lifetime expires here.
(This is all very difficult to do correctly without an M.R.E. or even a copypaste of your compiler-errors - thus I'll proceed via theoretical analysis and educated guessing.)
Even though you pass in the tokens via &mut, that gets automagically converted into a nonexclusive & reference because that's what's in the signature. You can't then get an exclusive reference from that, only a mutable name that holds nonexclusive references.
You need an exclusive reference in order to do that push operation - the "many readers XOR one writer" principle requires that.
Once you have outermost_vector as an exclusive reference, you then need to assign more exclusive references to it. last returns nonexclusive references, so that's out. But since outermost_vector is now an exclusive reference, you can call slice::last_mut on it. Then - assuming all of this is even possible in the first place - you just need to follow the compiler-errors to fix up other minor things such as the intricacies of correctly pattern-matching the intermediate values.
You also need to fix the function's overall typing; Vec::push returns (), which thankfully is not a member of the type Vec<Token>.
And as to the title of your question: since you're taking an &'a mut Vec<Token>, if you want to return some data borrowed from that then you should type your return as &'a mut Vec<Token>, not Vec<Token> over which your function has no ownership.
(Question was edited significantly, new answer follows:)
Your function must accept vector not by value, but by exclusive reference (spelled &mut rather than &). Then you can return another exclusive reference, one which is borrowed from the exclusive reference that was passed-in, but points directly to the desired receiver of push. The function's type signature would be something like fn g(vector: &mut Vec<T>, n: usize) -> &mut Vec<T>.
Passing by value would cause the function to take ownership of the whole thing - all owned data not returned are dropped, and this is probably not what you want.
Passing by nonexclusive reference is not what you want either, since there is no way to then convert that back to an exclusive reference with which to call push - even if you're back at the callsite where you have ownership of the data from which the reference is borrowed-from.

Why calling iter on array of floats gives "temporary value is freed" when using f64::from_bits?

I encountered a lifetime error which I am not able to explain why it is emitted by the compiler. I need this (which works fine):
fn iter<'a>() -> impl Iterator<Item = &'a f64> {
[3.14].iter()
}
However, when I try to use a float value which is transmuted from a specific byte representation using from_bits, like this:
fn iter<'a>() -> impl Iterator<Item = &'a f64> {
[f64::from_bits(0x7fffffffffffffff)].iter()
}
it gives me "creates a temporary which is freed while still in use". Playground here (stable 1.45.2).
My reasoning was that since f64 is Copy type (and it indeed works as expected if I use a constant value), this should work because no freeing should be performed on this value.
So the question is why the compiler emits the error in the second case?
Thanks for any pointers and explanations!
P.S. I need the iterator over references because it fits nicely with my other API.
Your code has two related problems:
You return a reference to a temporary object that go out of scope at the end of the function body.
Your return type contains a lifetime parameter that isn't bound to any function input.
References can only live as long as the data they point to. The result of the method call f64::from_bits(0x7fffffffffffffff) is a temporary object which goes out of scope at the end of the expression. Returning a reference to a temporary value or a local variable from a function is not possible, since the value referred to won't be alive anymore once the function returns.
The Copy trait, or whether the object is stored on the heap, is completely unrelated to the fact that values that go out of scope can no longer be referred to. Any value created inside a function will go out of scope at the end of the function body, unless you move it out of the function via the return value. However, you need to move ownership of the value for this to work – you can't simply return a reference.
Since you can't return a reference to any value created inside your function, any reference in the return value necessarily needs to refer to something that was passed in via the function parameters. This implies that the lifetime of any reference in the return value needs to match the lifetime of some reference that was passed to the function. Which gets us to the second point – a lifetime parameter that only occurs in the return type is always an error. The lifetime parameter is chosen by the calling code, so you are essentially saying that your function returns a reference that lives for an arbitrary time chosen by the caller, which is only possible if the reference refers to static data that lives as long as the program.
This also gives an explanation why your first example works. The literal [3.14] defines a constant static array. This array will live as long as the program, so you can return references with arbitrary lifetime to it. However, you'd usually express this by explicitly specifying the static lifetime to make clear what is happening:
fn iter() -> impl Iterator<Item = &'static f64> {
[3.14].iter()
}
A lifetime parameter that only occurs in the return value isn't ever useful.
So how do you fix your problem? You probably need to return an iterator over an owned type, since your iter() function doesn't accept any arguments.
It is Copy, it is semantically copied, and that is indeed the problem, as it semantically only exists on the stack until function returns, reference is now, semantically, pointing to memory that is outside the stack and is very likely to be overwritten soon, and if Rust allowed it, that would result in undefined behaviour.
On top of that, from_bits isn't const, and that means that you cannot convert static values at compile time, it is a runtime operation. Why do you want to convert each time when you already know the value?
Why is this the case anyway?
from_bits:
This is currently identical to transmute::<u64, f64>(v) on all
platforms.
If you take a look at transmute, you will find:
transmute is semantically equivalent to a bitwise move of one type
into another. It copies the bits from the source value into the
destination value, then forgets the original. It's equivalent to C's
memcpy under the hood, just like transmute_copy.
While generated code may indeed just be a simple reinterpretation of static value, Rust cannot semantically allow this value to be moved onto stack, then dropped, while a reference is still pointing to it.
Solution.
Since you want to return aNaN, you should just do the same thing you did in the first example:
fn iter<'a>() -> impl Iterator<Item = &'a f64> {
[f64::NAN].iter()
}
This will iterate over a static slice directly, and there will be no issues.

Resources