Idiomatic way to pass "out slice" to C function? - rust

Some Windows APIs take slice as parameter and write to it upon return. It's similar to an out pointer but in slice form (so that caller don't need to pass an extra "length" parameter).
In cases of out pointer, I've been using MaybeUninit, which I think is the idiomatic way in Rust. However, I do not know how to use it in case of slices.
For example, many examples suggest to declare [MaybeUninit<u16>; 32], but how do I pass it to a function that accepts only &mut [u16]? I tried MaybeUninit<[u16; 32]>, but there is no way to get an uninitialized &mut T out of MaybeUninit. There is only as_mut_ptr, which is pointer, not slice.
Am I supposed to stick to let x: [u16; 32] = zeroed(); at the moment?

Creating a reference to an uninitialized memory is Undefined Behavior, even if the memory contents are never read. Quoting the reference, "Behavior considered undefined" (emphasis mine):
...
Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):
...
A reference or Box<T> that is dangling, unaligned, or points to an invalid value.
...
Note: Uninitialized memory is also implicitly invalid for any type that has a restricted set of valid values. In other words, the only cases in which reading uninitialized memory is permitted are inside unions and in "padding" (the gaps between the fields/elements of a type).
It is not obvious this is UB: there is an active discussion about that (part of the counterarguments is to allow things like your case). However, currently it is considered UB and you should avoid it.
(Note: the "has a restricted set of valid values" is somewhat ambiguous. Whether primitive types like integers are allowed to contain uninitialized bits is also a matter of active discussion, but in this case too you should avoid it until it is settled. You may claim the reference does not agree with that, because integers do not have a restricted set of values, but this is false: one can view the set of possible values for a byte as 0-256 and the uninit byte, and in fact this is an interpretation used in many places. Integers cannot contain the uninit byte and so have a restricted set of values).
Initializing the array using mem::zeroed() is sound but uses unsafe for no reason: you can just use an initialized, i.e. [0; size], and it will be just as performant.

You do not canonically need MaybeUninit, you can fill a buffer array with zeroes yourself:
let x: [u16; 32] = [0u16; 32];
let res = unsafe { GetClassNameW(param0, &mut x); }

Related

How does having multiple shared references which value can be accessed just works in Rust?

I'm new to Rust and already have read "the book" but I'm trying to understand the inners of references in the language.
As far as I know a reference is a type of pointer that takes you to some value in memory when dereferenced.
let x = 5
let y = &x
In this example y is a pointer to the memory address of x and so the type and value of y is not equal to the type and value of x.
assert_eq!(y, x)
// fails to compile as assert_eq! has no implementation for `&{integer} == {integer}`
Then a reference is not the same as the value it references to.
But if I dereference the y by using the * operator I now do get the value it referenced and so the following code compiles.
assert_eq!(*y, x)
To access the value the reference points to, dereferencing is needed; but dereferencing implies moving the ownership of the referenced value to the new variable.
let x = Point {x:1, y:2};
let y = &x;
let z = *y;
// fails to compile as move occurs because `*y` has type `Point`, which does not implement the `Copy` trait
By implementing the Copy trait for the type (Point in this case) the problem can be solved by letting Rust create a copy of the value and move the ownership of the copied value to the new variable.
The final question is, how can Rust access a value of a type that does not implement the Copy or Clone traits and is behind a reference without having the dereference (*) function take ownership of the value, thus making other shared references to the original value invalid?
E.g. (works just fine)
let x = Point {x:1, y:2};
let y = &x;
let z = &x;
fn print_point(a: &Point){
println!("{a:#?}")
}
println!("Printing y");
print_point(y);
println!("Printing x");
println!("{x:#?}");
println!("Printing z");
print_point(z);
(Playground)
The ownership semantics of Rust are more complex than simply giving ownership or denying access.
First, there is a little background to be explained about pointers in Rust. A pointer is just an integer that indicates an index of the memory. It's usually beginner-friendly to think of it as an arrow pointing to a location in memory. An arrow is not what it points to, but it's easy to get the value pointed by an arrow given the arrow (just look where it points to).
All pointer-like in Rust are either just that arrow, or that arrow a little more information about that value (so that you don't have to read the value to know that information). By pointer-like I mean anything that behaves like a pointer, that is, can be dereferenced (to keep it simple).
For example, the borrow of x (&x) is not exactly a pointer ("actual" pointers in Rust are used less often than borrows and would be either *mut x or *const x), it's a pointer-like. This pointer-like is just an integer: once your program is compilated, you couldn't tell the difference between a borrow and a pointer. However, contrary to "just a number", a borrow holds some additional constraints. For instance, it's never a null-pointer (that is, a pointer that points to the very-first position of the memory; by convention — and for technical reasons concerning the allocator — this area is never allocated, that is, this pointer is always invalid). In fact, more generally, a borrow should never be invalid: it should always be safe to dereference it (NB. safe doesn't mean legal, Rust could prevent you from dereferencing it, as in the example you posted), in the sense that the value that it points to always "makes sense". This is still not the strongest guarantee provided by a borrow, which is that, for as long as someone holds the borrow (that is, for as long as someone could use it), no one is going to modify the value it points to. This is not something you have to be careful about: Rust will prevent you from writing code that could break this.
Similarly, a mutable borrow (&mut x) is also a pointer-like that is "just an integer", but has different constraints. A mutable borrow (also called "exclusive borrow"), besides always being valid, ensures that, as long as one person is holding it, no one else can access the value it points to, either to modify it or just to read it. This is to prevent data races, because the one that hold the exclusive borrow can modify the value, even though they don't own the value.
These are definitively the most common pointer-like used in Rust (they're everywhere), and could be seen as the read-only but multi-access version of a pointer, or the read-and-write but single-access version of a pointer.
Having understood that, it's easier to understand Rust's logic. If I have y = &x, I am allowed to read the value x, but I can't "take" it (that is, take its ownership): I can't even write to it (but I should be able to if I owned it)! Note that even if I have an exclusive borrow, I couldn't take the ownership of x, but I could "swap" it: take its ownership in exchange for the ownership of a variable I owned (or create an owned variable for the occasion). For this reason, if you write let z = *y, you are taking the ownership of x so Rust complains. But note that this is due to the binding, not the the dereferencing of y. To prove it, compare it with the following (String is not Copy):
let a = String::new();
let b = &a;
let c = &a;
assert_eq(*b, *c);
playground
Here, I dereference the borrow to a non-Copy value, but it's clear that I am not violating the borrow contract (to compare equality, I just need to read the value).
Furthermore, in general, if I have a borrow of a struct, I can obtain borrows of the its fields.
Incidentally, note that String is a pointer-like too! It's not "just a number" (meaning it carries more information than just being an arrow). These pointer-like that are more than "just a number" are called fat pointers. String is a fat pointer to an owned str, that is, String is a kind of pointer that also ensure that whomever owns it also owns the pointed value, which is why String is not Copy (otherwise, the pointed value could be owned by multiple parties).
To access the value the reference points to, dereferencing is needed; but dereferencing implies moving the ownership of the referenced value to the new variable.
Dereferencing does not imply moving. The motivating example presented let z = *y simply dereferences y and then moves/copies the value to z. Yet if you do not assign the value to a new variable there is no new variable and therfore no transfer of ownership.
This might help building an intuition:
You make the (correct) case that a reference is a different thing from the object it points to. However implying that you would need to assign it after dereferencing in order to do anything with it (wrong). Most of the time you are fine working directly on the members, which may be copyable. If not maybe the members of the members are. In the end it is likely to all just to boil down to bytes.

What is the relationship between slices and references in Rust?

I am completely new to Rust (as in I just started looking at it yesterday), and am working my way through "The Rust Programming Language". I'm a little stuck on Chapters 4.2 (References and Borrowing) / 4.3 (The Slice Type) and am trying to solidify my initial understanding of references before I move on. I'm an experienced programmer whose background is mainly in C++ (I am intimately familiar with several languages, but C++ is what I'm most comfortable with).
Consider the following Rust code:
let string_obj: String = String::from("My String");
let string_ref: &String = &string_obj;
let string_slice: &str = &string_obj[1..=5];
Based on my understanding, from the first line, string_obj is an object of type String that is stored on the stack, which contains three fields: (1) a pointer to the text "My String", allocated on the heap, encoded in UTF-8; (2) A length field with value 9; (3) A capacity field with a value >= 9. That's straightforward enough.
From the second line, string_ref is an immutable reference to a String object, also stored on the stack, which contains a single field - a pointer to string_obj. This leads me to believe that (leaving aside ownership rules, semantics, and other things I am yet to learn about references), a reference is essentially a pointer to some other object. Again, pretty straightforward.
It's the third line which causing me some headaches. From the documentation, it would appear that string_slice is an object of type &str that is stored on the stack, and contains two fields: 1) a pointer to the text "y Str", within the text "My String" associated with string_obj. 2) A length field with value 5.
But, by appearances at least, the &str type is by definition an immutable reference to an object of type str. So my questions are as follows:
What exactly is an str, and how is it represented in memory?
How does &str - a reference type, which I thought was simply a pointer - contain TWO fields (a pointer AND a length)?
How does Rust know in general what / how many fields to create when constructing a reference? (and consequently how does the programmer know?)
Slices are primitive types in Rust, which means that they don't necessarily have to follow the syntax rules of other types. In this case, str and &str are special and are treated with a bit of magic.
The type str doesn't really exist, since you can't have a slice that owns its contents. The reason for requiring us to spell this type "&str" is syntactic: the & reminds us that we're working with data borrowed from somewhere else, and it's required to be able to specify lifetimes, such as:
fn example<'a>(x: &String, y: &'a String) -> &'a str {
&y[..]
}
It's also necessary so that we can differentiate between an immutably-borrowed string slice (&str) and a mutably-borrowed string slice (&mut str). (Though the latter are somewhat limited in their usefulness and so you don't see them that often.)
Note that the same thing applies to array slices. We have arrays like [u8; 16] and we have slices like &[u8] but we don't really directly interact with [u8]. Here the mutable variant (&mut [u8]) is more useful than with string slices.
What exactly is an str, and how is it represented in memory?
As per above, str kind-of doesn't really exist by itself. The layout of &str though is as you suspect -- a pointer and a length.
(str is the actual characters referred to by the slice, and is a so-called dynamically-sized type. In the general case, a &T can't exist without a T to refer to. In this case it's a bit backwards in that the str doesn't exist without the &str slice.)
How does &str - a reference type, which I thought was simply a pointer - contain TWO fields (a pointer AND a length)?
As a primitive, it's a special case handled by the compiler.
How does Rust know in general what / how many fields to create when constructing a reference? (and consequently how does the programmer know?)
If it's a non-slice reference, then it's either a pointer or it's nothing (if the reference itself can be optimized away).

Is Vec<&&str> the same as Vec<&Str>?

I'm learning Rust and I'm trying to solve an advent of code challenge (day 9 2015).
I created a situation where I end up with a variable that has the type Vec<&&str> (note the double '&', it's not a typo). I'm now wondering if this type is different than Vec<&str>. I can't figure out if a reference to a reference to something would ever make sense. I know I can avoid this situation by using String for the from and to variables. I'm asking if Vec<&&str> == Vec<&str> and if I should try and avoid Vec<&&str>.
Here is the code that triggered this question:
use itertools::Itertools
use std::collections::{HashSet};
fn main() {
let contents = fs::read_to_string("input.txt").unwrap();
let mut vertices: HashSet<&str> = HashSet::new();
for line in contents.lines() {
let data: Vec<&str> = line.split(" ").collect();
let from = data[0];
let to = data[2];
vertices.insert(from);
vertices.insert(to);
}
// `Vec<&&str>` originates from here
let permutations_iter = vertices.iter().permutations(vertices.len());
for perm in permutations_iter {
let length_trip = compute_length_of_trip(&perm);
}
}
fn compute_length_of_trip(trip: &Vec<&&str>) -> u32 {
...
}
Are Vec<&str> and Vec<&&str> different types?
I'm now wondering if this type is different than Vec<&str>.
Yes, a Vec<&&str> is a type different from Vec<&str> - you can't pass a Vec<&&str> where a Vec<&str> is expected and vice versa. Vec<&str> stores string slice references, which you can think of as pointers to data inside some strings. Vec<&&str> stores references to such string slice references, i.e. pointers to pointers to data. With the latter, accessing the string data requires an additional indirection.
However, Rust's auto-dereferencing makes it possible to use a Vec<&&str> much like you'd use a Vec<&str> - for example, v[0].len() will work just fine on either, v[some_idx].chars() will iterate over chars with either, and so on. The only difference is that Vec<&&str> stores the data more indirectly and therefore requires a bit more work on each access, which can lead to slightly less efficient code.
Note that you can always convert a Vec<&&str> to Vec<&str> - but since doing so requires allocating a new vector, if you decide you don't want Vec<&&str>, it's better not to create it in the first place.
Can I avoid Vec<&&str> and how?
Since a &str is Copy, you can avoid the creation of Vec<&&str> by adding a .copied() when you iterate over vertices, i.e. change vertices.iter() to vertices.iter().copied(). If you don't need vertices sticking around, you can also use vertices.into_iter(), which will give out &str, as well as free vertices vector as soon as the iteration is done.
The reason why the additional reference arises and the ways to avoid it have been covered on StackOverflow before.
Should I avoid Vec<&&str>?
There is nothing inherently wrong with Vec<&&str> that would require one to avoid it. In most code you'll never notice the difference in efficiency between Vec<&&str> and Vec<&str>. Having said that, there are some reasons to avoid it beyond performance in microbenchmarks. The additional indirection in Vec<&&str> requires the exact &strs it was created from (and not just the strings that own the data) to stick around and outlive the new collection. This is not relevant in your case, but would become noticeable if you wanted to return the permutations to the caller that owns the strings. Also, there is value in the simpler type that doesn't accumulate a reference on each transformation. Just imagine needing to transform the Vec<&&str> further into a new vector - you wouldn't want to deal with Vec<&&&str>, and so on for every new transformation.
Regarding performance, less indirection is usually better since it avoids an extra memory access and increases data locality. However, one should also note that a Vec<&str> takes up 16 bytes per element (on 64-bit architectures) because a slice reference is represented by a "fat pointer", i.e. a pointer/length pair. A Vec<&&str> (as well as Vec<&&&str> etc.) on the other hand takes up only 8 bytes per element, because a reference to a fat reference is represented by a regular "thin" pointer. So if your vector measures millions of elements, a Vec<&&str> might be more efficient than Vec<&str> simply because it occupies less memory. As always, if in doubt, measure.
The reason you have &&str is that the data &str is owned by vertices and when you create an interator over that data you are simply getting a reference to that data, hence the &&str.
There's really nothing to avoid here. It simply shows your iterator references the data that is inside the HashSet.

Is it possible to create a Rust function that inputs and outputs an edited value, without killing its' lifetime?

I need a function that should accept a vector without killing its' lifetime, but still being able to get modified (so no reference).
The modification in this case is the following: You have a vector of a certain structure, that has a children attribute which is a vector of this same type. The function takes the last struct of the vector, gets into it's children, once again gets the last of the children's vector, gets once again into the children, and so n times. Then the function returns the n-th level child.
How would I go about making a compilable code like following pseudo code?
fn g(vector: Vec<...>, n: numeric input) {
let temporary;
n times {
temporary = vector.last().unwrap().children;
}
return temporary;
}
let vec = Vec // Existing Vector
g(vec).push() // pushes into a certain child element of the vec element, this child is received over the function.
However the above thing won't work since by giving the vector to the function, it's ownership is granted to the function and it's lifetime expires here.
(This is all very difficult to do correctly without an M.R.E. or even a copypaste of your compiler-errors - thus I'll proceed via theoretical analysis and educated guessing.)
Even though you pass in the tokens via &mut, that gets automagically converted into a nonexclusive & reference because that's what's in the signature. You can't then get an exclusive reference from that, only a mutable name that holds nonexclusive references.
You need an exclusive reference in order to do that push operation - the "many readers XOR one writer" principle requires that.
Once you have outermost_vector as an exclusive reference, you then need to assign more exclusive references to it. last returns nonexclusive references, so that's out. But since outermost_vector is now an exclusive reference, you can call slice::last_mut on it. Then - assuming all of this is even possible in the first place - you just need to follow the compiler-errors to fix up other minor things such as the intricacies of correctly pattern-matching the intermediate values.
You also need to fix the function's overall typing; Vec::push returns (), which thankfully is not a member of the type Vec<Token>.
And as to the title of your question: since you're taking an &'a mut Vec<Token>, if you want to return some data borrowed from that then you should type your return as &'a mut Vec<Token>, not Vec<Token> over which your function has no ownership.
(Question was edited significantly, new answer follows:)
Your function must accept vector not by value, but by exclusive reference (spelled &mut rather than &). Then you can return another exclusive reference, one which is borrowed from the exclusive reference that was passed-in, but points directly to the desired receiver of push. The function's type signature would be something like fn g(vector: &mut Vec<T>, n: usize) -> &mut Vec<T>.
Passing by value would cause the function to take ownership of the whole thing - all owned data not returned are dropped, and this is probably not what you want.
Passing by nonexclusive reference is not what you want either, since there is no way to then convert that back to an exclusive reference with which to call push - even if you're back at the callsite where you have ownership of the data from which the reference is borrowed-from.

Is &[T] literally an alias of Slice in rust?

&[T] is confusing me.
I naively assumed that like &T, &[T] was a pointer, which is to say, a numeric pointer address.
However, I've seen some code like this, that I was rather surprised to see work fine (simplified for demonstration purposes; but you see code like this in many 'as_slice()' implementations):
extern crate core;
extern crate collections;
use self::collections::str::raw::from_utf8;
use self::core::raw::Slice;
use std::mem::transmute;
fn main() {
let val = "Hello World";
{
let value:&str;
{
let bytes = val.as_bytes();
let mut slice = Slice { data: &bytes[0] as *const u8, len: bytes.len() };
unsafe {
let array:&[u8] = transmute(slice);
value = from_utf8(array);
}
// slice.len = 0;
}
println!("{}", value);
}
}
So.
I initially thought that this was invalid code.
That is, the instance of Slice created inside the block scope is returned to outside the block scope (by transmute), and although the code runs, the println! is actually accessing data that is no longer valid through unsafe pointers. Bad!
...but that doesn't seem to be the case.
Consider commenting the line // slice.len = 0;
This code still runs fine (prints 'Hello World') when this happens.
So the line...
value = from_utf8(array);
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)
A slice &[T], as you have noticed, is "equivalent" to a structure std::raw::Slice. In fact, Slice is an internal representation of &[T] value, and yes, it is a pointer and a length of data behind that pointer. Sometimes such structure is called "fat pointer", that is, a pointer and an additional piece of information.
When you pass &[T] value around, you indeed are just copying its contents - the pointer and the length.
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
So, yes, exactly.
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
And this is also true. That's the whole idea of borrowed references, including slices - borrowed references are statically checked to be used as long as their referent is alive. When DST finally lands, slices and regular references will be even more unified.
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)
And this is actually an absolutely valid concern; it is one of the problems with aliasing. However, Rust is designed exactly to prevent such bugs. There are two things which render aliasing of slices valid.
First, slices can't change length; there are no methods defined on &[T] which would allow you changing its length in place. You can create a derived slice from a slice, but it will be a new object whatsoever.
But even if slices can't change length, if the data could be mutated through them, they still could bring disaster if aliased. For example, if values in slices are enum instances, mutating a value in such an aliased slice could make a pointer to internals of enum value contained in this slice invalid. So, second, Rust aliasable slices (&[T]) are immutable. You can't change values contained in them and you can't take mutable references into them.
These two features (and compiler checks for lifetimes) make aliasing of slices absolutely safe. However, sometimes you do need to modify the data in a slice. And then you need mutable slice, called &mut [T]. You can change your data through such slice; but these slices are not aliasable. You can't create two mutable slices into the same structure (an array, for example), so you can't do anything dangerous.
Note, however, that using transmute() to transform a slice into a Slice or vice versa is an unsafe operation. &[T] is guaranteed statically to be correct if you create it using right methods, like calling as_slice() on a Vec. However, creating it manually using Slice struct and then transmuting it into &[T] is error-prone and can easily segfault your program, for example, when you assign it more length than is actually allocated.

Resources