Why does clone_from_slice() fail here? [duplicate] - rust

This question already has answers here:
Trying to copy content of a Vec into an other, how to use copy_from_slice()?
(2 answers)
Closed 3 days ago.
I have the following code
pub fn new(src_objects: &[Arc<dyn Hitable>], time0: f32, time1: f32) -> Self {
let object_span = src_objects.len();
let mut objects = Vec::with_capacity(object_span);
objects.clone_from_slice(src_objects);
...
}
On running this code, the code panics with a message "destination and source slices have different lengths". I ran the same code but replacing clone_fom_slice with extend_from_slice, but I wanted to understand what clone_from_slice is doing here that fails

Vec::with_capacity does not affect the length of the vector, only the capacity; the new vector will still have length 0. The difference is that it not need to reallocate until its length reaches the requested capacity.
The method clone_from_slice is actually a method of slices ([T]) not Vec<T>, but Vecs can be dereferenced as slices, so these methods are still callable. Even though clone_from_slice will overwrite all of the items in the destination slice, it still needs a valid slice with fully initialized items.
There would be a lot of complications if the slice contained a mixture of initialized and uninitialized items. Initialized items may need to be dropped, while uninitialized items really must not be dropped. The maintainers of the std library could have solved this by adding an extra parameter to say where the boundary of initialized and uninitialized values is, but that would have made it much harder to use: if you were to pass in the wrong value for the boundary it would trigger Undefined Behaviour, so the method would also have to be marked unsafe.

Related

Idiomatic way to pass "out slice" to C function?

Some Windows APIs take slice as parameter and write to it upon return. It's similar to an out pointer but in slice form (so that caller don't need to pass an extra "length" parameter).
In cases of out pointer, I've been using MaybeUninit, which I think is the idiomatic way in Rust. However, I do not know how to use it in case of slices.
For example, many examples suggest to declare [MaybeUninit<u16>; 32], but how do I pass it to a function that accepts only &mut [u16]? I tried MaybeUninit<[u16; 32]>, but there is no way to get an uninitialized &mut T out of MaybeUninit. There is only as_mut_ptr, which is pointer, not slice.
Am I supposed to stick to let x: [u16; 32] = zeroed(); at the moment?
Creating a reference to an uninitialized memory is Undefined Behavior, even if the memory contents are never read. Quoting the reference, "Behavior considered undefined" (emphasis mine):
...
Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):
...
A reference or Box<T> that is dangling, unaligned, or points to an invalid value.
...
Note: Uninitialized memory is also implicitly invalid for any type that has a restricted set of valid values. In other words, the only cases in which reading uninitialized memory is permitted are inside unions and in "padding" (the gaps between the fields/elements of a type).
It is not obvious this is UB: there is an active discussion about that (part of the counterarguments is to allow things like your case). However, currently it is considered UB and you should avoid it.
(Note: the "has a restricted set of valid values" is somewhat ambiguous. Whether primitive types like integers are allowed to contain uninitialized bits is also a matter of active discussion, but in this case too you should avoid it until it is settled. You may claim the reference does not agree with that, because integers do not have a restricted set of values, but this is false: one can view the set of possible values for a byte as 0-256 and the uninit byte, and in fact this is an interpretation used in many places. Integers cannot contain the uninit byte and so have a restricted set of values).
Initializing the array using mem::zeroed() is sound but uses unsafe for no reason: you can just use an initialized, i.e. [0; size], and it will be just as performant.
You do not canonically need MaybeUninit, you can fill a buffer array with zeroes yourself:
let x: [u16; 32] = [0u16; 32];
let res = unsafe { GetClassNameW(param0, &mut x); }

How to determine whether to use .clone() and .cloned() with iterators [duplicate]

This question already has an answer here:
Why does cloned() allow this function to compile
(1 answer)
Closed 6 months ago.
Suppose we have a vector of some type that can be cloned
let foo_vec = vec![clonable_item_1, clonable_item_2, ...];
How to determine whether to use .clone() and .cloned() when iterating?
foo_vec.iter().cloned()...
// vs
foo_vec.clone().iter()...
I couldn't find anything written about the difference between the two. What's the difference?
They're not at all equal. If anything, it should be v.iter().cloned() vs. v.clone().into_iter(), both produce an iterator over owned T while v.clone().iter() produces an iterator over &T.
v.clone().into_iter() clones the Vec, allocating a Vec with the same size and cloning all elements into it, then converts this newly created Vec into an iterator. v.iter().cloned(), OTOH, creates a borrowed iterator over the Vec that yields &T, then applies the cloned() iterator adapter to it, which on-the-fly clones the &T produced by Iterator::next() to produce an owned T. Thus it doesn't need to allocate a new vector.
Because of that, you should always prefer v.iter().cloned() when possible (usually it is, but Vec's IntoIter has additional capabilities, like getting the underlying slice that may be required).

Is Vec<&&str> the same as Vec<&Str>?

I'm learning Rust and I'm trying to solve an advent of code challenge (day 9 2015).
I created a situation where I end up with a variable that has the type Vec<&&str> (note the double '&', it's not a typo). I'm now wondering if this type is different than Vec<&str>. I can't figure out if a reference to a reference to something would ever make sense. I know I can avoid this situation by using String for the from and to variables. I'm asking if Vec<&&str> == Vec<&str> and if I should try and avoid Vec<&&str>.
Here is the code that triggered this question:
use itertools::Itertools
use std::collections::{HashSet};
fn main() {
let contents = fs::read_to_string("input.txt").unwrap();
let mut vertices: HashSet<&str> = HashSet::new();
for line in contents.lines() {
let data: Vec<&str> = line.split(" ").collect();
let from = data[0];
let to = data[2];
vertices.insert(from);
vertices.insert(to);
}
// `Vec<&&str>` originates from here
let permutations_iter = vertices.iter().permutations(vertices.len());
for perm in permutations_iter {
let length_trip = compute_length_of_trip(&perm);
}
}
fn compute_length_of_trip(trip: &Vec<&&str>) -> u32 {
...
}
Are Vec<&str> and Vec<&&str> different types?
I'm now wondering if this type is different than Vec<&str>.
Yes, a Vec<&&str> is a type different from Vec<&str> - you can't pass a Vec<&&str> where a Vec<&str> is expected and vice versa. Vec<&str> stores string slice references, which you can think of as pointers to data inside some strings. Vec<&&str> stores references to such string slice references, i.e. pointers to pointers to data. With the latter, accessing the string data requires an additional indirection.
However, Rust's auto-dereferencing makes it possible to use a Vec<&&str> much like you'd use a Vec<&str> - for example, v[0].len() will work just fine on either, v[some_idx].chars() will iterate over chars with either, and so on. The only difference is that Vec<&&str> stores the data more indirectly and therefore requires a bit more work on each access, which can lead to slightly less efficient code.
Note that you can always convert a Vec<&&str> to Vec<&str> - but since doing so requires allocating a new vector, if you decide you don't want Vec<&&str>, it's better not to create it in the first place.
Can I avoid Vec<&&str> and how?
Since a &str is Copy, you can avoid the creation of Vec<&&str> by adding a .copied() when you iterate over vertices, i.e. change vertices.iter() to vertices.iter().copied(). If you don't need vertices sticking around, you can also use vertices.into_iter(), which will give out &str, as well as free vertices vector as soon as the iteration is done.
The reason why the additional reference arises and the ways to avoid it have been covered on StackOverflow before.
Should I avoid Vec<&&str>?
There is nothing inherently wrong with Vec<&&str> that would require one to avoid it. In most code you'll never notice the difference in efficiency between Vec<&&str> and Vec<&str>. Having said that, there are some reasons to avoid it beyond performance in microbenchmarks. The additional indirection in Vec<&&str> requires the exact &strs it was created from (and not just the strings that own the data) to stick around and outlive the new collection. This is not relevant in your case, but would become noticeable if you wanted to return the permutations to the caller that owns the strings. Also, there is value in the simpler type that doesn't accumulate a reference on each transformation. Just imagine needing to transform the Vec<&&str> further into a new vector - you wouldn't want to deal with Vec<&&&str>, and so on for every new transformation.
Regarding performance, less indirection is usually better since it avoids an extra memory access and increases data locality. However, one should also note that a Vec<&str> takes up 16 bytes per element (on 64-bit architectures) because a slice reference is represented by a "fat pointer", i.e. a pointer/length pair. A Vec<&&str> (as well as Vec<&&&str> etc.) on the other hand takes up only 8 bytes per element, because a reference to a fat reference is represented by a regular "thin" pointer. So if your vector measures millions of elements, a Vec<&&str> might be more efficient than Vec<&str> simply because it occupies less memory. As always, if in doubt, measure.
The reason you have &&str is that the data &str is owned by vertices and when you create an interator over that data you are simply getting a reference to that data, hence the &&str.
There's really nothing to avoid here. It simply shows your iterator references the data that is inside the HashSet.

Is it possible to create a Rust function that inputs and outputs an edited value, without killing its' lifetime?

I need a function that should accept a vector without killing its' lifetime, but still being able to get modified (so no reference).
The modification in this case is the following: You have a vector of a certain structure, that has a children attribute which is a vector of this same type. The function takes the last struct of the vector, gets into it's children, once again gets the last of the children's vector, gets once again into the children, and so n times. Then the function returns the n-th level child.
How would I go about making a compilable code like following pseudo code?
fn g(vector: Vec<...>, n: numeric input) {
let temporary;
n times {
temporary = vector.last().unwrap().children;
}
return temporary;
}
let vec = Vec // Existing Vector
g(vec).push() // pushes into a certain child element of the vec element, this child is received over the function.
However the above thing won't work since by giving the vector to the function, it's ownership is granted to the function and it's lifetime expires here.
(This is all very difficult to do correctly without an M.R.E. or even a copypaste of your compiler-errors - thus I'll proceed via theoretical analysis and educated guessing.)
Even though you pass in the tokens via &mut, that gets automagically converted into a nonexclusive & reference because that's what's in the signature. You can't then get an exclusive reference from that, only a mutable name that holds nonexclusive references.
You need an exclusive reference in order to do that push operation - the "many readers XOR one writer" principle requires that.
Once you have outermost_vector as an exclusive reference, you then need to assign more exclusive references to it. last returns nonexclusive references, so that's out. But since outermost_vector is now an exclusive reference, you can call slice::last_mut on it. Then - assuming all of this is even possible in the first place - you just need to follow the compiler-errors to fix up other minor things such as the intricacies of correctly pattern-matching the intermediate values.
You also need to fix the function's overall typing; Vec::push returns (), which thankfully is not a member of the type Vec<Token>.
And as to the title of your question: since you're taking an &'a mut Vec<Token>, if you want to return some data borrowed from that then you should type your return as &'a mut Vec<Token>, not Vec<Token> over which your function has no ownership.
(Question was edited significantly, new answer follows:)
Your function must accept vector not by value, but by exclusive reference (spelled &mut rather than &). Then you can return another exclusive reference, one which is borrowed from the exclusive reference that was passed-in, but points directly to the desired receiver of push. The function's type signature would be something like fn g(vector: &mut Vec<T>, n: usize) -> &mut Vec<T>.
Passing by value would cause the function to take ownership of the whole thing - all owned data not returned are dropped, and this is probably not what you want.
Passing by nonexclusive reference is not what you want either, since there is no way to then convert that back to an exclusive reference with which to call push - even if you're back at the callsite where you have ownership of the data from which the reference is borrowed-from.

Is &[T] literally an alias of Slice in rust?

&[T] is confusing me.
I naively assumed that like &T, &[T] was a pointer, which is to say, a numeric pointer address.
However, I've seen some code like this, that I was rather surprised to see work fine (simplified for demonstration purposes; but you see code like this in many 'as_slice()' implementations):
extern crate core;
extern crate collections;
use self::collections::str::raw::from_utf8;
use self::core::raw::Slice;
use std::mem::transmute;
fn main() {
let val = "Hello World";
{
let value:&str;
{
let bytes = val.as_bytes();
let mut slice = Slice { data: &bytes[0] as *const u8, len: bytes.len() };
unsafe {
let array:&[u8] = transmute(slice);
value = from_utf8(array);
}
// slice.len = 0;
}
println!("{}", value);
}
}
So.
I initially thought that this was invalid code.
That is, the instance of Slice created inside the block scope is returned to outside the block scope (by transmute), and although the code runs, the println! is actually accessing data that is no longer valid through unsafe pointers. Bad!
...but that doesn't seem to be the case.
Consider commenting the line // slice.len = 0;
This code still runs fine (prints 'Hello World') when this happens.
So the line...
value = from_utf8(array);
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)
A slice &[T], as you have noticed, is "equivalent" to a structure std::raw::Slice. In fact, Slice is an internal representation of &[T] value, and yes, it is a pointer and a length of data behind that pointer. Sometimes such structure is called "fat pointer", that is, a pointer and an additional piece of information.
When you pass &[T] value around, you indeed are just copying its contents - the pointer and the length.
If it was an invalid pointer to the 'slice' variable, the len at the println() statement would be 0, but it is not. So effectively a copy not just of a pointer value, but a full copy of the Slice structure.
Is that right?
So, yes, exactly.
Does that mean that in general its valid to return a &[T] as long as the actual inner data pointer is valid, regardless of the scope of the original &[T] that is being returned, because a &[T] assignment is a copy operation?
And this is also true. That's the whole idea of borrowed references, including slices - borrowed references are statically checked to be used as long as their referent is alive. When DST finally lands, slices and regular references will be even more unified.
(This seems, to me, to be extremely counter intuitive... so perhaps I am misunderstanding; if I'm right, having two &[T] that point to the same data cannot be valid, because they won't sync lengths if you modify one...)
And this is actually an absolutely valid concern; it is one of the problems with aliasing. However, Rust is designed exactly to prevent such bugs. There are two things which render aliasing of slices valid.
First, slices can't change length; there are no methods defined on &[T] which would allow you changing its length in place. You can create a derived slice from a slice, but it will be a new object whatsoever.
But even if slices can't change length, if the data could be mutated through them, they still could bring disaster if aliased. For example, if values in slices are enum instances, mutating a value in such an aliased slice could make a pointer to internals of enum value contained in this slice invalid. So, second, Rust aliasable slices (&[T]) are immutable. You can't change values contained in them and you can't take mutable references into them.
These two features (and compiler checks for lifetimes) make aliasing of slices absolutely safe. However, sometimes you do need to modify the data in a slice. And then you need mutable slice, called &mut [T]. You can change your data through such slice; but these slices are not aliasable. You can't create two mutable slices into the same structure (an array, for example), so you can't do anything dangerous.
Note, however, that using transmute() to transform a slice into a Slice or vice versa is an unsafe operation. &[T] is guaranteed statically to be correct if you create it using right methods, like calling as_slice() on a Vec. However, creating it manually using Slice struct and then transmuting it into &[T] is error-prone and can easily segfault your program, for example, when you assign it more length than is actually allocated.

Resources