Is the data in Vec<T> always densely packed? - rust

It is a common pattern to see this 'shortcut' code in rust:
unsafe fn any_as_u8_slice<T: Sized>(p: &T) -> &[u8] {
::std::slice::from_raw_parts(
(p as *const T) as *const u8,
::std::mem::size_of::<T>(),
)
}
ie. Given a struct, unsafely convert the underlying pointer to &[u8] to read the bytes.
However, is it valid to take the same approach when using Vec<T>?
For example, this appears to work:
use std::mem::size_of;
use std::slice::from_raw_parts;
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct Point {
pub x: u8,
pub y: u8,
pub z: u8,
}
fn as_bytes(data: &[Point]) -> &[u8] {
unsafe {
let raw_pointer = data.as_ptr();
from_raw_parts(raw_pointer as *const u8, size_of::<Point>() * data.len())
}
}
fn main() {
let points = vec![Point{x: 0u8, y: 1u8, z: 2u8}, Point{x: 3u8, y: 4u8, z: 5u8}];
let slice = points.as_slice();
println!("{:?}", slice);
let bytes = as_bytes(slice);
println!("{:?}", bytes);
assert!(bytes.len() == 6);
assert!(bytes[0] == 0u8);
assert!(bytes[1] == 1u8);
assert!(bytes[2] == 2u8);
assert!(bytes[3] == 3u8);
assert!(bytes[4] == 4u8);
assert!(bytes[5] == 5u8);
}
...but is it reliable to assume that Vec<T> is represented as a single contiguous block of data this way?
The documentation on https://doc.rust-lang.org/std/vec/struct.Vec.html#capacity-and-reallocation says:
If a Vec has allocated memory, then the memory it points to is on the heap (as defined by the allocator Rust is configured to use by default), and its pointer points to len initialized, contiguous elements in order (what you would see if you coerced it to a slice), followed by capacity-len logically uninitialized, contiguous elements.
...but I'm not really sure if I understand what it means. Does this actually mean that for Vec<T> the underlying pointer is to a block of memory of length size_of::<T> * length of the Vec?

Yes, a Vec<T> can be made into something that can be treated as a pointer to a block of memory of length std::mem::size_of::<T>() times the length of Vec.
There is one caveat, as what you are actually interested in is the slice of T, which the Vec can provide; the Vec itself should be considered an implementation detail. Besides that:
A Vec<T> can deref to a slice [T]. Take that slice.
The Rust Reference defines that a slice has the same layout as the section of the Array they slice. So when we deref from a Vec<T> to a [T], this slice of length n is guaranteed to have the same memory layout as an array [T; n].
The Rust References defines the memory layout of an Array:
Arrays are laid out so that the nth element of the array is offset
from the start of the array by n * the size of the type bytes. An
array of [T; n] has a size of size_of::<T>() * n and the same
alignment of T.
We know n (from [T]) and we know "the size of the type bytes" (via mem::size_of<T>()). Since all members of an array must be fully initialized at all times, and given the two sentences from the paragraph above, we know it is safe to access all bytes up until mem::size_of<T>() * length of Vec (actually length of slice, which introduces the array memory layout rule).
To make use of all that, you should make sure that you get a slice of the Vec first, use as_ptr() on the slice, and cast the raw pointer you get. This ensures the sequence of definitions as above. Your fn as_bytes(data: &[Point]) -> &[u8] is exactly correct.

Related

How is from_raw_parts_mut able to transmute between types of different sizes?

I am looking at the code of from_raw_parts_mut:
pub unsafe fn from_raw_parts_mut<'a, T>(p: *mut T, len: usize) -> &'a mut [T] {
mem::transmute(Repr { data: p, len: len })
}
It uses transmute to reinterpret a Repr to a &mut [T]. As far as I understand, Repr is a 128 bit struct. How does this transmute of differently sized types work?
mem::transmute() does only work when transmuting to a type of the same size - so that means an &mut[T] slice is also the same size.
Looking at Repr:
#[repr(C)]
struct Repr<T> {
pub data: *const T,
pub len: usize,
}
It has a pointer to some data and a length. This is exactly what a slice is - a pointer to an array of items (which might be an actual array, or owned by a Vec<T>, etc.) with a length to say how many items are valid.
The object which is passed around as a slice is (under the covers) exactly what the Repr looks like, even though the data it refers to can be anything from 0 to as many T as will fit into memory.
In Rust, some references are not just implemented as a pointer as in some other languages. Some types are "fat pointers". This might not be obvious at first since, especially if you are familiar with references/pointers in some other languages! Some examples are:
Slices &[T] and &mut [T], which as described above, are actually a pointer and length. The length is needed for bounds checks. For example, you can pass a slice corresponding to part of an array or Vec to a function.
Trait objects like &Trait or Box<Trait>, where Trait is a trait rather than a concrete type, are actually a pointer to the concrete type and a pointer to a vtable — the information needed to call trait methods on the object, given that its concrete type is not known.

How do I fix a missing lifetime specifier?

I have a very simple method. The first argument takes in vector components ("A", 5, 0) and I will compare this to every element of another vector to see if they have the same ( _ , 5 , _) and then print out the found element's string.
Comparing ("A", 5, 0 ) and ("Q", 5, 2) should print out Q.
fn is_same_space(x: &str, y1: i32, p: i32, vector: &Vec<(&str, i32, i32)>) -> (&str) {
let mut foundString = "";
for i in 0..vector.len() {
if y1 == vector[i].1 {
foundString = vector[i].0;
}
}
foundString
}
However, I get this error
error[E0106]: missing lifetime specifier
--> src/main.rs:1:80
|
1 | fn is_same_space(x: &str, y1: i32, p: i32, vector: &Vec<(&str, i32, i32)>) -> (&str) {
| ^ expected lifetime parameter
|
= help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `x` or one of `vector`'s 2 elided lifetimes
By specifying a lifetime:
fn is_same_space<'a>(x: &'a str, y1: i32, p: i32, vector: &'a Vec<(&'a str, i32, i32)>) -> (&'a str)
This is only one of many possible interpretations of what you might have meant for the function to do, and as such it's a very conservative choice - it uses a unified lifetime of all the referenced parameters.
Perhaps you wanted to return a string that lives as long as x or as long as vector or as long as the strings inside vector; all of those are potentially valid.
I strongly recommend that you go back and re-read The Rust Programming Language. It's free, and aimed at beginners to Rust, and it covers all the things that make Rust unique and are new to programmers. Many people have spent a lot of time on this book and it answers many beginner questions such as this one.
Specifically, you should read the chapters on:
ownership
references and borrowing
lifetimes
There's even a second edition in the works, with chapters like:
Understanding Ownership
Generic Types, Traits, and Lifetimes
For fun, I'd rewrite your code using iterators:
fn is_same_space<'a>(y1: i32, vector: &[(&'a str, i32, i32)]) -> &'a str {
vector.iter()
.rev() // start from the end
.filter(|item| item.1 == y1) // element that matches
.map(|item| item.0) // first element of the tuple
.next() // take the first (from the end)
.unwrap_or("") // Use a default value
}
Removed the unneeded parameters.
Using an iterator avoids the overhead of bounds checks, and more clearly exposes your intent.
Why is it discouraged to accept a reference to a String (&String) or Vec (&Vec) as a function argument?
Rust does not use camelCase variable names.
I assume that you do want to return the string from inside vector.
Remove the redundant parens on the return type
So the problem comes from the fact that vector has two inferred lifetimes, one for vector itself (the &Vec part) and one for the &str inside the vector. You also have an inferred lifetime on x, but that really inconsequential.
To fix it, just specify that the returned &str lives as long as the &str in the vector:
fn is_same_space<'a>( // Must declare the lifetime here
x: &str, // This borrow doesn't have to be related (x isn't even used)
y1: i32, // Not borrowed
p: i32, // Not borrowed or used
vector: &'a Vec<(&'a str, i32, i32)> // Vector and some of its data are borrowed here
) -> &'a str { // This tells rustc how long the return value should live
...
}

Convert "*mut *mut f32" into "&[&[f32]]"

I want to convert arrays.
Example:
func()-> *mut *mut f32;
...
let buffer = func();
for n in 0..48000 {
buffer[0][n] = 1.0;
buffer[1][n] = 3.0;
}
In Rust &[T]/&mut [T] is called a slice. A slice is not an array; it is a pointer to the beginning of an array and the number of items in this array. Therefore, to create &mut [T] out of *mut T, you need to known the length of the array behind the pointer.
*mut *mut T looks like a C implementation of a 2D, possibly jagged, array, i.e. an array of arrays (this is different from a contiguous 2D array, as you probably know). There is no free way to convert it to &mut [&mut [T]], because, as I said before, *mut T is one pointer-sized number, while &mut [T] is two pointer-sized numbers. So you can't, for example, transmute *mut T to &mut [T], it would be a size mismatch. Therefore, you can't simply transform *mut *mut f32 to &mut [&mut [f32]] because of the layout mismatch.
In order to safely work with numbers stored in *mut *mut f32, you need, first, determine the length of the outer array and lengths of all of the inner arrays. For simplicity, let's consider that they are all known statically:
const ROWS: usize = 48000;
const COLUMNS: usize = 48000;
Now, since you know the length, you can convert the outer pointer to a slice of raw pointers:
use std::slice;
let buffer: *mut *mut f32 = func();
let buf_slice: &mut [*mut f32] = unsafe {
slice::from_raw_parts_mut(buffer, ROWS);
};
Now you need to go through this slice and convert each item to a slice, collecting the results into a vector:
let matrix: Vec<&mut [f32]> = buf_slice.iter_mut()
.map(|p| unsafe { slice::from_raw_parts_mut(p, COLUMNS) })
.collect();
And now you can indeed access your buffer by indices:
for n in 0..COLUMNS {
matrix[0][n] = 1.0;
matrix[1][n] = 3.0;
}
(I have put explicit types on bindings for readability, most of them in fact can be omitted)
So, there are two main things to consider when converting raw pointers to slices:
you need to know exact length of the array to create a slice from it; if you know it, you can use slice::from_raw_parts() or slice::from_raw_parts_mut();
if you are converting nested arrays, you need to rebuild each layer of the indirection because pointers have different size than slices.
And naturally, you have to track who is the owner of the buffer and when it will be freed, otherwise you can easily get a slice pointing to a buffer which does not exist anymore. This is unsafe, after all.
Since your array seems to be an array of pointers to an array of 48000 f32s, you can simply use fixed size arrays ([T; N]) instead of slices ([T]):
fn func() -> *mut *mut f32 { unimplemented!() }
fn main() {
let buffer = func();
let buffer: &mut [&mut [f32; 48000]; 2] = unsafe { std::mem::transmute(buffer) };
for n in 0..48000 {
buffer[0][n] = 1.0;
buffer[1][n] = 3.0;
}
}

Mutable multidimensional array as a function argument

In rustc 1.0.0, I'd like to write a function that mutates a two dimensional array supplied by the caller. I was hoping this would work:
fn foo(x: &mut [[u8]]) {
x[0][0] = 42;
}
fn main() {
let mut x: [[u8; 3]; 3] = [[0; 3]; 3];
foo(&mut x);
}
It fails to compile:
$ rustc fail2d.rs
fail2d.rs:7:9: 7:15 error: mismatched types:
expected `&mut [[u8]]`,
found `&mut [[u8; 3]; 3]`
(expected slice,
found array of 3 elements) [E0308]
fail2d.rs:7 foo(&mut x);
^~~~~~
error: aborting due to previous error
I believe this is telling me I need to somehow feed the function a slice of slices, but I don't know how to construct this.
It "works" if I hard-code the nested array's length in the function signature. This isn't acceptable because I want the function to operate on multidimensional arrays of arbitrary dimension:
fn foo(x: &mut [[u8; 3]]) { // FIXME: don't want to hard code length of nested array
x[0][0] = 42;
}
fn main() {
let mut x: [[u8; 3]; 3] = [[0; 3]; 3];
foo(&mut x);
}
tldr; any zero-cost ways of passing a reference to a multidimensional array such that the function use statements like $x[1][2] = 3;$?
This comes down to a matter of memory layout. Assuming a type T with a size known at compile time (this constraint can be written T: Sized), the size of [T; n] is known at compile time (it takes n times as much memory as T does); but [T] is an unsized type; its length is not known at compile time. Therefore it can only be used through some form of indirection, such as a reference (&[T]) or a box (Box<[T]>, though this is of limited practical value, with Vec<T> which allows you to add and remove items without needing to reallocate every single time by using overallocation).
A slice of an unsized type doesn’t make sense; it’s permitted for reasons that are not clear to me, but you can never actually have an instance of it. (Vec<T>, by comparison, requires T: Sized.)
&[T; n] can coerce to &[T], and &mut [T; n] to &mut [T], but this only applies at the outermost level; the contents of slice is fixed (you’d need to create a new array or vector to achieve such a transformation, because the memory layout of each item is different). The effect of this is that arrays work for single‐dimensional work, but for multi‐dimensional work they fall apart. Arrays are currently very much second‐class citizens in Rust, and will be until the language supports making slices generic over length, which it is likely to eventually.
I recommend that you use either a single‐dimensional array (suitable for square matrices, indexed by x * width + y or similar), or vectors (Vec<Vec<T>>). There may also be libraries already out there abstracting over a suitable solution.

Is there a good way to convert a Vec<T> to an array?

Is there a good way to convert a Vec<T> with size S to an array of type [T; S]? Specifically, I'm using a function that returns a 128-bit hash as a Vec<u8>, which will always have length 16, and I would like to deal with the hash as a [u8, 16].
Is there something built-in akin to the as_slice method which gives me what I want, or should I write my own function which allocates a fixed-size array, iterates through the vector copying each element, and returns the array?
Arrays must be completely initialized, so you quickly run into concerns about what to do when you convert a vector with too many or too few elements into an array. These examples simply panic.
As of Rust 1.51 you can parameterize over an array's length.
use std::convert::TryInto;
fn demo<T, const N: usize>(v: Vec<T>) -> [T; N] {
v.try_into()
.unwrap_or_else(|v: Vec<T>| panic!("Expected a Vec of length {} but it was {}", N, v.len()))
}
As of Rust 1.48, each size needs to be a specialized implementation:
use std::convert::TryInto;
fn demo<T>(v: Vec<T>) -> [T; 4] {
v.try_into()
.unwrap_or_else(|v: Vec<T>| panic!("Expected a Vec of length {} but it was {}", 4, v.len()))
}
As of Rust 1.43:
use std::convert::TryInto;
fn demo<T>(v: Vec<T>) -> [T; 4] {
let boxed_slice = v.into_boxed_slice();
let boxed_array: Box<[T; 4]> = match boxed_slice.try_into() {
Ok(ba) => ba,
Err(o) => panic!("Expected a Vec of length {} but it was {}", 4, o.len()),
};
*boxed_array
}
See also:
How to get a slice as an array in Rust?
How do I get an owned value out of a `Box`?
Is it possible to control the size of an array using the type parameter of a generic?

Resources