Creating a fixed-size array on heap in Rust - rust

I've tried to use the following code:
fn main() {
let array = box [1, 2, 3];
}
, in my program, and it results in a compile error: error: obsolete syntax: ~[T] is no longer a type.
AFAIU, there are no dynamic size arrays in Rust (the size has to be known at compile time). However, in my code snippet the array does have static size and should be of type ~[T, ..3] (owned static array of size 3) whereas the compiler says it has the type ~[T]. Is there any deep reason why it isn't possible to get a static sized array allocated on the heap?
P.S. Yeah, I've heard about Vec.

Since I ended up here, others might as well. Rust has moved along and at the point of this answer Rust is at 1.53 for stable and 1.55 for nightly.
Box::new([1, 2, 3]) is the recommended way, and does its job, however there is a catch: The array is created on the stack and then copied over to the heap. This is a documented behaviour of Box:
Move a value from the stack to the heap by creating a Box:
Meaning, it contains a hidden memcopy, and with large array, the heap allocation even fails with a stack overflow.
const X: usize = 10_000_000;
let failing_huge_heap_array = [1; X];
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
There are several workarounds to this as of now (Rust 1.53), the most straightforward is to create a vector and turn the vector into a boxed slice:
const X: usize = 10_000_000;
let huge_heap_array = vec![1; X].into_boxed_slice();
This works, but has two small catches: It looses the type information, what should be Box<[i32; 10000000]> is now Box<[usize]> and additionally takes up 16 bytes on the stack as opposed to an array which only takes 8.
...
println!("{}", mem::size_of_val(&huge_heap_array);
16
Not a huge deal, but it hurts my personal Monk factor.
Upon further research, discarding options that need nightly like the OP box [1, 2, 3] which seems to be coming back with the feature #![feature(box_syntax)] and the arr crate which is nice but also needs nightly, the best solution I found to allocating an array on the heap without the hidden memcopy
was a suggestion by Simias
/// A macro similar to `vec![$elem; $size]` which returns a boxed array.
///
/// ```rustc
/// let _: Box<[u8; 1024]> = box_array![0; 1024];
/// ```
macro_rules! box_array {
($val:expr ; $len:expr) => {{
// Use a generic function so that the pointer cast remains type-safe
fn vec_to_boxed_array<T>(vec: Vec<T>) -> Box<[T; $len]> {
let boxed_slice = vec.into_boxed_slice();
let ptr = ::std::boxed::Box::into_raw(boxed_slice) as *mut [T; $len];
unsafe { Box::from_raw(ptr) }
}
vec_to_boxed_array(vec![$val; $len])
}};
}
const X: usize = 10_000_000;
let huge_heap_array = box_array![1; X];
It does not overflow the stack and only takes up 8 bytes while preserving the type.
It uses unsafe, but limits this to a single line of code. Until the arrival of the box [1;X] syntax, IMHO a clean option.

As far as I know the box expression is experimental. You can use Box::new() with something like the code below to suppress warnings.
fn main() {
let array1 = Box::new([1, 2, 3]);
// or even
let array2: Box<[i32]> = Box::new([1, 2, 3]);
}
Check out the comment by Shepmaster below, as these are different types.

Just write like:
let mut buffer= vec![0; k]; it makes u8 array with length equals k.

Related

Cast vector of i8 to vector of u8 in Rust? [duplicate]

This question already has answers here:
How do I convert a Vec<T> to a Vec<U> without copying the vector?
(2 answers)
Closed 3 years ago.
Is there a better way to cast Vec<i8> to Vec<u8> in Rust except for these two?
creating a copy by mapping and casting every entry
using std::transmute
The (1) is slow, the (2) is "transmute should be the absolute last resort" according to the docs.
A bit of background maybe: I'm getting a Vec<i8> from the unsafe gl::GetShaderInfoLog() call and want to create a string from this vector of chars by using String::from_utf8().
The other answers provide excellent solutions for the underlying problem of creating a string from Vec<i8>. To answer the question as posed, creating a Vec<u8> from data in a Vec<i8> can be done without copying or transmuting the vector. As pointed out by #trentcl, transmuting the vector directly constitutes undefined behavior because Vec is allowed to have different layout for different types.
The correct (though still requiring the use of unsafe) way to transfer a vector's data without copying it is:
obtain the *mut i8 pointer to the data in the vector, along with its length and capacity
leak the original vector to prevent it from freeing the data
use Vec::from_raw_parts to build a new vector, giving it the pointer cast to *mut u8 - this is the unsafe part, because we are vouching that the pointer contains valid and initialized data, and that it is not in use by other objects, and so on.
This is not UB because the new Vec is given the pointer of the correct type from the start. Code (playground):
fn vec_i8_into_u8(v: Vec<i8>) -> Vec<u8> {
// ideally we'd use Vec::into_raw_parts, but it's unstable,
// so we have to do it manually:
// first, make sure v's destructor doesn't free the data
// it thinks it owns when it goes out of scope
let mut v = std::mem::ManuallyDrop::new(v);
// then, pick apart the existing Vec
let p = v.as_mut_ptr();
let len = v.len();
let cap = v.capacity();
// finally, adopt the data into a new Vec
unsafe { Vec::from_raw_parts(p as *mut u8, len, cap) }
}
fn main() {
let v = vec![-1i8, 2, 3];
assert!(vec_i8_into_u8(v) == vec![255u8, 2, 3]);
}
transmute on a Vec is always, 100% wrong, causing undefined behavior, because the layout of Vec is not specified. However, as the page you linked also mentions, you can use raw pointers and Vec::from_raw_parts to perform this correctly. user4815162342's answer shows how.
(std::mem::transmute is the only item in the Rust standard library whose documentation consists mostly of suggestions for how not to use it. Take that how you will.)
However, in this case, from_raw_parts is also unnecessary. The best way to deal with C strings in Rust is with the wrappers in std::ffi, CStr and CString. There may be better ways to work this in to your real code, but here's one way you could use CStr to borrow a Vec<c_char> as a &str:
const BUF_SIZE: usize = 1000;
let mut info_log: Vec<c_char> = vec![0; BUF_SIZE];
let mut len: usize;
unsafe {
gl::GetShaderInfoLog(shader, BUF_SIZE, &mut len, info_log.as_mut_ptr());
}
let log = Cstr::from_bytes_with_nul(info_log[..len + 1])
.expect("Slice must be nul terminated and contain no nul bytes")
.to_str()
.expect("Slice must be valid UTF-8 text");
Notice there is no unsafe code except to call the FFI function; you could also use with_capacity + set_len (as in wasmup's answer) to skip initializing the Vec to 1000 zeros, and use from_bytes_with_nul_unchecked to skip checking the validity of the returned string.
See this:
fn get_compilation_log(&self) -> String {
let mut len = 0;
unsafe { gl::GetShaderiv(self.id, gl::INFO_LOG_LENGTH, &mut len) };
assert!(len > 0);
let mut buf = Vec::with_capacity(len as usize);
let buf_ptr = buf.as_mut_ptr() as *mut gl::types::GLchar;
unsafe {
gl::GetShaderInfoLog(self.id, len, std::ptr::null_mut(), buf_ptr);
buf.set_len(len as usize);
};
match String::from_utf8(buf) {
Ok(log) => log,
Err(vec) => panic!("Could not convert compilation log from buffer: {}", vec),
}
}
See ffi:
let s = CStr::from_ptr(strz_ptr).to_str().unwrap();
Doc

How to safely reinterpret Vec<f64> as Vec<num_complex::Complex<f64>> with half the size?

I have complex number data filled into a Vec<f64> by an external C library (prefer not to change) in the form [i_0_real, i_0_imag, i_1_real, i_1_imag, ...] and it appears that this Vec<f64> has the same memory layout as a Vec<num_complex::Complex<f64>> of half the length would be, given that num_complex::Complex<f64>'s data structure is memory-layout compatible with [f64; 2] as documented here. I'd like to use it as such without needing a re-allocation of a potentially large buffer.
I'm assuming that it's valid to use from_raw_parts() in std::vec::Vec to fake a new Vec that takes ownership of the old Vec's memory (by forgetting the old Vec) and use size / 2 and capacity / 2, but that requires unsafe code. Is there a "safe" way to do this kind of data re-interpretation?
The Vec is allocated in Rust as a Vec<f64> and is populated by a C function using .as_mut_ptr() that fills in the Vec<f64>.
My current compiling unsafe implementation:
extern crate num_complex;
pub fn convert_to_complex_unsafe(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
let new_vec = unsafe {
Vec::from_raw_parts(
buffer.as_mut_ptr() as *mut num_complex::Complex<f64>,
buffer.len() / 2,
buffer.capacity() / 2,
)
};
std::mem::forget(buffer);
return new_vec;
}
fn main() {
println!(
"Converted vector: {:?}",
convert_to_complex_unsafe(vec![3.0, 4.0, 5.0, 6.0])
);
}
Is there a "safe" way to do this kind of data re-interpretation?
No. At the very least, this is because the information you need to know is not expressed in the Rust type system but is expressed via prose (a.k.a. the docs):
Complex<T> is memory layout compatible with an array [T; 2].
— Complex docs
If a Vec has allocated memory, then [...] its pointer points to len initialized, contiguous elements in order (what you would see if you coerced it to a slice),
— Vec docs
Arrays coerce to slices ([T])
— Array docs
Since a Complex is memory-compatible with an array, an array's data is memory-compatible with a slice, and a Vec's data is memory-compatible with a slice, this transformation should be safe, even though the compiler cannot tell this.
This information should be attached (via a comment) to your unsafe block.
I would make some small tweaks to your function:
Having two Vecs at the same time pointing to the same data makes me very nervous. This can be trivially avoided by introducing some variables and forgetting one before creating the other.
Remove the return keyword to be more idiomatic
Add some asserts that the starting length of the data is a multiple of two.
As rodrigo points out, the capacity could easily be an odd number. To attempt to avoid this, we call shrink_to_fit. This has the downside that the Vec may need to reallocate and copy the memory, depending on the implementation.
Expand the unsafe block to cover all of the related code that is required to ensure that the safety invariants are upheld.
pub fn convert_to_complex(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
buffer.shrink_to_fit();
let ptr = buffer.as_mut_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
let cap = buffer.capacity();
assert!(len % 2 == 0);
assert!(cap % 2 == 0);
std::mem::forget(buffer);
Vec::from_raw_parts(ptr, len / 2, cap / 2)
}
}
To avoid all the worrying about the capacity, you could just convert a slice into the Vec. This also doesn't have any extra memory allocation. It's simpler because we can "lose" any odd trailing values because the Vec still maintains them.
pub fn convert_to_complex(buffer: &[f64]) -> &[num_complex::Complex<f64>] {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
let ptr = buffer.as_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
assert!(len % 2 == 0);
std::slice::from_raw_parts(ptr, len / 2)
}
}

Ergonomics issues with fixed size byte arrays in Rust

Rust sadly cannot produce a fixed size array [u8; 16] with a fixed size slicing operator s[0..16]. It'll throw errors like "expected array of 16 elements, found slice".
I've some KDFs that output several keys in wrapper structs like
pub struct LeafKey([u8; 16]);
pub struct MessageKey([u8; 32]);
fn kdfLeaf(...) -> (MessageKey,LeafKey) {
// let mut r: [u8; 32+16];
let mut r: (MessageKey, LeafKey);
debug_assert_eq!(mem::size_of_val(&r), 384/8);
let mut sha = Sha3::sha3_384();
sha.input(...);
// sha.result(r);
sha.result(
unsafe { mem::transmute::<&mut (MessageKey, LeafKey),&mut [u8;32+16]>(&r) }
);
sha.reset();
// (MessageKey(r[0..31]), LeafKey(r[32..47]))
r
}
Is there a safer way to do this? We know mem::transmute will refuse to compile if the types do not have the same size, but that only checks that pointers have the same size here, so I added that debug_assert.
In fact, I'm not terribly worried about extra copies though since I'm running SHA3 here, but afaik rust offers no ergonomic way to copy amongst byte arrays.
Can I avoid writing (MessageKey, LeafKey) three times here? Is there a type alias for the return type of the current function? Is it safe to use _ in the mem::transmute given that I want the code to refuse to compile if the sizes do not match? Yes, I know I could make a type alias, but that seems silly.
As an aside, there is a longer discussion of s[0..16] not having type [u8; 16] here
There's the copy_from_slice method.
fn main() {
use std::default::Default;
// Using 16+8 because Default isn't implemented
// for [u8; 32+16] due to type explosion unfortunateness
let b: [u8; 24] = Default::default();
let mut c: [u8; 16] = Default::default();
let mut d: [u8; 8] = Default::default();
c.copy_from_slice(&b[..16])
d.copy_from_slice(&b[16..16+8]);
}
Note, unfortunately copy_from_slice throws a runtime error if the slices are not the same length, so make sure you thoroughly test this yourself, or use the lengths of the other arrays to guard.
Unfortunately, c.copy_from_slice(&b[..c.len()]) doesn't work because Rust thinks c is borrowed both immutably and mutably at the same time.
I marked the accepted answer as best since it's safe, and led me to the clone_into_array answer here, but..
Another idea that improves the safety is to make a version of mem::transmute for references that checks the sizes of the referenced types, as opposed to just the pointers. It might look like :
#[inline]
unsafe fn transmute_ptr_mut<A,B>(v: &mut A) -> &mut B {
debug_assert_eq!(core::mem::size_of(A),core::mem::size_of(B));
core::mem::transmute::<&mut A,&mut B>(v)
}
I have raised an issue on the arrayref crate to discuss this, as arrayref might be a reasonable crate for it to live in.
Update : We've a new "best answer" by the arrayref crate developer :
let (a,b) = array_refs![&r,32,16];
(MessageKey(*a), LeafKey(*b))

What happens if I call Vec::from_raw_parts with a smaller capacity than the pointer actually has?

I have a vector of u8 that I want to interpret as a vector of u32. It is assumed that the bytes are in the right order. I don't want to allocate new memory and copy bytes after casting. I got the following to work:
use std::mem;
fn reinterpret(mut v: Vec<u8>) -> Option<Vec<u32>> {
let v_len = v.len();
v.shrink_to_fit();
if v_len % 4 != 0 {
None
} else {
let v_cap = v.capacity();
let v_ptr = v.as_mut_ptr();
println!("{:?}|{:?}|{:?}", v_len, v_cap, v_ptr);
let v_reinterpret = unsafe { Vec::from_raw_parts(v_ptr as *mut u32, v_len / 4, v_cap / 4) };
println!("{:?}|{:?}|{:?}",
v_reinterpret.len(),
v_reinterpret.capacity(),
v_reinterpret.as_ptr());
println!("{:?}", v_reinterpret);
println!("{:?}", v); // v is still alive, but is same as rebuilt
mem::forget(v);
Some(v_reinterpret)
}
}
fn main() {
let mut v: Vec<u8> = vec![1, 1, 1, 1, 1, 1, 1, 1];
let test = reinterpret(v);
println!("{:?}", test);
}
However, there's an obvious problem here. From the shrink_to_fit documentation:
It will drop down as close as possible to the length but the allocator may still inform the vector that there is space for a few more elements.
Does this mean that my capacity may still not be a multiple of the size of u32 after calling shrink_to_fit? If in from_raw_parts I set capacity to v_len/4 with v.capacity() not an exact multiple of 4, do I leak those 1-3 bytes, or will they go back into the memory pool because of mem::forget on v?
Is there any other problem I am overlooking here?
I think moving v into reinterpret guarantees that it's not accessible from that point on, so there's only one owner from the mem::forget(v) call onwards.
This is an old question, and it looks like it has a working solution in the comments. I've just written up what exactly goes wrong here, and some solutions that one might create/use in today's Rust.
This is undefined behavior
Vec::from_raw_parts is an unsafe function, and thus you must satisfy its invariants, or you invoke undefined behavior.
Quoting from the documentation for Vec::from_raw_parts:
ptr needs to have been previously allocated via String/Vec (at least, it's highly likely to be incorrect if it wasn't).
T needs to have the same size and alignment as what ptr was allocated with. (T having a less strict alignment is not sufficient, the alignment really needs to be equal to satsify the dealloc requirement that memory must be allocated and deallocated with the same layout.)
length needs to be less than or equal to capacity.
capacity needs to be the capacity that the pointer was allocated with.
So, to answer your question, if capacity is not equal to the capacity of the original vec, then you've broken this invariant. This gives you undefined behavior.
Note that the requirement isn't on size_of::<T>() * capacity either, though, which brings us to the next topic.
Is there any other problem I am overlooking here?
Three things.
First, the function as written is disregarding another requirement of from_raw_parts. Specifically, T must have the same size as alignment as the original T. u32 is four times as big as u8, so this again breaks this requirement. Even if capacity*size remains the same, size isn't, and capacity isn't. This function will never be sound as implemented.
Second, even if all of the above was valid, you've also ignored the alignment. u32 must be aligned to 4-byte boundaries, while a Vec<u8> is only guaranteed to be aligned to a 1-byte boundary.
A comment on the OP mentions:
I think on x86_64, misalignment will have performance penalty
It's worth noting that while this may be true of machine language, it is not true for Rust. The rust reference explicitly states "A value of alignment n must only be stored at an address that is a multiple of n." This is a hard requirement.
Why the exact type requirement?
Vec::from_raw_parts seems like it's pretty strict, and that's for a reason. In Rust, the allocator API operates not only on allocation size, but on a Layout, which is the combination of size, number of things, and alignment of individual elements. In C with memalloc, all the allocator can rely upon is that the size is the same, and some minimum alignment. In Rust, though, it's allowed to rely on the entire Layout, and invoke undefined behavior if not.
So in order to correctly deallocate the memory, Vec needs to know the exact type that it was allocated with. By converting a Vec<u32> into Vec<u8>, it no longer knows this information, and so it can no longer properly deallocate this memory.
Alternative - Transforming slices
Vec::from_raw_parts's strictness comes from the fact that it needs to deallocate the memory. If we create a borrowing slice, &[u32] instead, we no longer need to deal with it! There is no capacity when turning a &[u8] into &[u32], so we should be all good, right?
Well, almost. You still have to deal with alignment. Primitives are generally aligned to their size, so a [u8] is only guaranteed to be aligned to 1-byte boundaries, while [u32] must be aligned to a 4-byte boundary.
If you want to chance it, though, and create a [u32] if possible, there's a function for that - <[T]>::align_to:
pub unsafe fn align_to<U>(&self) -> (&[T], &[U], &[T])
This will trim of any starting and ending misaligned values, and then give you a slice in the middle of your new type. It's unsafe, but the only invariant you need to satisfy is that the elements in the middle slice are valid.
It's sound to reinterpret 4 u8 values as a u32 value, so we're good.
Putting it all together, a sound version of the original function would look like this. This operates on borrowed rather than owned values, but given that reinterpreting an owned Vec is instant-undefined-behavior in any case, I think it's safe to say this is the closest sound function:
use std::mem;
fn reinterpret(v: &[u8]) -> Option<&[u32]> {
let (trimmed_front, u32s, trimmed_back) = unsafe { v.align_to::<u32>() };
if trimmed_front.is_empty() && trimmed_back.is_empty() {
Some(u32s)
} else {
// either alignment % 4 != 0 or len % 4 != 0, so we can't do this op
None
}
}
fn main() {
let mut v: Vec<u8> = vec![1, 1, 1, 1, 1, 1, 1, 1];
let test = reinterpret(&v);
println!("{:?}", test);
}
As a note, this could also be done with std::slice::from_raw_parts rather than align_to. However, that requires manually dealing with the alignment, and all it really gives is more things we need to ensure we're doing right. Well, that and compatibility with older compilers - align_to was introduced in 2018 in Rust 1.30.0, and wouldn't have existed when this question was asked.
Alternative - Copying
If you do need a Vec<u32> for long term data storage, I think the best option is to just allocate new memory. The old memory is allocated for u8s anyways, and wouldn't work.
This can be made fairly simple with some functional programming:
fn reinterpret(v: &[u8]) -> Option<Vec<u32>> {
let v_len = v.len();
if v_len % 4 != 0 {
None
} else {
let result = v
.chunks_exact(4)
.map(|chunk: &[u8]| -> u32 {
let chunk: [u8; 4] = chunk.try_into().unwrap();
let value = u32::from_ne_bytes(chunk);
value
})
.collect();
Some(result)
}
}
First, we use <[T]>::chunks_exact to iterate over chunks of 4 u8s. Next, try_into to convert from &[u8] to [u8; 4]. The &[u8] is guaranteed to be length 4, so this never fails.
We use u32::from_ne_bytes to convert the bytes into a u32 using native endianness. If interacting with a network protocol, or on-disk serialization, then using from_be_bytes or from_le_bytes may be preferable. And finally, we collect to turn our result back into a Vec<u32>.
As a last note, a truly general solution might use both of these techniques. If we change the return type to Cow<'_, [u32]>, we could return aligned, borrowed data if it works, and allocate a new array if it doesn't! Not quite the best of both worlds, but close.

Take slice of certain length known at compile time

In this code:
fn unpack_u32(data: &[u8]) -> u32 {
assert_eq!(data.len(), 4);
let res = data[0] as u32 |
(data[1] as u32) << 8 |
(data[2] as u32) << 16 |
(data[3] as u32) << 24;
res
}
fn main() {
let v = vec![0_u8, 1_u8, 2_u8, 3_u8, 4_u8, 5_u8, 6_u8, 7_u8, 8_u8];
println!("res: {:X}", unpack_u32(&v[1..5]));
}
the function unpack_u32 accepts only slices of length 4. Is there any way to replace the runtime check assert_eq with a compile time check?
Yes, kind of. The first step is easy: change the argument type from &[u8] to [u8; 4]:
fn unpack_u32(data: [u8; 4]) -> u32 { ... }
But transforming a slice (like &v[1..5]) into an object of type [u8; 4] is hard. You can of course create such an array simply by specifying all elements, like so:
unpack_u32([v[1], v[2], v[3], v[4]]);
But this is rather ugly to type and doesn't scale well with array size. So the question is "How to get a slice as an array in Rust?". I used a slightly modified version of Matthieu M.'s answer to said question (playground):
fn unpack_u32(data: [u8; 4]) -> u32 {
// as before without assert
}
use std::convert::AsMut;
fn clone_into_array<A, T>(slice: &[T]) -> A
where A: Default + AsMut<[T]>,
T: Clone
{
assert_eq!(slice.len(), std::mem::size_of::<A>()/std::mem::size_of::<T>());
let mut a = Default::default();
<A as AsMut<[T]>>::as_mut(&mut a).clone_from_slice(slice);
a
}
fn main() {
let v = vec![0_u8, 1, 2, 3, 4, 5, 6, 7, 8];
println!("res: {:X}", unpack_u32(clone_into_array(&v[1..5])));
}
As you can see, there is still an assert and thus the possibility of runtime failure. The Rust compiler isn't able to know that v[1..5] is 4 elements long, because 1..5 is just syntactic sugar for Range which is just a type the compiler knows nothing special about.
I think the answer is no as it is; a slice doesn't have a size (or minimum size) as part of the type, so there's nothing for the compiler to check; and similarly a vector is dynamically sized so there's no way to check at compile time that you can take a slice of the right size.
The only way I can see for the information to be even in principle available at compile time is if the function is applied to a compile-time known array. I think you'd still need to implement a procedural macro to do the check (so nightly Rust only, and it's not easy to do).
If the problem is efficiency rather than compile-time checking, you may be able to adjust your code so that, for example, you do one check for n*4 elements being available before n calls to your function; you could use the unsafe get_unchecked to avoid later redundant bounds checks. Obviously you'd need to be careful to avoid mistakes in the implementation.
I had a similar problem, creating a fixed byte-array on stack corresponding to const length of other byte-array (which may change during development time)
A combination of compiler plugin and macro was the solution:
https://github.com/frehberg/rust-sizedbytes

Resources