Take slice of certain length known at compile time - rust

In this code:
fn unpack_u32(data: &[u8]) -> u32 {
assert_eq!(data.len(), 4);
let res = data[0] as u32 |
(data[1] as u32) << 8 |
(data[2] as u32) << 16 |
(data[3] as u32) << 24;
res
}
fn main() {
let v = vec![0_u8, 1_u8, 2_u8, 3_u8, 4_u8, 5_u8, 6_u8, 7_u8, 8_u8];
println!("res: {:X}", unpack_u32(&v[1..5]));
}
the function unpack_u32 accepts only slices of length 4. Is there any way to replace the runtime check assert_eq with a compile time check?

Yes, kind of. The first step is easy: change the argument type from &[u8] to [u8; 4]:
fn unpack_u32(data: [u8; 4]) -> u32 { ... }
But transforming a slice (like &v[1..5]) into an object of type [u8; 4] is hard. You can of course create such an array simply by specifying all elements, like so:
unpack_u32([v[1], v[2], v[3], v[4]]);
But this is rather ugly to type and doesn't scale well with array size. So the question is "How to get a slice as an array in Rust?". I used a slightly modified version of Matthieu M.'s answer to said question (playground):
fn unpack_u32(data: [u8; 4]) -> u32 {
// as before without assert
}
use std::convert::AsMut;
fn clone_into_array<A, T>(slice: &[T]) -> A
where A: Default + AsMut<[T]>,
T: Clone
{
assert_eq!(slice.len(), std::mem::size_of::<A>()/std::mem::size_of::<T>());
let mut a = Default::default();
<A as AsMut<[T]>>::as_mut(&mut a).clone_from_slice(slice);
a
}
fn main() {
let v = vec![0_u8, 1, 2, 3, 4, 5, 6, 7, 8];
println!("res: {:X}", unpack_u32(clone_into_array(&v[1..5])));
}
As you can see, there is still an assert and thus the possibility of runtime failure. The Rust compiler isn't able to know that v[1..5] is 4 elements long, because 1..5 is just syntactic sugar for Range which is just a type the compiler knows nothing special about.

I think the answer is no as it is; a slice doesn't have a size (or minimum size) as part of the type, so there's nothing for the compiler to check; and similarly a vector is dynamically sized so there's no way to check at compile time that you can take a slice of the right size.
The only way I can see for the information to be even in principle available at compile time is if the function is applied to a compile-time known array. I think you'd still need to implement a procedural macro to do the check (so nightly Rust only, and it's not easy to do).
If the problem is efficiency rather than compile-time checking, you may be able to adjust your code so that, for example, you do one check for n*4 elements being available before n calls to your function; you could use the unsafe get_unchecked to avoid later redundant bounds checks. Obviously you'd need to be careful to avoid mistakes in the implementation.

I had a similar problem, creating a fixed byte-array on stack corresponding to const length of other byte-array (which may change during development time)
A combination of compiler plugin and macro was the solution:
https://github.com/frehberg/rust-sizedbytes

Related

Ergonomics issues with fixed size byte arrays in Rust

Rust sadly cannot produce a fixed size array [u8; 16] with a fixed size slicing operator s[0..16]. It'll throw errors like "expected array of 16 elements, found slice".
I've some KDFs that output several keys in wrapper structs like
pub struct LeafKey([u8; 16]);
pub struct MessageKey([u8; 32]);
fn kdfLeaf(...) -> (MessageKey,LeafKey) {
// let mut r: [u8; 32+16];
let mut r: (MessageKey, LeafKey);
debug_assert_eq!(mem::size_of_val(&r), 384/8);
let mut sha = Sha3::sha3_384();
sha.input(...);
// sha.result(r);
sha.result(
unsafe { mem::transmute::<&mut (MessageKey, LeafKey),&mut [u8;32+16]>(&r) }
);
sha.reset();
// (MessageKey(r[0..31]), LeafKey(r[32..47]))
r
}
Is there a safer way to do this? We know mem::transmute will refuse to compile if the types do not have the same size, but that only checks that pointers have the same size here, so I added that debug_assert.
In fact, I'm not terribly worried about extra copies though since I'm running SHA3 here, but afaik rust offers no ergonomic way to copy amongst byte arrays.
Can I avoid writing (MessageKey, LeafKey) three times here? Is there a type alias for the return type of the current function? Is it safe to use _ in the mem::transmute given that I want the code to refuse to compile if the sizes do not match? Yes, I know I could make a type alias, but that seems silly.
As an aside, there is a longer discussion of s[0..16] not having type [u8; 16] here
There's the copy_from_slice method.
fn main() {
use std::default::Default;
// Using 16+8 because Default isn't implemented
// for [u8; 32+16] due to type explosion unfortunateness
let b: [u8; 24] = Default::default();
let mut c: [u8; 16] = Default::default();
let mut d: [u8; 8] = Default::default();
c.copy_from_slice(&b[..16])
d.copy_from_slice(&b[16..16+8]);
}
Note, unfortunately copy_from_slice throws a runtime error if the slices are not the same length, so make sure you thoroughly test this yourself, or use the lengths of the other arrays to guard.
Unfortunately, c.copy_from_slice(&b[..c.len()]) doesn't work because Rust thinks c is borrowed both immutably and mutably at the same time.
I marked the accepted answer as best since it's safe, and led me to the clone_into_array answer here, but..
Another idea that improves the safety is to make a version of mem::transmute for references that checks the sizes of the referenced types, as opposed to just the pointers. It might look like :
#[inline]
unsafe fn transmute_ptr_mut<A,B>(v: &mut A) -> &mut B {
debug_assert_eq!(core::mem::size_of(A),core::mem::size_of(B));
core::mem::transmute::<&mut A,&mut B>(v)
}
I have raised an issue on the arrayref crate to discuss this, as arrayref might be a reasonable crate for it to live in.
Update : We've a new "best answer" by the arrayref crate developer :
let (a,b) = array_refs![&r,32,16];
(MessageKey(*a), LeafKey(*b))

Borrow a section of a borrowed array as a borrowed array

As the title reads, how would I go about doing this?
fn foo(array: &[u32; 10]) -> &[u32; 5] {
&array[0..5]
}
Compiler error
error[E0308]: mismatched types
--> src/main.rs:2:5
|
2 | &array[0..5]
| ^^^^^^^^^^^^ expected array of 5 elements, found slice
|
= note: expected type `&[u32; 5]`
= note: found type `&[u32]`
arrayref implements a safe interface for doing this operation, using macros (and compile-time constant slicing bounds, of course).
Their readme explains
The goal of arrayref is to enable the effective use of APIs that involve array references rather than slices, for situations where parameters must have a given size.
and
let addr: &[u8; 16] = ...;
let mut segments = [0u16; 8];
// array-based API with arrayref
for i in 0 .. 8 {
segments[i] = read_u16_array(array_ref![addr,2*i,2]);
}
Here the array_ref![addr,2*i,2] macro allows us to take an array reference to a slice consisting of two bytes starting at 2*i. Apart from the syntax (less nice than slicing), it is essentially the same as the slice approach. However, this code makes explicit the need for precisely two bytes both in the caller, and in the function signature.
Stable Rust
It's not possible to do this using only safe Rust. To understand why, it's important to understand how these types are implemented. An array is guaranteed to have N initialized elements. It cannot get smaller or larger. At compile time, those guarantees allow the size aspect of the array to be removed, and the array only takes up N * sizeof(element) space.
That means that [T; N] and [T; M] are different types (when N != M) and you cannot convert a reference of one to the other.
The idiomatic solution is to use a slice instead:
fn foo(array: &[u32; 10]) -> &[u32] {
&array[0..5]
}
A slice contains a pointer to the data and the length of the data, thus moving that logic from compile time to run time.
Nightly Rust
You can perform a runtime check that the slice is the correct length and convert it to an array in one step:
#![feature(try_from)]
use std::convert::TryInto;
fn foo(array: &[u32; 10]) -> &[u32; 5] {
array[0..5].try_into().unwrap()
}
fn main() {}
Unsafe Rust
Because someone might want to do this the unsafe way in an earlier version of Rust, I'll present code based on the standard library implementation:
fn foo(array: &[u32; 10]) -> &[u32; 5] {
let slice = &array[0..5];
if slice.len() == 5 {
let ptr = slice.as_ptr() as *const [u32; 5];
unsafe { &*ptr }
} else {
panic!("Needs to be length 5")
}
}
fn main() {
let input = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
let output = foo(&input);
println!("{:?}", output);
}

Using pointer casting to change the “type” of data in memory [duplicate]

I am reading raw data from a file and I want to convert it to an integer:
fn main() {
let buf: &[u8] = &[0, 0, 0, 1];
let num = slice_to_i8(buf);
println!("1 == {}", num);
}
pub fn slice_to_i8(buf: &[u8]) -> i32 {
unimplemented!("what should I do here?")
}
I would do a cast in C, but what do I do in Rust?
I'd suggest using the byteorder crate (which also works in a no-std environment):
use byteorder::{BigEndian, ReadBytesExt}; // 1.2.7
fn main() {
let mut buf: &[u8] = &[0, 0, 0, 1];
let num = buf.read_u32::<BigEndian>().unwrap();
assert_eq!(1, num);
}
This handles oddly-sized slices and automatically advances the buffer so you can read multiple values.
As of Rust 1.32, you can also use the from_le_bytes / from_be_bytes / from_ne_bytes inherent methods on integers:
fn main() {
let buf = [0, 0, 0, 1];
let num = u32::from_be_bytes(buf);
assert_eq!(1, num);
}
These methods only handle fixed-length arrays to avoid dealing with the error when not enough data is present. If you have a slice, you will need to convert it into an array.
See also:
How to get a slice as an array in Rust?
How to convert a slice into an array reference?
I'd like to give this answer here to commit the following additional details:
A working code snippet which converts slice to integer (two ways to do it).
A working solution in no_std environment.
To keep everything in one place for the people getting here from the search engine.
Without external crates, the following methods are suitable to convert from slices to integer even for no_std build starting from Rust 1.32:
Method 1 (try_into + from_be_bytes)
use core::convert::TryInto;
let src = [1, 2, 3, 4, 5, 6, 7];
// 0x03040506
u32::from_be_bytes(src[2..6].try_into().unwrap());
use core::conver::TryInto is for no_std build. And the way to use the standard crate is the following: use std::convert::TryInto;.
(And about endians it has been already answered, but let me keep it here in one place: from_le_bytes, from_be_bytes, and from_ne_bytes - use them depending on how integer is represented in memory).
Method 2 (clone_from_slice + from_be_bytes)
let src = [1, 2, 3, 4, 5, 6, 7];
let mut dst = [0u8; 4];
dst.clone_from_slice(&src[2..6]);
// 0x03040506
u32::from_be_bytes(dst);
Result
In both cases integer will be equal to 0x03040506.
This custom serialize_deserialize_u8_i32 library will safely convert back and forth between u8 array and i32 array i.e. the serialise function will take all of your u8 values and pack them into i32 values and the deserialise function will take this library’s custom i32 values and convert them back to the original u8 values that you started with.
This was built for a specific purpose, however it may come in handy for other uses; depending on whether you want/need a fast/custom converter like this.
https://github.com/second-state/serialize_deserialize_u8_i32
Here’s my implementation (for a different use case) that discards any additional bytes beyond 8 (and therefore doesn’t need to panic if not exact):
pub fn u64_from_slice(slice: &[u8]) -> u64 {
u64::from_ne_bytes(slice.split_at(8).0.try_into().unwrap())
}
The split_at() method returns a tuple of two slices: one from index 0 until the specified index and the other from the specified index until the end. So by using .0 to access the first member of the tuple returned by .split_at(8), it ensures that only the first 8 bytes are passed to u64::to_ne_bytes(), discarding the leftovers. Then, of course, it calls the try_into method on that .0 tuple member, and .unwrap() since split_at does all the custom panicking for you.

Mutable multidimensional array as a function argument

In rustc 1.0.0, I'd like to write a function that mutates a two dimensional array supplied by the caller. I was hoping this would work:
fn foo(x: &mut [[u8]]) {
x[0][0] = 42;
}
fn main() {
let mut x: [[u8; 3]; 3] = [[0; 3]; 3];
foo(&mut x);
}
It fails to compile:
$ rustc fail2d.rs
fail2d.rs:7:9: 7:15 error: mismatched types:
expected `&mut [[u8]]`,
found `&mut [[u8; 3]; 3]`
(expected slice,
found array of 3 elements) [E0308]
fail2d.rs:7 foo(&mut x);
^~~~~~
error: aborting due to previous error
I believe this is telling me I need to somehow feed the function a slice of slices, but I don't know how to construct this.
It "works" if I hard-code the nested array's length in the function signature. This isn't acceptable because I want the function to operate on multidimensional arrays of arbitrary dimension:
fn foo(x: &mut [[u8; 3]]) { // FIXME: don't want to hard code length of nested array
x[0][0] = 42;
}
fn main() {
let mut x: [[u8; 3]; 3] = [[0; 3]; 3];
foo(&mut x);
}
tldr; any zero-cost ways of passing a reference to a multidimensional array such that the function use statements like $x[1][2] = 3;$?
This comes down to a matter of memory layout. Assuming a type T with a size known at compile time (this constraint can be written T: Sized), the size of [T; n] is known at compile time (it takes n times as much memory as T does); but [T] is an unsized type; its length is not known at compile time. Therefore it can only be used through some form of indirection, such as a reference (&[T]) or a box (Box<[T]>, though this is of limited practical value, with Vec<T> which allows you to add and remove items without needing to reallocate every single time by using overallocation).
A slice of an unsized type doesn’t make sense; it’s permitted for reasons that are not clear to me, but you can never actually have an instance of it. (Vec<T>, by comparison, requires T: Sized.)
&[T; n] can coerce to &[T], and &mut [T; n] to &mut [T], but this only applies at the outermost level; the contents of slice is fixed (you’d need to create a new array or vector to achieve such a transformation, because the memory layout of each item is different). The effect of this is that arrays work for single‐dimensional work, but for multi‐dimensional work they fall apart. Arrays are currently very much second‐class citizens in Rust, and will be until the language supports making slices generic over length, which it is likely to eventually.
I recommend that you use either a single‐dimensional array (suitable for square matrices, indexed by x * width + y or similar), or vectors (Vec<Vec<T>>). There may also be libraries already out there abstracting over a suitable solution.

Creating a fixed-size array on heap in Rust

I've tried to use the following code:
fn main() {
let array = box [1, 2, 3];
}
, in my program, and it results in a compile error: error: obsolete syntax: ~[T] is no longer a type.
AFAIU, there are no dynamic size arrays in Rust (the size has to be known at compile time). However, in my code snippet the array does have static size and should be of type ~[T, ..3] (owned static array of size 3) whereas the compiler says it has the type ~[T]. Is there any deep reason why it isn't possible to get a static sized array allocated on the heap?
P.S. Yeah, I've heard about Vec.
Since I ended up here, others might as well. Rust has moved along and at the point of this answer Rust is at 1.53 for stable and 1.55 for nightly.
Box::new([1, 2, 3]) is the recommended way, and does its job, however there is a catch: The array is created on the stack and then copied over to the heap. This is a documented behaviour of Box:
Move a value from the stack to the heap by creating a Box:
Meaning, it contains a hidden memcopy, and with large array, the heap allocation even fails with a stack overflow.
const X: usize = 10_000_000;
let failing_huge_heap_array = [1; X];
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
There are several workarounds to this as of now (Rust 1.53), the most straightforward is to create a vector and turn the vector into a boxed slice:
const X: usize = 10_000_000;
let huge_heap_array = vec![1; X].into_boxed_slice();
This works, but has two small catches: It looses the type information, what should be Box<[i32; 10000000]> is now Box<[usize]> and additionally takes up 16 bytes on the stack as opposed to an array which only takes 8.
...
println!("{}", mem::size_of_val(&huge_heap_array);
16
Not a huge deal, but it hurts my personal Monk factor.
Upon further research, discarding options that need nightly like the OP box [1, 2, 3] which seems to be coming back with the feature #![feature(box_syntax)] and the arr crate which is nice but also needs nightly, the best solution I found to allocating an array on the heap without the hidden memcopy
was a suggestion by Simias
/// A macro similar to `vec![$elem; $size]` which returns a boxed array.
///
/// ```rustc
/// let _: Box<[u8; 1024]> = box_array![0; 1024];
/// ```
macro_rules! box_array {
($val:expr ; $len:expr) => {{
// Use a generic function so that the pointer cast remains type-safe
fn vec_to_boxed_array<T>(vec: Vec<T>) -> Box<[T; $len]> {
let boxed_slice = vec.into_boxed_slice();
let ptr = ::std::boxed::Box::into_raw(boxed_slice) as *mut [T; $len];
unsafe { Box::from_raw(ptr) }
}
vec_to_boxed_array(vec![$val; $len])
}};
}
const X: usize = 10_000_000;
let huge_heap_array = box_array![1; X];
It does not overflow the stack and only takes up 8 bytes while preserving the type.
It uses unsafe, but limits this to a single line of code. Until the arrival of the box [1;X] syntax, IMHO a clean option.
As far as I know the box expression is experimental. You can use Box::new() with something like the code below to suppress warnings.
fn main() {
let array1 = Box::new([1, 2, 3]);
// or even
let array2: Box<[i32]> = Box::new([1, 2, 3]);
}
Check out the comment by Shepmaster below, as these are different types.
Just write like:
let mut buffer= vec![0; k]; it makes u8 array with length equals k.

Resources