Ergonomics issues with fixed size byte arrays in Rust

Ergonomics issues with fixed size byte arrays in Rust - rust

Rust sadly cannot produce a fixed size array [u8; 16] with a fixed size slicing operator s[0..16]. It'll throw errors like "expected array of 16 elements, found slice".
I've some KDFs that output several keys in wrapper structs like
pub struct LeafKey([u8; 16]);
pub struct MessageKey([u8; 32]);
fn kdfLeaf(...) -> (MessageKey,LeafKey) {
// let mut r: [u8; 32+16];
let mut r: (MessageKey, LeafKey);
debug_assert_eq!(mem::size_of_val(&r), 384/8);
let mut sha = Sha3::sha3_384();
sha.input(...);
// sha.result(r);
sha.result(
unsafe { mem::transmute::<&mut (MessageKey, LeafKey),&mut [u8;32+16]>(&r) }
);
sha.reset();
// (MessageKey(r[0..31]), LeafKey(r[32..47]))
r
}
Is there a safer way to do this? We know mem::transmute will refuse to compile if the types do not have the same size, but that only checks that pointers have the same size here, so I added that debug_assert.
In fact, I'm not terribly worried about extra copies though since I'm running SHA3 here, but afaik rust offers no ergonomic way to copy amongst byte arrays.
Can I avoid writing (MessageKey, LeafKey) three times here? Is there a type alias for the return type of the current function? Is it safe to use _ in the mem::transmute given that I want the code to refuse to compile if the sizes do not match? Yes, I know I could make a type alias, but that seems silly.
As an aside, there is a longer discussion of s[0..16] not having type [u8; 16] here

There's the copy_from_slice method.
fn main() {
use std::default::Default;
// Using 16+8 because Default isn't implemented
// for [u8; 32+16] due to type explosion unfortunateness
let b: [u8; 24] = Default::default();
let mut c: [u8; 16] = Default::default();
let mut d: [u8; 8] = Default::default();
c.copy_from_slice(&b[..16])
d.copy_from_slice(&b[16..16+8]);
}
Note, unfortunately copy_from_slice throws a runtime error if the slices are not the same length, so make sure you thoroughly test this yourself, or use the lengths of the other arrays to guard.
Unfortunately, c.copy_from_slice(&b[..c.len()]) doesn't work because Rust thinks c is borrowed both immutably and mutably at the same time.

I marked the accepted answer as best since it's safe, and led me to the clone_into_array answer here, but..
Another idea that improves the safety is to make a version of mem::transmute for references that checks the sizes of the referenced types, as opposed to just the pointers. It might look like :
#[inline]
unsafe fn transmute_ptr_mut<A,B>(v: &mut A) -> &mut B {
debug_assert_eq!(core::mem::size_of(A),core::mem::size_of(B));
core::mem::transmute::<&mut A,&mut B>(v)
}
I have raised an issue on the arrayref crate to discuss this, as arrayref might be a reasonable crate for it to live in.
Update : We've a new "best answer" by the arrayref crate developer :
let (a,b) = array_refs![&r,32,16];
(MessageKey(*a), LeafKey(*b))

Related

Unable to use concat on vec<u8> in my function

I have a program where I need to append two Vec<u8> before they are are serialized.
Just to be sure how to do it, I made this example program:
let a: Vec<u8> = vec![1, 2, 3, 4, 5, 6];
let b: Vec<u8> = vec![7, 8, 9];
let c = [a, b].concat();
println!("{:?}", c);
Which works perfectly. The issue is now when I have to implement it in my own project.
Here I need to write a function, the function takes a struct as input that looks like this:
pub struct Message2 {
pub ephemeral_key_r: Vec<u8>,
pub c_r: Vec<u8>,
pub ciphertext2: Vec<u8>,
}
and the serialalization function looks like this:
pub fn serialize_message_2(msg: &Message2) -> Result<Vec<u8>> {
let c_r_and_ciphertext = [msg.c_r, msg.ciphertext2].concat();
let encoded = (
Bytes::new(&msg.ephemeral_key_r),
Bytes::new(&c_r_and_ciphertext),
);
Ok(cbor::encode_sequence(encoded)?)
}
The first issue that arises here is that it complains that msg.ciphertext2 and msg.c_r are moved values. This makes sense, so I add an & in front of both of them.
However, when I do this, the call to concat() fails, with this type error:
util.rs(77, 59): method cannot be called on `[&std::vec::Vec<u8>; 2]` due to unsatisfied trait bounds
So, when I borrow the values, then the expression [&msg.c_r, &msg.ciphertext2] becomes an array of two vec's, which there is not a concat() defined for.
I also tried calling clone on both vectors:
let c_r_and_ciphertext = [msg.c_r.clone(), msg.ciphertext2.clone()].concat();
and this actually works out!
But now I'm just wondering, why does borrowing the values change the types?
and is there any things to think about when slapping on clone to values that are moved, and where I cannot borrow for some reason?

The reasons on why .concat() behaves as it does are a bit awkward.
To be able to call .concat(), the Concat trait must be implemented. It is implemented on slices of strings, and slices of V, where V can be Borrowed as slices of copyable T.
First, you're calling concat on an array, not a slice. However, auto-borrowing and unsize coercion are applied when calling a function with .. This turns the [V; 2] into a &[V] (where V = Vec<u8> in the working case and V = &Vec<u8> in the non-workin case). Try calling Concat::concat([a, b]) and you'll notice the difference.
So now is the question whether V can be borrowed as/into some &[T] (where T = u8 in your case). Two possibilities exist:
There is an impl<T> Borrow<[T]> for Vec<T>, so Vec<u8> can be turned into &[u8].
There is an impl<'_, T> Borrow<T> for &'_ T, so if you already have a &[u8], that can be used.
However, there is no impl<T> Borrow<[T]> for &'_ Vec<T>, so concatting [&Vec<_>] won't work.
So much for the theory, on the practical side: You can avoid the clones by using [&msg.c_r[..], &msg.ciphertext2[..]].concat(), because you'll be calling concat on &[&[u8]]. The &x[..] is a neat trick to turn the Vecs into slices (by slicing it, without slicing anything off…). You can also do that with .borrow(), but that's a bit more awkward, since you may need an extra type specification: [msg.c_r.borrow(), msg.ciphertext2.borrow()].concat::<u8>()

I tried to reproduce your error message, which this code does:
fn main() {
let a = vec![1, 2];
let b = vec![3, 4];
println!("{:?}", [&a, &b].concat())
}
gives:
error[E0599]: the method `concat` exists for array `[&Vec<{integer}>; 2]`, but its trait bounds were not satisfied
--> src/main.rs:4:31
|
4 | println!("{:?}", [&a, &b].concat())
| ^^^^^^ method cannot be called on `[&Vec<{integer}>; 2]` due to unsatisfied trait bounds
|
= note: the following trait bounds were not satisfied:
`[&Vec<{integer}>]: Concat<_>`
It is a simple matter of helping the compiler to see that &a works perfectly fine as a slice, by calling it &a[..]:
fn main() {
let a = vec![1, 2];
let b = vec![3, 4];
println!("{:?}", [&a[..], &b[..]].concat())
}
why does borrowing the values change the types?
Borrowing changes a type into a reference to that same type, so T to &T. These types are related, but are not the same.
is there any things to think about when slapping on clone to values that are moved, and where I cannot borrow for some reason?
Cloning is a good way to sacrifice performance to make the borrow checker happy. It (usually) involves copying the entire memory that is cloned, but if your code is not performance critical (which most code is not), then it may still be a good trade-off...

Is there a way to create a mutable reference for a type that doesn't match underlying storage?

Consider a struct that is implemented as a [u8; 2]. Is it possible to construct a &mut u16 mutable reference to the whole struct? Is there a safe way to do it?
As an alternative way of phrasing this, is there a way to implement:
fn ref_all(&mut [u8; 2]) -> &mut u16
Is there a way to do this in general for custom types as well?

There is no perfectly safe method to do this, but there is align_to_mut (and its immutable counterpart align_to) defined for slices, which works for all types and is a safer alternative to the big hammer of mem::transmute:
fn ref_all(x: &mut [u8; 2]) -> &mut u16 {
let (prefix, chunks, suffix) = unsafe {x.align_to_mut::<u16>()};
// you don't need these asserts but know that chunks might not always have an element
assert!(prefix.is_empty());
assert!(suffix.is_empty());
assert_eq!(chunks.len(), 1);
&mut chunks[0]
}
For u16s, this should be fine, although it can cause architecture-dependent behavior due to the endianness of numbers. For other types it'd be very risky to do something like this.

Memory deallocation using slice::from_raw_parts_mut and ptr::drop_in_place in rust

I saw a piece of code online that was dropping allocated memory using a combination of std::slice::from_raw_parts_mut() and std::ptr::drop_in_place(). Below is a piece of code that allocates an array of ten integers and then de-allocates it:
use std::{
alloc::{alloc, Layout},
ptr::NonNull,
};
fn main() {
let len: usize = 10;
let layout: Layout = Layout::array::<i32>(len).unwrap();
let data: NonNull<i32> = unsafe { NonNull::new(alloc(layout) as *mut i32).unwrap() };
unsafe {
std::ptr::drop_in_place(std::slice::from_raw_parts_mut(data.as_ptr(), len));
}
}
The return type of std::slice::from_raw_parts_mut() is a mutable slice &mut [T], but the argument of std::ptr::drop_in_place() is *mut T. It seems to me that the conversion happens automatically. I'm pretty sure I'm missing something here since it shouldn't be allowed. Would someone explain what exactly is happening here?

When you write std::slice::from_raw_parts_mut(data.as_ptr(), len) you are building a value of type &mut [i32].
Then you are passing it to drop_in_place() that is defined more or less as:
fn drop_in_place<T: ?Sized>(to_drop: *mut T)
So you are coercing a &mut [i32] into a *mut T, that is solved in two steps: there is an automatic coercion from reference to pointer, and then T is resolved as [i32] which is the type whose drop is actually called.
(You may think that the automatic coercion from reference to pointer is dangerous and should not be automatic, but it is actually totally safe. What is unsafe is usually what you do with the pointer afterwards. And actually there are a couple of uses of raw pointers that are safe, such as std::ptr::eq or std::ptr::hash).
Slices implement Drop::drop by simply iterating over the elements and calling drop_in_place in each of them. This is a clever way to avoid writing the loop manually.
But note a couple of things about this code:
drop_in_place will call Drop::drop on every element of the slice, but since they are of type i32 it is effectively a no-op. I guess that your original code uses a generic type.
drop_in_place does not free the memory, for that you need a call to std::alloc::dealloc.

How to safely get an immutable byte slice from a `&mut [u32]`?

In a rather low level part of a project of mine, a function receives a mutable slice of primitive data (&mut [u32] in this case). This data should be written to a writer in little endian.
Now, this alone wouldn't be a problem, but all of this has to be fast. I measured my application and identified this as one of the critical paths. In particular, if the endianness doesn't need to be changed (since we're already on a little endian system), there shouldn't be any overhead.
This is my code (Playground):
use std::{io, mem, slice};
fn write_data(mut w: impl io::Write, data: &mut [u32]) -> Result<(), io::Error> {
adjust_endianness(data);
// Is this safe?
let bytes = unsafe {
let len = data.len() * mem::size_of::<u32>();
let ptr = data.as_ptr() as *const u8;
slice::from_raw_parts(ptr, len)
};
w.write_all(bytes)
}
fn adjust_endianness(_: &mut [u32]) {
// implementation omitted
}
adjust_endianness changes the endianness in place (which is fine, since a wrong-endian u32 is garbage, but still a valid u32).
This code works, but the critical question is: Is this safe? In particular, at some point, data and bytes both exist, being one mutable and one immutable slice to the same data. That sounds very bad, right?
On the other hand, I can do this:
let bytes = &data[..];
That way, I also have those two slices. The difference is just that data is now borrowed.
Is my code safe or does it exhibit UB? Why? If it's not safe, how to safely do what I want to do?

In general, creation of slices that violate Rust's safety rules, even briefly, is unsafe. If you cheat the borrow checker and make independent slices borrowing the same data as & and &mut at the same time, it will make Rust specify incorrect aliasing information in LLVM, and this may lead to actually miscompiled code. Miri doesn't flag this case, because you're not using data afterwards, but the exact details of what is unsafe are still being worked out.
To be safe, you should to explain the sharing situation to the borrow checker:
let shared_data = &data[..];
data will be temporarily reborrowed as shared/read-only for the duration shared_data is used. In this case it shouldn't cause any limitations. The data will keep being mutable after exiting this scope.
Then you'll have &[u32], but you need &[u8]. Fortunately, this conversion is safe to do, because both are shared, and u8 has lesser alignment requirement than u32 (if it was the other way, you'd have to use align_to!).
let shared_data = &data[..];
let bytes = unsafe {
let len = shared_data.len() * mem::size_of::<u32>();
let ptr = data.as_ptr() as *const u8;
slice::from_raw_parts(ptr, len)
};

Take slice of certain length known at compile time

In this code:
fn unpack_u32(data: &[u8]) -> u32 {
assert_eq!(data.len(), 4);
let res = data[0] as u32 |
(data[1] as u32) << 8 |
(data[2] as u32) << 16 |
(data[3] as u32) << 24;
res
}
fn main() {
let v = vec![0_u8, 1_u8, 2_u8, 3_u8, 4_u8, 5_u8, 6_u8, 7_u8, 8_u8];
println!("res: {:X}", unpack_u32(&v[1..5]));
}
the function unpack_u32 accepts only slices of length 4. Is there any way to replace the runtime check assert_eq with a compile time check?

Yes, kind of. The first step is easy: change the argument type from &[u8] to [u8; 4]:
fn unpack_u32(data: [u8; 4]) -> u32 { ... }
But transforming a slice (like &v[1..5]) into an object of type [u8; 4] is hard. You can of course create such an array simply by specifying all elements, like so:
unpack_u32([v[1], v[2], v[3], v[4]]);
But this is rather ugly to type and doesn't scale well with array size. So the question is "How to get a slice as an array in Rust?". I used a slightly modified version of Matthieu M.'s answer to said question (playground):
fn unpack_u32(data: [u8; 4]) -> u32 {
// as before without assert
}
use std::convert::AsMut;
fn clone_into_array<A, T>(slice: &[T]) -> A
where A: Default + AsMut<[T]>,
T: Clone
{
assert_eq!(slice.len(), std::mem::size_of::<A>()/std::mem::size_of::<T>());
let mut a = Default::default();
<A as AsMut<[T]>>::as_mut(&mut a).clone_from_slice(slice);
a
}
fn main() {
let v = vec![0_u8, 1, 2, 3, 4, 5, 6, 7, 8];
println!("res: {:X}", unpack_u32(clone_into_array(&v[1..5])));
}
As you can see, there is still an assert and thus the possibility of runtime failure. The Rust compiler isn't able to know that v[1..5] is 4 elements long, because 1..5 is just syntactic sugar for Range which is just a type the compiler knows nothing special about.

I think the answer is no as it is; a slice doesn't have a size (or minimum size) as part of the type, so there's nothing for the compiler to check; and similarly a vector is dynamically sized so there's no way to check at compile time that you can take a slice of the right size.
The only way I can see for the information to be even in principle available at compile time is if the function is applied to a compile-time known array. I think you'd still need to implement a procedural macro to do the check (so nightly Rust only, and it's not easy to do).
If the problem is efficiency rather than compile-time checking, you may be able to adjust your code so that, for example, you do one check for n*4 elements being available before n calls to your function; you could use the unsafe get_unchecked to avoid later redundant bounds checks. Obviously you'd need to be careful to avoid mistakes in the implementation.

I had a similar problem, creating a fixed byte-array on stack corresponding to const length of other byte-array (which may change during development time)
A combination of compiler plugin and macro was the solution:
https://github.com/frehberg/rust-sizedbytes

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Ergonomics issues with fixed size byte arrays in Rust - rust

Related

Unable to use concat on vec<u8> in my function

Is there a way to create a mutable reference for a type that doesn't match underlying storage?

Memory deallocation using slice::from_raw_parts_mut and ptr::drop_in_place in rust

How to safely get an immutable byte slice from a `&mut [u32]`?

Take slice of certain length known at compile time

Categories

Resources