What happens if I call Vec::from_raw_parts with a smaller capacity than the pointer actually has? - rust

I have a vector of u8 that I want to interpret as a vector of u32. It is assumed that the bytes are in the right order. I don't want to allocate new memory and copy bytes after casting. I got the following to work:
use std::mem;
fn reinterpret(mut v: Vec<u8>) -> Option<Vec<u32>> {
let v_len = v.len();
v.shrink_to_fit();
if v_len % 4 != 0 {
None
} else {
let v_cap = v.capacity();
let v_ptr = v.as_mut_ptr();
println!("{:?}|{:?}|{:?}", v_len, v_cap, v_ptr);
let v_reinterpret = unsafe { Vec::from_raw_parts(v_ptr as *mut u32, v_len / 4, v_cap / 4) };
println!("{:?}|{:?}|{:?}",
v_reinterpret.len(),
v_reinterpret.capacity(),
v_reinterpret.as_ptr());
println!("{:?}", v_reinterpret);
println!("{:?}", v); // v is still alive, but is same as rebuilt
mem::forget(v);
Some(v_reinterpret)
}
}
fn main() {
let mut v: Vec<u8> = vec![1, 1, 1, 1, 1, 1, 1, 1];
let test = reinterpret(v);
println!("{:?}", test);
}
However, there's an obvious problem here. From the shrink_to_fit documentation:
It will drop down as close as possible to the length but the allocator may still inform the vector that there is space for a few more elements.
Does this mean that my capacity may still not be a multiple of the size of u32 after calling shrink_to_fit? If in from_raw_parts I set capacity to v_len/4 with v.capacity() not an exact multiple of 4, do I leak those 1-3 bytes, or will they go back into the memory pool because of mem::forget on v?
Is there any other problem I am overlooking here?
I think moving v into reinterpret guarantees that it's not accessible from that point on, so there's only one owner from the mem::forget(v) call onwards.

This is an old question, and it looks like it has a working solution in the comments. I've just written up what exactly goes wrong here, and some solutions that one might create/use in today's Rust.
This is undefined behavior
Vec::from_raw_parts is an unsafe function, and thus you must satisfy its invariants, or you invoke undefined behavior.
Quoting from the documentation for Vec::from_raw_parts:
ptr needs to have been previously allocated via String/Vec (at least, it's highly likely to be incorrect if it wasn't).
T needs to have the same size and alignment as what ptr was allocated with. (T having a less strict alignment is not sufficient, the alignment really needs to be equal to satsify the dealloc requirement that memory must be allocated and deallocated with the same layout.)
length needs to be less than or equal to capacity.
capacity needs to be the capacity that the pointer was allocated with.
So, to answer your question, if capacity is not equal to the capacity of the original vec, then you've broken this invariant. This gives you undefined behavior.
Note that the requirement isn't on size_of::<T>() * capacity either, though, which brings us to the next topic.
Is there any other problem I am overlooking here?
Three things.
First, the function as written is disregarding another requirement of from_raw_parts. Specifically, T must have the same size as alignment as the original T. u32 is four times as big as u8, so this again breaks this requirement. Even if capacity*size remains the same, size isn't, and capacity isn't. This function will never be sound as implemented.
Second, even if all of the above was valid, you've also ignored the alignment. u32 must be aligned to 4-byte boundaries, while a Vec<u8> is only guaranteed to be aligned to a 1-byte boundary.
A comment on the OP mentions:
I think on x86_64, misalignment will have performance penalty
It's worth noting that while this may be true of machine language, it is not true for Rust. The rust reference explicitly states "A value of alignment n must only be stored at an address that is a multiple of n." This is a hard requirement.
Why the exact type requirement?
Vec::from_raw_parts seems like it's pretty strict, and that's for a reason. In Rust, the allocator API operates not only on allocation size, but on a Layout, which is the combination of size, number of things, and alignment of individual elements. In C with memalloc, all the allocator can rely upon is that the size is the same, and some minimum alignment. In Rust, though, it's allowed to rely on the entire Layout, and invoke undefined behavior if not.
So in order to correctly deallocate the memory, Vec needs to know the exact type that it was allocated with. By converting a Vec<u32> into Vec<u8>, it no longer knows this information, and so it can no longer properly deallocate this memory.
Alternative - Transforming slices
Vec::from_raw_parts's strictness comes from the fact that it needs to deallocate the memory. If we create a borrowing slice, &[u32] instead, we no longer need to deal with it! There is no capacity when turning a &[u8] into &[u32], so we should be all good, right?
Well, almost. You still have to deal with alignment. Primitives are generally aligned to their size, so a [u8] is only guaranteed to be aligned to 1-byte boundaries, while [u32] must be aligned to a 4-byte boundary.
If you want to chance it, though, and create a [u32] if possible, there's a function for that - <[T]>::align_to:
pub unsafe fn align_to<U>(&self) -> (&[T], &[U], &[T])
This will trim of any starting and ending misaligned values, and then give you a slice in the middle of your new type. It's unsafe, but the only invariant you need to satisfy is that the elements in the middle slice are valid.
It's sound to reinterpret 4 u8 values as a u32 value, so we're good.
Putting it all together, a sound version of the original function would look like this. This operates on borrowed rather than owned values, but given that reinterpreting an owned Vec is instant-undefined-behavior in any case, I think it's safe to say this is the closest sound function:
use std::mem;
fn reinterpret(v: &[u8]) -> Option<&[u32]> {
let (trimmed_front, u32s, trimmed_back) = unsafe { v.align_to::<u32>() };
if trimmed_front.is_empty() && trimmed_back.is_empty() {
Some(u32s)
} else {
// either alignment % 4 != 0 or len % 4 != 0, so we can't do this op
None
}
}
fn main() {
let mut v: Vec<u8> = vec![1, 1, 1, 1, 1, 1, 1, 1];
let test = reinterpret(&v);
println!("{:?}", test);
}
As a note, this could also be done with std::slice::from_raw_parts rather than align_to. However, that requires manually dealing with the alignment, and all it really gives is more things we need to ensure we're doing right. Well, that and compatibility with older compilers - align_to was introduced in 2018 in Rust 1.30.0, and wouldn't have existed when this question was asked.
Alternative - Copying
If you do need a Vec<u32> for long term data storage, I think the best option is to just allocate new memory. The old memory is allocated for u8s anyways, and wouldn't work.
This can be made fairly simple with some functional programming:
fn reinterpret(v: &[u8]) -> Option<Vec<u32>> {
let v_len = v.len();
if v_len % 4 != 0 {
None
} else {
let result = v
.chunks_exact(4)
.map(|chunk: &[u8]| -> u32 {
let chunk: [u8; 4] = chunk.try_into().unwrap();
let value = u32::from_ne_bytes(chunk);
value
})
.collect();
Some(result)
}
}
First, we use <[T]>::chunks_exact to iterate over chunks of 4 u8s. Next, try_into to convert from &[u8] to [u8; 4]. The &[u8] is guaranteed to be length 4, so this never fails.
We use u32::from_ne_bytes to convert the bytes into a u32 using native endianness. If interacting with a network protocol, or on-disk serialization, then using from_be_bytes or from_le_bytes may be preferable. And finally, we collect to turn our result back into a Vec<u32>.
As a last note, a truly general solution might use both of these techniques. If we change the return type to Cow<'_, [u32]>, we could return aligned, borrowed data if it works, and allocate a new array if it doesn't! Not quite the best of both worlds, but close.

Related

How do I allocate a slice in Rust with a slice known at runtime?

I have seen replies telling you to use Vec::with_capacity(some_size); but that feels weird.
Why do I need to use a Vector to get a slice? I do not want to grow the slice later, I just want to use a plain slice, but the size is known at runtime because it comes from some program input.
My current solution is:
let mutable_zeroed_slice: &mut [u8] = &mut vec![0; size][..];
Is there a better, more idiomatic and non convoluted?
Updated: You can't use the trick before wrapped within a function as you will lose the slice reference upon return.
Update 2: As pointed out in the first reply, Vec::with_capacity(size)[..] will give you an immutable empty slice with the given capacity, which is not very useful.
You don't want the growable property, but you still need the storage to be allocated.
&[u8] refers to some data allocated somewhere; the Vec you try to create here will disappear at the end of the function, then any slice referring to its content will be dangling, and Rust prevents you from doing that.
The idea is to use a boxed-slice: it is boxed in order to allocate the storage, but it is not growable (as vectors are).
An easy way to achieve this is to build a vector, then steal its content.
Note that you seem to make a confusion between capacity and size/length.
fn new_byte_slice(size: usize) -> Box<[u8]> {
vec![0; size].into_boxed_slice()
}
fn main() {
let values = new_byte_slice(10);
println!("{:?}", values);
}
/*
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
*/

Capacity of vectors created with macro

let vec_macro = vec![0, 1, 2, 3, 4];
let mut vec = Vec::new();
for i in 0..5 {
vec.push(i)
}
println!("Capacity of vec with macro: {}", vec_macro.capacity());
println!("Capacity of vec push: {}", vec.capacity());
// Result:
// Capacity of vec with macro: 5
// Capacity of vec push: 8
I created 2 vectors, first one with macro vec! and the second one with Vec::new(), then I push item from 0 to 4 into them. The expected result is the capacity of these two vectors are the same, but it's not. Is this the bug in the implementation of vec!?
The macro knows the number of elements, so it creates the vector with the optimal capacity (see also with_capacity()).
Your loop invokes push() several times, but this operation does not know by itself when we will stop pushing elements.
So the strategy behind this is to amortize the cost of dynamic memory reallocation with an exponential capacity (0, 4, 8, 16, 32... for example on my computer).
When you push very few elements, the capacity stays low (we do not waste too much unused allocated memory), but if you push many elements, the reallocations will grow by bigger and bigger steps in order to happen not too often.
At the moment, we see that it simply doubles the capacity (whatever it is) when it is exhausted.
If we start from zero then just push, it seems to choose 4 as initial capacity then doubles it as needed, but if we start with 5 it will still continue to double the capacity (10, 20...).
However, what we see now is not guaranteed and might change at any time in future releases.
vec![a, b, ...] is not the same as a repeated push. If you look at the source code for the macro, it looks like this:
($($x:expr),+ $(,)?) => (
$crate::__rust_force_expr!(<[_]>::into_vec(box [$($x),+]))
);
The important part is the <[_]>::into_vec(box [$($x),+]):
box [$($x),+] is special syntax to create a "boxed slice" directly on the heap, as opposed to creating it on the stack and then moving it to the heap
<[_]>::into_vec calls slice::into_vec
If we look at the source code of into_vec:
pub fn into_vec<T, A: Allocator>(b: Box<[T], A>) -> Vec<T, A> {
unsafe {
let len = b.len();
let (b, alloc) = Box::into_raw_with_allocator(b);
Vec::from_raw_parts_in(b as *mut T, len, len, alloc)
}
}
And the signature of Vec::from_raw_parts_in:
pub unsafe fn from_raw_parts_in(
ptr: *mut T,
length: usize,
capacity: usize,
alloc: A
) -> Self;
You can see that the slice length is passed as both the Vec length and capacity.
The Vec documentation specifies this behavior of the macro:
vec![x; n], vec![a, b, c, d], and
[Vec::with_capacity(n)][Vec::with_capacity], will all produce a Vec
with exactly the requested capacity. If [len] == [capacity],
(as is the case for the [vec!] macro), then a Vec<T> can be converted to
and from a [Box<[T]>][owned slice] without reallocating or moving the elements.
As prog-fh said, Vec::push will exponentially reallocate when necessary, and will not end on a capacity of 5 unless you otherwise set the capacity manually via with_capacity or otherwise. It should be noted that this exponential growth is an implementation detail, not guaranteed by the Vec documentation.

How to safely get an immutable byte slice from a `&mut [u32]`?

In a rather low level part of a project of mine, a function receives a mutable slice of primitive data (&mut [u32] in this case). This data should be written to a writer in little endian.
Now, this alone wouldn't be a problem, but all of this has to be fast. I measured my application and identified this as one of the critical paths. In particular, if the endianness doesn't need to be changed (since we're already on a little endian system), there shouldn't be any overhead.
This is my code (Playground):
use std::{io, mem, slice};
fn write_data(mut w: impl io::Write, data: &mut [u32]) -> Result<(), io::Error> {
adjust_endianness(data);
// Is this safe?
let bytes = unsafe {
let len = data.len() * mem::size_of::<u32>();
let ptr = data.as_ptr() as *const u8;
slice::from_raw_parts(ptr, len)
};
w.write_all(bytes)
}
fn adjust_endianness(_: &mut [u32]) {
// implementation omitted
}
adjust_endianness changes the endianness in place (which is fine, since a wrong-endian u32 is garbage, but still a valid u32).
This code works, but the critical question is: Is this safe? In particular, at some point, data and bytes both exist, being one mutable and one immutable slice to the same data. That sounds very bad, right?
On the other hand, I can do this:
let bytes = &data[..];
That way, I also have those two slices. The difference is just that data is now borrowed.
Is my code safe or does it exhibit UB? Why? If it's not safe, how to safely do what I want to do?
In general, creation of slices that violate Rust's safety rules, even briefly, is unsafe. If you cheat the borrow checker and make independent slices borrowing the same data as & and &mut at the same time, it will make Rust specify incorrect aliasing information in LLVM, and this may lead to actually miscompiled code. Miri doesn't flag this case, because you're not using data afterwards, but the exact details of what is unsafe are still being worked out.
To be safe, you should to explain the sharing situation to the borrow checker:
let shared_data = &data[..];
data will be temporarily reborrowed as shared/read-only for the duration shared_data is used. In this case it shouldn't cause any limitations. The data will keep being mutable after exiting this scope.
Then you'll have &[u32], but you need &[u8]. Fortunately, this conversion is safe to do, because both are shared, and u8 has lesser alignment requirement than u32 (if it was the other way, you'd have to use align_to!).
let shared_data = &data[..];
let bytes = unsafe {
let len = shared_data.len() * mem::size_of::<u32>();
let ptr = data.as_ptr() as *const u8;
slice::from_raw_parts(ptr, len)
};

How to safely reinterpret Vec<f64> as Vec<num_complex::Complex<f64>> with half the size?

I have complex number data filled into a Vec<f64> by an external C library (prefer not to change) in the form [i_0_real, i_0_imag, i_1_real, i_1_imag, ...] and it appears that this Vec<f64> has the same memory layout as a Vec<num_complex::Complex<f64>> of half the length would be, given that num_complex::Complex<f64>'s data structure is memory-layout compatible with [f64; 2] as documented here. I'd like to use it as such without needing a re-allocation of a potentially large buffer.
I'm assuming that it's valid to use from_raw_parts() in std::vec::Vec to fake a new Vec that takes ownership of the old Vec's memory (by forgetting the old Vec) and use size / 2 and capacity / 2, but that requires unsafe code. Is there a "safe" way to do this kind of data re-interpretation?
The Vec is allocated in Rust as a Vec<f64> and is populated by a C function using .as_mut_ptr() that fills in the Vec<f64>.
My current compiling unsafe implementation:
extern crate num_complex;
pub fn convert_to_complex_unsafe(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
let new_vec = unsafe {
Vec::from_raw_parts(
buffer.as_mut_ptr() as *mut num_complex::Complex<f64>,
buffer.len() / 2,
buffer.capacity() / 2,
)
};
std::mem::forget(buffer);
return new_vec;
}
fn main() {
println!(
"Converted vector: {:?}",
convert_to_complex_unsafe(vec![3.0, 4.0, 5.0, 6.0])
);
}
Is there a "safe" way to do this kind of data re-interpretation?
No. At the very least, this is because the information you need to know is not expressed in the Rust type system but is expressed via prose (a.k.a. the docs):
Complex<T> is memory layout compatible with an array [T; 2].
— Complex docs
If a Vec has allocated memory, then [...] its pointer points to len initialized, contiguous elements in order (what you would see if you coerced it to a slice),
— Vec docs
Arrays coerce to slices ([T])
— Array docs
Since a Complex is memory-compatible with an array, an array's data is memory-compatible with a slice, and a Vec's data is memory-compatible with a slice, this transformation should be safe, even though the compiler cannot tell this.
This information should be attached (via a comment) to your unsafe block.
I would make some small tweaks to your function:
Having two Vecs at the same time pointing to the same data makes me very nervous. This can be trivially avoided by introducing some variables and forgetting one before creating the other.
Remove the return keyword to be more idiomatic
Add some asserts that the starting length of the data is a multiple of two.
As rodrigo points out, the capacity could easily be an odd number. To attempt to avoid this, we call shrink_to_fit. This has the downside that the Vec may need to reallocate and copy the memory, depending on the implementation.
Expand the unsafe block to cover all of the related code that is required to ensure that the safety invariants are upheld.
pub fn convert_to_complex(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
buffer.shrink_to_fit();
let ptr = buffer.as_mut_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
let cap = buffer.capacity();
assert!(len % 2 == 0);
assert!(cap % 2 == 0);
std::mem::forget(buffer);
Vec::from_raw_parts(ptr, len / 2, cap / 2)
}
}
To avoid all the worrying about the capacity, you could just convert a slice into the Vec. This also doesn't have any extra memory allocation. It's simpler because we can "lose" any odd trailing values because the Vec still maintains them.
pub fn convert_to_complex(buffer: &[f64]) -> &[num_complex::Complex<f64>] {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
let ptr = buffer.as_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
assert!(len % 2 == 0);
std::slice::from_raw_parts(ptr, len / 2)
}
}

How to convert Vec<Rgb<u8>> to Vec<u8>

Using the Piston image crate, I can write an image by feeding it a Vec<u8>, but my actual data is Vec<Rgb<u8>> (because that is a lot easier to deal with, and I want to grow it dynamically).
How can I convert Vec<Rgb<u8>> to Vec<u8>? Rgb<u8> is really [u8; 3]. Does this have to be an unsafe conversion?
The answer depends on whether you are fine with copying the data. If copying is not an issue for you, you can do something like this:
let img: Vec<Rgb<u8>> = ...;
let buf: Vec<u8> = img.iter().flat_map(|rgb| rgb.data.iter()).cloned().collect();
If you want to perform the conversion without copying, though, we first need to make sure that your source and destination types actually have the same memory layout. Rust makes very few guarantees about the memory layout of structs. It currently does not even guarantee that a struct with a single member has the same memory layout as the member itself.
In this particular case, the Rust memory layout is not relevant though, since Rgb is defined as
#[repr(C)]
pub struct Rgb<T: Primitive> {
pub data: [T; 3],
}
The #[repr(C)] attribute specifies that the memory layout of the struct should be the same as an equivalent C struct. The C memory layout is not fully specified in the C standard, but according to the unsafe code guidelines, there are some rules that hold for "most" platforms:
Field order is preserved.
The first field begins at offset 0.
Assuming the struct is not packed, each field's offset is aligned to the ABI-mandated alignment for that field's type, possibly creating unused padding bits.
The total size of the struct is rounded up to its overall alignment.
As pointed out in the comments, the C standard theoretically allows additional padding at the end of the struct. However, the Piston image library itself makes the assumption that a slice of channel data has the same memory layout as the Rgb struct, so if you are on a platform where this assumption does not hold, all bets are off anyway (and I couldnt' find any evidence that such a platform exists).
Rust does guarantee that arrays, slices and vectors are densely packed, and that structs and arrays have an alignment equal to the maximum alignment of their elements. Together with the assumption that the layout of Rgb is as specified by the rules I quotes above, this guarantees that Rgb<u8> is indeed laid out as three consecutive bytes in memory, and that Vec<Rgb<u8>> is indeed a consecutive, densely packed buffer of RGB values, so our conversion is safe. We still need to use unsafe code to write it:
let p = img.as_mut_ptr();
let len = img.len() * mem::size_of::<Rgb<u8>>();
let cap = img.capacity() * mem::size_of::<Rgb<u8>>();
mem::forget(img);
let buf: Vec<u8> = unsafe { Vec::from_raw_parts(p as *mut u8, len, cap) };
If you want to protect against the case that there is additional padding at the end of Rgb, you can check whether size_of::<Rgb<u8>>() is indeed 3. If it is, you can use the unsafe non-copying version, otherwise you have to use the first version above.
You choose the Vec<Rgb<u8>> storage format because it's easier to deal with and you want it to grow dynamically. But as you noticed, there's no guarantee of compatibility of its storage with a Vec<u8>, and no safe conversion.
Why not take the problem the other way and build a convenient facade for a Vec<u8> ?
type Rgb = [u8; 3];
#[derive(Debug)]
struct Img(Vec<u8>);
impl Img {
fn new() -> Img {
Img(Vec::new())
}
fn push(&mut self, rgb: &Rgb) {
self.0.push(rgb[0]);
self.0.push(rgb[1]);
self.0.push(rgb[2]);
}
// other convenient methods
}
fn main() {
let mut img = Img::new();
let rgb : Rgb = [1, 2, 3];
img.push(&rgb);
img.push(&rgb);
println!("{:?}", img);
}

Resources