let vector: Vec<u8> = Vec::new();
Can the vector in the code above cause a stackoverflow if the vector grows too big?
let vector: Vec<Box<u8>> = Vec::new();
How bout this one? Since its elements are on the heap.
let vector: Box<Vec<u8>> = Box::new(Vec::new());
I'm assuming that in the code above no stackoverflow should be possible, am i correct?
No the actual data is on heap. So there will not be stack overflow.
What is on stack is capacity, length and the pointer to the actual data on heap. If regrowth is required then it is done on the heap. If it is moved (not cloned) then what is copied is just the length, capacity and pointer to data (bitwise shallow copy).
Not the actual implementation but if you have to implement Vector then you will start with:
pub struct Vec<T> {
ptr: Unique<T>,
cap: usize,
len: usize,
}
You see the ptr is actually pointing to heap location where the data is. The vector on stack will consists of just few fields like the 3 mentioned above.
You cannot grow an object on stack, as you push objects on stack frame if any object is allowed to grow it will run over other objects. On heap for growth, if contiguous memory is not available, then entire data is moved to another place with newer capacity; if contiguous memory block is available, then growth in capacity is instant.
Related
Forgive me if there is an obvious answer to the question I'm asking, but I just don't quite understand it.
The Dynamically Sized Types and the Sized Trait section in chapter 19.3 Advanced Types of the 《The Rust Programming Language》 mentions:
Rust needs to know how much memory to allocate for any value of a particular type, and all values of a type must use the same amount of memory. If Rust allowed us to write this code, these two str values would need to take up the same amount of space. But they have different lengths: s1 needs 12 bytes of storage and s2 needs 15. This is why it’s not possible to create a variable holding a dynamically sized type.
When it says "and all values of a type must use the same amount of memory", it is meant to refer to dynamically sized types, not types such as vectors or arrays, right? v1 and v2 are also unlikely to occupy the same amount of memory.
let v1 = vec![1, 2, 3];
let v2 = vec![1, 2, 3, 4, 5, 6];
It's correct and considers vectors as well. A Vec<T> is roughly just a pointer to a position on the heap, a capacity, and a length. It could be defined, more or less, as
pub struct Vec<T>(T*, usize, usize);
And every value of that structure clearly has the same size. When Rust says that every value of a type has to have the same size, it only refers to the size of the structure itself, not to the recursive size of all things it points to. Box<T> has a constant size, regardless of T, which is why Box can hold even things that are dynamically sized, such as trait objects. Likewise, String is basically just a pointer.
Likewise, if we define
pub enum MyEnum {
A(i32),
B(i32, i32),
}
Then MyEnum::A is no smaller than MyEnum::B, for similar reasons, despite the latter having more data than the former.
Every type that can be stored and accessed without the indirection of a reference or Box must have the Sized trait implemented. This means that every instance of the type will have the same size. A str is a DST, as the data it holds can be of a variable length, and thus, you can only access strs as references, or from String, which holds the str data on the heap, through a pointer.
Every Vec also takes the same space, which is 24 bytes on a 64-bit machine.
For example:
let vec = vec![1, 2, 3, 4];
println!("{}", std::mem::size_of_val(&vec)); // Prints '24'.
I want to dynamically allocate an array at runtime, so I use Vec to implement it. And I want to use a raw pointer to point the array address, like this:
fn alloc(&mut page: Page) {
page.data = vec![0; page.page_size].as_mut_ptr();//it's a Vec<u8>
}
I want to know if the pointer directly points to the vec buffer, and the length is exactly the page.page_size?
The effect I want to have is just like the following C code:
void alloc(Page* page) {
page->data = (u8*)malloc(page->page_size);
}
As far as I know the only guarantee you get from vec in this case, when it is created by new() or vec![] macro is it will have reserved at least page_size bytes of data. To reserve exact amount of bytes create it with Vec::with_capacity().
When working with raw pointers it is the responsibility of the programmer to ensure the data lives long enough. In this example the vec will only live within alloc so when your function return the address it will free the buffer at the same time.
I have complex number data filled into a Vec<f64> by an external C library (prefer not to change) in the form [i_0_real, i_0_imag, i_1_real, i_1_imag, ...] and it appears that this Vec<f64> has the same memory layout as a Vec<num_complex::Complex<f64>> of half the length would be, given that num_complex::Complex<f64>'s data structure is memory-layout compatible with [f64; 2] as documented here. I'd like to use it as such without needing a re-allocation of a potentially large buffer.
I'm assuming that it's valid to use from_raw_parts() in std::vec::Vec to fake a new Vec that takes ownership of the old Vec's memory (by forgetting the old Vec) and use size / 2 and capacity / 2, but that requires unsafe code. Is there a "safe" way to do this kind of data re-interpretation?
The Vec is allocated in Rust as a Vec<f64> and is populated by a C function using .as_mut_ptr() that fills in the Vec<f64>.
My current compiling unsafe implementation:
extern crate num_complex;
pub fn convert_to_complex_unsafe(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
let new_vec = unsafe {
Vec::from_raw_parts(
buffer.as_mut_ptr() as *mut num_complex::Complex<f64>,
buffer.len() / 2,
buffer.capacity() / 2,
)
};
std::mem::forget(buffer);
return new_vec;
}
fn main() {
println!(
"Converted vector: {:?}",
convert_to_complex_unsafe(vec![3.0, 4.0, 5.0, 6.0])
);
}
Is there a "safe" way to do this kind of data re-interpretation?
No. At the very least, this is because the information you need to know is not expressed in the Rust type system but is expressed via prose (a.k.a. the docs):
Complex<T> is memory layout compatible with an array [T; 2].
— Complex docs
If a Vec has allocated memory, then [...] its pointer points to len initialized, contiguous elements in order (what you would see if you coerced it to a slice),
— Vec docs
Arrays coerce to slices ([T])
— Array docs
Since a Complex is memory-compatible with an array, an array's data is memory-compatible with a slice, and a Vec's data is memory-compatible with a slice, this transformation should be safe, even though the compiler cannot tell this.
This information should be attached (via a comment) to your unsafe block.
I would make some small tweaks to your function:
Having two Vecs at the same time pointing to the same data makes me very nervous. This can be trivially avoided by introducing some variables and forgetting one before creating the other.
Remove the return keyword to be more idiomatic
Add some asserts that the starting length of the data is a multiple of two.
As rodrigo points out, the capacity could easily be an odd number. To attempt to avoid this, we call shrink_to_fit. This has the downside that the Vec may need to reallocate and copy the memory, depending on the implementation.
Expand the unsafe block to cover all of the related code that is required to ensure that the safety invariants are upheld.
pub fn convert_to_complex(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
buffer.shrink_to_fit();
let ptr = buffer.as_mut_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
let cap = buffer.capacity();
assert!(len % 2 == 0);
assert!(cap % 2 == 0);
std::mem::forget(buffer);
Vec::from_raw_parts(ptr, len / 2, cap / 2)
}
}
To avoid all the worrying about the capacity, you could just convert a slice into the Vec. This also doesn't have any extra memory allocation. It's simpler because we can "lose" any odd trailing values because the Vec still maintains them.
pub fn convert_to_complex(buffer: &[f64]) -> &[num_complex::Complex<f64>] {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
let ptr = buffer.as_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
assert!(len % 2 == 0);
std::slice::from_raw_parts(ptr, len / 2)
}
}
I'm trying to write a little buffer-thing for parsing so I can pull records off the front of as I parse them out, ideally without making any copies and just transferring ownership of chunks of the front of the buffer off as I run. Here's my implementation:
struct BufferThing {
buf: Vec<u8>,
}
impl BufferThing {
fn extract(&mut self, size: usize) -> Vec<u8> {
assert!(size <= self.buf.len());
let remaining: usize = self.buf.len() - size;
let ptr: *mut u8 = self.buf.as_mut_ptr();
unsafe {
self.buf = Vec::from_raw_parts(ptr.offset(size as isize), remaining, remaining);
Vec::from_raw_parts(ptr, size, size)
}
}
}
This compiles, but panics with a signal: 11, SIGSEGV: invalid memory reference as it starts running. This is mostly the same code as the example in the Nomicon, but I'm trying to do it on Vec's and I'm trying to split a field instead of the object itself.
Is it possible to do this without copying out one of the Vecs? And is there some section of the Nomicon or other documentation that explains why I'm blowing everything up in the unsafe block?
Unfortunately, that's not how memory allocators work. It might have been possible in the past, when memory was at a premium, but today's allocators are geared for speed rather than memory preservation.
A common implementation of memory allocators is to use slabs. Basically, it's:
struct Allocator {
less_than_32_bytes: List<[u8; 32]>,
less_than_64_bytes: List<[u8; 64]>,
less_than_128_bytes: List<[u8; 128]>,
less_than_256_bytes: List<[u8; 256]>,
less_than_512_bytes: List<[u8; 512]>,
...
}
When you request 96 bytes, it takes an element from less_than_128_bytes.
When you free that element, it frees all of it, not just the first N bytes, and the whole block is now re-usable. Any pointer inside the block is now dangling and should NOT be dereferenced.
Furthermore, trying to free a pointer in the middle of a block will only confuse the allocator: it won't find it, because the contract is that you address blocks by their first byte.
You violated the contract using unsafe code, BOOM.
The solution I propose is simple:
use a single Vec<u8> containing the whole buffer to parse
use slices into this Vec for parsing
Rust will check the lifetimes, so your slices cannot outlive the buffer, and slicing a slice further (s[..offset], s[offset..]) does not allocate.
If you don't mind one allocation, there's Vec::split_off which allocates a new Vec big enough for the split part.
I have a vector of u8 that I want to interpret as a vector of u32. It is assumed that the bytes are in the right order. I don't want to allocate new memory and copy bytes after casting. I got the following to work:
use std::mem;
fn reinterpret(mut v: Vec<u8>) -> Option<Vec<u32>> {
let v_len = v.len();
v.shrink_to_fit();
if v_len % 4 != 0 {
None
} else {
let v_cap = v.capacity();
let v_ptr = v.as_mut_ptr();
println!("{:?}|{:?}|{:?}", v_len, v_cap, v_ptr);
let v_reinterpret = unsafe { Vec::from_raw_parts(v_ptr as *mut u32, v_len / 4, v_cap / 4) };
println!("{:?}|{:?}|{:?}",
v_reinterpret.len(),
v_reinterpret.capacity(),
v_reinterpret.as_ptr());
println!("{:?}", v_reinterpret);
println!("{:?}", v); // v is still alive, but is same as rebuilt
mem::forget(v);
Some(v_reinterpret)
}
}
fn main() {
let mut v: Vec<u8> = vec![1, 1, 1, 1, 1, 1, 1, 1];
let test = reinterpret(v);
println!("{:?}", test);
}
However, there's an obvious problem here. From the shrink_to_fit documentation:
It will drop down as close as possible to the length but the allocator may still inform the vector that there is space for a few more elements.
Does this mean that my capacity may still not be a multiple of the size of u32 after calling shrink_to_fit? If in from_raw_parts I set capacity to v_len/4 with v.capacity() not an exact multiple of 4, do I leak those 1-3 bytes, or will they go back into the memory pool because of mem::forget on v?
Is there any other problem I am overlooking here?
I think moving v into reinterpret guarantees that it's not accessible from that point on, so there's only one owner from the mem::forget(v) call onwards.
This is an old question, and it looks like it has a working solution in the comments. I've just written up what exactly goes wrong here, and some solutions that one might create/use in today's Rust.
This is undefined behavior
Vec::from_raw_parts is an unsafe function, and thus you must satisfy its invariants, or you invoke undefined behavior.
Quoting from the documentation for Vec::from_raw_parts:
ptr needs to have been previously allocated via String/Vec (at least, it's highly likely to be incorrect if it wasn't).
T needs to have the same size and alignment as what ptr was allocated with. (T having a less strict alignment is not sufficient, the alignment really needs to be equal to satsify the dealloc requirement that memory must be allocated and deallocated with the same layout.)
length needs to be less than or equal to capacity.
capacity needs to be the capacity that the pointer was allocated with.
So, to answer your question, if capacity is not equal to the capacity of the original vec, then you've broken this invariant. This gives you undefined behavior.
Note that the requirement isn't on size_of::<T>() * capacity either, though, which brings us to the next topic.
Is there any other problem I am overlooking here?
Three things.
First, the function as written is disregarding another requirement of from_raw_parts. Specifically, T must have the same size as alignment as the original T. u32 is four times as big as u8, so this again breaks this requirement. Even if capacity*size remains the same, size isn't, and capacity isn't. This function will never be sound as implemented.
Second, even if all of the above was valid, you've also ignored the alignment. u32 must be aligned to 4-byte boundaries, while a Vec<u8> is only guaranteed to be aligned to a 1-byte boundary.
A comment on the OP mentions:
I think on x86_64, misalignment will have performance penalty
It's worth noting that while this may be true of machine language, it is not true for Rust. The rust reference explicitly states "A value of alignment n must only be stored at an address that is a multiple of n." This is a hard requirement.
Why the exact type requirement?
Vec::from_raw_parts seems like it's pretty strict, and that's for a reason. In Rust, the allocator API operates not only on allocation size, but on a Layout, which is the combination of size, number of things, and alignment of individual elements. In C with memalloc, all the allocator can rely upon is that the size is the same, and some minimum alignment. In Rust, though, it's allowed to rely on the entire Layout, and invoke undefined behavior if not.
So in order to correctly deallocate the memory, Vec needs to know the exact type that it was allocated with. By converting a Vec<u32> into Vec<u8>, it no longer knows this information, and so it can no longer properly deallocate this memory.
Alternative - Transforming slices
Vec::from_raw_parts's strictness comes from the fact that it needs to deallocate the memory. If we create a borrowing slice, &[u32] instead, we no longer need to deal with it! There is no capacity when turning a &[u8] into &[u32], so we should be all good, right?
Well, almost. You still have to deal with alignment. Primitives are generally aligned to their size, so a [u8] is only guaranteed to be aligned to 1-byte boundaries, while [u32] must be aligned to a 4-byte boundary.
If you want to chance it, though, and create a [u32] if possible, there's a function for that - <[T]>::align_to:
pub unsafe fn align_to<U>(&self) -> (&[T], &[U], &[T])
This will trim of any starting and ending misaligned values, and then give you a slice in the middle of your new type. It's unsafe, but the only invariant you need to satisfy is that the elements in the middle slice are valid.
It's sound to reinterpret 4 u8 values as a u32 value, so we're good.
Putting it all together, a sound version of the original function would look like this. This operates on borrowed rather than owned values, but given that reinterpreting an owned Vec is instant-undefined-behavior in any case, I think it's safe to say this is the closest sound function:
use std::mem;
fn reinterpret(v: &[u8]) -> Option<&[u32]> {
let (trimmed_front, u32s, trimmed_back) = unsafe { v.align_to::<u32>() };
if trimmed_front.is_empty() && trimmed_back.is_empty() {
Some(u32s)
} else {
// either alignment % 4 != 0 or len % 4 != 0, so we can't do this op
None
}
}
fn main() {
let mut v: Vec<u8> = vec![1, 1, 1, 1, 1, 1, 1, 1];
let test = reinterpret(&v);
println!("{:?}", test);
}
As a note, this could also be done with std::slice::from_raw_parts rather than align_to. However, that requires manually dealing with the alignment, and all it really gives is more things we need to ensure we're doing right. Well, that and compatibility with older compilers - align_to was introduced in 2018 in Rust 1.30.0, and wouldn't have existed when this question was asked.
Alternative - Copying
If you do need a Vec<u32> for long term data storage, I think the best option is to just allocate new memory. The old memory is allocated for u8s anyways, and wouldn't work.
This can be made fairly simple with some functional programming:
fn reinterpret(v: &[u8]) -> Option<Vec<u32>> {
let v_len = v.len();
if v_len % 4 != 0 {
None
} else {
let result = v
.chunks_exact(4)
.map(|chunk: &[u8]| -> u32 {
let chunk: [u8; 4] = chunk.try_into().unwrap();
let value = u32::from_ne_bytes(chunk);
value
})
.collect();
Some(result)
}
}
First, we use <[T]>::chunks_exact to iterate over chunks of 4 u8s. Next, try_into to convert from &[u8] to [u8; 4]. The &[u8] is guaranteed to be length 4, so this never fails.
We use u32::from_ne_bytes to convert the bytes into a u32 using native endianness. If interacting with a network protocol, or on-disk serialization, then using from_be_bytes or from_le_bytes may be preferable. And finally, we collect to turn our result back into a Vec<u32>.
As a last note, a truly general solution might use both of these techniques. If we change the return type to Cow<'_, [u32]>, we could return aligned, borrowed data if it works, and allocate a new array if it doesn't! Not quite the best of both worlds, but close.