Rust pointer being freed was not allocated error - rust

Here's the situation, I want to do some data conversion from a string, and for convenience, I converted it to a pointer in the middle, and now I want to return the part of the string, but I'm stuck with this exception:
foo(74363,0x10fd2fdc0) malloc: *** error for object 0x7ff65ff000d1: pointer being freed was not allocated
foo(74363,0x10fd2fdc0) malloc: *** set a breakpoint in malloc_error_break to debug
When I try to debug the program, I got the error message as shown above.
Here's my sample code:
fn main() {
unsafe {
let mut s = String::from_utf8_unchecked(vec![97, 98]);
let p = s.as_ptr();
let k = p.add(1);
String::from_raw_parts(k as *mut u8, 1, 1);
}
}

You should never use an unsafe function without understanding its documentation, 100%.
So, what does String::from_raw_parts says:
Safety
This is highly unsafe, due to the number of invariants that aren't
checked:
The memory at ptr needs to have been previously allocated by the same allocator the standard library uses, with a required alignment of exactly 1.
length needs to be less than or equal to capacity.
capacity needs to be the correct value.
Violating these may cause problems like corrupting the allocator's internal data structures.
The ownership of ptr is effectively transferred to the String which may then deallocate, reallocate or change the contents of memory pointed to by the pointer at will. Ensure that nothing else uses the pointer after calling this function.
There are two things that stand out here:
The memory at ptr needs to have been previously allocated.
capacity needs to be the correct value.
And those are related to how allocations work in Rust. Essentially, deallocation only expects the very pointer value (and type) that allocation returned.
Shenanigans such as trying to deallocate a pointer pointing in the middle of an allocation, with a different alignment, or with a different size, are Not Allowed.
Furthermore, you also missed:
Ensure that nothing else uses the pointer after calling this function.
Here, the original instance of String is still owning the allocation, and you are trying to deallocate one byte out of it. It cannot ever go well.

Related

How to shrink a Vec or String from an offset without reallocating it?

Multiple structures in rust have shrink_to or shrink_to_fit methods, such as Vec and String. But apparently there's nothing like shrink_from_to.
Why would I want that?
Assume I've a XY gigabyte string or vector in memory and know the exact start and end positions of the part I am interesting in (which allocates only Z GB from start to end, somewhere in the middle). I could call truncate and then shrink_from_to effectivly freeing memory.
However I've still gigabytes of memory occupied by [0..start] which are of no relevance for my further processing.
Question
Is there any way to free this memory too without reallocating and copying the relevant parts there?
Note that shrink_to_fit does reallocate by copying into a smaller buffer and freeing the old buffer. Your best bet is probably just converting the slice you care about into an owned Vec and then dropping the original Vec.
fn main() {
let v1 = (0..1000).collect::<Vec<_>>(); // needs to be dropped
println!("{}", v1.capacity()); // 1000
let v2 = v1[100..150].to_owned(); // don't drop this!
println!("{}", v2.capacity()); // 50
drop(v1);
}
Is there any way to free this memory too without reallocating and copying the relevant parts there?
Move the segment you want to keep to the start of the collection (e.g. replace_range, drain, copy_within, rotate, ...), then truncate, then shrink.
APIs like realloc and mremap work in terms of "memory blocks" (aka allocations returned by malloc/mmap), they don't work in terms of random pointers. So a hypothetical shrink_from_to would just be doing that under the cover, since you can't really resize allocations from both ends.

Will data be copied byte by byte when moving a variable?

I'm new to rust, and I am wondering what exactly happens when moving a variable.
struct Point {
x: i32,
y: i32,
}
fn main() {
let p = Point { x: 1, y: 1 };
let q = p;
}
When let q = p;, will the data (size of which is 8 bytes) be copied from a memory address to another? Since p is moved here thus it cannot be used anymore, I think it is good to make q's underlying memory address equal to p's. In another word, I think it is OK that nothing is copied in the machine code.
So my question is: will data be copied byte by byte when moving a variable? If it will, why?
[W]ill data be copied byte by byte when moving a variable?
In general, yes. To move a value, Rust simply performs a bitwise copy. If the value is not Copy, the source won't be used anymore after the move. If the value is Copy, both the source and the destination can be used.
However, there are many cases when the compiler backend can eliminate the copy by proving that the code beahves identical without the copy. This optimization happens completely in LLVM. In your example, the LLVM IR still contains the instructions to move the data, but the generated code does not contain the move even in debug mode.
If it will, why?
There are many reasons why the compiler can be unable to use the same memory for source and destination. In your example, with two variables in the same stack frame, it's easy to see that the move is not needed, but the code is a bit pointless anyway (though sometimes people do move values inside a function to make a variable immutable).
Here are just a few illustrations why the compiler may be unable to reuse the source memory for the destination:
The source value may be on the stack, while the destination is on the heap or vice versa. The statement let b = Box::new(3); will move the value 3 from the stack to the heap'; let i = *b; will move it from the heap back to the stack. It's still possible that the compiler can eliminate these moves, e.g. by writing the constant 3 to the heap immediately, without writing it to the stack first.
Source and destination may be on different stack frames, when moving values across functions – e.g. when passing a value into a function, or when returning a value from a function.
Source an destination values may be stored in struct fields, so they need to have the right offset inside the struct.
These are just a few examples. The takeaway is that in general, a move may result in a bitwise copy. Keep in mind that a bitwise copy is very cheap, though, and that the optimizer usually does a good job, so you should only worry about this if you actually have a proven performance bottleneck.

Does std::ptr::write transfer the "uninitialized-ness" of the bytes it writes?

I'm working on a library that help transact types that fit in a pointer-size int over FFI boundaries. Suppose I have a struct like this:
use std::mem::{size_of, align_of};
struct PaddingDemo {
data: u8,
force_pad: [usize; 0]
}
assert_eq!(size_of::<PaddingDemo>(), size_of::<usize>());
assert_eq!(align_of::<PaddingDemo>(), align_of::<usize>());
This struct has 1 data byte and 7 padding bytes. I want to pack an instance of this struct into a usize and then unpack it on the other side of an FFI boundary. Because this library is generic, I'm using MaybeUninit and ptr::write:
use std::ptr;
use std::mem::MaybeUninit;
let data = PaddingDemo { data: 12, force_pad: [] };
// In order to ensure all the bytes are initialized,
// zero-initialize the buffer
let mut packed: MaybeUninit<usize> = MaybeUninit::zeroed();
let ptr = packed.as_mut_ptr() as *mut PaddingDemo;
let packed_int = unsafe {
std::ptr::write(ptr, data);
packed.assume_init()
};
// Attempt to trigger UB in Miri by reading the
// possibly uninitialized bytes
let copied = unsafe { ptr::read(&packed_int) };
Does that assume_init call triggered undefined behavior? In other words, when the ptr::write copies the struct into the buffer, does it copy the uninitialized-ness of the padding bytes, overwriting the initialized state as zero bytes?
Currently, when this or similar code is run in Miri, it doesn't detect any Undefined Behavior. However, per the discussion about this issue on github, ptr::write is supposedly allowed to copy those padding bytes, and furthermore to copy their uninitialized-ness. Is that true? The docs for ptr::write don't talk about this at all, nor does the nomicon section on uninitialized memory.
Does that assume_init call triggered undefined behavior?
Yes. "Uninitialized" is just another value that a byte in the Rust Abstract Machine can have, next to the usual 0x00 - 0xFF. Let us write this special byte as 0xUU. (See this blog post for a bit more background on this subject.) 0xUU is preserved by copies just like any other possible value a byte can have is preserved by copies.
But the details are a bit more complicated.
There are two ways to copy data around in memory in Rust.
Unfortunately, the details for this are also not explicitly specified by the Rust language team, so what follows is my personal interpretation. I think what I am saying is uncontroversial unless marked otherwise, but of course that could be a wrong impression.
Untyped / byte-wise copy
In general, when a range of bytes is being copied, the source range just overwrites the target range -- so if the source range was "0x00 0xUU 0xUU 0xUU", then after the copy the target range will have that exact list of bytes.
This is what memcpy/memmove in C behave like (in my interpretation of the standard, which is not very clear here unfortunately). In Rust, ptr::copy{,_nonoverlapping} probably performs a byte-wise copy, but it's not actually precisely specified right now and some people might want to say it is typed as well. This was discussed a bit in this issue.
Typed copy
The alternative is a "typed copy", which is what happens on every normal assignment (=) and when passing values to/from a function. A typed copy interprets the source memory at some type T, and then "re-serializes" that value of type T into the target memory.
The key difference to a byte-wise copy is that information which is not relevant at the type T is lost. This is basically a complicated way of saying that a typed copy "forgets" padding, and effectively resets it to uninitialized. Compared to an untyped copy, a typed copy loses more information. Untyped copies preserve the underlying representation, typed copies just preserve the represented value.
So even when you transmute 0usize to PaddingDemo, a typed copy of that value can reset this to "0x00 0xUU 0xUU 0xUU" (or any other possible bytes for the padding) -- assuming data sits at offset 0, which is not guaranteed (add #[repr(C)] if you want that guarantee).
In your case, ptr::write takes an argument of type PaddingDemo, and the argument is passed via a typed copy. So already at that point, the padding bytes may change arbitrarily, in particular they may become 0xUU.
Uninitialized usize
Whether your code has UB then depends on yet another factor, namely whether having an uninitialized byte in a usize is UB. The question is, does a (partially) uninitialized range of memory represent some integer? Currently, it does not and thus there is UB. However, whether that should be the case is heavily debated and it seems likely that we will eventually permit it.
Many other details are still unclear, though -- for example, transmuting "0x00 0xUU 0xUU 0xUU" to an integer may well result in a fully uninitialized integer, i.e., integers may not be able to preserve "partial initialization". To preserve partially initialized bytes in integers we would have to basically say that an integer has no abstract "value", it is just a sequence of (possibly uninitialized) bytes. This does not reflect how integers get used in operations like /. (Some of this also depends on LLVM decisions around poison and freeze; LLVM might decide that when doing a load at integer type, the result is fully poison if any input byte is poison.) So even if the code is not UB because we permit uninitialized integers, it may not behave as expected because the data you want to transfer is being lost.
If you want to transfer raw bytes around, I suggest to use a type suited for that, such as MaybeUninit. If you use an integer type, the goal should be to transfer integer values -- i.e., numbers.

Is it safe to clone a type-erased Arc via raw pointer?

I'm in a situation where I'm working with data wrapped in an Arc, and I sometimes end up using into_raw to get the raw pointer to the underlying data. My use case also calls for type-erasure, so the raw pointer often gets cast to a *const c_void, then cast back to the appropriate concrete type when re-constructing the Arc.
I've run into a situation where it would be useful to be able to clone the Arc without needing to know the concrete type of the underlying data. As I understand it, it should be safe to reconstruct the Arc with a dummy type solely for the purpose of calling clone, so long as I never actually dereference the data. So, for example, this should be safe:
pub unsafe fn clone_raw(handle: *const c_void) -> *const c_void {
let original = Arc::from_raw(handle);
let copy = original.clone();
mem::forget(original);
Arc::into_raw(copy)
}
Is there anything that I'm missing that would make this actually unsafe? Also, I assume the answer would apply to Rc as well, but if there are any differences please let me know!
This is almost always unsafe.
An Arc<T> is just a pointer to a heap-allocated struct which roughly looks like
struct ArcInner<T: ?Sized> {
strong: atomic::AtomicUsize,
weak: atomic::AtomicUsize,
data: T, // You get a raw pointer to this element
}
into_raw() gives you a pointer to the data element. The implementation of Arc::from_raw() takes such a pointer, assumes that it's a pointer to the data-element in an ArcInner<T>, walks back in memory and assumes to find an ArcInner<T> there. This assumption depends on the memory-layout of T, specifically it's alignment and therefore it's exact placement in ArcInner.
If you call into_raw() on an Arc<U> and then call from_raw() as if it was an Arc<V> where U and V differ in alignment, the offset-calculation of where U/V is in ArcInner will be wrong and the call to .clone() will corrupt the data structure. Dereferencing T is therefore not required to trigger memory unsafety.
In practice, this might not be a problem: Since data is the third element after two usize-elements, most T will probably be aligned the same way. However, if the stdlib-implementation changes or you end up compiling for a platform where this assumption is wrong, reconstructing an Arc<V>::from_raw that was created by an Arc<U> where the memory layout of V and U is different will be unsafe and crash.
Update:
Having thought about it some more I downgrade my vote from "might be safe, but cringy" to "most likely unsafe" because I can always do
#[repr(align(32))]
struct Foo;
let foo = Arc::new(Foo);
In this example Foo will be aligned to 32 bytes, making ArcInner<Foo> 32 bytes in size (8+8+16+0) while a ArcInner<()> is just 16 bytes (8+8+0+0). Since there is no way to tell what the alignment of T is after the type has been erased, there is no way to reconstruct a valid Arc.
There is an escape hatch that might be safe in practice: By wrapping T into another Box, the layout of ArcInner<T> is always the same. In order to force this upon any user, you can do something like
struct ArcBox<T>(Arc<Box<T>>)
and implement Deref on that. Using ArcBox instead of Arc forces the memory layout of ArcInner to always be the same, because T is behind another pointer. This, however, means that all access to T requires a double dereference, which might badly affect performance.

How can I convert a Vec<T> into a C-friendly *mut T?

I have a Rust library that returns a u8 array to a C caller via FFI. The library also handles dropping the array after the client is done with it. The library has no state, so the client needs to own the array until it is passed back to the library for freeing.
Using box::from_raw and boxed::into_raw would be nice, but I couldn't manage to work out how to convert the array into the return type.
A Vec<T> is described by 3 values:
A pointer to its first element, that can be obtained with .as_mut_ptr()
A length, that can be obtained with .len()
A capacity, that can be obtained with .capacity()
In terms of a C array, the capacity is the size of memory allocated, while the length is the number of elements actually contained in the array. Both are counting in number of T. You normally would need to provide these 3 values to your C code.
If you want them to be equals, you can use .shrink_to_fit() on the vector to reduce its capacity as near as its size as possible depending on the allocator.
If you give back the ownership of the Vec<T> to your C code, don't forget to call std::mem::forget(v) on it once you have retrieved the 3 values described before, to avoid having its destructor running at the end of the function.
Afterwards, you can create back a Vec from these 3 values using from_raw_parts(..) like this:
let v = unsafe { Vec::<T>::from_raw_parts(ptr, length, capacity) };
and when its destructor will run the memory will be correctly freed. Be careful, the 3 values need to be correct for deallocation of memory to be correct. It's not very important for a Vec<u8>, but the destructor of Vec will run the destructor of all data it contains according to its length.

Resources