Reuse raw pointer to rehydrate Rust Box - rust

I have a function that coverts a Box::into_raw result into a u64. I later 're-Box' with from the u64.
// Somewhere at inception
let bbox = Box::new(MyStruct::from_u32(1u32).unwrap());
let rwptr = Box::into_raw(bbox);
let bignum_ptr = rwptr as u64;
// Later in life
let rehyrdrate: Box<MyStruct> = unsafe {
Box::from_raw(bignum_ptr as *mut MyStruct)
};
What I would like to do is 're-Box' that bignum_ptr again, and again, as needed. Is this possible?

A box owns the data it points to, and will deallocate/drop it when it goes out of scope, so if you need use the same pointer in more than one place, Box is not the correct type. To support multiple "revivals" of the pointer, you can use a reference instead:
// safety contract: bignum_ptr comes from a valid pointer, there are no
// mutable references
let rehydrate: &MyStruct = unsafe { &*(bignum_ptr as *const MyStruct) };
When the time comes to free the initial box and its data (and you know that no outstanding references exist), only then re-create the box using Box::from_raw:
// safety contract: bignum_ptr comes from a valid pointer, no references
// of any kind remain
drop(unsafe { Box::from_raw(bignum_ptr as *const MyStruct) });

What I would like to do is 're-Box' that bignum_ptr again, and again, as needed. Is this possible?
If you mean creating many boxes from the same pointer without taking it out each time, no.
If you mean putting it in and out repeatedly and round-tripping every time via an integer, probably yes; however, I would be careful with code like that. Most likely, it will work, but be aware that the memory model for Rust is not formalized and the rules around pointer provenance may change in the future. Even the C and C++ standards (from where Rust memory model comes from) have open questions around those, including round-tripping via an integer type.
Furthermore, your code assumes a pointer fits in a u64, which is likely true for most architectures, but maybe not all in the future.
At the very least, I suggest you use mem::transmute rather than a cast.
In short: don't do it. There is likely a better design for what you are trying to achieve.

Related

Is allowing library users to embed arbitrary data in your structures a correct usage of std::mem::transmute?

A library I'm working on stores various data structures in a graph-like manner.
I'd like to let users store metadata ("annotations") in nodes, so they can retrieve them later. Currently, they have to create their own data structure which mirrors the library's, which is very inconvenient.
I'm placing very little constraints on what an annotation can be, because I do not know what the users will want to store in the future.
The rest of this question is about my current attempt at solving this use case, but I'm open to completely different implementations as well.
User annotations are represented with a trait:
pub trait Annotation {
fn some_important_method(&self)
}
This trait contains a few methods (all on &self) which are important for the domain, but these are always trivial to implement for users. The real data of an annotation implementation cannot be retrieved this way.
I can store a list of annotations this way:
pub struct Node {
// ...
annotations: Vec<Box<dyn Annotation>>,
}
I'd like to let the user retrieve whatever implementation they previously added to a list, something like this:
impl Node {
fn annotations_with_type<T>(&self) -> Vec<&T>
where
T: Annotation,
{
// ??
}
}
I originally aimed to convert dyn Annotation to dyn Any, then use downcast_ref, however trait upcasting coercion is unsable.
Another solution would be to require each Annotation implementation to store its TypeId, compare it with annotations_with_type's type parameter's TypeId, and std::mem::transmute the resulting &dyn Annotation to &T… but the documentation of transmute is quite scary and I honestly don't know whether that's one of the allowed cases in which it is safe. I definitely would have done some kind of void * in C.
Of course it's also possible that there's a third (safe) way to go through this. I'm open to suggestions.
What you are describing is commonly solved by TypeMaps, allowing a type to be associated with some data.
If you are open to using a library, you might consider looking into using an existing implementation, such as https://crates.io/crates/typemap_rev, to store data. For example:
struct MyAnnotation;
impl TypeMapKey for MyAnnotation {
type Value = String;
}
let mut map = TypeMap::new();
map.insert::<MyAnnotation>("Some Annotation");
If you are curious. It underlying uses a HashMap<TypeId, Box<(dyn Any + Send + Sync)>> to store the data. To retrieve data, it uses a downcast_ref on the Any type which is stable. This could also be a pattern to implement it yourself if needed.
You don't have to worry whether this is valid - because it doesn't compile (playground):
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/main.rs:7:18
|
7 | _ = unsafe { std::mem::transmute::<&dyn Annotation, &i32>(&*v) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: source type: `&dyn Annotation` (128 bits)
= note: target type: `&i32` (64 bits)
The error message should be clear, I hope: &dyn Trait is a fat pointer, and has size 2*size_of::<usize>(). &T, on the other hand, is a thin pointer (as long as T: Sized), of size of only one usize, and you cannot transmute between types of different sizes.
You can work around that with transmute_copy(), but it will just make things worse: it will work, but it is unsound and is not guaranteed to work in any way. It may become UB in future Rust versions. This is because the only guaranteed thing (as of now) for &dyn Trait references is:
Pointers to unsized types are sized. The size and alignment is guaranteed to be at least equal to the size and alignment of a pointer.
Nothing guarantees the order of the fields. It can be (data_ptr, vtable_ptr) (as it is now, and thus transmute_copy() works) or (vtable_ptr, data_ptr). Nothing is even guaranteed about the contents. It can not contain a data pointer at all (though I doubt somebody will ever do something like that). transmute_copy() copies the data from the beginning, meaning that for the code to work the data pointer should be there and should be first (which it is). For the code to be sound this needs to be guaranteed (which is not).
So what can we do? Let's check how Any does its magic:
// SAFETY: caller guarantees that T is the correct type
unsafe { &*(self as *const dyn Any as *const T) }
So it uses as for the conversion. Does it work? Certainly. And that means std can do that, because std can do things that are not guaranteed and relying on how things work in practice. But we shouldn't. So, is it guaranteed?
I don't have a firm answer, but I'm pretty sure the answer is no. I have found no authoritative source that guarantees the behavior of casts from unsized to sized pointers.
Edit: #CAD97 pointed on Zulip that the reference promises that *[const|mut] T as *[const|mut V] where V: Sized will be a pointer-to-pointer case, and that can be read as a guarantee this will work.
But I still feel fine with relying on that. Because, unlike the transmute_copy(), people are doing it. In production. And there is no better way in stable. So the chance it will become undefined behavior is very low. It is much more likely to be defined.
Does a guaranteed way even exist? Well, yes and no. Yes, but only using the unstable pointer metadata API:
#![feature(ptr_metadata)]
let v: &dyn Annotation;
let v = v as *const dyn Annotation;
let v: *const T = v.to_raw_parts().0.cast::<T>();
let v: &T = unsafe { &*v };
In conclusion, if you can use nightly features, I would prefer the pointer metadata API just to be extra safe. But in case you can't, I think the cast approach is fine.
Last point, there may be a crate that already does that. Prefer that, if it exists.

why does value allocated in stack didn't result in double free pointer?

Please tell me why didn't result in double free pointer that the value is allocated in stack? Thanks.
#[test]
fn read_value_that_allocated_in_stack_is_no_problem() {
let origin = Value(1);
let copied = unsafe { std::ptr::read(&origin) };
assert_eq!(copied, Value(1));
assert_eq!(copied, origin);
}
/// test failed as expected: double free detected
#[test]
fn read_value_that_allocated_in_heap_will_result_in_double_free_problem() {
let origin = Box::new(Value(1));
let copied = unsafe { std::ptr::read(&origin) };
assert_eq!(copied, Box::new(Value(1)));
assert_eq!(copied, origin);
}
#[derive(Debug, PartialEq)]
struct Value<T>(T);
The unsafe method you are using just creates a bitwise copy of the referenced value. When you do this with a Box, it's not okay but for something like your Value struct containing an integer, it is okay to make the copy as Drop of integers has no side effects while drop of Box accesses global allocator and changes the state.
If you do not understand any term I used for this explanation, try to search it or ask in the comments.
Those tests hide the fact that you use different types in them. It isn't really about stack or heap.
In the first one you use Value<i32> type, which is your custom type, presumably without custom Drop implemented. If so, then Rust will call Drop on each member, in this case the i32 member. Which does nothing. And so nothing happens when both objects go out of scope. Even if you implement Drop, it would have to have some serious side-effects (like call to free) for it to fail.
In the second one you actually use Box type, which does implement Drop. Internally it calls free (in addition to dropping the underlying object). And so free is called twice on drop, trying to free the same pointer (because of the unsafe copy).
This is not a double free because we do not free the memory twice. Because we do not free memory at all.
However, whether this is valid or UB is another question. Miri does not flag it as UB, however Miri does not aim to flag any UB out there. If Value<i32> was Copy, that would be fine. As it is not, it depends on the question whether std::ptr::read() invalidates the original data, i.e. is it always invalid to use a data that was std::ptr::read()'ed, or only if it violates Stacked Borrows semantics, like in the case of copying the Box itself where both destructors try to access the Box thereafter?
The answer is that it's not decided yet. As per UCG issue #307, "Are Copy implementations semantically relevant (besides specialization)?":
#steffahn
Overall, this still being an open question means that while miri doesn't complain, one should avoid code like this because it's not yet certain that it won't be UB, right?
#RalfJung
Yes.
In conclusion, you should avoid code like that.

Should functions that depend upon specific values be made unsafe?

I have a function that takes a usize equivalent to a pointer, and aligns it up to the next alignment point.
It doesn't require any unsafe as it's side effect free, but the alignment must be a power of two with this implementation. This means that if you use the function with bad parameters, you might get undefined behaviour later down the line. I can't check for this inside the function itself with assert! as it's supposed to be very fast.
/// Align the given address `addr` upwards to alignment `align`.
///
/// Unsafe as `align` must be a power of two.
unsafe fn align_next_unsafe(addr: usize, align: usize) -> usize {
(addr + align - 1) & !(align - 1)
}
Currently, I've made this unsafe for the above reasons, but I'm not sure if that's best practice. Should I only define a function as unsafe if it has side effects? Or is this a valid time to require an unsafe block?
I'll preface this by saying this is a fairly opinion-heavy answer, and represents a point of view, rather than "the truth".
Consider this code taken from the Vec docs:
let x = vec![1, 2, 4];
let x_ptr = x.as_ptr();
unsafe {
for i in 0..x.len() {
assert_eq!(*x_ptr.add(i), 1 << i);
}
}
The function you're describing seems to have a similar safety profile to Vec::as_ptr. Vec::as_ptr is not unsafe, and does nothing particularly bad on its own; having an invalid *const T isn't bad until you dereference it. That's why dereferencing the raw pointer requires unsafe.
Similarly, I'd argue that align_next doesn't do anything particularly bad unless that value is then passed into some unsafe context. As with any question of unsafe, it's a tradeoff between safety/risk and ergonomics.
In Vec::as_ptr's case, the risk is relatively low; the stdlib has lots of eyes on it, and is well "battle-tested". Moreover, it is a single function with a single implementation.
If your align_next was a function on a trait, I'd be much more tempted to make it unsafe, since someone in the future could implement it badly, and you might have other code whose safety relies on a correct implementation of align_next.
However, in your case, I'd say the pattern is similar to Vec::as_ptr, and you should make sure that any functions that consume this value are marked unsafe if they can cause UB.
I'd also second Martin Gallagher's point about creating a Result returning variant and benchmarking (you could also try an Option<usize>-returning API to make use of null-pointer optimizations).

Why do immutable references to copy types in rust exist?

So I just started learning rust (first few chapters of "the book") and am obviously quite a noob. I finished the ownership-basics chapter (4) and wrote some test programs to make sure I understood everything. I seem to have the basics down but I asked myself why immutable references to copy-types are even possible. I will try to explain my thoughts with examples.
I thought that you maybe want to store a reference to a copy-type so you can check it's value later instead of having a copy of the old value but this can't be it since the underlying value can't be changed as long as it's been borrowed.
The most basic example of this would be this code:
let mut x = 10; // push i32
let x_ref = &x; // push immutable reference to x
// x = 100; change x which is disallowed since it's borrowed currently
println!("{}", x_ref); // do something with the reference since you want the current value of x
The only reason for this I can currently think of (with my current knowledge) is that they just exist so you can call generic methods which require references (like cmp) with them.
This code demonstrates this:
let x = 10; // push i32
// let ordering = 10.cmp(x); try to compare it but you can't since cmp wants a reference
let ordering = 10.cmp(&x) // this works since it's now a reference
So, is that the only reason you can create immutable references to copy-types?
Disclaimer:
I don't see Just continue reading the book as a valid answer. However I fully understand if you say something like Yes you need those for this and this use-case (optional example), it will be covered in chapter X. I hope you understand what I mean :)
EDIT:
Maybe worth mentioning, I'm a C# programmer and not new to programming itself.
EDIT 2:
I don't know if this is technically a duplicate of this question but I do not fully understand the question and the answer so I hope for a more simple answer understandable by a real noob.
An immutable reference to a Copy-type is still "an immutable reference". The code that gets passed the reference can't change the original value. It can make a (hopefully) trivial copy of that value, but it can still only ever change that copy after doing so.
That is, the original owner of the value is ensured that - while receivers of the reference may decide to make a copy and change that - the state of whatever is referenced can't ever change. If the receiver wants to change the value, it can feel free; nobody else is going to see it, though.
Immutable references to primitives are not different, and while being Copy everywhere, you are probably more inclined to what "an immutable reference" means semantically for primitive types. For instance
fn print_the_age(age: &i32) { ... }
That function could make a copy via *age and change it. But the caller will not see that change and it does not make much sense to do so in the first place.
Update due to comment: There is no advantage per se, at least as far as primitives are concerned (larger types may be costly to copy). It does boil down to the semantic relationship between the owner of the i32 and the receiver: "Here is a reference, it is guaranteed to not change while you have that reference, I - the owner - can't change or move or deallocate and there is no other thread else including myself that could possibly do that".
Consider where the reference is coming from: If you receive an &i32, wherever it is coming from can't change and can't deallocate. The `i32´ may be part of a larger type, which - due to handing out a reference - can't move, change or get de-allocated; the receiver is guaranteed of that. It's hard to say there is an advantage per se in here; it might be advantageous to communicate more detailed type (and lifetime!) relationships this way.
They're very useful, because they can be passed to generic functions that expect a reference:
fn map_vec<T, U>(v: &Vec<T>, f: impl Fn(&T) -> U) -> Vec<U> {...}
If immutable references of non-Copy types were forbidden, we would need two versions:
fn map_vec_own<T: !Copy, U>(v: &Vec<T>, f: impl Fn(&T) -> U) -> Vec<U> {...}
fn map_vec_copy<T: Copy, U>(v: &Vec<T>, f: impl Fn( T) -> U) -> Vec<U> {...}
Immutable references are, naturally, used to provide access to the referenced data. For instance, you could have loaded a dictionary and have multiple threads reading from it at the same time, each using their own immutable reference. Because the references are immutable those threads will not corrupt that common data.
Using only mutable references, you can't be sure of that so you need to make full copies. Copying data takes time and space, which are always limited. The primary question for performance tends to be if your data fits in CPU cache.
I'm guessing you were thinking of "copy" types as ones that fit in the same space as the reference itself, i.e. sizeof(type) <= sizeof(type*). Rust's Copy trait indicates data that could be safely copied, no matter the size. These are orthogonal concepts; for instance, a pointer might not be safely copied without adjusting a refernce count, or an array might be copyable but take gigabytes of memory. This is why Rc<T> has the Clone trait, not Copy.

Inefficient instance construction?

Here is a simple struct
pub struct Point {
x: uint,
y: uint
}
impl Point {
pub fn new() -> Point {
Point{x: 0u, y: 0u}
}
}
fn main() {
let p = box Point::new();
}
My understanding of how the constructor function works is as follows. The new() function creates an instance of Point in its local stack and returns it. Data from this instance is shallow copied into the heap memory created by box. The pointer to the heap memory is then assigned to the variable p.
Is my understanding correct? Does two separate memory regions get initialized to create one instance? This seems to be an inefficient way to initialize an instance compared to C++ where we get to directly write to the memory of the instance from the constructor.
From a relevant guide:
You may think that this gives us terrible performance: return a value and then immediately box it up ?! Isn't this pattern the worst of both worlds? Rust is smarter than that. There is no copy in this code. main allocates enough room for the box, passes a pointer to that memory into foo as x, and then foo writes the value straight into the Box.
This is important enough that it bears repeating: pointers are not for optimizing returning values from your code. Allow the caller to choose how they want to use your output.
While this talks about boxing the value, I believe the mechanism is general enough, and not specific to boxes.
Just to expand a bit on #Shepmaster's answer:
Rust (and LLVM) supports RVO, or return value optimization, where if a return value is used in a context like box, Rust is smart enough to generate code that uses some sort of out pointer to avoid the copy by writing the return value directly into its usage site. box is one of the major uses of RVO, but it can be used for other types and situations as well.

Resources