How to create a DST type? - rust

DST (Dynamically Sized Types) are a thing in Rust now. I have used them successfully, with a flexible last member which is known to the compiler (such as [u8]).
What I am looking to do, however, is to create a custom DST. Say, for example:
struct Known<S> {
dropit: fn (&mut S) -> (),
data: S,
}
struct Unknown {
dropit: fn (&mut ()) -> (),
data: (),
}
With an expected usage being Box<Known<S>> => Box<Unknown> => Box<Known<S>>, where the middleware need not know about concrete types.
Note: yes, I know about Any, and no I am not interested in using it.
I am open to suggestions in the layout of both Known and Unknown, however:
size_of::<Box<Known>>() = size_of::<Box<Unknown>>() = size_of::<Box<u32>>(); that is it should be a thin pointer.
dropping Box<Unknown> drops its content
cloning Box<Unknown> (assuming a clonable S), clones its content
ideally, fn dup(u: &Unknown) -> Box<Unknown> { box u.clone() } works
I have particular difficulties with (3) and (4), I could solve (3) with manually allocating memory (not using box, but directly calling malloc) but I would prefer providing an idiomatic experience to the user.
I could not find any documentation on how to inform box of the right size to allocate.

There are exactly two types of unsized objects at present: slices ([T]), where it adds a length member; and trait objects (Trait, Trait + Send, &c.), where it adds a vtable including a destructor which knows how large an object to free.
There is not currently any mechanism for declaring your own variety of unsized objects.

At this point, you should seek inspiration from Arc::new_uinint_slice and Arc::from_ptr.
We've no nice mechanism to make custom DSTs play nicely together though, making Arc<Known<T>> nasty.
We still always create Arc<dyn Trait> with CoerceUnsized because you cannot make trait objects form DSTs currently.

You could try using the vptr crate, which stores the vtable pointer with the data instead of with the pointer.

Related

Object safe generic without type erasure

In the trait Extendable below, I'd like to make app generic where it currently uses the concrete type i32.
At first blush, you'd think to use a generic type but doing so while keeping Extendable object safe isn't easy.
trait Isoextender {
type Input;
type Output;
fn forward(&self, v: Self::Input) -> Self::Output;
fn backward(&self, v: Self::Output) -> Self::Input;
}
trait Extendable {
type Item;
fn app(
&self,
v: &dyn Isoextender<Input = Self::Item, Output = i32>,
) -> Box<dyn Extendable<Item = i32>>;
}
There's a (really neat) type erasure trick (link) I can use with std::Any but then I'm losing type information.
This is one of those things that feels like it should be possible from my understanding of how Rust works but which simply might not be. I do see there's an issue with size. Clearly rust needs to know the size of the associated type Item, but I don't see how to solve with with references/pointers in a way that's both object safe and keeps the type information.
Is it the case that:
I am missing something and it's actually possible?
This is not possible today, but it may become possible in the future?
This is not likely to be possible ever?
Update:
So I suppose a part of the issue here is that I feel like the following is a generic function that has only 1 possible implementation. You should only have to compile this once.
fn pointer_identity<T>(v: &T) -> &T {
v
}
It feels like I should be able to put this function into a vtable, but still somehow express that it is generic. There are a whole class of easily identifiable functions that effectively act this way. Interactions between trait objects often act this way. It's all pointers to functions where the types don't matter except to be carried forward.
I found some document ion that may answer my question.
Currently, the compiler has two concepts surrounding Generic code, only one of which seems to be surfaced in the type system (link).
It sounds like this may be possible some day, but not currently expressible with Rust.
Monomorphization
The compiler stamps out a different copy of the code of a generic function for each concrete type needed.
Polymorphization
In addition to MIR optimizations, rustc attempts to determine when fewer copies of functions are necessary and avoid making those copies - known as "polymorphization".
As a result of polymorphization, items collected during monomorphization cannot be assumed to be monomorphic.
It is intended that polymorphization be extended to more advanced cases, such as where only the size/alignment of a generic parameter are required.

Is allowing library users to embed arbitrary data in your structures a correct usage of std::mem::transmute?

A library I'm working on stores various data structures in a graph-like manner.
I'd like to let users store metadata ("annotations") in nodes, so they can retrieve them later. Currently, they have to create their own data structure which mirrors the library's, which is very inconvenient.
I'm placing very little constraints on what an annotation can be, because I do not know what the users will want to store in the future.
The rest of this question is about my current attempt at solving this use case, but I'm open to completely different implementations as well.
User annotations are represented with a trait:
pub trait Annotation {
fn some_important_method(&self)
}
This trait contains a few methods (all on &self) which are important for the domain, but these are always trivial to implement for users. The real data of an annotation implementation cannot be retrieved this way.
I can store a list of annotations this way:
pub struct Node {
// ...
annotations: Vec<Box<dyn Annotation>>,
}
I'd like to let the user retrieve whatever implementation they previously added to a list, something like this:
impl Node {
fn annotations_with_type<T>(&self) -> Vec<&T>
where
T: Annotation,
{
// ??
}
}
I originally aimed to convert dyn Annotation to dyn Any, then use downcast_ref, however trait upcasting coercion is unsable.
Another solution would be to require each Annotation implementation to store its TypeId, compare it with annotations_with_type's type parameter's TypeId, and std::mem::transmute the resulting &dyn Annotation to &T… but the documentation of transmute is quite scary and I honestly don't know whether that's one of the allowed cases in which it is safe. I definitely would have done some kind of void * in C.
Of course it's also possible that there's a third (safe) way to go through this. I'm open to suggestions.
What you are describing is commonly solved by TypeMaps, allowing a type to be associated with some data.
If you are open to using a library, you might consider looking into using an existing implementation, such as https://crates.io/crates/typemap_rev, to store data. For example:
struct MyAnnotation;
impl TypeMapKey for MyAnnotation {
type Value = String;
}
let mut map = TypeMap::new();
map.insert::<MyAnnotation>("Some Annotation");
If you are curious. It underlying uses a HashMap<TypeId, Box<(dyn Any + Send + Sync)>> to store the data. To retrieve data, it uses a downcast_ref on the Any type which is stable. This could also be a pattern to implement it yourself if needed.
You don't have to worry whether this is valid - because it doesn't compile (playground):
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/main.rs:7:18
|
7 | _ = unsafe { std::mem::transmute::<&dyn Annotation, &i32>(&*v) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: source type: `&dyn Annotation` (128 bits)
= note: target type: `&i32` (64 bits)
The error message should be clear, I hope: &dyn Trait is a fat pointer, and has size 2*size_of::<usize>(). &T, on the other hand, is a thin pointer (as long as T: Sized), of size of only one usize, and you cannot transmute between types of different sizes.
You can work around that with transmute_copy(), but it will just make things worse: it will work, but it is unsound and is not guaranteed to work in any way. It may become UB in future Rust versions. This is because the only guaranteed thing (as of now) for &dyn Trait references is:
Pointers to unsized types are sized. The size and alignment is guaranteed to be at least equal to the size and alignment of a pointer.
Nothing guarantees the order of the fields. It can be (data_ptr, vtable_ptr) (as it is now, and thus transmute_copy() works) or (vtable_ptr, data_ptr). Nothing is even guaranteed about the contents. It can not contain a data pointer at all (though I doubt somebody will ever do something like that). transmute_copy() copies the data from the beginning, meaning that for the code to work the data pointer should be there and should be first (which it is). For the code to be sound this needs to be guaranteed (which is not).
So what can we do? Let's check how Any does its magic:
// SAFETY: caller guarantees that T is the correct type
unsafe { &*(self as *const dyn Any as *const T) }
So it uses as for the conversion. Does it work? Certainly. And that means std can do that, because std can do things that are not guaranteed and relying on how things work in practice. But we shouldn't. So, is it guaranteed?
I don't have a firm answer, but I'm pretty sure the answer is no. I have found no authoritative source that guarantees the behavior of casts from unsized to sized pointers.
Edit: #CAD97 pointed on Zulip that the reference promises that *[const|mut] T as *[const|mut V] where V: Sized will be a pointer-to-pointer case, and that can be read as a guarantee this will work.
But I still feel fine with relying on that. Because, unlike the transmute_copy(), people are doing it. In production. And there is no better way in stable. So the chance it will become undefined behavior is very low. It is much more likely to be defined.
Does a guaranteed way even exist? Well, yes and no. Yes, but only using the unstable pointer metadata API:
#![feature(ptr_metadata)]
let v: &dyn Annotation;
let v = v as *const dyn Annotation;
let v: *const T = v.to_raw_parts().0.cast::<T>();
let v: &T = unsafe { &*v };
In conclusion, if you can use nightly features, I would prefer the pointer metadata API just to be extra safe. But in case you can't, I think the cast approach is fine.
Last point, there may be a crate that already does that. Prefer that, if it exists.

An Rc equivalent that supports pointing to a component of the owned data

This answer asserts that an equivalent of C++ shared_ptr in rust is std::rc::Rc. This is true on the surface but there is an important operation missing: Specifically, shared_ptr can be used to point to an unrelated, unmanaged pointer, via its aliasing constructor (8), which is mostly used to point to a subpart of the original allocation. The shared_ptr still keeps the object alive, but the original type is completely erased. This can be very useful to forget generic parameters (and other tricks).
Obviously, in Rust, managing a completely unrelated pointer would be incredibly unsafe. Yet there is precedence of such an operation: Ref::map allows one to obtain a Ref to a component of the original Ref, while "forgetting" the type of the former.
Implementation-wise, it is clear that the current Rc<T> can not possibly implement this behaviour. When the refcount hits 0, it has to deallocate after all, and the layout must be exactly the same as when it allocated, thus it must use the same T. But that's just because it doesn't store the Layout that was originally used for the allocation.
So my question: Is there a library or other overlooked type that supports the allocation management of Rc, while also allowing the equivalent of Ref::map? I should mention that the type should also support unsize coercion.
struct Foo<T> {
zet: T,
bar: u16,
}
let foo: MagicRc<Foo<usize>> = MagicRc::new(Foo { zet: 6969, bar: 42 });
// Now I want to erase the generic parameter. Imagine a queue of
// these MagicRc<u16> that point to the `bar` field of different
// Foo<T> for arbitrary T. It is very important that Foo<..> does
// not appear in this type.
let foobar: MagicRc<u16> = MagicRc::map(&foo, |f| &f.bar);
// I can drop foo, but foobar should still be kept alive
drop(foo);
assert_eq!(*foobar, 42);
// unsizing to a trait object should work
let foodbg: MagicRc<dyn Debug> = foobar;
Addressing comments:
OwningRef and the playground link (DerefFn) do not erase the owner type.
Addressing Cerberus concern, such a type would store the Layout (a perfectly normal type) of the original allocation somewhere as part of the managed object and use it to free the value without having to have access to the original Type of the allocation.

Is my understanding of a Rust vector that supports Rc or Box wrapped types correct?

I'm not looking for code samples. I want to state my understanding of Box vs. Rc and have you tell me if my understanding is right or wrong.
Let's say I have some trait ChattyAnimal and a struct Cat that implements this trait, e.g.
pub trait ChattyAnimal {
fn make_sound(&self);
}
pub struct Cat {
pub name: String,
pub sound: String
}
impl ChattyAnimal for Cat {
fn make_sound(&self) {
println!("Meow!");
}
}
Now let's say I have other structs (Dog, Cow, Chicken, ...) that also implement the ChattyAnimal trait, and let's say I want to store all of these in the same vector.
So step 1 is I would have to use a Box type, because the Rust compiler cannot determine the size of everything that might implement this trait. And therefore, we must store these items on the heap – viola using a Box type, which is like a smarter pointer in C++. Anything wrapped with Box is automatically deleted by Rust when it goes out of scope.
// I can alias and use my Box type that wraps the trait like this:
pub type BoxyChattyAnimal = Box<dyn ChattyAnimal>;
// and then I can use my type alias, i.e.
pub struct Container {
animals: Vec<BoxyChattyAnimal>
}
Meanwhile, with Box, Rust's borrow checker requires changing when I pass or reassign the instance. But if I actually want to have multiple references to the same underlying instance, I have to use Rc. And so to have a vector of ChattyAnimal instances where each instance can have multiple references, I would need to do:
pub type RcChattyAnimal = Rc<dyn ChattyAnimal>;
pub struct Container {
animals: Vec<RcChattyAnimal>
}
One important take away from this is that if I want to have a vector of some trait type, I need to explicitly set that vector's type to a Box or Rc that wraps my trait. And so the Rust language designers force us to think about this in advance so that a Box or Rc cannot (at least not easily or accidentally) end up in the same vector.
This feels like a very and well thought design – helping prevent me from introducing bugs in my code. Is my understanding as stated above correct?
Yes, all this is correct.
There's a second reason for this design: it allows the compiler to verify that the operations you're performing on the vector elements are using memory in a safe way, relative to how they're stored.
For example, if you had a method on ChattyAnimal that mutates the animal (i.e. takes a &mut self argument), you could call that method on elements of a Vec<Box<dyn ChattyAnimal>> as long as you had a mutable reference to the vector; the Rust compiler would know that there could only be one reference to the ChattyAnimal in question (because the only reference is inside the Box, which is inside the Vec, and you have a mutable reference to the Vec so there can't be any other references to it). If you tried to write the same code with a Vec<Rc<dyn ChattyAnimal>>, the compiler would complain; it wouldn't be able to completely eliminate the possibility that your code might be mutating the animal at the same time as the code that called it was in the middle of trying to read the animal, which might lead to some inconsistencies in the calling code.
As a consequence, the compiler needs to know that all the elements of the Vec have their memory treated in the same way, so that it can check to make sure that a reference to some arbitrary element of the Vec is being used appropriately.
(There's a third reason, too, which is performance; because the compiler knows that this is a "vector of Boxes" or "vector of Rcs", it can generate code that assumes a particular storage mechanism. For example, if you have a vector of Rcs, and clone one of the elements, the machine code that the compiler generates will work simply by going to the memory address listed in the vector and adding 1 to the reference count stored there – there's no need for any extra levels of indirection. If the vector were allowed to mix different allocation schemes, the generated code would have to be a lot more complex, because it wouldn't be able to assume things like "there is a reference count", and would instead need to (at runtime) find the appropriate piece of code for dealing with the memory allocation scheme in use, and then run it; that would be much slower.)

Can I have a generic type bound that requires that type to be a trait?

I want to declare a generic function that accepts trait objects and only trait objects. I want this because I want to type erase these and pass them as TraitObject objects across an ABI boundary.
A function written like this will fail to compile...
fn f<T: ?Sized>(t: &T) -> std::raw::TraitObject {
unsafe { std::mem::transmute(t) }
}
... with the following error:
error[E0512]: transmute called with differently sized types: &T (pointer to T) to std::raw::TraitObject (128 bits)
I understand why the compiler complains of different sizes: &T can be a pointer to a concrete type (like &i32), which is a single pointer (64 bits), or a trait object (like &Display), which is going to be two pointers with the same layout as std::raw::TraitObject (128 bits).
This function should be fine as long as &T is a trait object, i.e. T is a trait. Is there a way to express this requirement?
It is impossible to prove a negative... but as far as I know the answer is no, sorry.
The representation of TraitObject is unstable, notably because in the future Rust might be able to tack on multiple virtual pointers to a single data pointer (representing &(Display + Eq) for example).
In the mean time, I usually use low-level memory tricks to read the virtual pointer and data pointer then build the TraitObject myself; guarded by a call to mem::size_of to ensure that &T has the right size for 2 *mut () because ?Sized means Sized or not (and not !Sized).
If you use transmute_copy instead, you can have the compiler ignore the size mismatches. This, however, means you have to handle such issues yourself, by e.g. checking the size yourself and perhaps panicking if there's a mismatch. Not doing so can result in undefined behaviour.
fn f<T: ?Sized>(t: &T) -> std::raw::TraitObject {
assert!(std::mem::size_of::<&T>() == std::mem::size_of::<std::raw::TraitObject>());
unsafe { std::mem::transmute_copy(&r) }
}
I believe the answer is "no":
There is no way to refer to all trait objects generically
(emphasis mine)

Resources