Object safe generic without type erasure

Object safe generic without type erasure - rust

In the trait Extendable below, I'd like to make app generic where it currently uses the concrete type i32.
At first blush, you'd think to use a generic type but doing so while keeping Extendable object safe isn't easy.
trait Isoextender {
type Input;
type Output;
fn forward(&self, v: Self::Input) -> Self::Output;
fn backward(&self, v: Self::Output) -> Self::Input;
}
trait Extendable {
type Item;
fn app(
&self,
v: &dyn Isoextender<Input = Self::Item, Output = i32>,
) -> Box<dyn Extendable<Item = i32>>;
}
There's a (really neat) type erasure trick (link) I can use with std::Any but then I'm losing type information.
This is one of those things that feels like it should be possible from my understanding of how Rust works but which simply might not be. I do see there's an issue with size. Clearly rust needs to know the size of the associated type Item, but I don't see how to solve with with references/pointers in a way that's both object safe and keeps the type information.
Is it the case that:
I am missing something and it's actually possible?
This is not possible today, but it may become possible in the future?
This is not likely to be possible ever?
Update:
So I suppose a part of the issue here is that I feel like the following is a generic function that has only 1 possible implementation. You should only have to compile this once.
fn pointer_identity<T>(v: &T) -> &T {
v
}
It feels like I should be able to put this function into a vtable, but still somehow express that it is generic. There are a whole class of easily identifiable functions that effectively act this way. Interactions between trait objects often act this way. It's all pointers to functions where the types don't matter except to be carried forward.

I found some document ion that may answer my question.
Currently, the compiler has two concepts surrounding Generic code, only one of which seems to be surfaced in the type system (link).
It sounds like this may be possible some day, but not currently expressible with Rust.
Monomorphization
The compiler stamps out a different copy of the code of a generic function for each concrete type needed.
Polymorphization
In addition to MIR optimizations, rustc attempts to determine when fewer copies of functions are necessary and avoid making those copies - known as "polymorphization".
As a result of polymorphization, items collected during monomorphization cannot be assumed to be monomorphic.
It is intended that polymorphization be extended to more advanced cases, such as where only the size/alignment of a generic parameter are required.

Related

Is allowing library users to embed arbitrary data in your structures a correct usage of std::mem::transmute?

A library I'm working on stores various data structures in a graph-like manner.
I'd like to let users store metadata ("annotations") in nodes, so they can retrieve them later. Currently, they have to create their own data structure which mirrors the library's, which is very inconvenient.
I'm placing very little constraints on what an annotation can be, because I do not know what the users will want to store in the future.
The rest of this question is about my current attempt at solving this use case, but I'm open to completely different implementations as well.
User annotations are represented with a trait:
pub trait Annotation {
fn some_important_method(&self)
}
This trait contains a few methods (all on &self) which are important for the domain, but these are always trivial to implement for users. The real data of an annotation implementation cannot be retrieved this way.
I can store a list of annotations this way:
pub struct Node {
// ...
annotations: Vec<Box<dyn Annotation>>,
}
I'd like to let the user retrieve whatever implementation they previously added to a list, something like this:
impl Node {
fn annotations_with_type<T>(&self) -> Vec<&T>
where
T: Annotation,
{
// ??
}
}
I originally aimed to convert dyn Annotation to dyn Any, then use downcast_ref, however trait upcasting coercion is unsable.
Another solution would be to require each Annotation implementation to store its TypeId, compare it with annotations_with_type's type parameter's TypeId, and std::mem::transmute the resulting &dyn Annotation to &T… but the documentation of transmute is quite scary and I honestly don't know whether that's one of the allowed cases in which it is safe. I definitely would have done some kind of void * in C.
Of course it's also possible that there's a third (safe) way to go through this. I'm open to suggestions.

What you are describing is commonly solved by TypeMaps, allowing a type to be associated with some data.
If you are open to using a library, you might consider looking into using an existing implementation, such as https://crates.io/crates/typemap_rev, to store data. For example:
struct MyAnnotation;
impl TypeMapKey for MyAnnotation {
type Value = String;
}
let mut map = TypeMap::new();
map.insert::<MyAnnotation>("Some Annotation");
If you are curious. It underlying uses a HashMap<TypeId, Box<(dyn Any + Send + Sync)>> to store the data. To retrieve data, it uses a downcast_ref on the Any type which is stable. This could also be a pattern to implement it yourself if needed.

You don't have to worry whether this is valid - because it doesn't compile (playground):
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/main.rs:7:18
|
7 | _ = unsafe { std::mem::transmute::<&dyn Annotation, &i32>(&*v) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: source type: `&dyn Annotation` (128 bits)
= note: target type: `&i32` (64 bits)
The error message should be clear, I hope: &dyn Trait is a fat pointer, and has size 2*size_of::<usize>(). &T, on the other hand, is a thin pointer (as long as T: Sized), of size of only one usize, and you cannot transmute between types of different sizes.
You can work around that with transmute_copy(), but it will just make things worse: it will work, but it is unsound and is not guaranteed to work in any way. It may become UB in future Rust versions. This is because the only guaranteed thing (as of now) for &dyn Trait references is:
Pointers to unsized types are sized. The size and alignment is guaranteed to be at least equal to the size and alignment of a pointer.
Nothing guarantees the order of the fields. It can be (data_ptr, vtable_ptr) (as it is now, and thus transmute_copy() works) or (vtable_ptr, data_ptr). Nothing is even guaranteed about the contents. It can not contain a data pointer at all (though I doubt somebody will ever do something like that). transmute_copy() copies the data from the beginning, meaning that for the code to work the data pointer should be there and should be first (which it is). For the code to be sound this needs to be guaranteed (which is not).
So what can we do? Let's check how Any does its magic:
// SAFETY: caller guarantees that T is the correct type
unsafe { &*(self as *const dyn Any as *const T) }
So it uses as for the conversion. Does it work? Certainly. And that means std can do that, because std can do things that are not guaranteed and relying on how things work in practice. But we shouldn't. So, is it guaranteed?
I don't have a firm answer, but I'm pretty sure the answer is no. I have found no authoritative source that guarantees the behavior of casts from unsized to sized pointers.
Edit: #CAD97 pointed on Zulip that the reference promises that *[const|mut] T as *[const|mut V] where V: Sized will be a pointer-to-pointer case, and that can be read as a guarantee this will work.
But I still feel fine with relying on that. Because, unlike the transmute_copy(), people are doing it. In production. And there is no better way in stable. So the chance it will become undefined behavior is very low. It is much more likely to be defined.
Does a guaranteed way even exist? Well, yes and no. Yes, but only using the unstable pointer metadata API:
#![feature(ptr_metadata)]
let v: &dyn Annotation;
let v = v as *const dyn Annotation;
let v: *const T = v.to_raw_parts().0.cast::<T>();
let v: &T = unsafe { &*v };
In conclusion, if you can use nightly features, I would prefer the pointer metadata API just to be extra safe. But in case you can't, I think the cast approach is fine.
Last point, there may be a crate that already does that. Prefer that, if it exists.

In Rust, how to define a generic function for converting Vec<T> to Vec<U>

I need something like:
fn my_convert<T, U>(v: &Vec<U>)->Vec<T>{
v.iter().map(|&t|t).collect()
}
Of course, I suppose that only vector of builtin numeric types could be passed to the function.
The compiler tells me that I need to give the trait FromIterator for Vec
the trait std::iter::FromIterator<U> is not implemented for `std::vec::Vec
then, I added
impl<T, U> std::iter::FromIterator<U> for Vec<T>{
fn from_iter<I>(iter:I)->Self where I: IntoIterator<Item=U>{
Vec::<T>::new() // just for test
}
}
conflicting implementations of trait std::iter::FromIterator<_> for type std::vec::Vec<_>:
note: conflicting implementation in crate alloc:
- impl std::iter::FromIterator for std::vec::Vec;rustc(E0119)
As we all known, in C++, this kind of conversion is straight forward and intutive. I am a beginner of Rust. I don't know how to define a function like above. I even don't known if it is could be achieved in Rust because of the complexity of the trait for generic type.
Thanks!

The Rust From trait is your general purpose "how to get from A to B". If there's a reasonable way to go from U to T, then T: From<U> will be implemented. Consider
fn my_convert<T, U>(v: Vec<U>) -> Vec<T>
where T: From<U> {
v.into_iter().map(T::from).collect()
}
Note that I do take ownership of the v vector. the From::from function consumes its argument, so we should also consume the vector. You could write a completely general version of this function that works with arbitrary iterators, and then it'll be easier to go back and forth between iterators which own their data and those that only borrow.
fn my_convert1<T, U, I>(v: I) -> Vec<T>
where T: From<U>,
I: Iterator<Item=U> {
v.map(T::from).collect()
}
But at that point, you'd be better off just writing v.map(T::from).collect() directly, and any reasonably proficient Rust programmer will instantly know what you're doing.
One point to note here, since you've mentioned that you're a C++ programmer. In C++, template functions can simply be written as-is, and then, when the caller instantiates a version of it, the compiler checks whether it's okay or not. So your proposed function in the OP works fine, since the compiler only checks it when it's called. Rust (and indeed most languages with generics) isn't like that.
In Rust, on the other hand, when you write the my_convert function, Rust checks at that time that all of the types line up. You can't write a function that only works some of the time. If you say my_convert works for all T and U, then it had better work for all of them, and Rust's type checker and borrow checker are going to verify that immediately. If not, then you'd better do as I did above and restrict the type signature. If you've used C++20 at all, you can think of traits as being somewhat similar to the "concepts" feature in C++20.

How can I simplify multiple uses of BigInt::from()?

I wrote a program where I manipulated a lot of BigInt and BigUint values and perform some arithmetic operations.
I produced code where I frequently used BigInt::from(Xu8) because it is not possible to directly add numbers from different types (if I understand correctly).
I want to reduce the number of BigInt::from in my code. I thought about a function to "wrap" this, but I would need a function for each type I want to convert into BigInt/BigUint:
fn short_name(n: X) -> BigInt {
return BigInt::from(n)
}
Where X will be each type I want to convert.
I couldn't find any solution that is not in contradiction with the static typing philosophy of Rust.
I feel that I am missing something about traits, but I am not very comfortable with them, and I did not find a solution using them.
Am I trying to do something impossible in Rust? Am I missing an obvious solution?

To answer this part:
I produced code where I frequently used BigInt::from(Xu8) because it is not possible to directly add numbers from different types (if I understand correctly).
On the contrary, if you look at BigInt's documentation you'll see many impl Add:
impl<'a> Add<BigInt> for &'a u64
impl Add<u8> for BigInt
and so on. The first allows calling a_ref_to_u64 + a_bigint, the second a_bigint + an_u8 (and both set OutputType to be BigInt). You don't need to convert these types to BigInt before adding them! And if you want your method to handle any such type you just need an Add bound similar to the From bound in Frxstrem's answer. Of course if you want many such operations, From may end up more readable.

The From<T> trait (and the complementary Into<T> trait) is what is typically used to convert between types in Rust. In fact, the BigInt::from method comes from the From trait.
You can modify your short_name function into a generic function with a where clause to accept all types that BigInt can be converted from:
fn short_name<T>(n: T) -> BigInt // function with generic type T
where
BigInt: From<T>, // where BigInt implements the From<T> trait
{
BigInt::from(n)
}

Differences generic trait-bounded method vs 'direct' trait method

I have this code:
fn main() {
let p = Person;
let r = &p as &dyn Eatable;
Consumer::consume(r);
// Compile error
Consumer::consume_generic(r);
}
trait Eatable {}
struct Person;
impl Eatable for Person {}
struct Consumer;
impl Consumer {
fn consume(eatable: &dyn Eatable) {}
fn consume_generic<T: Eatable>(eatable: &T) {}
}
Error:
the size for values of type dyn Eatable cannot be known at
compilation time
I think it is strange. I have a method that literally takes a dyn Eatable and compiles fine, so that method knows somehow the size of Eatable. The generic method (consume_generic) will properly compile down for every used type for performance and the consume method will not.
So a few questions arise: why the compiler error? Are there things inside the body of the methods in which I can do something which I can not do in the other method? When should I prefer the one over the other?
Sidenote: I asked this question for the language Swift as well: Differences generic protocol type parameter vs direct protocol type. In Swift I get the same compile error but the underlying error is different: protocols/traits do not conform to themselves (because Swift protocols can holds initializers, static things etc. which makes it harder to generically reference them). I also tried it in Java, I believe the generic type is erased and it makes absolutely no difference.

The problem is not with the functions themselves, but with the trait bounds on types.
Every generic types in Rust has an implicit Sized bound: since this is correct in the majority of cases, it was decided not to force the developer to write this out every time. But, if you are using this type only behind some kind of reference, as you do here, you may want to lift this restriction by specifying T: ?Sized. If you add this, your code will compile fine:
impl Consumer {
fn consume(eatable: &dyn Eatable) {}
fn consume_generic<T: Eatable + ?Sized>(eatable: &T) {}
}
Playground as a proof
As for the other questions, the main difference is in static vs dynamic dispatch.
When you use the generic function (or the semantically equivalent impl Trait syntax), the function calls are dispatched statically. That is, for every type of argument you pass to the function, compiler generates the definition independently of others. This will likely result in more optimized code in most cases, but the drawbacks are possibly larger binary size and some limitations in API (e.g. you can't easily create a heterogeneous collection this way).
When you use dyn Trait syntax, you opt in for dynamic dispatch. The necessary data will be stored into the table attached to trait object, and the correct implementation for every trait method will be chosen at runtime. The consumer, however, needs to be compiled only once. This is usually slower, both due to the indirection and to the fact that individual optimizations are impossible, but more flexible.
As for the recommendations (note that this is an opinion, not the fact) - I'd say it's better to stick to generics whenever possible and only change it to trait objects if the goal is impossible to achieve otherwise.

How to idiomatically require a trait to retrieve a slice

I'm working on a library that operates on [T] slices. I would like the library user to specify a type that can retrieve a slice in a mutable or immutable way.
My current lib defines 2 traits
pub trait SliceWrapper<T> {
fn slice(&self) -> &[T];
}
pub trait SliceWrapperMut<T> {
fn slice_mut (&mut self) -> &mut [T];
}
But the names slice and slice_mut seem arbitrary and not somewhere in the core Rust libs.
Is there a trait I should be requiring instead, like Into::<&mut [T]> ?
I'm afraid that Into consumes the type rather than just referencing it, so I can't simply demand the caller type implements core::convert::Into, can I?

The simplest way to answer this is to look at the documentation for existing types that more or less do what you want. This sounds like something Vec would do, so why not look at the documentation for Vec?
If you search through it looking for -> &, you'll find numerous methods that return borrowed slices:
Vec::as_slice
Borrow::borrow
Index::<Range<usize>>::index (plus for RangeTo<usize>, RangeFrom<usize>, RangeFull, RangeInclusive<usize>, and RangeToInclusive<usize>)
Deref::deref
AsRef::<[T]>::as_ref
Which of these should you implement? I don't know; which ones make sense? Read the documentation on each method (and the associated trait) and see if what it describes is what you want to permit. You say "that can retrieve a slice", but you don't really explain what that means. A big part of traits is not just abstracting out common interfaces, but giving those interfaces meaning beyond what the code strictly allows.
If the methods listed aren't right; if none of them quite convey the correct semantics, then make a new trait. Don't feel compelled to implement a trait just because you technically can.
As for Into, again, read the documentation:
A conversion that consumes self, which may or may not be expensive.
Emphasis mine. Implementing Into in this context makes no sense: if you consume the value, you can't borrow from it. Oh, also:
Library authors should not directly implement this trait, but should prefer implementing the From trait, which offers greater flexibility and provides an equivalent Into implementation for free, thanks to a blanket implementation in the standard library.
So yeah, I wouldn't use Into for this. Or From.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string