How to handle generic types with different concrete types in rust efficiently? - rust

The main goal is to implement a computation graph, that handles nodes with values and nodes with operators (think of simple arithmetic operators like add, subtract, multiply etc..). An operator node can take up to two value nodes, and "produces" a resulting value node.
Up to now, I'm using an enum to differentiate between a value and operator node:
pub enum Node<'a, T> where T : Copy + Clone {
Value(ValueNode<'a, T>),
Operator(OperatorNode)
}
pub struct ValueNode<'a, T> {
id: usize,
value_object : &'a dyn ValueType<T>
}
Update: Node::Value contains a struct, which itself contains a reference to a trait object ValueType, which is being implemented by a variety of concrete types.
But here comes the problem. During compililation, the generic types will be elided, and replaced by the actual types. The generic type T is also being propagated throughout the computation graph (obviously):
pub struct ComputationGraph<T> where T : Copy + Clone {
nodes: Vec<Node<T>>
}
This actually restricts the usage of ComputeGraph to one specific ValueType.
Furthermore the generic type T cannot be Sized, since a value node can be an opqaue type handled by a different backend not available through rust (think of C opqaue types made available through FFI).
One possible solution to this problem would be to introduce an additional enum type, that "mirrors" the concrete implementation of the valuetype trait mentioned above. this approach would be similiar, that enum dispatch does.
Is there anything I haven't thought of to use multiple implementations of ValueType?
update:
What i want to achive is following code:
pub struct Scalar<T> where T : Copy + Clone{
data : T
}
fn main() {
let cg = ComputeGraph::new();
// a new scalar type. doesn't have to be a tuple struct
let a = Scalar::new::<f32>(1.0);
let b_size = 32;
let b = Container::new::<opaque_type>(32);
let op = OperatorAdd::new();
// cg.insert_operator_node constructs four nodes: 3 value nodes
// and one operator nodes internally.
let result = cg.insert_operator_node::<Container>(&op, &a, &b);
}
update
ValueType<T> looks like this
pub trait ValueType<T> {
fn get_size(&self) -> usize;
fn get_value(&self) -> T;
}
update
To further increase the clarity of my question think of a small BLAS library backed by OpenCL. The memory management and device interaction shall be transparent to the user. A Matrix type allocates space on an OpenCL device with types as a primitive type buffer, and the appropriate call will return a pointer to that specific region of memory. Think of an operation that will scale the matrix by a scalar type, that is being represented by a primitive value. Both the (pointer to the) buffer and the scalar can be passed to a kernel function. Going back to the ComputeGraph, it seems obvious, that all BLAS operations form some type of computational graph, which can be reduced to a linear list of instructions ( think here of setting kernel arguments, allocating buffers, enqueue the kernel, storing the result, etc... ). Having said all that, a computation graph needs to be able to store value nodes with a variety of types.

As always the answer to the problem posed in the question is obvious. The graph expects one generic type (with trait bounds). Using an enum to "cluster" various subtypes was the solution, as already sketched out in the question.
An example to illustrate the solution. Consider following "subtypes":
struct Buffer<T> {
// fields
}
struct Scalar<T> {
// fields
}
struct Kernel {
// fields
}
The value containing types can be packed into an enum:
enum MemType {
Buffer(Buffer<f32>);
Scalar(Scalar<f32>);
// more enum variants ..
}
Now MemType and Kernel can now be packed in an enum as well
enum Node {
Value(MemType);
Operator(Kernel);
}
Node can now be used as the main type for nodes/vertices inside the graph. The solution might not be very elegant, but it does the trick for now. Maybe some code restructuring might be done in the future.

Related

rust, how to run entities' origina method when access it from wrapped Box, [duplicate]

How can I downcast a trait to a struct like in this C# example?
I have a base trait and several derived structs that must be pushed into a single vector of base traits.
I have to check if the each item of the vector is castable to a specific derived struct and, if yes, use it as a struct of that type.
This is my Rust code, I don't know how to implement the commented part.
trait Base {
fn base_method(&self);
}
struct Derived1;
impl Derived1 {
pub fn derived1_method(&self) {
println!("Derived1");
}
}
impl Base for Derived1 {
fn base_method(&self) {
println!("Base Derived1");
}
}
struct Derived2;
impl Derived2 {
pub fn derived2_method(&self) {
println!("Derived2");
}
}
impl Base for Derived2 {
fn base_method(&self) {
println!("Base Derived2");
}
}
fn main() {
let mut bases = Vec::<Box<dyn Base>>::new();
let der1 = Derived1{};
let der2 = Derived2{};
bases.push(Box::new(der1));
bases.push(Box::new(der2));
for b in bases {
b.base_method();
//if (b is Derived1)
// (b as Derived1).derived1_method();
//else if (b is Derived2)
// (b as Derived2).derived2_method();
}
}
Technically you can use as_any, as explained in this answer:
How to get a reference to a concrete type from a trait object?
However, type-checking and downcasting when looping over a vector of trait objects is considered a code smell. If you put a bunch of objects into a vector and then loop over that vector, presumably the objects in that vector are supposed to play a similar role.
So then you should refactor your code such that you can call the same method on your object regardless of the underlying concrete type.
From your code, it seems you're purely checking the type (and downcasting) so that you can call the appropriate method. What you really should do, then, is introduce yet another trait that provides a unified interface that you then can call from your loop, so that the loop doesn't need to know the concrete type at all.
EDIT: Allow me to add a concrete example that highlights this, but I'm going to use Python to show this, because in Python it's very easy to do what you are asking to do, so we can then focus on why it's not the best design choice:
class Dog:
def bark():
print("Woof woof")
class Cat:
def meow():
print("Meow meow")
list_of_animals = [Dog(), Dog(), Cat(), Dog()]
for animal in list_of_animals:
if isinstance(animal, Dog):
animal.bark()
elif isinstance(animal, Cat):
animal.meow()
Here Python's dynamic typing allows us to just slap all the objects into the same list, iterate over it, and then figure out the type at runtime so we can call the right method.
But really the whole point of well-designed object oriented code is to lift that burden from the caller to the object. This type of design is very inflexible, because as soon as you add another animal, you'll have to add another branch to your if block, and you better do that everywhere you had that branching.
The solution is of course to identify the common role that both bark and meow play, and abstract that behavior into an interface. In Python of course we don't need to formally declare such an interface, so we can just slap another method in:
class Dog:
...
def make_sound():
self.bark()
class Cat:
...
def make_sound():
self.meow()
...
for animal in list_of_animals:
animal.make_sound()
In your Rust code, you actually have two options, and that depends on the rest of your design. Either, as I suggested, adding another trait that expresses the common role that the objects play (why put them into a vector otherwise in the first place?) and implementing that for all your derived structs. Or, alternatively, expressing all the various derived structs as different variants of the same enum, and then add a method to the enum that handles the dispatch. The enum is of course more closed to outside extension than using the trait version. That's why the solution will depend on your needs for that.

Best way to model storing an iterator for a vector inside same struct

Context
I'm a beginner, and, at a high level, what I want to do is store some mutable state (to power a state machine) in a struct with the following constraints
Mutating the state doesn't require the entire struct to be mut (since I'd have to update a ton of callsites to be mut + I don't want every field to be mutable)
The state is represented as an enum, and can, in the right state, store a way to index into the correct position in a vec that's in the same struct
I came up with two different approaches/examples that seem quite complicated and I want to see if there's a way to simplify. Here's some playgrounds that minimally reproduce what I'm exploring
Using a Cell and a usize
#[derive(Clone, Copy)]
enum S {
A,
B(usize),
}
struct Test {
a: Vec<i32>,
b: Cell<S>,
}
where usage look like this
println!("{}", t.a[index]);
t.b.set(S::B(index + 1));
Using a RefCell and an iterator
enum S<'a> {
A,
B(Iter<'a, i32>),
}
struct Test<'a> {
a: Vec<i32>,
b: RefCell<S<'a>>,
}
where usage looks like this
println!("{}", iter.next().unwrap());
Questions
Is there a better way to model this in general vs. what I've tried?
I like approach #2 with the iterator better in theory since it feels cleaner, but I don't like how it introduces explicit lifetime annotations into the struct...in the actual codebase I'm working on, I'd need to update a ton of callsites to add the lifetime annotation and the tiny bit of convenience doesn't seem worth it. Is there some way to do #2 without introducing lifetimes?

Pass struct generic type to trait generic method [duplicate]

In this question, an issue arose that could be solved by changing an attempt at using a generic type parameter into an associated type. That prompted the question "Why is an associated type more appropriate here?", which made me want to know more.
The RFC that introduced associated types says:
This RFC clarifies trait matching by:
Treating all trait type parameters as input types, and
Providing associated types, which are output types.
The RFC uses a graph structure as a motivating example, and this is also used in the documentation, but I'll admit to not fully appreciating the benefits of the associated type version over the type-parameterized version. The primary thing is that the distance method doesn't need to care about the Edge type. This is nice but seems a bit shallow of a reason for having associated types at all.
I've found associated types to be pretty intuitive to use in practice, but I find myself struggling when deciding where and when I should use them in my own API.
When writing code, when should I choose an associated type over a generic type parameter, and when should I do the opposite?
This is now touched on in the second edition of The Rust Programming Language. However, let's dive in a bit in addition.
Let us start with a simpler example.
When is it appropriate to use a trait method?
There are multiple ways to provide late binding:
trait MyTrait {
fn hello_word(&self) -> String;
}
Or:
struct MyTrait<T> {
t: T,
hello_world: fn(&T) -> String,
}
impl<T> MyTrait<T> {
fn new(t: T, hello_world: fn(&T) -> String) -> MyTrait<T>;
fn hello_world(&self) -> String {
(self.hello_world)(self.t)
}
}
Disregarding any implementation/performance strategy, both excerpts above allow the user to specify in a dynamic manner how hello_world should behave.
The one difference (semantically) is that the trait implementation guarantees that for a given type T implementing the trait, hello_world will always have the same behavior whereas the struct implementation allows having a different behavior on a per instance basis.
Whether using a method is appropriate or not depends on the usecase!
When is it appropriate to use an associated type?
Similarly to the trait methods above, an associated type is a form of late binding (though it occurs at compilation), allowing the user of the trait to specify for a given instance which type to substitute. It is not the only way (thus the question):
trait MyTrait {
type Return;
fn hello_world(&self) -> Self::Return;
}
Or:
trait MyTrait<Return> {
fn hello_world(&Self) -> Return;
}
Are equivalent to the late binding of methods above:
the first one enforces that for a given Self there is a single Return associated
the second one, instead, allows implementing MyTrait for Self for multiple Return
Which form is more appropriate depends on whether it makes sense to enforce unicity or not. For example:
Deref uses an associated type because without unicity the compiler would go mad during inference
Add uses an associated type because its author thought that given the two arguments there would be a logical return type
As you can see, while Deref is an obvious usecase (technical constraint), the case of Add is less clear cut: maybe it would make sense for i32 + i32 to yield either i32 or Complex<i32> depending on the context? Nonetheless, the author exercised their judgment and decided that overloading the return type for additions was unnecessary.
My personal stance is that there is no right answer. Still, beyond the unicity argument, I would mention that associated types make using the trait easier as they decrease the number of parameters that have to be specified, so in case the benefits of the flexibility of using a regular trait parameter are not obvious, I suggest starting with an associated type.
Associated types are a grouping mechanism, so they should be used when it makes sense to group types together.
The Graph trait introduced in the documentation is an example of this. You want a Graph to be generic, but once you have a specific kind of Graph, you don't want the Node or Edge types to vary anymore. A particular Graph isn't going to want to vary those types within a single implementation, and in fact, wants them to always be the same. They're grouped together, or one might even say associated.
Associated types can be used to tell the compiler "these two types between these two implementations are the same". Here's a double dispatch example that compiles, and is almost similar to how the standard library relates iterator to sum types:
trait MySum {
type Item;
fn sum<I>(iter: I)
where
I: MyIter<Item = Self::Item>;
}
trait MyIter {
type Item;
fn next(&self) {}
fn sum<S>(self)
where
S: MySum<Item = Self::Item>;
}
struct MyU32;
impl MySum for MyU32 {
type Item = MyU32;
fn sum<I>(iter: I)
where
I: MyIter<Item = Self::Item>,
{
iter.next()
}
}
struct MyVec;
impl MyIter for MyVec {
type Item = MyU32;
fn sum<S>(self)
where
S: MySum<Item = Self::Item>,
{
S::sum::<Self>(self)
}
}
fn main() {}
Also, https://blog.thomasheartman.com/posts/on-generics-and-associated-types has some good information on this as well:
In short, use generics when you want to type A to be able to implement a trait any number of times for different type parameters, such as in the case of the From trait.
Use associated types if it makes sense for a type to only implement the trait once, such as with Iterator and Deref.

Rust Data Structure

I am currently learning Rust for fun. I have some experience in C / C++ and other experience in other programming languages that use more complex paradigms like generics.
Background
For my first project (after the tutorial), I wanted to create a N-Dimensional array (or Matrix) data structure to practice development in Rust.
Here is what I have so far for my Matrix struct and a basic fill and new initializations.
Forgive the absent bound checking and parameter testing
pub struct Matrix<'a, T> {
data: Vec<Option<T>>,
dimensions: &'a [usize],
}
impl<'a, T: Clone> Matrix<'a, T> {
pub fn fill(dimensions: &'a [usize], fill: T) -> Matrix<'a, T> {
let mut total = if dimensions.len() > 0 { 1 } else { 0 };
for dim in dimensions.iter() {
total *= dim;
}
Matrix {
data: vec![Some(fill); total],
dimensions: dimensions,
}
}
pub fn new(dimensions: &'a [usize]) -> Matrix<'a, T> {
...
Matrix {
data: vec![None; total],
dimensions: dimensions,
}
}
}
I wanted the ability to create an "empty" N-Dimensional array using the New fn. I thought using the Option enum would be the best way to accomplish this, as I can fill the N-Dimensional with None and it would allocate space for this T generic automatically.
So then it comes down to being able to set the entries for this. I found the IndexMut and Index traits that looked like I could do something like m[&[2, 3]] = 23. Since the logic is similar to each other here is the IndexMut impl for Matrix.
impl<'a, T> ops::IndexMut<&[usize]> for Matrix<'a, T> {
fn index_mut(&mut self, indices: &[usize]) -> &mut Self::Output {
match self.data[get_matrix_index(self.dimensions, indices)].as_mut() {
Some(x) => x,
None => {
NOT SURE WHAT TO DO HERE.
}
}
}
}
Ideally what would happen is that the value (if there) would be changed i.e.
let mut mat = Matrix::fill(&[4, 4], 0)
mat[&[2, 3]] = 23
This would set the value from 0 to 23 (which the above fn does via returning &mut x from Some(x)). But I also want None to set the value i.e.
let mut mat = Matrix::new(&[4, 4])
mat[&[2, 3]] = 23
Question
Finally, is there a way to make m[&[2,3]] = 23 possible with what the Vec struct requires to allocate the memory? If not what should I change and how can I still have an array with "empty" spots. Open to any suggestions as I am trying to learn. :)
Final Thoughts
Through my research, the Vec struct impls I see that the type T is typed and has to be Sized. This could be useful as to allocate the Vec with the appropriate size via vec![pointer of T that is null but of size of T; total]. But I am unsure of how to do this.
So there are a few ways to make this more similar to idiomatic rust, but first, let's look at why the none branch doesn't make sense.
So the Output type for IndexMut I'm going to assume is &mut T as you don't show the index definition but I feel safe in that assumption. The type &mut T means a mutable reference to an initialized T, unlike pointers in C/C++ where they can point to initialized or uninitialized memory. What this means is that you have to return an initialized T which the none branch cannot because there is no initialized value. This leads to the first of the more idiomatic ways.
Return an Option<T>
The easiest way would be to change Index::Output to be an Option<T>. This is better because the user can decide what to do if there was no value there before and is close to what you are actually storing. Then you can also remove the panic in your index method and allow the caller to choose what to do if there is no value. At this point, I think you can go a little further with gentrifying the structure in the next option.
Store a T directly
This method allows the caller to directly change what the type is that's stored rather than wrapping it in an option. This cleans up most of your index code nicely as you just have to access what's already stored. The main problem is now initialization, how do you represent uninitialized values? You were correct that option is the best way to do this1, but now the caller can decide to have this optional initialization capability by storing an Option themselves. So that means we can always store initialized Ts without losing functionality. This only really changes your new function to instead not fill with None values. My suggestion here is to make a bound T: Default for the new function2:
impl<'a, T: Default> Matrix<'a, T> {
pub fn new(dimensions: &'a [usize]) -> Matrix<'a, T> {
Matrix {
data: (0..total).into_iter().map(|_|Default::default()).collect(),
dimensions: dimensions,
}
}
}
This method is much more common in the rust world and allows the caller to choose whether to allow for uninitialized values. Option<T> also implements default for all T and returns None So the functionality is very similar to what you have currently.
Aditional Info
As you're new to rust there are a few comments that I can make about traps that I've fallen into before. To start your struct contains a reference to the dimensions with a lifetime. What this means is that your structs cannot exist longer than the dimension object that created them. This hasn't caused you a problem so far as all you've been passing is statically created dimensions, dimensions that are typed into the code and stored in static memory. This gives your object a lifetime of 'static, but this won't occur if you use dynamic dimensions.
How else can you store these dimensions so that your object always has a 'static lifetime (same as no lifetime)? Since you want an N-dimensional array stack allocation is out of the question since stack arrays must be deterministic at compile time (otherwise known as const in rust). This means you have to use the heap. This leaves two real options Box<[usize]> or Vec<usize>. Box is just another way of saying this is on the heap and adds Sized to values that are ?Sized. Vec is a little more self-explanatory and adds the ability to be resized at the cost of a little overhead. Either would allow your matrix object to always have a 'static lifetime.
1. The other way to represent this without Option<T>'s discriminate is MaybeUninit<T> which is unsafe territory. This allows you to have a chunk of initialized memory big enough to hold a T and then assume it's initialized unsafely. This can cause a lot of problems and is usually not worth it as Option is already heavily optimized in that if it stores a type with a pointer it uses compiler magic to store the discriminate in whether or not that value is a null pointer.
2. The reason this section doesn't just use vec![Default::default(); total] is that this requires T: Clone as the way this macro works the first part is called once and cloned until there are enough values. This is an extra requirement that we don't need to have so the interface is smoother without it.

What are the differences between the multiple ways to create zero-sized structs?

I found four different ways to create a struct with no data:
struct A{} // empty struct / empty braced struct
struct B(); // empty tuple struct
struct C(()); // unit-valued tuple struct
struct D; // unit struct
(I'm leaving arbitrarily nested tuples that contain only ()s and single-variant enum declarations out of the question, as I understand why those shouldn't be used).
What are the differences between these four declarations? Would I use them for specific purposes, or are they interchangeable?
The book and the reference were surprisingly unhelpful. I did find this accepted RFC (clarified_adt_kinds) which goes into the differences a bit, namely that the unit struct also declares a constant value D and that the tuple structs also declare constructors B() and C(_: ()). However it doesn't offer a design guideline on why to use which.
My guess would be that when I export them with pub, there are differences in which kinds can actually be constructed outside of my module, but I found no conclusive documentation about that.
There are only two functional differences between these four definitions (and a fifth possibility I'll mention in a minute):
Syntax (the most obvious). mcarton's answer goes into more detail.
When the struct is marked pub, whether its constructor (also called struct literal syntax) is usable outside the module it's defined in.
The only one of your examples that is not directly constructible from outside the current module is C. If you try to do this, you will get an error:
mod stuff {
pub struct C(());
}
let _c = stuff::C(()); // error[E0603]: tuple struct `C` is private
This happens because the field is not marked pub; if you declare C as pub struct C(pub ()), the error goes away.
There's another possibility you didn't mention that gives a marginally more descriptive error message: a normal struct, with a zero-sized non-pub member.
mod stuff {
pub struct E {
_dummy: (),
}
}
let _e = stuff::E { _dummy: () }; // error[E0451]: field `_dummy` of struct `main::stuff::E` is private
(Again, you can make the _dummy field available outside of the module by declaring it with pub.)
Since E's constructor is only usable inside the stuff module, stuff has exclusive control over when and how values of E are created. Many structs in the standard library take advantage of this, like Box (to take an obvious example). Zero-sized types work in exactly the same way; in fact, from outside the module it's defined in, the only way you would know that an opaque type is zero-sized is by calling mem::size_of.
See also
What is an idiomatic way to create a zero-sized struct that can't be instantiated outside its crate?
Why define a struct with single private field of unit type?
struct D; // unit struct
This is the usual way for people to write a zero-sized struct.
struct A{} // empty struct / empty braced struct
struct B(); // empty tuple struct
These are just special cases of basic struct and tuple struct which happen to have no parameters. RFC 1506 explains the rational to allow those (they didn't used to):
Permit tuple structs and tuple variants with 0 fields. This restriction is artificial and can be lifted trivially. Macro writers dealing with tuple structs/variants will be happy to get rid of this one special case.
As such, they could easily be generated by macros, but people will rarely write those on their own.
struct C(()); // unit-valued tuple struct
This is another special case of tuple struct. In Rust, () is a type just like any other type, so struct C(()); isn't much different from struct E(u32);. While the type itself isn't very useful, forbidding it would make yet another special case that would need to be handled in macros or generics (struct F<T>(T) can of course be instantiated as F<()>).
Note that there are many other ways to have empty types in Rust. Eg. it is possible to have a function return Result<(), !> to indicate that it doesn't produce a value, and cannot fail. While you might think that returning () in that case would be better, you might have to do that if you implement a trait that dictates you to return Result<T, E> but lets you choose T = () and E = !.

Resources