Lets say I have a trait ShowStrings with a method show_strings. Calling show_strings will deeply find Strings and print them out (with println!()).
The trait has a accompanying derive macro for it which traces all the fields of a structure and generates the recursive calls.
So for example the structure struct Person { first_name: String, last_name: String, children: Vec<Person>, height: f32 }, Person::show_strings will print out their first and last name and then recurse to do the same for their children.
The trait ShowStrings implementation on String is to print out self.
The question I have is what to do about the height: f32 field. f32 is not a string neither does it have any fields that can be strings. There are two options for how to handle ShowStrings for f32:
Either the proc derive macro generates a ShowStrings::show_strings(self.height) call and f32 gets a blank implementation: impl ShowStrings for f32 { fn show_strings(&self) {} }
Or the proc macro sees the f32 type reference and decides to ignore that field.
The first method is what the gc crate does. This can be automated with a simple macro so is not too manual. It would mean ShowStrings would be implemented on something like () which doesn't make sense and could confuse users of the API. Also I would assume Rust would compile away the call ShowStrings::show_string(field_that_has_empty_implementation) so it would not call/jump.
The second method ends up with less generated code. But type references are weak as they can be shadowed, the type name may have been shadowed.
I think I will go for the first method? Are there any performance, binary size or other concerns around having lots of implementations that are empty? impl MyTrait for *inbuilt_type* { fn func() {} }
Related
How can I downcast a trait to a struct like in this C# example?
I have a base trait and several derived structs that must be pushed into a single vector of base traits.
I have to check if the each item of the vector is castable to a specific derived struct and, if yes, use it as a struct of that type.
This is my Rust code, I don't know how to implement the commented part.
trait Base {
fn base_method(&self);
}
struct Derived1;
impl Derived1 {
pub fn derived1_method(&self) {
println!("Derived1");
}
}
impl Base for Derived1 {
fn base_method(&self) {
println!("Base Derived1");
}
}
struct Derived2;
impl Derived2 {
pub fn derived2_method(&self) {
println!("Derived2");
}
}
impl Base for Derived2 {
fn base_method(&self) {
println!("Base Derived2");
}
}
fn main() {
let mut bases = Vec::<Box<dyn Base>>::new();
let der1 = Derived1{};
let der2 = Derived2{};
bases.push(Box::new(der1));
bases.push(Box::new(der2));
for b in bases {
b.base_method();
//if (b is Derived1)
// (b as Derived1).derived1_method();
//else if (b is Derived2)
// (b as Derived2).derived2_method();
}
}
Technically you can use as_any, as explained in this answer:
How to get a reference to a concrete type from a trait object?
However, type-checking and downcasting when looping over a vector of trait objects is considered a code smell. If you put a bunch of objects into a vector and then loop over that vector, presumably the objects in that vector are supposed to play a similar role.
So then you should refactor your code such that you can call the same method on your object regardless of the underlying concrete type.
From your code, it seems you're purely checking the type (and downcasting) so that you can call the appropriate method. What you really should do, then, is introduce yet another trait that provides a unified interface that you then can call from your loop, so that the loop doesn't need to know the concrete type at all.
EDIT: Allow me to add a concrete example that highlights this, but I'm going to use Python to show this, because in Python it's very easy to do what you are asking to do, so we can then focus on why it's not the best design choice:
class Dog:
def bark():
print("Woof woof")
class Cat:
def meow():
print("Meow meow")
list_of_animals = [Dog(), Dog(), Cat(), Dog()]
for animal in list_of_animals:
if isinstance(animal, Dog):
animal.bark()
elif isinstance(animal, Cat):
animal.meow()
Here Python's dynamic typing allows us to just slap all the objects into the same list, iterate over it, and then figure out the type at runtime so we can call the right method.
But really the whole point of well-designed object oriented code is to lift that burden from the caller to the object. This type of design is very inflexible, because as soon as you add another animal, you'll have to add another branch to your if block, and you better do that everywhere you had that branching.
The solution is of course to identify the common role that both bark and meow play, and abstract that behavior into an interface. In Python of course we don't need to formally declare such an interface, so we can just slap another method in:
class Dog:
...
def make_sound():
self.bark()
class Cat:
...
def make_sound():
self.meow()
...
for animal in list_of_animals:
animal.make_sound()
In your Rust code, you actually have two options, and that depends on the rest of your design. Either, as I suggested, adding another trait that expresses the common role that the objects play (why put them into a vector otherwise in the first place?) and implementing that for all your derived structs. Or, alternatively, expressing all the various derived structs as different variants of the same enum, and then add a method to the enum that handles the dispatch. The enum is of course more closed to outside extension than using the trait version. That's why the solution will depend on your needs for that.
I am currently learning Rust for fun. I have some experience in C / C++ and other experience in other programming languages that use more complex paradigms like generics.
Background
For my first project (after the tutorial), I wanted to create a N-Dimensional array (or Matrix) data structure to practice development in Rust.
Here is what I have so far for my Matrix struct and a basic fill and new initializations.
Forgive the absent bound checking and parameter testing
pub struct Matrix<'a, T> {
data: Vec<Option<T>>,
dimensions: &'a [usize],
}
impl<'a, T: Clone> Matrix<'a, T> {
pub fn fill(dimensions: &'a [usize], fill: T) -> Matrix<'a, T> {
let mut total = if dimensions.len() > 0 { 1 } else { 0 };
for dim in dimensions.iter() {
total *= dim;
}
Matrix {
data: vec![Some(fill); total],
dimensions: dimensions,
}
}
pub fn new(dimensions: &'a [usize]) -> Matrix<'a, T> {
...
Matrix {
data: vec![None; total],
dimensions: dimensions,
}
}
}
I wanted the ability to create an "empty" N-Dimensional array using the New fn. I thought using the Option enum would be the best way to accomplish this, as I can fill the N-Dimensional with None and it would allocate space for this T generic automatically.
So then it comes down to being able to set the entries for this. I found the IndexMut and Index traits that looked like I could do something like m[&[2, 3]] = 23. Since the logic is similar to each other here is the IndexMut impl for Matrix.
impl<'a, T> ops::IndexMut<&[usize]> for Matrix<'a, T> {
fn index_mut(&mut self, indices: &[usize]) -> &mut Self::Output {
match self.data[get_matrix_index(self.dimensions, indices)].as_mut() {
Some(x) => x,
None => {
NOT SURE WHAT TO DO HERE.
}
}
}
}
Ideally what would happen is that the value (if there) would be changed i.e.
let mut mat = Matrix::fill(&[4, 4], 0)
mat[&[2, 3]] = 23
This would set the value from 0 to 23 (which the above fn does via returning &mut x from Some(x)). But I also want None to set the value i.e.
let mut mat = Matrix::new(&[4, 4])
mat[&[2, 3]] = 23
Question
Finally, is there a way to make m[&[2,3]] = 23 possible with what the Vec struct requires to allocate the memory? If not what should I change and how can I still have an array with "empty" spots. Open to any suggestions as I am trying to learn. :)
Final Thoughts
Through my research, the Vec struct impls I see that the type T is typed and has to be Sized. This could be useful as to allocate the Vec with the appropriate size via vec![pointer of T that is null but of size of T; total]. But I am unsure of how to do this.
So there are a few ways to make this more similar to idiomatic rust, but first, let's look at why the none branch doesn't make sense.
So the Output type for IndexMut I'm going to assume is &mut T as you don't show the index definition but I feel safe in that assumption. The type &mut T means a mutable reference to an initialized T, unlike pointers in C/C++ where they can point to initialized or uninitialized memory. What this means is that you have to return an initialized T which the none branch cannot because there is no initialized value. This leads to the first of the more idiomatic ways.
Return an Option<T>
The easiest way would be to change Index::Output to be an Option<T>. This is better because the user can decide what to do if there was no value there before and is close to what you are actually storing. Then you can also remove the panic in your index method and allow the caller to choose what to do if there is no value. At this point, I think you can go a little further with gentrifying the structure in the next option.
Store a T directly
This method allows the caller to directly change what the type is that's stored rather than wrapping it in an option. This cleans up most of your index code nicely as you just have to access what's already stored. The main problem is now initialization, how do you represent uninitialized values? You were correct that option is the best way to do this1, but now the caller can decide to have this optional initialization capability by storing an Option themselves. So that means we can always store initialized Ts without losing functionality. This only really changes your new function to instead not fill with None values. My suggestion here is to make a bound T: Default for the new function2:
impl<'a, T: Default> Matrix<'a, T> {
pub fn new(dimensions: &'a [usize]) -> Matrix<'a, T> {
Matrix {
data: (0..total).into_iter().map(|_|Default::default()).collect(),
dimensions: dimensions,
}
}
}
This method is much more common in the rust world and allows the caller to choose whether to allow for uninitialized values. Option<T> also implements default for all T and returns None So the functionality is very similar to what you have currently.
Aditional Info
As you're new to rust there are a few comments that I can make about traps that I've fallen into before. To start your struct contains a reference to the dimensions with a lifetime. What this means is that your structs cannot exist longer than the dimension object that created them. This hasn't caused you a problem so far as all you've been passing is statically created dimensions, dimensions that are typed into the code and stored in static memory. This gives your object a lifetime of 'static, but this won't occur if you use dynamic dimensions.
How else can you store these dimensions so that your object always has a 'static lifetime (same as no lifetime)? Since you want an N-dimensional array stack allocation is out of the question since stack arrays must be deterministic at compile time (otherwise known as const in rust). This means you have to use the heap. This leaves two real options Box<[usize]> or Vec<usize>. Box is just another way of saying this is on the heap and adds Sized to values that are ?Sized. Vec is a little more self-explanatory and adds the ability to be resized at the cost of a little overhead. Either would allow your matrix object to always have a 'static lifetime.
1. The other way to represent this without Option<T>'s discriminate is MaybeUninit<T> which is unsafe territory. This allows you to have a chunk of initialized memory big enough to hold a T and then assume it's initialized unsafely. This can cause a lot of problems and is usually not worth it as Option is already heavily optimized in that if it stores a type with a pointer it uses compiler magic to store the discriminate in whether or not that value is a null pointer.
2. The reason this section doesn't just use vec![Default::default(); total] is that this requires T: Clone as the way this macro works the first part is called once and cloned until there are enough values. This is an extra requirement that we don't need to have so the interface is smoother without it.
I'm implementing an object that owns several resources created from C libraries through FFI. In order to clean up what's already been done if the constructor panics, I'm wrapping each resource in its own struct and implementing Drop for them. However, when it comes to dropping the object itself, I cannot guarantee that resources will be dropped in a safe order because Rust doesn't define the order that a struct's fields are dropped.
Normally, you would solve this by making it so the object doesn't own the resources but rather borrows them (so that the resources may borrow each other). In effect, this pushes the problem up to the calling code, where the drop order is well defined and enforced with the semantics of borrowing. But this is inappropriate for my use case and in general a bit of a cop-out.
What's infuriating is that this would be incredibly easy if drop took self instead of &mut self for some reason. Then I could just call std::mem::drop in my desired order.
Is there any way to do this? If not, is there any way to clean up in the event of a constructor panic without manually catching and repanicking?
You can specify drop order of your struct fields in two ways:
Implicitly
I wrote RFC 1857 specifying drop order and it was merged 2017/07/03! According to the RFC, struct fields are dropped in the same order as they are declared.
You can check this by running the example below
struct PrintDrop(&'static str);
impl Drop for PrintDrop {
fn drop(&mut self) {
println!("Dropping {}", self.0)
}
}
struct Foo {
x: PrintDrop,
y: PrintDrop,
z: PrintDrop,
}
fn main() {
let foo = Foo {
x: PrintDrop("x"),
y: PrintDrop("y"),
z: PrintDrop("z"),
};
}
The output should be:
Dropping x
Dropping y
Dropping z
Explicitly
RFC 1860 introduces the ManuallyDrop type, which wraps another type and disables its destructor. The idea is that you can manually drop the object by calling a special function (ManuallyDrop::drop). This function is unsafe, since memory is left uninitialized after dropping the object.
You can use ManuallyDrop to explicitly specify the drop order of your fields in the destructor of your type:
#![feature(manually_drop)]
use std::mem::ManuallyDrop;
struct Foo {
x: ManuallyDrop<String>,
y: ManuallyDrop<String>
}
impl Drop for Foo {
fn drop(&mut self) {
// Drop in reverse order!
unsafe {
ManuallyDrop::drop(&mut self.y);
ManuallyDrop::drop(&mut self.x);
}
}
}
fn main() {
Foo {
x: ManuallyDrop::new("x".into()),
y: ManuallyDrop::new("y".into())
};
}
If you need this behavior without being able to use either of the newer methods, keep on reading...
The issue with drop
The drop method cannot take its parameter by value, since the parameter would be dropped again at the end of the scope. This would result in infinite recursion for all destructors of the language.
A possible solution/workaround
A pattern that I have seen in some codebases is to wrap the values that are being dropped in an Option<T>. Then, in the destructor, you can replace each option with None and drop the resulting value in the right order.
For instance, in the scoped-threadpool crate, the Pool object contains threads and a sender that will schedule new work. In order to join the threads correctly upon dropping, the sender should be dropped first and the threads second.
pub struct Pool {
threads: Vec<ThreadData>,
job_sender: Option<Sender<Message>>
}
impl Drop for Pool {
fn drop(&mut self) {
// By setting job_sender to `None`, the job_sender is dropped first.
self.job_sender = None;
}
}
A note on ergonomics
Of course, doing things this way is more of a workaround than a proper solution. Also, if the optimizer cannot prove that the option will always be Some, you now have an extra branch for each access to your struct field.
Fortunately, nothing prevents a future version of Rust to implement a feature that allows specifying drop order. It would probably require an RFC, but seems certainly doable. There is an ongoing discussion on the issue tracker about specifying drop order for the language, though it has been inactive last months.
A note on safety
If destroying your structs in the wrong order is unsafe, you should probably consider making their constructors unsafe and document this fact (in case you haven't done that already). Otherwise it would be possible to trigger unsafe behavior just by creating the structs and letting them fall out of scope.
Consider the following two structs:
pub struct BitVector<S: BitStorage> {
data: Vec<S>,
capacity: usize,
storage_size: usize
}
pub struct BitSlice<'a, S: BitStorage> {
data: &'a [S],
storage_size: usize
}
Where BitStorage is practically a type that is restricted to all unsigned integers (u8, u16, u32, u64, usize).
How to implement the Deref trait? (BitVector<S> derefs to BitSlice<S> similar to how Vec<S> derefs to &[S])
I have tried the following (Note that it doesn't compile due to issues with lifetimes, but more importantly because I try to return a value on the stack as a reference):
impl<'b, S: BitStorage> Deref for BitVector<S> {
type Target = BitSlice<'b, S>;
fn deref<'a>(&'a self) -> &'a BitSlice<'b, S> {
let slice = BitSlice {
data: self.data,
storage_size: self.storage_size,
};
&slice
}
}
I am aware that it is possible to return a field of a struct by reference, so for example I could return &Vec<S> or &usize in the Deref trait, but is it possible to return a BitSlice noting that I essentially have all the data in the BitVector already as Vec<S> can be transformed into &[S] and storage_size is already there?
I would think this is possible if I could create a struct using both values and somehow tell the compiler to ignore the fact that it is a struct that is created on the stack and instead just use the existing values, but I have got no clue how.
Deref is required to return a reference. A reference always points to some existing memory, and any local variable will not exist long enough. While there are, in theory, some sick tricks you could play to create a new object in deref and return a reference to it, all that I'm aware of result in a memory leak. Let's ignore these technicalities and just say it's plain impossible.
Now what? You'll have to change your API. Vec can implement Deref because it derefs to [T], not to &[T] or anything like that. You may have success with the same strategy: Make BitSlice<S> an unsized type containing only a slice [S], so that the return type is &'a BitSlice<S>. This assume the storage_size member is not needed. But it seems that this refers to the number of bits that are logically valid (i.e., can be accessed without extending the bit vector) — if so, that seems unavoidable1.
The other alternative, of course, is to not implement a Deref. Inconvenient, but if your slice data type is too far from an actual slice, it may be the only option.
RFC PR #1524 that proposed custom dynamically-sized types, then you could have a type BitSlice<S> that is like a slice but can have additional contents such as storage_size. However, this doesn't exist yet and it's far from certain if it ever will.
1 The capacity member on BitVector, however, seems pointless. Isn't that just sizeof S * 8?
Forgive me if this is a dumb question, but I'm new to Rust, and having a hard time writing this toy program to test my understanding.
I want a function that given a string, returns the first word in each line, as an iterator (because the input could be huge, I don't want to buffer the result as an array). Here's the program I wrote which collects the result as an array first:
fn get_first_words(input: ~str) -> ~[&str] {
return input.lines_any().filter_map(|x| x.split_str(" ").nth(0)).collect();
}
fn main() {
let s = ~"Hello World\nFoo Bar";
let words = get_words(s);
for word in words.iter() {
println!("{}", word);
}
}
Result (as expected):
Hello
Foo
How do I modify this to return an Iterator instead? I'm apparently not allowed to make Iterator<&str> the return type. If I try #Iterator<&str>, rustc says
error: The managed box syntax is being replaced by the `std::gc::Gc` and `std::rc::Rc` types. Equivalent functionality to managed trait objects will be implemented but is currently missing.
I can't figure out for the life of me how to make that work.
Similarly, trying to return ~Iterator<&str> makes rustc complain that the actual type is std::iter::FilterMap<....blah...>.
In C# this is really easy, as you simply return the result of the equivalent map call as an IEnumerable<string>. Then the callee doesn't have to know what the actual type is that's returned, it only uses methods available in the IEnumerable interface.
Is there nothing like returning an interface in Rust??
(I'm using Rust 0.10)
I believe that the equivalent of the C# example would be returning ~Iterator<&str>. This can be done, but must be written explicitly: rather than returning x, return ~x as ~Iterator<&'a str>. (By the way, your function is going to have to take &'a str rather than ~str—if you don’t know why, ask and I’ll explain.)
This is not, however, idiomatic Rust because it is needlessly inefficient. The idiomatic Rust is to list the return type explicitly. You can specify it in one place like this if you like:
use std::iter::{FilterMap, Map};
use std::str::CharSplits;
type Foo = FilterMap<'a, &'a str, &'a str,
Map<'a, &'a str, &'a str,
CharSplits<'a, char>>>
And then list Foo as the return type.
Yes, this is cumbersome. At present, there is no such thing as inferring a return type in any way. This has, however, been discussed and I believe it likely that it will come eventually in some syntax similar to fn foo<'a>(&'a str) -> Iterator<&'a str>. For now, though, there is no fancy sugar.