Extend Existing Data Structure Via an External Library - rust

How does one take an existing data structure (vec, hashmap, set) and extend the methods on it via an external library?
fn main() {
let vec = vec![1, 2, 3];
vec.my_new_method(...)
}

You can take advantage of the fact that a crate defining a trait can implement that trait on whatever type it wants. Here's a simple example of a combined "shift and then push" function. First it will shift the first element out of the vector (if there is one), then it will push the argument on to the end of the vector. If there was a shifted element, it is returned. (This is a bit of a silly operation, but works to demonstrate this technique.)
First we declare the trait, with the signature of the method(s) we want to add:
trait VecExt<T> {
fn shift_and_push(&mut self, v: T) -> Option<T>;
}
Now we can implement the trait for Vec<T>:
impl<T> VecExt<T> for Vec<T> {
fn shift_and_push(&mut self, v: T) -> Option<T> {
let r = if !self.is_empty() { Some(self.remove(0)) } else { None };
self.push(v);
r
}
}
Now, anywhere that VecExt is brought into scope with use (or by being in the same source file as its declaration) this extension method can be used on any vector.

Related

Struct property as key and the struct itself as value in HashMaps - Rust [duplicate]

This question already has answers here:
Why can't I store a value and a reference to that value in the same struct?
(4 answers)
Closed 8 months ago.
The following is a snippet of a more complicated code, the idea is loading a SQL table and setting a hashmap with one of the table struct fields as the key and keeping the structure as the value (implementation details are not important since the code works fine if I clone the String, however, the Strings in the DB can be arbitrarily long and cloning can be expensive).
The following code will fail with
error[E0382]: use of partially moved value: `foo`
--> src/main.rs:24:35
|
24 | foo_hashmap.insert(foo.a, foo);
| ----- ^^^ value used here after partial move
| |
| value partially moved here
|
= note: partial move occurs because `foo.a` has type `String`, which does not implement the `Copy` trait
For more information about this error, try `rustc --explain E0382`.
use std::collections::HashMap;
struct Foo {
a: String,
b: String,
}
fn main() {
let foo_1 = Foo {
a: "bar".to_string(),
b: "bar".to_string(),
};
let foo_2 = Foo {
a: "bar".to_string(),
b: "bar".to_string(),
};
let foo_vec = vec![foo_1, foo_2];
let mut foo_hashmap = HashMap::new();
foo_vec.into_iter().for_each(|foo| {
foo_hashmap.insert(foo.a, foo); // foo.a.clone() will make this compile
});
}
The struct Foo cannot implement Copy since its fields are String. I tried wrapping foo.a with Rc::new(RefCell::new()) but later went down the pitfall of missing the trait Hash for RefCell<String>, so currently I'm not certain in either using something else for the struct fields (will Cow work?), or to handle that logic within the for_each loop.
There are at least two problems here: First, the resulting HashMap<K, V> would be a self-referential struct, as the K borrows V; there are many questions and answers on SA about the pitfalls of this. Second, even if you could construct such a HashMap, you'd easily break the guarantees provided by HashMap, which allows you to modify V while assuming that K always stays constant: There is no way to get a &mut K for a HashMap, but you can get a &mut V; if K is actually a &V, one could easily modify K through V (by ways of mutating Foo.a ) and break the map.
One possibility is to change Foo.a from a String to a Rc<str>, which you can clone with minimal runtime cost in order to put the value both in the K and into V. As Rc<str> is Borrow<str>, you can still look up values in the map by means of &str. This still has the - theoretical - downside that you can break the map by getting a &mut Foo from the map and std::mem::swap the a, which makes it impossible to look up the correct value from its keys; but you'd have to do that deliberately.
Another option is to actually use a HashSet instead of a HashMap, and use a newtype for Foo which behaves like a Foo.a. You'd have to implement PartialEq, Eq, Hash (and Borrow<str> for good measure) like this:
use std::collections::HashSet;
#[derive(Debug)]
struct Foo {
a: String,
b: String,
}
/// A newtype for `Foo` which behaves like a `str`
#[derive(Debug)]
struct FooEntry(Foo);
/// `FooEntry` compares to other `FooEntry` only via `.a`
impl PartialEq<FooEntry> for FooEntry {
fn eq(&self, other: &FooEntry) -> bool {
self.0.a == other.0.a
}
}
impl Eq for FooEntry {}
/// It also hashes the same way as a `Foo.a`
impl std::hash::Hash for FooEntry {
fn hash<H>(&self, hasher: &mut H)
where
H: std::hash::Hasher,
{
self.0.a.hash(hasher);
}
}
/// Due to the above, we can implement `Borrow`, so now we can look up
/// a `FooEntry` in the Set using &str
impl std::borrow::Borrow<str> for FooEntry {
fn borrow(&self) -> &str {
&self.0.a
}
}
fn main() {
let foo_1 = Foo {
a: "foo".to_string(),
b: "bar".to_string(),
};
let foo_2 = Foo {
a: "foobar".to_string(),
b: "barfoo".to_string(),
};
let foo_vec = vec![foo_1, foo_2];
let mut foo_hashmap = HashSet::new();
foo_vec.into_iter().for_each(|foo| {
foo_hashmap.insert(FooEntry(foo));
});
// Look up `Foo` using &str as keys...
println!("{:?}", foo_hashmap.get("foo").unwrap().0);
println!("{:?}", foo_hashmap.get("foobar").unwrap().0);
}
Notice that HashSet provides no way to get a &mut FooEntry due to the reasons described above. You'd have to use RefCell (and read what the docs of HashSet have to say about this).
The third option is to simply clone() the foo.a as you described. Given the above, this is probably the most simple solution. If using an Rc<str> doesn't bother you for other reasons, this would be my choice.
Sidenote: If you don't need to modify a and/or b, a Box<str> instead of String is smaller by one machine word.

Entry::Occupied.get() returns a value referencing data owned by the current function even though hashmap should have the ownership

My goal was to implement the suggested improvement on the cacher struct of the rust book chapter 13.1, that is creating a struct which takes a function and uses memoization to reduce the number of calls of the given function. To do this, I created a struct with an HashMap
struct Cacher<T, U, V>
where T: Fn(&U) -> V, U: Eq + Hash
{
calculation: T,
map: HashMap<U,V>,
}
and two methods, one constructor and one which is resposible of the memoization.
impl<T, U, V> Cacher<T, U, V>
where T: Fn(&U) -> V, U: Eq + Hash
{
fn new(calculation: T) -> Cacher<T,U,V> {
Cacher {
calculation,
map: HashMap::new(),
}
}
fn value(&mut self, arg: U) -> &V {
match self.map.entry(arg){
Entry::Occupied(occEntry) => occEntry.get(),
Entry::Vacant(vacEntry) => {
let argRef = vacEntry.key();
let result = (self.calculation)(argRef);
vacEntry.insert(result)
}
}
}
}
I used the Entry enum, because I didn't found a better way of deciding if the HashMap contains a key and - if it doesn't - calculating the value and inserting it into the HashMap as well as returning a reference to it.
If I want to compile the code above, I get an error which says that occEntry is borrowed by it's .get() method (which is fine by me) and that .get() "returns a value referencing data owned by the current function".
My understanding is that the compiler thinks that the value which occEntry.get() is referencing to is owned by the function value(...). But shouldn't I get a reference of the value of type V, which is owned by the HashMap? Is the compiler getting confused because the value is owned by the function and saved as result for a short moment?
let result = (self.calculation)(argRef);
vacEntry.insert(result)
Please note that it is necessary to save the result temporarily because the insert method consumes the key and such argRef is not valid anymore. Also I acknowledge that the signature of value can be problematic (see Mutable borrow from HashMap and lifetime elision) but I tried to avoid a Copy Trait Bound.
For quick reproduction of the problem I append the use statements necessary. Thanks for your help.
use std::collections::HashMap;
use std::cmp::Eq;
use std::hash::Hash;
use std::collections::hash_map::{OccupiedEntry, VacantEntry, Entry};
Let's take a look at OccupiedEntry::get()'s signature:
pub fn get(&self) -> &V
What this signature is telling us is that the reference obtained from the OccupiedEntry can only live as long as the OccupiedEntry itself. However, the OccupiedEntry is a local variable, thus it's dropped when the function returns.
What we want is a reference whose lifetime is bound to the HashMap's lifetime. Both Entry and OccupiedEntry have a lifetime parameter ('a), which is linked to the &mut self parameter in HashMap::entry. We need a method on OccupiedEntry that returns a &'a V. There's no such method, but there's one that returns a '&a mut V: into_mut. A mutable reference can be implicitly coerced to a shared reference, so all we need to do to make your method compile is to replace get() with into_mut().
fn value(&mut self, arg: U) -> &V {
match self.map.entry(arg) {
Entry::Occupied(occ_entry) => occ_entry.into_mut(),
Entry::Vacant(vac_entry) => {
let arg_ref = vac_entry.key();
let result = (self.calculation)(arg_ref);
vac_entry.insert(result)
}
}
}

Rust: polymorphic calls for structs in a vector

I'm a complete newbie in Rust and I'm trying to get some understanding of the basics of the language.
Consider the following trait
trait Function {
fn value(&self, arg: &[f64]) -> f64;
}
and two structs implementing it:
struct Add {}
struct Multiply {}
impl Function for Add {
fn value(&self, arg: &[f64]) -> f64 {
arg[0] + arg[1]
}
}
impl Function for Multiply {
fn value(&self, arg: &[f64]) -> f64 {
arg[0] * arg[1]
}
}
In my main() function I want to group two instances of Add and Multiply in a vector, and then call the value method. The following works:
fn main() {
let x = vec![1.0, 2.0];
let funcs: Vec<&dyn Function> = vec![&Add {}, &Multiply {}];
for f in funcs {
println!("{}", f.value(&x));
}
}
And so does:
fn main() {
let x = vec![1.0, 2.0];
let funcs: Vec<Box<dyn Function>> = vec![Box::new(Add {}), Box::new(Multiply {})];
for f in funcs {
println!("{}", f.value(&x));
}
}
Is there any better / less verbose way? Can I work around wrapping the instances in a Box? What is the takeaway with trait objects in this case?
Is there any better / less verbose way?
There isn't really a way to make this less verbose. Since you are using trait objects, you need to tell the compiler that the vectors's items are dyn Function and not the concrete type. The compiler can't just infer that you meant dyn Function trait objects because there could have been other traits that Add and Multiply both implement.
You can't abstract out the calls to Box::new either. For that to work, you would have to somehow map over a heterogeneous collection, which isn't possible in Rust. However, if you are writing this a lot, you might consider adding helper constructor functions for each concrete impl:
impl Add {
fn new() -> Add {
Add {}
}
fn new_boxed() -> Box<Add> {
Box::new(Add::new())
}
}
It's idiomatic to include a new constructor wherever possible, but it's also common to include alternative convenience constructors.
This makes the construction of the vector a bit less noisy:
let funcs: Vec<Box<dyn Function>> = vec!(Add::new_boxed(), Multiply::new_boxed()));
What is the takeaway with trait objects in this case?
There is always a small performance hit with using dynamic dispatch. If all of your objects are the same type, they can be densely packed in memory, which can be much faster for iteration. In general, I wouldn't worry too much about this unless you are creating a library crate, or if you really want to squeeze out the last nanosecond of performance.

Why is it useful to use PhantomData to inform the compiler that a struct owns a generic if I already implement Drop?

In the Rustonomicon's guide to PhantomData, there is a part about what happens if a Vec-like struct has *const T field, but no PhantomData<T>:
The drop checker will generously determine that Vec<T> does not own any values of type T. This will in turn make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor for determining drop check soundness. This will in turn allow people to create unsoundness using Vec's destructor.
What does it mean? If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?
The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.
Deep dive: We have a combination of factors here:
We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).
Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.
However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.
The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.
So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)
In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).
Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).
BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."
The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).
If one wants to try to understand this via concrete exploration, you can try comparing how the compiler differs in its treatment of little container types that vary in their use of #[may_dangle] and PhantomData.
Here is some sample code I have whipped up to illustrate this:
// Illustration of a case where PhantomData is providing necessary ownership
// info to rustc.
//
// MyBox2<T> uses just a `*const T` to hold the `T` it owns.
// MyBox3<T> has both a `*const T` AND a PhantomData<T>; the latter communicates
// its ownership relationship with `T`.
//
// Skim down to `fn f2()` to see the relevant case,
// and compare it to `fn f3()`. When you run the program,
// the output will include:
//
// drop PrintOnDrop(mb2b, PrintOnDrop("v2b", 13, INVALID), Valid)
//
// (However, in the absence of #[may_dangle], the compiler will constrain
// things in a manner that may indeed imply that PhantomData is unnecessary;
// pnkfelix is not 100% sure of this claim yet, though.)
#![feature(alloc, dropck_eyepatch, generic_param_attrs, heap_api)]
extern crate alloc;
use alloc::heap;
use std::fmt;
use std::marker::PhantomData;
use std::mem;
use std::ptr;
#[derive(Copy, Clone, Debug)]
enum State { INVALID, Valid }
#[derive(Debug)]
struct PrintOnDrop<T: fmt::Debug>(&'static str, T, State);
impl<T: fmt::Debug> PrintOnDrop<T> {
fn new(name: &'static str, t: T) -> Self {
PrintOnDrop(name, t, State::Valid)
}
}
impl<T: fmt::Debug> Drop for PrintOnDrop<T> {
fn drop(&mut self) {
println!("drop PrintOnDrop({}, {:?}, {:?})",
self.0,
self.1,
self.2);
self.2 = State::INVALID;
}
}
struct MyBox1<T> {
v: Box<T>,
}
impl<T> MyBox1<T> {
fn new(t: T) -> Self {
MyBox1 { v: Box::new(t) }
}
}
struct MyBox2<T> {
v: *const T,
}
impl<T> MyBox2<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox2 { v: p }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox2<T> {
fn drop(&mut self) {
unsafe {
// We want this to be *legal*. This destructor is not
// allowed to call methods on `T` (since it may be in
// an invalid state), but it should be allowed to drop
// instances of `T` as it deconstructs itself.
//
// (Note however that the compiler has no knowledge
// that `MyBox2<T>` owns an instance of `T`.)
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
struct MyBox3<T> {
v: *const T,
_pd: PhantomData<T>,
}
impl<T> MyBox3<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox3 { v: p, _pd: Default::default() }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox3<T> {
fn drop(&mut self) {
unsafe {
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
fn f1() {
// `let (v, _mb1);` and `let (_mb1, v)` won't compile due to dropck
let v1; let _mb1;
v1 = PrintOnDrop::new("v1", 13);
_mb1 = MyBox1::new(PrintOnDrop::new("mb1", &v1));
}
fn f2() {
{
let (v2a, _mb2a); // Sound, but not distinguished from below by rustc!
v2a = PrintOnDrop::new("v2a", 13);
_mb2a = MyBox2::new(PrintOnDrop::new("mb2a", &v2a));
}
{
let (_mb2b, v2b); // Unsound!
v2b = PrintOnDrop::new("v2b", 13);
_mb2b = MyBox2::new(PrintOnDrop::new("mb2b", &v2b));
// namely, v2b dropped before _mb2b, but latter contains
// value that attempts to access v2b when being dropped.
}
}
fn f3() {
let v3; let _mb3; // `let (v, mb3);` won't compile due to dropck
v3 = PrintOnDrop::new("v3", 13);
_mb3 = MyBox3::new(PrintOnDrop::new("mb3", &v3));
}
fn main() {
f1(); f2(); f3();
}
Caveat emptor — I'm not that strong in the extremely deep theory that truly answers your question. I'm just a layperson who has used Rust a bit and has read the related RFCs. Always refer back to those original sources for a less-diluted version of the truth.
RFC 769 introduced the actual The Drop-Check Rule:
Let v be some value (either temporary or named) and 'a be some
lifetime (scope); if the type of v owns data of type D, where (1.)
D has a lifetime- or type-parametric Drop implementation, and (2.)
the structure of D can reach a reference of type &'a _, and (3.)
either:
(A.) the Drop impl for D instantiates D at 'a
directly, i.e. D<'a>, or,
(B.) the Drop impl for D has some type parameter with a
trait bound T where T is a trait that has at least
one method,
then 'a must strictly outlive the scope of v.
It then goes further to define some of those terms, including what it means for one type to own another. This goes further to mention PhantomData specifically:
Therefore, as an additional special case to the criteria above for when the type E owns data of type D, we include:
If E is PhantomData<T>, then recurse on T.
A key problem occurs when two variables are defined at the same time:
struct Noisy<'a>(&'a str);
impl<'a> Drop for Noisy<'a> {
fn drop(&mut self) { println!("Dropping {}", self.0 )}
}
fn main() -> () {
let (mut v, s) = (Vec::new(), "hi".to_string());
let noisy = Noisy(&s);
v.push(noisy);
}
As I understand it, without The Drop-Check Rule and indicating that Vec owns Noisy, code like this might compile. When the Vec is dropped, the drop implementation could access an invalid reference; introducing unsafety.
Returning to your points:
If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?
The compiler must know that you own the value because you can/will call drop. Since the implementation of drop is arbitrary, if you are going to call it, the compiler must forbid you from accepting values that would cause unsafe behavior during drop.
Always remember that any arbitrary T can be a value, a reference, a value containing a reference, etc. When trying to puzzle out these types of things, it's important to try to use the most complicated variant for any thought experiments.
All of that should provide enough pieces to connect-the-dots; for full understanding, reading the RFC a few times is probably better than relying on my flawed interpretation.
Then it gets more complicated. RFC 1238 further modifies The Drop-Check Rule, removing this specific reasoning. It does say:
parametricity is a necessary but not sufficient condition to justify the inferences that dropck makes
Continuing to use PhantomData seems the safest thing to do, but it may not be required. An anonymous Twitter benefactor pointed out this code:
use std::marker::PhantomData;
#[derive(Debug)] struct MyGeneric<T> { x: Option<T> }
#[derive(Debug)] struct MyDropper<T> { x: Option<T> }
#[derive(Debug)] struct MyHiddenDropper<T> { x: *const T }
#[derive(Debug)] struct MyHonestHiddenDropper<T> { x: *const T, boo: PhantomData<T> }
impl<T> Drop for MyDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHiddenDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHonestHiddenDropper<T> { fn drop(&mut self) { } }
fn main() {
// Does Compile! (magic annotation on destructor)
{
let (a, mut b) = (0, vec![]);
b.push(&a);
}
// Does Compile! (no destructor)
{
let (a, mut b) = (0, MyGeneric { x: None });
b.x = Some(&a);
}
// Doesn't Compile! (has destructor, no attribute)
{
let (a, mut b) = (0, MyDropper { x: None });
b.x = Some(&a);
}
{
let (a, mut b) = (0, MyHiddenDropper { x: 0 as *const _ });
b.x = &&a;
}
{
let (a, mut b) = (0, MyHonestHiddenDropper { x: 0 as *const _, boo: PhantomData });
b.x = &&a;
}
}
This suggests that the changes in RFC 1238 made the compiler more conservative, such that simply having a lifetime or type parameter is enough to prevent it from compiling.
You can also note that Vec doesn't have this problem because it uses the unsafe_destructor_blind_to_params attribute described in the the RFC.

How to take ownership of Any:downcast_ref from trait object?

I've met a conflict with Rust's ownership rules and a trait object downcast. This is a sample:
use std::any::Any;
trait Node{
fn gen(&self) -> Box<Node>;
}
struct TextNode;
impl Node for TextNode{
fn gen(&self) -> Box<Node>{
Box::new(TextNode)
}
}
fn main(){
let mut v: Vec<TextNode> = Vec::new();
let node = TextNode.gen();
let foo = &node as &Any;
match foo.downcast_ref::<TextNode>(){
Some(n) => {
v.push(*n);
},
None => ()
};
}
The TextNode::gen method has to return Box<Node> instead of Box<TextNode>, so I have to downcast it to Box<TextNode>.
Any::downcast_ref's return value is Option<&T>, so I can't take ownership of the downcast result and push it to v.
====edit=====
As I am not good at English, my question is vague.
I am implementing (copying may be more precise) the template parser in Go standard library.
What I really need is a vector, Vec<Box<Node>> or Vec<Box<Any>>, which can contain TextNode, NumberNode, ActionNode, any type of node that implements the trait Node can be pushed into it.
Every node type needs to implement the copy method, return Box<Any>, and then downcasting to the concrete type is OK. But to copy Vec<Box<Any>>, as you don't know the concrete type of every element, you have to check one by one, that is really inefficient.
If the copy method returns Box<Node>, then copying Vec<Box<Node>> is simple. But it seems that there is no way to get the concrete type from trait object.
If you control trait Node you can have it return a Box<Any> and use the Box::downcast method
It would look like this:
use std::any::Any;
trait Node {
fn gen(&self) -> Box<Any>; // downcast works on Box<Any>
}
struct TextNode;
impl Node for TextNode {
fn gen(&self) -> Box<Any> {
Box::new(TextNode)
}
}
fn main() {
let mut v: Vec<TextNode> = Vec::new();
let node = TextNode.gen();
if let Ok(n) = node.downcast::<TextNode>() {
v.push(*n);
}
}
Generally speaking, you should not jump to using Any. I know it looks familiar when coming from a language with subtype polymorphism and want to recreate a hierarchy of types with some root type (like in this case: you're trying to recreate the TextNode is a Node relationship and create a Vec of Nodes). I did it too and so did many others: I bet the number of SO questions on Any outnumbers the times Any is actually used on crates.io.
While Any does have its uses, in Rust it has alternatives.
In case you have not looked at them, I wanted to make sure you considered doing this with:
enums
Given different Node types you can express the "a Node is any of these types" relationship with an enum:
struct TextNode;
struct XmlNode;
struct HtmlNode;
enum Node {
Text(TextNode),
Xml(XmlNode),
Html(HtmlNode),
}
With that you can put them all in one Vec and do different things depending on the variant, without downcasting:
let v: Vec<Node> = vec![
Node::Text(TextNode),
Node::Xml(XmlNode),
Node::Html(HtmlNode)];
for n in &v {
match n {
&Node::Text(_) => println!("TextNode"),
&Node::Xml(_) => println!("XmlNode"),
&Node::Html(_) => println!("HtmlNode"),
}
}
playground
adding a variant means potentially changing your code in many places: the enum itself and all the functions that do something with the enum (to add the logic for the new variant). But then again, with Any it's mostly the same, all those functions might need to add the downcast to the new variant.
Trait objects (not Any)
You can try putting the actions you'd want to perform on the various types of nodes in the trait, so you don't need to downcast, but just call methods on the trait object.
This is essentially what you were doing, except putting the method on the Node trait instead of downcasting.
playground
The (more) ideomatic way for the problem:
use std::any::Any;
pub trait Nodeable {
fn as_any(&self) -> &dyn Any;
}
#[derive(Clone, Debug)]
struct TextNode {}
impl Nodeable for TextNode {
fn as_any(&self) -> &dyn Any {
self
}
}
fn main() {
let mut v: Vec<Box<dyn Nodeable>> = Vec::new();
let node = TextNode {}; // or impl TextNode::new
v.push(Box::new(node));
// the downcast back to TextNode could be solved like this:
if let Some(b) = v.pop() { // only if we have a node…
let n = (*b).as_any().downcast_ref::<TextNode>().unwrap(); // this is secure *)
println!("{:?}", n);
};
}
*) This is secure: only Nodeables are allowd to be downcasted to types that had Nodeable implemented.

Resources