I'm trying to understand how trait objects are implemented in Rust. Please let me know if the following understanding is correct.
I have a function that takes any type that implements the Write trait:
fn some_func(write_to: &mut Write) {}
In any place where we have a type that implements this trait and calls the above function, the compiler generates a "trait object", probably by adding a call to TraitObject::new(data, vtable).
If we have something like:
let input = get_user_input(); // say we are expecting the input to be 1 or 2
let mut file = File::new("blah.txt").unwrap();
let mut vec: Vec<u8> = vec![1, 2, 3];
match input {
1 => some_func(&mut file),
2 => some_func(&mut vec),
}
will probably turn out to be:
match input {
1 => {
let file_write_trait_object: &mut Write =
TraitObject::new(&file, &vtable_for_file_write_trait);
some_func(file_write_trait_object);
}
2 => {
let vec_write_trait_object: &mut Write =
TraitObject::new(&vec, &vtable_for_vec_write_trait);
some_func(vec_write_trait_object);
}
}
Inside some_func the compiler will just access the methods used based on the vtable in the TraitObject passed along.
Trait objects are fat pointers, so fn some_func(write_to: &mut Write) compiles to something like fn some_func(_: *mut OpaqueStruct, _: *const WriteVtable).
Related
TL;DR: I want to implement trait std::io::Write that outputs to a memory buffer, ideally String, for unit-testing purposes.
I must be missing something simple.
Similar to another question, Writing to a file or stdout in Rust, I am working on a code that can work with any std::io::Write implementation.
It operates on structure defined like this:
pub struct MyStructure {
writer: Box<dyn Write>,
}
Now, it's easy to create instance writing to either a file or stdout:
impl MyStructure {
pub fn use_stdout() -> Self {
let writer = Box::new(std::io::stdout());
MyStructure { writer }
}
pub fn use_file<P: AsRef<Path>>(path: P) -> Result<Self> {
let writer = Box::new(File::create(path)?);
Ok(MyStructure { writer })
}
pub fn printit(&mut self) -> Result<()> {
self.writer.write(b"hello")?;
Ok(())
}
}
But for unit testing, I also need to have a way to run the business logic (here represented by method printit()) and trap its output, so that its content can be checked in the test.
I cannot figure out how to implement this. This playground code shows how I would like to use it, but it does not compile because it breaks borrowing rules.
// invalid code - does not compile!
fn main() {
let mut buf = Vec::new(); // This buffer should receive output
let mut x2 = MyStructure { writer: Box::new(buf) };
x2.printit().unwrap();
// now, get the collected output
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
// here I want to analyze the output, for instance in unit-test asserts
println!("Output to string was {}", output);
}
Any idea how to write the code correctly? I.e., how to implement a writer on top of a memory structure (String, Vec, ...) that can be accessed afterwards?
Something like this does work:
let mut buf = Vec::new();
{
// Use the buffer by a mutable reference
//
// Also, we're doing it inside another scope
// to help the borrow checker
let mut x2 = MyStructure { writer: Box::new(&mut buf) };
x2.printit().unwrap();
}
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
println!("Output to string was {}", output);
However, in order for this to work, you need to modify your type and add a lifetime parameter:
pub struct MyStructure<'a> {
writer: Box<dyn Write + 'a>,
}
Note that in your case (where you omit the + 'a part) the compiler assumes that you use 'static as the lifetime of the trait object:
// Same as your original variant
pub struct MyStructure {
writer: Box<dyn Write + 'static>
}
This limits the set of types which could be used here, in particular, you cannot use any kinds of borrowed references. Therefore, for maximum genericity we have to be explicit here and define a lifetime parameter.
Also note that depending on your use case, you can use generics instead of trait objects:
pub struct MyStructure<W: Write> {
writer: W
}
In this case the types are fully visible at any point of your program, and therefore no additional lifetime annotation is needed.
I created a structure where an iterator over a file or stdin should be stored, but compiler yelling at me :)
I decided that Lines is the struct I need to store in my struct to iterate using it later and Box will allow to store variable with unknown size, so I define my structure like that:
pub struct A {
pub input: Box<Lines<BufRead>>,
}
I want to do something like this later:
let mut a = A {
input: /* don't know what should be here yet */,
};
if something {
a.input = Box::new(io::stdin().lock().lines());
} else {
a.input = Box::new(BufReader::new(file).lines());
}
And finally
for line in a.input {
// ...
}
But I got an error from the compiler
error[E0277]: the size for values of type `(dyn std::io::BufRead + 'static)` cannot be known at compilation time
--> src/context.rs:11:5
|
11 | pub input: Box<Lines<BufRead>>,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
= help: the trait `std::marker::Sized` is not implemented for `(dyn std::io::BufRead + 'static)`
= note: to learn more, visit <https://doc.rust-lang.org/book/second-edition/ch19-04-advanced-types.html#dynamically-sized-types-and-sized>
= note: required by `std::io::Lines`
How can I achieve my goal?
The most generic answer to your question is that you don't / can't. Locking stdin returns a type that references the Stdin value. You cannot create a local value (stdin()), take a reference to it (.lock()), and then return that reference.
If you just want to do this inside of a function without returning it, then you can create a trait object:
use std::io::{self, prelude::*, BufReader};
fn example(file: Option<std::fs::File>) {
let stdin;
let mut stdin_lines;
let mut file_lines;
let input: &mut Iterator<Item = _> = match file {
None => {
stdin = io::stdin();
stdin_lines = stdin.lock().lines();
&mut stdin_lines
}
Some(file) => {
file_lines = BufReader::new(file).lines();
&mut file_lines
}
};
for line in input {
// ...
}
}
Or create a new generic function that you can pass either type of concrete iterator to:
use std::io::{self, prelude::*, BufReader};
fn example(file: Option<std::fs::File>) {
match file {
None => finally(io::stdin().lock().lines()),
Some(file) => finally(BufReader::new(file).lines()),
}
}
fn finally(input: impl Iterator<Item = io::Result<String>>) {
for line in input {
// ...
}
}
You could put either the trait object or the generic type into a structure even though you can't return it:
struct A<'a> {
input: &mut Iterator<Item = io::Result<String>>,
}
struct A<I>
where
I: Iterator<Item = io::Result<String>>,
{
input: I,
}
If you are feeling adventurous, you might be able to use some unsafe code / crates wrapping unsafe code to store the Stdin value and the iterator referencing it together, which is not universally safe.
See also:
Is there a way to use locked standard input and output in a constructor to live as long as the struct you're constructing?
Is there any way to return a reference to a variable created in a function?
Why can't I store a value and a reference to that value in the same struct?
How can I store a Chars iterator in the same struct as the String it is iterating on?
Are polymorphic variables allowed?
input: Box<Lines<BufRead>>,
This is invalid because Lines is not a trait. You want either:
use std::io::{prelude::*, Lines};
pub struct A {
pub input: Lines<Box<BufRead>>,
}
Or
use std::io;
pub struct A {
pub input: Box<Iterator<Item = io::Result<String>>>,
}
Consider the following code:
trait Animal {
fn make_sound(&self) -> String;
}
struct Cat;
impl Animal for Cat {
fn make_sound(&self) -> String {
"meow".to_string()
}
}
struct Dog;
impl Animal for Dog {
fn make_sound(&self) -> String {
"woof".to_string()
}
}
fn main () {
let dog: Dog = Dog;
let cat: Cat = Cat;
let v: Vec<Animal> = Vec::new();
v.push(cat);
v.push(dog);
for animal in v.iter() {
println!("{}", animal.make_sound());
}
}
The compiler tells me that v is a vector of Animal when I try to push cat (type mismatch)
So, how can I make a vector of objects belonging to a trait and calls the corresponding trait method on each element?
Vec<Animal> is not legal, but the compiler can't tell you that because the type mismatch somehow hides it. If we remove the calls to push, the compiler gives us the following error:
<anon>:22:9: 22:40 error: instantiating a type parameter with an incompatible type `Animal`, which does not fulfill `Sized` [E0144]
<anon>:22 let mut v: Vec<Animal> = Vec::new();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The reason why that's not legal is that a Vec<T> stores many T objects consecutively in memory. However, Animal is a trait, and traits have no size (a Cat and a Dog are not guaranteed to have the same size).
To solve this problem, we need to store something that has a size in the Vec. The most straightforward solution is to wrap the values in a Box, i.e. Vec<Box<Animal>>. Box<T> has a fixed size (a "fat pointer" if T is a trait, a simple pointer otherwise).
Here's a working main:
fn main() {
let dog: Dog = Dog;
let cat: Cat = Cat;
let mut v: Vec<Box<Animal>> = Vec::new();
v.push(Box::new(cat));
v.push(Box::new(dog));
for animal in v.iter() {
println!("{}", animal.make_sound());
}
}
You may use a reference trait object &Animal to borrow the elements and store these trait objects in a Vec. You can then enumerate it and use the trait's interface.
Altering the Vec's generic type by adding a & in front of the trait will work:
fn main() {
let dog: Dog = Dog;
let cat: Cat = Cat;
let mut v: Vec<&Animal> = Vec::new();
// ~~~~~~~
v.push(&dog);
v.push(&cat);
for animal in v.iter() {
println!("{}", animal.make_sound());
}
// Ownership is still bound to the original variable.
println!("{}", cat.make_sound());
}
This is great if you may want the original variable to keep ownership and reuse it later.
Keep in mind with the scenario above, you can't transfer ownership of dog or cat because the Vec has borrowed these concrete instances at the same scope.
Introducing a new scope can help handle that particular situation:
fn main() {
let dog: Dog = Dog;
let cat: Cat = Cat;
{
let mut v: Vec<&Animal> = Vec::new();
v.push(&dog);
v.push(&cat);
for animal in v.iter() {
println!("{}", animal.make_sound());
}
}
let pete_dog: Dog = dog;
println!("{}", pete_dog.make_sound());
}
The existing answers explain the problem with Vec<Animal> well, but they use older syntax, which is not valid anymore.
In short, the vector needs to contain trait objects and its type should be (something like) Vec<Box<dyn Animal>>.
In modern Rust, the dyn keyword is used to specify a trait object. But we cannot use just Vec<dyn Animal>, because dyn Animal is not sized (Cat and Dog could pottentially have fields of different size). Vectors can only contain elements of a fixed size. So that's why in the vector we should rather store some sort of pointers to the actual structs. The Box struct is one such option, a kind of a smart pointer that has a fixed size in itself.
Let's test this (on a 64-bit machine):
use std::mem::size_of;
println!("size Cat = {}", size_of::<Cat>()); // 0 bytes (the Cat struct has no fields)
println!("size Dog = {}", size_of::<Dog>()); // 0 bytes (the Dog struct has no fields)
println!("size BoxCat = {}", size_of::<Box<Cat>>()); // 8 bytes (1 usize pntr)
println!("size BoxDyn = {}", size_of::<Box<dyn Animal>>()); // 16 bytes (2 usize pointers)
println!("{}", size_of::<dyn Animal>()); // Error: doesn't have a size known at compile-time
Note that if Cat had fields, size_of::<Cat>() would have been more than 0, but size_of::<Box<Cat>>() and size_of::<Box<dyn Animal>>() wouldn't change at all.
Also note that Box<dyn Animal> actually contains 2 pointers:
one that points to the actual struct instance data;
one for the vtable (that's because of dyn; it's needed for dynamic dispatching).
Now to your example. To make it work, you just need to replace these three lines:
let v: Vec<Animal> = Vec::new();
v.push(cat);
v.push(dog);
with these:
let mut v: Vec<Box<dyn Animal>> = Vec::new();
v.push(Box::new(cat));
v.push(Box::new(dog));
Vecs support std::io::Write, so code can be written that takes a File or Vec, for example. From the API reference, it looks like neither Vec nor slices support std::io::Read.
Is there a convenient way to achieve this? Does it require writing a wrapper struct?
Here is an example of working code, that reads and writes a file, with a single line commented that should read a vector.
use ::std::io;
// Generic IO
fn write_4_bytes<W>(mut file: W) -> Result<usize, io::Error>
where W: io::Write,
{
let len = file.write(b"1234")?;
Ok(len)
}
fn read_4_bytes<R>(mut file: R) -> Result<[u8; 4], io::Error>
where R: io::Read,
{
let mut buf: [u8; 4] = [0; 4];
file.read(&mut buf)?;
Ok(buf)
}
// Type specific
fn write_read_vec() {
let mut vec_as_file: Vec<u8> = Vec::new();
{ // Write
println!("Writing Vec... {}", write_4_bytes(&mut vec_as_file).unwrap());
}
{ // Read
// println!("Reading File... {:?}", read_4_bytes(&vec_as_file).unwrap());
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// Comment this line above to avoid an error!
}
}
fn write_read_file() {
let filepath = "temp.txt";
{ // Write
let mut file_as_file = ::std::fs::File::create(filepath).expect("open failed");
println!("Writing File... {}", write_4_bytes(&mut file_as_file).unwrap());
}
{ // Read
let mut file_as_file = ::std::fs::File::open(filepath).expect("open failed");
println!("Reading File... {:?}", read_4_bytes(&mut file_as_file).unwrap());
}
}
fn main() {
write_read_vec();
write_read_file();
}
This fails with the error:
error[E0277]: the trait bound `std::vec::Vec<u8>: std::io::Read` is not satisfied
--> src/main.rs:29:42
|
29 | println!("Reading File... {:?}", read_4_bytes(&vec_as_file).unwrap());
| ^^^^^^^^^^^^ the trait `std::io::Read` is not implemented for `std::vec::Vec<u8>`
|
= note: required by `read_4_bytes`
I'd like to write tests for a file format encoder/decoder, without having to write to the file-system.
While vectors don't support std::io::Read, slices do.
There is some confusion here caused by Rust being able to coerce a Vec into a slice in some situations but not others.
In this case, an explicit coercion to a slice is needed because at the stage coercions are applied, the compiler doesn't know that Vec<u8> doesn't implement Read.
The code in the question will work when the vector is coerced into a slice using one of the following methods:
read_4_bytes(&*vec_as_file)
read_4_bytes(&vec_as_file[..])
read_4_bytes(vec_as_file.as_slice()).
Note:
When asking the question initially, I was taking &Read instead of Read. This made passing a reference to a slice fail, unless I'd passed in &&*vec_as_file which I didn't think to do.
Recent versions of rust you can also use as_slice() to convert a Vec to a slice.
Thanks to #arete on #rust for finding the solution!
std::io::Cursor
std::io::Cursor is a simple and useful wrapper that implements Read for Vec<u8>, so it allows to use vector as a readable entity.
let mut file = Cursor::new(vector);
read_something(&mut file);
And documentation shows how to use Cursor instead of File to write unit-tests!
Working example:
use std::io::Cursor;
use std::io::Read;
fn read_something(file: &mut impl Read) {
let _ = file.read(&mut [0; 8]);
}
fn main() {
let vector = vec![1, 2, 3, 4];
let mut file = Cursor::new(vector);
read_something(&mut file);
}
From the documentation about std::io::Cursor:
Cursors are typically used with in-memory buffers to allow them to implement Read and/or Write...
The standard library implements some I/O traits on various types which are commonly used as a buffer, like Cursor<Vec<u8>> and Cursor<&[u8]>.
Slice
The example above works for slices as well. In that case it would look like the following:
read_something(&mut &vector[..]);
Working example:
use std::io::Read;
fn read_something(file: &mut impl Read) {
let _ = file.read(&mut [0; 8]);
}
fn main() {
let vector = vec![1, 2, 3, 4];
read_something(&mut &vector[..]);
}
&mut &vector[..] is a "mutable reference to a slice" (a reference to a reference to a part of vector), so I just find the explicit option with Cursor to be more clear and elegant.
Cursor <-> Slice
Even more: if you have a Cursor that owns a buffer, and you need to emulate, for instance, a part of a "file", you can get a slice from the Cursor and pass to the function.
read_something(&mut &file.get_ref()[1..3]);
I've met a conflict with Rust's ownership rules and a trait object downcast. This is a sample:
use std::any::Any;
trait Node{
fn gen(&self) -> Box<Node>;
}
struct TextNode;
impl Node for TextNode{
fn gen(&self) -> Box<Node>{
Box::new(TextNode)
}
}
fn main(){
let mut v: Vec<TextNode> = Vec::new();
let node = TextNode.gen();
let foo = &node as &Any;
match foo.downcast_ref::<TextNode>(){
Some(n) => {
v.push(*n);
},
None => ()
};
}
The TextNode::gen method has to return Box<Node> instead of Box<TextNode>, so I have to downcast it to Box<TextNode>.
Any::downcast_ref's return value is Option<&T>, so I can't take ownership of the downcast result and push it to v.
====edit=====
As I am not good at English, my question is vague.
I am implementing (copying may be more precise) the template parser in Go standard library.
What I really need is a vector, Vec<Box<Node>> or Vec<Box<Any>>, which can contain TextNode, NumberNode, ActionNode, any type of node that implements the trait Node can be pushed into it.
Every node type needs to implement the copy method, return Box<Any>, and then downcasting to the concrete type is OK. But to copy Vec<Box<Any>>, as you don't know the concrete type of every element, you have to check one by one, that is really inefficient.
If the copy method returns Box<Node>, then copying Vec<Box<Node>> is simple. But it seems that there is no way to get the concrete type from trait object.
If you control trait Node you can have it return a Box<Any> and use the Box::downcast method
It would look like this:
use std::any::Any;
trait Node {
fn gen(&self) -> Box<Any>; // downcast works on Box<Any>
}
struct TextNode;
impl Node for TextNode {
fn gen(&self) -> Box<Any> {
Box::new(TextNode)
}
}
fn main() {
let mut v: Vec<TextNode> = Vec::new();
let node = TextNode.gen();
if let Ok(n) = node.downcast::<TextNode>() {
v.push(*n);
}
}
Generally speaking, you should not jump to using Any. I know it looks familiar when coming from a language with subtype polymorphism and want to recreate a hierarchy of types with some root type (like in this case: you're trying to recreate the TextNode is a Node relationship and create a Vec of Nodes). I did it too and so did many others: I bet the number of SO questions on Any outnumbers the times Any is actually used on crates.io.
While Any does have its uses, in Rust it has alternatives.
In case you have not looked at them, I wanted to make sure you considered doing this with:
enums
Given different Node types you can express the "a Node is any of these types" relationship with an enum:
struct TextNode;
struct XmlNode;
struct HtmlNode;
enum Node {
Text(TextNode),
Xml(XmlNode),
Html(HtmlNode),
}
With that you can put them all in one Vec and do different things depending on the variant, without downcasting:
let v: Vec<Node> = vec![
Node::Text(TextNode),
Node::Xml(XmlNode),
Node::Html(HtmlNode)];
for n in &v {
match n {
&Node::Text(_) => println!("TextNode"),
&Node::Xml(_) => println!("XmlNode"),
&Node::Html(_) => println!("HtmlNode"),
}
}
playground
adding a variant means potentially changing your code in many places: the enum itself and all the functions that do something with the enum (to add the logic for the new variant). But then again, with Any it's mostly the same, all those functions might need to add the downcast to the new variant.
Trait objects (not Any)
You can try putting the actions you'd want to perform on the various types of nodes in the trait, so you don't need to downcast, but just call methods on the trait object.
This is essentially what you were doing, except putting the method on the Node trait instead of downcasting.
playground
The (more) ideomatic way for the problem:
use std::any::Any;
pub trait Nodeable {
fn as_any(&self) -> &dyn Any;
}
#[derive(Clone, Debug)]
struct TextNode {}
impl Nodeable for TextNode {
fn as_any(&self) -> &dyn Any {
self
}
}
fn main() {
let mut v: Vec<Box<dyn Nodeable>> = Vec::new();
let node = TextNode {}; // or impl TextNode::new
v.push(Box::new(node));
// the downcast back to TextNode could be solved like this:
if let Some(b) = v.pop() { // only if we have a nodeā¦
let n = (*b).as_any().downcast_ref::<TextNode>().unwrap(); // this is secure *)
println!("{:?}", n);
};
}
*) This is secure: only Nodeables are allowd to be downcasted to types that had Nodeable implemented.