Traits in algebraic data types - rust

I'm having trouble understanding the rules about traits in algebraic data types.
Here's a simplified example:
use std::rc::Rc;
use std::cell::RefCell;
trait Quack {
fn quack(&self);
}
struct Duck;
impl Quack for Duck {
fn quack(&self) { println!("Quack!"); }
}
fn main() {
let mut pond: Vec<Box<Quack>> = Vec::new();
let duck: Box<Duck> = Box::new(Duck);
pond.push(duck); // This is valid.
let mut lake: Vec<Rc<RefCell<Box<Quack>>>> = Vec::new();
let mallard: Rc<RefCell<Box<Duck>>> = Rc::new(RefCell::new(Box::new(Duck)));
lake.push(mallard); // This is a type mismatch.
}
The above fails to compile, yielding the following error message:
expected `alloc::rc::Rc<core::cell::RefCell<Box<Quack>>>`,
found `alloc::rc::Rc<core::cell::RefCell<Box<Duck>>>`
(expected trait Quack,
found struct `Duck`) [E0308]
src/main.rs:19 lake.push(mallard);
Why is it that pond.push(duck) is valid, yet lake.push(mallard) isn't? In both cases, a Duck has been supplied where a Quack was expected. In the former, the compiler is happy, but in the latter, it's not.
Is the reason for this difference related to CoerceUnsized?

This is a correct behavior, even if it is somewhat unfortunate.
In the first case we have this:
let mut pond: Vec<Box<Quack>> = Vec::new();
let duck: Box<Duck> = Box::new(Duck);
pond.push(duck);
Note that push(), when called on Vec<Box<Quack>>, accepts Box<Quack>, and you're passing Box<Duck>. This is OK - rustc is able to understand that you want to convert a boxed value to a trait object, like here:
let duck: Box<Duck> = Box::new(Duck);
let quack: Box<Quack> = duck; // automatic coercion to a trait object
In the second case we have this:
let mut lake: Vec<Rc<RefCell<Box<Quack>>>> = Vec::new();
let mallard: Rc<RefCell<Box<Duck>>> = Rc::new(RefCell::new(Box::new(Duck)));
lake.push(mallard);
Here push() accepts Rc<RefCell<Box<Quack>>> while you provide Rc<RefCell<Box<Duck>>>:
let mallard: Rc<RefCell<Box<Duck>>> = Rc::new(RefCell::new(Box::new(Duck)));
let quack: Rc<RefCell<Box<Quack>>> = mallard;
And now there is a trouble. Box<T> is a DST-compatible type, so it can be used as a container for a trait object. The same thing will soon be true for Rc and other smart pointers when this RFC is implemented. However, in this case there is no coercion from a concrete type to a trait object because Box<Duck> is inside of additional layers of types (Rc<RefCell<..>>).
Remember, trait object is a fat pointer, so Box<Duck> is different from Box<Quack> in size. Consequently, in principle, they are not directly compatible: you can't just take bytes of Box<Duck> and write them to where Box<Quack> is expected. Rust performs a special conversion, that is, it obtains a pointer to the virtual table for Duck, constructs a fat pointer and writes it to Box<Quack>-typed variable.
When you have Rc<RefCell<Box<Duck>>>, however, rustc would need to know how to construct and destructure both RefCell and Rc in order to apply the same fat pointer conversion to its internals. Naturally, because these are library types, it can't know how to do it. This is also true for any other wrapper type, e.g. Arc or Mutex or even Vec. You don't expect that it would be possible to use Vec<Box<Duck>> as Vec<Box<Quack>>, right?
Also there is a fact that in the example with Rc the Rcs created out of Box<Duck> and Box<Quack> wouldn't have been connected - they would have had different reference counters.
That is, a conversion from a concrete type to a trait object can only happen if you have direct access to a smart pointer which supports DST, not when it is hidden inside some other structure.
That said, I see how it may be possible to allow this for a few select types. For example, we could introduce some kind of Construct/Unwrap traits which are known to the compiler and which it could use to "reach" inside of a stack of wrappers and perform trait object conversion inside them. However, no one designed this thing and provided an RFC about it yet - probably because it is not a widely needed feature.

Vladimir's answer explained what the
compiler is doing. Based on that information, I developed a solution: Creating a wrapper
struct around Box<Quack>.
The wrapper is called QuackWrap. It has a fixed size, and it can be used just like any
other struct (I think). The Box inside QuackWrap allows me to build a QuackWrap
around any trait that implements Quack. Thus, I can have a Vec<Rc<RefCell<QuackWrap>>>
where the inner values are a mixture of Ducks, Gooses, etc.
use std::rc::Rc;
use std::cell::RefCell;
trait Quack {
fn quack(&self);
}
struct Duck;
impl Quack for Duck {
fn quack(&self) { println!("Quack!"); }
}
struct QuackWrap(Box<Quack>);
impl QuackWrap {
pub fn new<T: Quack + 'static>(value: T) -> QuackWrap {
QuackWrap(Box::new(value))
}
}
fn main() {
let mut pond: Vec<Box<Quack>> = Vec::new();
let duck: Box<Duck> = Box::new(Duck);
pond.push(duck); // This is valid.
// This would be a type error:
//let mut lake: Vec<Rc<RefCell<Box<Quack>>>> = Vec::new();
//let mallard: Rc<RefCell<Box<Duck>>> = Rc::new(RefCell::new(Box::new(Duck)));
//lake.push(mallard); // This is a type mismatch.
// Instead, we can do this:
let mut lake: Vec<Rc<RefCell<QuackWrap>>> = Vec::new();
let mallard: Rc<RefCell<QuackWrap>> = Rc::new(RefCell::new(QuackWrap::new(Duck)));
lake.push(mallard); // This is valid.
}
As an added convenience, I'll probably want to implement Deref and DefrefMut on
QuackWrap. But that's not necessary for the above example.

Related

References in rust self referential structs

Given the code snippet below:
use std::{io::BufWriter, pin::Pin};
pub struct SelfReferential {
pub writer: BufWriter<&'static mut [u8]>, // borrowed from buffer
pub buffer: Pin<Box<[u8]>>,
}
#[cfg(test)]
mod tests {
use std::io::Write;
use super::*;
fn init() -> SelfReferential {
let mut buffer = Pin::new(vec![0; 12].into_boxed_slice());
let writer = unsafe { buffer.as_mut().get_unchecked_mut() };
let writer = unsafe { (writer as *mut [u8]).as_mut().unwrap() };
let writer = BufWriter::new(writer);
SelfReferential { writer, buffer }
}
#[test]
fn move_works() {
let mut sr = init();
sr.writer.write(b"hello ").unwrap();
sr.writer.flush().unwrap();
let mut slice = &mut sr.buffer[6..];
slice.write(b"world!").unwrap();
assert_eq!(&sr.buffer[..], b"hello world!".as_ref());
let mut sr_moved = sr;
sr_moved.writer.write(b"W").unwrap();
sr_moved.writer.flush().unwrap();
assert_eq!(&sr_moved.buffer[..], b"hello World!".as_ref());
}
}
The first question: is it OK to assign 'static lifetime to mutable slice reference in BufWriter? As technically speaking, it's bound to the lifetime of struct instances themselves, and AFAIK there's no safe way to invalidate it.
The second question: besides the fact that unsafe instantiation of this type, in test example, creates two mutable references into the underlying buffer, is there any other potential dangers associated with such an "unidiomatic" (for the lack of better word) type?
is it OK to assign 'static lifetime to mutable slice reference in BufWriter?
Sort of, but there's a bigger problem. The lifetime itself is not worse than any other choice, because there is no lifetime that you can use here which is really accurate. But it is not safe to expose that reference, because then it can be taken:
let w = BufWriter<&'static mut [u8]> = {
let sr = init();
sr.writer
};
// `sr.buffer` has now been dropped, so `w` has a dangling reference
is there any other potential dangers associated with such an "unidiomatic" (for the lack of better word) type?
Yes, it's undefined behavior. Box isn't just managing an allocation; it also (currently) signals a claim of unique, non-aliasing access to the contents. You violate that non-aliasing by creating the writer and then moving the buffer — even though the heap memory is not actually touched, the move of buffer is counted invalidating all references into it.
This is an area of Rust semantics which is not yet fully nailed down, but as far as the current compiler is concerned, this is UB. You can see this if you run your test code under the Miri interpreter.
The good news is, what you're trying to do is a very common desire and people have worked on the problem. I personally recommend using ouroboros — with the help of a macro, it allows you to create the struct you want without writing any new unsafe code. There will be some restrictions on how you use the writer, but nothing you can't tidy out of the way by writing an impl io::Write for SelfReferential. Another, newer library in this space is yoke; I haven't tried it.

In Rust, why can higher-level references be assigned to lower-level references, and why not the other way around?

Rust allows assigning references with a higher level of indirection to references with a lower level of indirection. For instance, the compiler allows assigning a &&&&&& to a &:
fn main() {
let mut some_number = 5;
// assign an &&&&&&i32 to an &i32, which works.
let reference : &i32 = &&&&&&some_number;
}
This also works for function parameters:
fn main() {
let num = 5;
// ref1 is an &&i32
let ref1 = &&num;
// Pass an &&i32 to a function parameter, which itself is an &i32 (works)
func(ref1);
}
fn func(test: &i32) {
println!("^^^^ This works!");
}
I've learned that this works because of automatic dereferencing, which allows the Rust compiler to dereference a type as much as it needs to match some other type (please correct me if I'm wrong on this).
However, Rust doesn't seem to allow assigning lower-indirection references to higher-indirection references:
fn main() {
let num = 5;
// Try assigning an &i32 to an &&i32 (error)
let ref1 : &&i32 = &num;
}
This results in an expected &i32, found integer compiler error. We get a similar compiler error when testing this with function parameters:
fn main() {
let num = 5;
// ref1 is an &&&i32
let ref1 = &&&num;
// Try passing an &&&i32 to a function parameter of type &&&&&i32 (error)
func(ref1);
}
fn func(test: &&&&&i32) {
println!("^^^^^^^^ This does not work!")
}
Here, we get a mismatched types error as well. Something I'm curious about, however, is that the compiler output isn't exactly what we expect. Rather than expected &&&&&i32, found &&&i32, the compiler error is expected &&i32, found integer. It seems that the compiler dereferenced both references until one was no longer a reference - why does it dereference both references? I thought it only dereferenced whatever was being passed to the function.
Overall, my main question is
Why, exactly, should assigning lower-indirection to higher-indirection references be disallowed when assigning higher-indirection to lower-indirection references is allowed? What is so different about these two things, that their behaviors must be different as well?
&&T can be coerced to &T because of deref coercion ("&T or &mut T to &U if T implements Deref<Target = U>") and the impl Deref<Target = T> for &T the other way is not possible because there exists no impl Deref<Target = &T> for T.
By repeatet application &&&&&&T can be coerced to &T
As to why one is allowed while the other isn't well if implicit referencing was allowed everywhere tracking ownership would be even harder than it currently is, we have this problem already with auto-referencing of method receivers.
let s = String::from("Hello");
my_fun(s);
The question "Does s get moved?" can't be answered without looking at the definition of my_fun if we allow automatic referencing.

Rust: how to assign `iter().map()` or `iter().enumarate()` to same variable

struct A {...whatever...};
const MY_CONST_USIZE:usize = 127;
// somewhere in function
// vec1_of_A:Vec<A> vec2_of_A_refs:Vec<&A> have values from different data sources and have different inside_item types
let my_iterator;
if my_rand_condition() { // my_rand_condition is random and compiles for sake of simplicity
my_iterator = vec1_of_A.iter().map(|x| (MY_CONST_USIZE, &x)); // Map<Iter<Vec<A>>>
} else {
my_iterator = vec2_of_A_refs.iter().enumerate(); // Enumerate<Iter<Vec<&A>>>
}
how to make this code compile?
at the end (based on condition) I would like to have iterator able build from both inputs and I don't know how to integrate these Map and Enumerate types into single variable without calling collect() to materialize iterator as Vec
reading material will be welcomed
In the vec_of_A case, first you need to replace &x with x in your map function. The code you have will never compile because the mapping closure tries to return a reference to one of its parameters, which is never allowed in Rust. To make the types match up, you need to dereference the &&A in the vec2_of_A_refs case to &A instead of trying to add a reference to the other.
Also, -127 is an invalid value for usize, so you need to pick a valid value, or use a different type than usize.
Having fixed those, now you need some type of dynamic dispatch. The simplest approach would be boxing into a Box<dyn Iterator>.
Here is a complete example:
#![allow(unused)]
#![allow(non_snake_case)]
struct A;
// Fixed to be a valid usize.
const MY_CONST_USIZE: usize = usize::MAX;
fn my_rand_condition() -> bool { todo!(); }
fn example() {
let vec1_of_A: Vec<A> = vec![];
let vec2_of_A_refs: Vec<&A> = vec![];
let my_iterator: Box<dyn Iterator<Item=(usize, &A)>>;
if my_rand_condition() {
// Fixed to return x instead of &x
my_iterator = Box::new(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)));
} else {
// Added map to deref &&A to &A to make the types match
my_iterator = Box::new(vec2_of_A_refs.iter().map(|x| *x).enumerate());
}
for item in my_iterator {
// ...
}
}
(Playground)
Instead of a boxed trait object, you could also use the Either type from the either crate. This is an enum with Left and Right variants, but the Either type itself implements Iterator if both the left and right types also do, with the same type for the Item associated type. For example:
#![allow(unused)]
#![allow(non_snake_case)]
use either::Either;
struct A;
const MY_CONST_USIZE: usize = usize::MAX;
fn my_rand_condition() -> bool { todo!(); }
fn example() {
let vec1_of_A: Vec<A> = vec![];
let vec2_of_A_refs: Vec<&A> = vec![];
let my_iterator;
if my_rand_condition() {
my_iterator = Either::Left(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)));
} else {
my_iterator = Either::Right(vec2_of_A_refs.iter().map(|x| *x).enumerate());
}
for item in my_iterator {
// ...
}
}
(Playground)
Why would you choose one approach over the other?
Pros of the Either approach:
It does not require a heap allocation to store the iterator.
It implements dynamic dispatch via match which is likely (but not guaranteed) to be faster than dynamic dispatch via vtable lookup.
Pros of the boxed trait object approach:
It does not depend on any external crates.
It scales easily to many different types of iterators; the Either approach quickly becomes unwieldy with more than two types.
You can do this using a Boxed trait object like so:
let my_iterator: Box<dyn Iterator<Item = _>> = if my_rand_condition() {
Box::new(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)))
} else {
Box::new(vec2_of_A_refs.iter().enumerate().map(|(i, x)| (i, *x)))
};
I don't think this is a good idea generally though. A few things to note:
The use of trait objects means the types here must be resolved dynamically. This adds a lot of overhead.
The closure in vec1's iterator's map method cannot reference its arguments. Instead the second map must be added to vec2s iterator. The effect of this is that all the items are being copied regardless. If you are doing this, why not collect()? The overhead for creating the Vec or whatever you choose should be less than that of the dynamic resolution.
Bit pedantic, but remember if statements are expressions in Rust, and so the assignment can be expressed a little more cleanly as I have done above.

How to create a Box<UnsafeCell<[T]>>

The recommended way to create a regular boxed slice (i.e. Box<[T]>) seems to be to first create a std::Vec<T>, and use .into_boxed_slice(). However, nothing similar to this seems to work if I want the slice to be wrapped in UnsafeCell.
A solution with unsafe code is fine, but I'd really like to avoid having to manually manage the memory.
The only (not-unsafe) way to create a Box<[T]> is via Box::from, given a &[T] as the parameter. This is because [T] is ?Sized and can't be passed a parameter. This in turn effectively requires T: Copy, because T has to be copied from behind the reference into the new Box. But UnsafeCell is not Copy, regardless if T is. Discussion about making UnsafeCell Copy has been going on for years, yielding no final conclusion, due to safety concerns.
If you really, really want a Box<UnsafeCell<[T]>>, there are only two ways:
Because Box and UnsafeCell are both CoerceUnsize, and [T; N] is Unsize, you can create a Box<UnsafeCell<[T; N]>> and coerce it to a Box<UnsafeCell<[T]>. This limits you to initializing from fixed-sized arrays.
Unsize coercion:
fn main() {
use std::cell::UnsafeCell;
let x: [u8;3] = [1,2,3];
let c: Box<UnsafeCell<[_]>> = Box::new(UnsafeCell::new(x));
}
Because UnsafeCell is #[repr(transparent)], you can create a Box<[T]> and unsafely mutate it to a Box<UnsafeCell<[T]>, as the UnsafeCell<[T]> is guaranteed to have the same memory layout as a [T], given that [T] doesn't use niche-values (even if T does).
Transmute:
// enclose the transmute in a function accepting and returning proper type-pairs
fn into_boxed_unsafecell<T>(inp: Box<[T]>) -> Box<UnsafeCell<[T]>> {
unsafe {
mem::transmute(inp)
}
}
fn main() {
let x = vec![1,2,3];
let b = x.into_boxed_slice();
let c: Box<UnsafeCell<[_]>> = into_boxed_unsafecell(b);
}
Having said all this: I strongly suggest you are suffering from the xy-problem. A Box<UnsafeCell<[T]>> is a very strange type (especially compared to UnsafeCell<Box<[T]>>). You may want to give details on what you are trying to accomplish with such a type.
Just swap the pointer types to UnsafeCell<Box<[T]>>:
use std::cell::UnsafeCell;
fn main() {
let mut res: UnsafeCell<Box<[u32]>> = UnsafeCell::new(vec![1, 2, 3, 4, 5].into_boxed_slice());
unsafe {
println!("{}", (*res.get())[1]);
res.get_mut()[1] = 10;
println!("{}", (*res.get())[1]);
}
}
Playground

Does Rust erase generic types or not?

Is there type erasure of generics in Rust (like in Java) or not? I am unable to find a definitive answer.
When you use a generic function or a generic type, the compiler generates a separate instance for each distinct set of type parameters (I believe lifetime parameters are ignored, as they have no influence on the generated code). This process is called monomorphization. For instance, Vec<i32> and Vec<String> are different types, and therefore Vec<i32>::len() and Vec<String>::len() are different functions. This is necessary, because Vec<i32> and Vec<String> have different memory layouts, and thus need different machine code! Therefore, no, there is no type erasure.
If we use Any::type_id(), as in the following example:
use std::any::Any;
fn main() {
let v1: Vec<i32> = Vec::new();
let v2: Vec<String> = Vec::new();
let a1 = &v1 as &dyn Any;
let a2 = &v2 as &dyn Any;
println!("{:?}", a1.type_id());
println!("{:?}", a2.type_id());
}
we obtain different type IDs for two instances of Vec. This supports the fact that Vec<i32> and Vec<String> are distinct types.
However, reflection capabilities in Rust are limited; Any is pretty much all we've got for now. You cannot obtain more information about the type of a runtime value, such as its name or its members. In order to be able to work with Any, you must cast it (using Any::downcast_ref() or Any::downcast_mut() to a type that is known at compile time.
Rust does have type erasure in the form of virtual method dispatch via dyn Trait, which allows you to have a Vec where the elements have different concrete types:
fn main() {
let list: Vec<Box<dyn ToString>> = vec![Box::new(1), Box::new("hello")];
for item in list {
println!("{}", item.to_string());
}
}
(playground)
Note that the compiler requires you to manually box the elements since it must know the size of every value at compile time. You can use a Box, which has the same size no matter what it points to since it's just a pointer to the heap. You can also use &-references:
fn main() {
let list: Vec<&dyn ToString> = vec![&1, &"hello"];
for item in list {
println!("{}", item.to_string());
}
}
(playground)
However, note that if you use &-references you may run into lifetime issues.

Resources