Why does Rust require Copy and Clone traits for simple enum - rust

The following code will not compile unless Copy and Clone traits are derived in the enum. Why is this a requirement given that the enum is basically an i8 and according to the documentation, integers automatically implement the Copy trait? Related to this, since the size of MyEnum should be well known at compile time, shouldn't it go onto the stack? Doesn't the Clone trait imply that it goes on the heap?
#[derive(Debug, Copy, Clone)]
#[repr(i8)]
enum MyEnum {
Some1,
}
fn main() {
let x = MyEnum::Some1;
let y = x;
println!("x={:?} y={:?}", x, y);
}

Clone has nothing to do with the heap. Clone does not imply heap-allocated for any type, be they structs, enums, and whether they have a #[repr] attribute or not. Clone is just a normal trait.
And traits aren't implemented automatically, ever, because when writing a library, this would be an implicit part of your public interface. That is, an type could be implicitly, and unexpectedly, Clone without the library author intending to make it Clone. This limits how the type could change in the future, and the library author might inadvertently break their public interface by changing the content of the type and losing that implicit trait implementation. Rust always prefers explicitness over hidden things.

Related

Is my understanding of a Rust vector that supports Rc or Box wrapped types correct?

I'm not looking for code samples. I want to state my understanding of Box vs. Rc and have you tell me if my understanding is right or wrong.
Let's say I have some trait ChattyAnimal and a struct Cat that implements this trait, e.g.
pub trait ChattyAnimal {
fn make_sound(&self);
}
pub struct Cat {
pub name: String,
pub sound: String
}
impl ChattyAnimal for Cat {
fn make_sound(&self) {
println!("Meow!");
}
}
Now let's say I have other structs (Dog, Cow, Chicken, ...) that also implement the ChattyAnimal trait, and let's say I want to store all of these in the same vector.
So step 1 is I would have to use a Box type, because the Rust compiler cannot determine the size of everything that might implement this trait. And therefore, we must store these items on the heap – viola using a Box type, which is like a smarter pointer in C++. Anything wrapped with Box is automatically deleted by Rust when it goes out of scope.
// I can alias and use my Box type that wraps the trait like this:
pub type BoxyChattyAnimal = Box<dyn ChattyAnimal>;
// and then I can use my type alias, i.e.
pub struct Container {
animals: Vec<BoxyChattyAnimal>
}
Meanwhile, with Box, Rust's borrow checker requires changing when I pass or reassign the instance. But if I actually want to have multiple references to the same underlying instance, I have to use Rc. And so to have a vector of ChattyAnimal instances where each instance can have multiple references, I would need to do:
pub type RcChattyAnimal = Rc<dyn ChattyAnimal>;
pub struct Container {
animals: Vec<RcChattyAnimal>
}
One important take away from this is that if I want to have a vector of some trait type, I need to explicitly set that vector's type to a Box or Rc that wraps my trait. And so the Rust language designers force us to think about this in advance so that a Box or Rc cannot (at least not easily or accidentally) end up in the same vector.
This feels like a very and well thought design – helping prevent me from introducing bugs in my code. Is my understanding as stated above correct?
Yes, all this is correct.
There's a second reason for this design: it allows the compiler to verify that the operations you're performing on the vector elements are using memory in a safe way, relative to how they're stored.
For example, if you had a method on ChattyAnimal that mutates the animal (i.e. takes a &mut self argument), you could call that method on elements of a Vec<Box<dyn ChattyAnimal>> as long as you had a mutable reference to the vector; the Rust compiler would know that there could only be one reference to the ChattyAnimal in question (because the only reference is inside the Box, which is inside the Vec, and you have a mutable reference to the Vec so there can't be any other references to it). If you tried to write the same code with a Vec<Rc<dyn ChattyAnimal>>, the compiler would complain; it wouldn't be able to completely eliminate the possibility that your code might be mutating the animal at the same time as the code that called it was in the middle of trying to read the animal, which might lead to some inconsistencies in the calling code.
As a consequence, the compiler needs to know that all the elements of the Vec have their memory treated in the same way, so that it can check to make sure that a reference to some arbitrary element of the Vec is being used appropriately.
(There's a third reason, too, which is performance; because the compiler knows that this is a "vector of Boxes" or "vector of Rcs", it can generate code that assumes a particular storage mechanism. For example, if you have a vector of Rcs, and clone one of the elements, the machine code that the compiler generates will work simply by going to the memory address listed in the vector and adding 1 to the reference count stored there – there's no need for any extra levels of indirection. If the vector were allowed to mix different allocation schemes, the generated code would have to be a lot more complex, because it wouldn't be able to assume things like "there is a reference count", and would instead need to (at runtime) find the appropriate piece of code for dealing with the memory allocation scheme in use, and then run it; that would be much slower.)

Does Rust automatically implement clone when dereference?

#[derive(Debug)]
struct Point {
x: i32,
y: i32,
}
impl Copy for Point {}
impl Clone for Point {
fn clone(&self) -> Point {
*self
}
}
When I only implement Copy, Rust tells me I need to implement Clone. When I only implement Clone, Rust tells me Point can be moved.
My question is, I have never implemented anything, This code is a bit like a circular dependency, but it works? Why?
TL;DR: Copy requires Clone for convenience and Clone on a Copy type is usually implemented using a copy.
The Rust compiler will routinely have to do bit-wise copy of your data. But there are two cases:
If the type does not implement Copy, the original instance isn't usable anymore, your data has moved.
If the type does implement Copy, the compiler knows that it's perfectly safe to continue using the original instance together with the new instance.
The Copy trait does not change the fact that the compiler will only automatically do bit-wise copies: It is a marker trait with no methods. Its job is only to tell the compiler “it's OK to continue using this after a bit-wise copy”.
The Clone trait isn't that special: It's a regular trait with a method that can do whatever you want it to do. The compiler never uses the Clone trait automatically and doesn't care about what it actually does. However it is clearly intended to create a clone of an instance, so it is perfectly normal to:
expect Copy types to implement Clone. After all, Copy is a more strict version of Clone. This is why Copy: Clone and you have to implement Clone on a Copy type.
expect Clone on a Copy type to be dumb and only perform a bit-wise copy. You wouldn't want two different semantics here, that would be very confusing. This is why clone is implemented with a simple *self in your example: This dereferences self, which causes a bit-wise copy.
It is however very uncommon to implement those trait manually, and most of the time, you'll just go with derive:
#[derive(Copy, Clone, Debug)]
struct Point {
x: i32,
y: i32,
}

Differences generic trait-bounded method vs 'direct' trait method

I have this code:
fn main() {
let p = Person;
let r = &p as &dyn Eatable;
Consumer::consume(r);
// Compile error
Consumer::consume_generic(r);
}
trait Eatable {}
struct Person;
impl Eatable for Person {}
struct Consumer;
impl Consumer {
fn consume(eatable: &dyn Eatable) {}
fn consume_generic<T: Eatable>(eatable: &T) {}
}
Error:
the size for values of type dyn Eatable cannot be known at
compilation time
I think it is strange. I have a method that literally takes a dyn Eatable and compiles fine, so that method knows somehow the size of Eatable. The generic method (consume_generic) will properly compile down for every used type for performance and the consume method will not.
So a few questions arise: why the compiler error? Are there things inside the body of the methods in which I can do something which I can not do in the other method? When should I prefer the one over the other?
Sidenote: I asked this question for the language Swift as well: Differences generic protocol type parameter vs direct protocol type. In Swift I get the same compile error but the underlying error is different: protocols/traits do not conform to themselves (because Swift protocols can holds initializers, static things etc. which makes it harder to generically reference them). I also tried it in Java, I believe the generic type is erased and it makes absolutely no difference.
The problem is not with the functions themselves, but with the trait bounds on types.
Every generic types in Rust has an implicit Sized bound: since this is correct in the majority of cases, it was decided not to force the developer to write this out every time. But, if you are using this type only behind some kind of reference, as you do here, you may want to lift this restriction by specifying T: ?Sized. If you add this, your code will compile fine:
impl Consumer {
fn consume(eatable: &dyn Eatable) {}
fn consume_generic<T: Eatable + ?Sized>(eatable: &T) {}
}
Playground as a proof
As for the other questions, the main difference is in static vs dynamic dispatch.
When you use the generic function (or the semantically equivalent impl Trait syntax), the function calls are dispatched statically. That is, for every type of argument you pass to the function, compiler generates the definition independently of others. This will likely result in more optimized code in most cases, but the drawbacks are possibly larger binary size and some limitations in API (e.g. you can't easily create a heterogeneous collection this way).
When you use dyn Trait syntax, you opt in for dynamic dispatch. The necessary data will be stored into the table attached to trait object, and the correct implementation for every trait method will be chosen at runtime. The consumer, however, needs to be compiled only once. This is usually slower, both due to the indirection and to the fact that individual optimizations are impossible, but more flexible.
As for the recommendations (note that this is an opinion, not the fact) - I'd say it's better to stick to generics whenever possible and only change it to trait objects if the goal is impossible to achieve otherwise.

Should trait bounds be duplicated in struct and impl?

The following code uses a struct with generic type. While its implementation is only valid for the given trait bound, the struct can be defined with or without the same bound. The struct's fields are private so no other code could create an instance anyway.
trait Trait {
fn foo(&self);
}
struct Object<T: Trait> {
value: T,
}
impl<T: Trait> Object<T> {
fn bar(object: Object<T>) {
object.value.foo();
}
}
Should the trait bound for the structure should be omitted to conform to the DRY principle, or should it be given to clarify the dependency? Or are there circumstances one solution should be preferred over the other?
I believe that the existing answers are misleading. In most cases, you should not put a bound on a struct unless the struct literally will not compile without it.
tl;dr
Bounds on structs express the wrong thing for most people. They are infectious, redundant, sometimes nearsighted, and often confusing. Even when a bound feels right, you should usually leave it off until it's proven necessary.
(In this answer, anything I say about structs applies equally to enums.)
0. Bounds on structs have to be repeated everywhere the struct is
This is the most obvious, but (to me) least compelling reason to avoid writing bounds on structs. As of this writing (Rust 1.65), you have to repeat every struct's bounds on every impl that touches it, which is a good enough reason not to put bounds on structs for now. However, there is an accepted RFC (implied_bounds) which, when implemented and stabilized, will change this by inferring the redundant bounds. But even then, bounds on structs are still usually wrong:
1. Bounds on structs leak out of abstractions.
Your data structure is special. "Object<T> only makes sense if T is Trait," you say. And perhaps you are right. But the decision affects not just Object, but any other data structure that contains an Object<T>, even if it does not always contain an Object<T>. Consider a programmer who wants to wrap your Object in an enum:
enum MyThing<T> { // error[E0277]: the trait bound `T: Trait` is not satisfied
Wrapped(your::Object<T>),
Plain(T),
}
Within the downstream code this makes sense because MyThing::Wrapped is only used with Ts that do implement Thing, while Plain can be used with any type. But if your::Object<T> has a bound on T, this enum can't be compiled without that same bound, even if there are lots of uses for a Plain(T) that don't require such a bound. Not only does this not work, but even if adding the bound doesn't make it entirely useless, it also exposes the bound in the public API of any struct that happens to use MyThing.
Bounds on structs limit what other people can do with them. Bounds on code (impls and functions) do too, of course, but those constraints are (presumably) required by your own code, while bounds on structs are a preemptive strike against anyone downstream who might use your struct in an innovative way. This may be useful, but unnecessary bounds are particularly annoying for innovators because they constrain what can compile without usefully constraining what can actually run (more on that in a moment).
2. Bounds on structs are redundant with bounds on code.
So you don't think downstream innovation is possible? That doesn't mean the struct itself needs a bound. To make it impossible to construct an Object<T> without T: Trait, it is enough to put that bound on the impl that contains Object's constructor(s); if it's impossible to call a_method on an Object<T> without T: Trait you can say that on the impl that contains a_method, or perhaps on a_method itself. (Until implied_bounds is implemented, you have to, anyway, so you don't even have the weak justification of "saving keystrokes.")
Even and especially when you can't think of any way for downstream to use an un-bounded Object<T>, you should not forbid it a priori, because...
3. Bounds on structs mean something different to the type system than bounds on code.
A T: Trait bound on Object<T> means more than "all Object<T>s have to have T: Trait"; it actually means something like "the concept of Object<T> itself does not make sense unless T: Trait", which is a more abstract idea. Think about natural language: I've never seen a purple elephant, but I can easily name the concept of "purple elephant" despite the fact that it corresponds to no real-world animal. Types are a kind of language and it can make sense to refer to the idea of Elephant<Purple>, even when you don't know how to create one and you certainly have no use for one. Similarly, it can make sense to express the type Object<NotTrait> in the abstract even if you don't and can't have one in hand right now. Especially when NotTrait is a type parameter, which may not be known in this context to implement Trait but in some other context does.
Case study: Cell<T>
For one example of a struct that originally had a trait bound which was eventually removed, look no farther than Cell<T>, which originally had a T: Copy bound. In the RFC to remove the bound many people initially made the same kinds of arguments you may be thinking of right now, but the eventual consensus was that "Cell requires Copy" was always the wrong way to think about Cell. The RFC was merged, paving the way for innovations like Cell::as_slice_of_cells, which lets you do things you couldn't before in safe code, including temporarily opt-in to shared mutation. The point is that T: Copy was never a useful bound on Cell<T>, and it would have done no harm (and possibly some good) to leave it off from the beginning.
This kind of abstract constraint can be hard to wrap one's head around, which is probably one reason why it's so often misused. Which relates to my last point:
4. Unnecessary bounds invite unnecessary parameters (which are worse).
This does not apply to all cases of bounds on structs, but it is a common point of confusion. You may, for instance, have a struct with a type parameter that should implement a generic trait, but not know what parameter(s) the trait should take. In such cases it is tempting to use PhantomData to add a type parameter to the main struct, but this is usually a mistake, not least because PhantomData is hard to use correctly. Here are some examples of unnecessary parameters added because of unnecessary bounds: 1 2 3 4 5 In the majority of such cases, the correct solution is simply to remove the bound.
Exceptions to the rule
Okay, when do you need a bound on a struct? I can think of two possible reasons.
In Shepmaster's answer, the struct will simply not compile without a bound, because the Iterator implementation for I actually defines what the struct contains. One other way that a struct won't compile without a bound is when its implementation of Drop has to use the trait somehow. Drop can't have bounds that aren't on the struct, for soundness reasons, so you have to write them on the struct as well.
When you're writing unsafe code and you want it to rely on a bound (T: Send, for example), you might need to put that bound on the struct. unsafe code is special because it can rely on invariants that are guaranteed by non-unsafe code, so just putting the bound on the impl that contains the unsafe is not necessarily enough.
But in all other cases, unless you really know what you're doing, you should avoid bounds on structs entirely.
Trait bounds that apply to every instance of the struct should be applied to the struct:
struct IteratorThing<I>
where
I: Iterator,
{
a: I,
b: Option<I::Item>,
}
Trait bounds that only apply to certain instances should only be applied to the impl block they pertain to:
struct Pair<T> {
a: T,
b: T,
}
impl<T> Pair<T>
where
T: std::ops::Add<T, Output = T>,
{
fn sum(self) -> T {
self.a + self.b
}
}
impl<T> Pair<T>
where
T: std::ops::Mul<T, Output = T>,
{
fn product(self) -> T {
self.a * self.b
}
}
to conform to the DRY principle
The redundancy will be removed by RFC 2089:
Eliminate the need for “redundant” bounds on functions and impls where
those bounds can be inferred from the input types and other trait
bounds. For example, in this simple program, the impl would no longer
require a bound, because it can be inferred from the Foo<T> type:
struct Foo<T: Debug> { .. }
impl<T: Debug> Foo<T> {
// ^^^^^ this bound is redundant
...
}
It really depends on what the type is for. If it is only intended to hold values which implement the trait, then yes, it should have the trait bound e.g.
trait Child {
fn name(&self);
}
struct School<T: Child> {
pupil: T,
}
impl<T: Child> School<T> {
fn role_call(&self) -> bool {
// check everyone is here
}
}
In this example, only children are allowed in the school so we have the bound on the struct.
If the struct is intended to hold any value but you want to offer extra behaviour when the trait is implemented, then no, the bound shouldn't be on the struct e.g.
trait GoldCustomer {
fn get_store_points(&self) -> i32;
}
struct Store<T> {
customer: T,
}
impl<T: GoldCustomer> Store {
fn choose_reward(customer: T) {
// Do something with the store points
}
}
In this example, not all customers are gold customers and it doesn't make sense to have the bound on the struct.

Is there a way other than traits to add methods to a type I don't own?

I'm trying to extend the Grid struct from the piston-2dgraphics library. There's no method for getting the location on the window of a particular cell, so I implemented a trait to calculate that for me. Then, I wanted a method to calculate the neighbours of a particular cell on the grid, so I implemented another trait.
Something about this is ugly and feels unnecessary seeing as how I'll likely never use these traits for anything other than this specific grid structure. Is there another way in Rust to extend a type without having to implement traits each time?
As of Rust 1.67, no, there is no other way. It's not possible to define inherent methods on a type defined in another crate.
You can define your own trait with the methods you need, then implement that trait for an external type. This pattern is known as extension traits. The name of extension traits, by convention, ends with Ext, to indicate that this trait is not meant to be used as a generic bound or as a trait object. There are a few examples in the standard library.
trait DoubleExt {
fn double(&self) -> Self;
}
impl DoubleExt for i32 {
fn double(&self) -> Self {
*self * 2
}
}
fn main() {
let a = 42;
println!("{}", 42.double());
}
Other libraries can also export extension traits (example: byteorder). However, as for any other trait, you need to bring the trait's methods in scope with use SomethingExt;.
No. Currently, the only way to write new methods for a type that has been defined in another crate is through traits. However, this seems too cumbersome as you have to write both the trait definition and the implementation.
In my opinion, the way to go is to use free functions instead of methods. This would at least avoid the duplication caused by traits.

Resources