Should trait bounds be duplicated in struct and impl? - rust

The following code uses a struct with generic type. While its implementation is only valid for the given trait bound, the struct can be defined with or without the same bound. The struct's fields are private so no other code could create an instance anyway.
trait Trait {
fn foo(&self);
}
struct Object<T: Trait> {
value: T,
}
impl<T: Trait> Object<T> {
fn bar(object: Object<T>) {
object.value.foo();
}
}
Should the trait bound for the structure should be omitted to conform to the DRY principle, or should it be given to clarify the dependency? Or are there circumstances one solution should be preferred over the other?

I believe that the existing answers are misleading. In most cases, you should not put a bound on a struct unless the struct literally will not compile without it.
tl;dr
Bounds on structs express the wrong thing for most people. They are infectious, redundant, sometimes nearsighted, and often confusing. Even when a bound feels right, you should usually leave it off until it's proven necessary.
(In this answer, anything I say about structs applies equally to enums.)
0. Bounds on structs have to be repeated everywhere the struct is
This is the most obvious, but (to me) least compelling reason to avoid writing bounds on structs. As of this writing (Rust 1.65), you have to repeat every struct's bounds on every impl that touches it, which is a good enough reason not to put bounds on structs for now. However, there is an accepted RFC (implied_bounds) which, when implemented and stabilized, will change this by inferring the redundant bounds. But even then, bounds on structs are still usually wrong:
1. Bounds on structs leak out of abstractions.
Your data structure is special. "Object<T> only makes sense if T is Trait," you say. And perhaps you are right. But the decision affects not just Object, but any other data structure that contains an Object<T>, even if it does not always contain an Object<T>. Consider a programmer who wants to wrap your Object in an enum:
enum MyThing<T> { // error[E0277]: the trait bound `T: Trait` is not satisfied
Wrapped(your::Object<T>),
Plain(T),
}
Within the downstream code this makes sense because MyThing::Wrapped is only used with Ts that do implement Thing, while Plain can be used with any type. But if your::Object<T> has a bound on T, this enum can't be compiled without that same bound, even if there are lots of uses for a Plain(T) that don't require such a bound. Not only does this not work, but even if adding the bound doesn't make it entirely useless, it also exposes the bound in the public API of any struct that happens to use MyThing.
Bounds on structs limit what other people can do with them. Bounds on code (impls and functions) do too, of course, but those constraints are (presumably) required by your own code, while bounds on structs are a preemptive strike against anyone downstream who might use your struct in an innovative way. This may be useful, but unnecessary bounds are particularly annoying for innovators because they constrain what can compile without usefully constraining what can actually run (more on that in a moment).
2. Bounds on structs are redundant with bounds on code.
So you don't think downstream innovation is possible? That doesn't mean the struct itself needs a bound. To make it impossible to construct an Object<T> without T: Trait, it is enough to put that bound on the impl that contains Object's constructor(s); if it's impossible to call a_method on an Object<T> without T: Trait you can say that on the impl that contains a_method, or perhaps on a_method itself. (Until implied_bounds is implemented, you have to, anyway, so you don't even have the weak justification of "saving keystrokes.")
Even and especially when you can't think of any way for downstream to use an un-bounded Object<T>, you should not forbid it a priori, because...
3. Bounds on structs mean something different to the type system than bounds on code.
A T: Trait bound on Object<T> means more than "all Object<T>s have to have T: Trait"; it actually means something like "the concept of Object<T> itself does not make sense unless T: Trait", which is a more abstract idea. Think about natural language: I've never seen a purple elephant, but I can easily name the concept of "purple elephant" despite the fact that it corresponds to no real-world animal. Types are a kind of language and it can make sense to refer to the idea of Elephant<Purple>, even when you don't know how to create one and you certainly have no use for one. Similarly, it can make sense to express the type Object<NotTrait> in the abstract even if you don't and can't have one in hand right now. Especially when NotTrait is a type parameter, which may not be known in this context to implement Trait but in some other context does.
Case study: Cell<T>
For one example of a struct that originally had a trait bound which was eventually removed, look no farther than Cell<T>, which originally had a T: Copy bound. In the RFC to remove the bound many people initially made the same kinds of arguments you may be thinking of right now, but the eventual consensus was that "Cell requires Copy" was always the wrong way to think about Cell. The RFC was merged, paving the way for innovations like Cell::as_slice_of_cells, which lets you do things you couldn't before in safe code, including temporarily opt-in to shared mutation. The point is that T: Copy was never a useful bound on Cell<T>, and it would have done no harm (and possibly some good) to leave it off from the beginning.
This kind of abstract constraint can be hard to wrap one's head around, which is probably one reason why it's so often misused. Which relates to my last point:
4. Unnecessary bounds invite unnecessary parameters (which are worse).
This does not apply to all cases of bounds on structs, but it is a common point of confusion. You may, for instance, have a struct with a type parameter that should implement a generic trait, but not know what parameter(s) the trait should take. In such cases it is tempting to use PhantomData to add a type parameter to the main struct, but this is usually a mistake, not least because PhantomData is hard to use correctly. Here are some examples of unnecessary parameters added because of unnecessary bounds: 1 2 3 4 5 In the majority of such cases, the correct solution is simply to remove the bound.
Exceptions to the rule
Okay, when do you need a bound on a struct? I can think of two possible reasons.
In Shepmaster's answer, the struct will simply not compile without a bound, because the Iterator implementation for I actually defines what the struct contains. One other way that a struct won't compile without a bound is when its implementation of Drop has to use the trait somehow. Drop can't have bounds that aren't on the struct, for soundness reasons, so you have to write them on the struct as well.
When you're writing unsafe code and you want it to rely on a bound (T: Send, for example), you might need to put that bound on the struct. unsafe code is special because it can rely on invariants that are guaranteed by non-unsafe code, so just putting the bound on the impl that contains the unsafe is not necessarily enough.
But in all other cases, unless you really know what you're doing, you should avoid bounds on structs entirely.

Trait bounds that apply to every instance of the struct should be applied to the struct:
struct IteratorThing<I>
where
I: Iterator,
{
a: I,
b: Option<I::Item>,
}
Trait bounds that only apply to certain instances should only be applied to the impl block they pertain to:
struct Pair<T> {
a: T,
b: T,
}
impl<T> Pair<T>
where
T: std::ops::Add<T, Output = T>,
{
fn sum(self) -> T {
self.a + self.b
}
}
impl<T> Pair<T>
where
T: std::ops::Mul<T, Output = T>,
{
fn product(self) -> T {
self.a * self.b
}
}
to conform to the DRY principle
The redundancy will be removed by RFC 2089:
Eliminate the need for “redundant” bounds on functions and impls where
those bounds can be inferred from the input types and other trait
bounds. For example, in this simple program, the impl would no longer
require a bound, because it can be inferred from the Foo<T> type:
struct Foo<T: Debug> { .. }
impl<T: Debug> Foo<T> {
// ^^^^^ this bound is redundant
...
}

It really depends on what the type is for. If it is only intended to hold values which implement the trait, then yes, it should have the trait bound e.g.
trait Child {
fn name(&self);
}
struct School<T: Child> {
pupil: T,
}
impl<T: Child> School<T> {
fn role_call(&self) -> bool {
// check everyone is here
}
}
In this example, only children are allowed in the school so we have the bound on the struct.
If the struct is intended to hold any value but you want to offer extra behaviour when the trait is implemented, then no, the bound shouldn't be on the struct e.g.
trait GoldCustomer {
fn get_store_points(&self) -> i32;
}
struct Store<T> {
customer: T,
}
impl<T: GoldCustomer> Store {
fn choose_reward(customer: T) {
// Do something with the store points
}
}
In this example, not all customers are gold customers and it doesn't make sense to have the bound on the struct.

Related

Differences generic trait-bounded method vs 'direct' trait method

I have this code:
fn main() {
let p = Person;
let r = &p as &dyn Eatable;
Consumer::consume(r);
// Compile error
Consumer::consume_generic(r);
}
trait Eatable {}
struct Person;
impl Eatable for Person {}
struct Consumer;
impl Consumer {
fn consume(eatable: &dyn Eatable) {}
fn consume_generic<T: Eatable>(eatable: &T) {}
}
Error:
the size for values of type dyn Eatable cannot be known at
compilation time
I think it is strange. I have a method that literally takes a dyn Eatable and compiles fine, so that method knows somehow the size of Eatable. The generic method (consume_generic) will properly compile down for every used type for performance and the consume method will not.
So a few questions arise: why the compiler error? Are there things inside the body of the methods in which I can do something which I can not do in the other method? When should I prefer the one over the other?
Sidenote: I asked this question for the language Swift as well: Differences generic protocol type parameter vs direct protocol type. In Swift I get the same compile error but the underlying error is different: protocols/traits do not conform to themselves (because Swift protocols can holds initializers, static things etc. which makes it harder to generically reference them). I also tried it in Java, I believe the generic type is erased and it makes absolutely no difference.
The problem is not with the functions themselves, but with the trait bounds on types.
Every generic types in Rust has an implicit Sized bound: since this is correct in the majority of cases, it was decided not to force the developer to write this out every time. But, if you are using this type only behind some kind of reference, as you do here, you may want to lift this restriction by specifying T: ?Sized. If you add this, your code will compile fine:
impl Consumer {
fn consume(eatable: &dyn Eatable) {}
fn consume_generic<T: Eatable + ?Sized>(eatable: &T) {}
}
Playground as a proof
As for the other questions, the main difference is in static vs dynamic dispatch.
When you use the generic function (or the semantically equivalent impl Trait syntax), the function calls are dispatched statically. That is, for every type of argument you pass to the function, compiler generates the definition independently of others. This will likely result in more optimized code in most cases, but the drawbacks are possibly larger binary size and some limitations in API (e.g. you can't easily create a heterogeneous collection this way).
When you use dyn Trait syntax, you opt in for dynamic dispatch. The necessary data will be stored into the table attached to trait object, and the correct implementation for every trait method will be chosen at runtime. The consumer, however, needs to be compiled only once. This is usually slower, both due to the indirection and to the fact that individual optimizations are impossible, but more flexible.
As for the recommendations (note that this is an opinion, not the fact) - I'd say it's better to stick to generics whenever possible and only change it to trait objects if the goal is impossible to achieve otherwise.

What is the most idiomatic/best way to abstract over many optional and independent capabilities of types via traits?

I want to abstract over a variety of different list data structures. My abstraction should be fairly flexible. I want a "base trait" (let's call it List) that represents the minimal interface required from all data structures. But there are optional capabilities the data structures could offer, which are independent from each other:
Some data structures allow mutation, while others only provide a read-only interface. This is something which is fairly common for Rust traits: the pair Trait and TraitMut (e.g. Index and IndexMut).
Some data structures provide a capability "foo".
Some data structures provide a capability "bar".
Initially, this seems easy: provide the traits List, ListMut, FooList and BarList where the latter three have List as super trait. Like this:
trait List {
fn num_elements(&self) -> usize;
}
trait ListMut: List {
fn clear(&mut self);
}
trait FooList: List {
fn num_foos(&self) -> usize;
}
trait BarList: List {
fn num_bars(&self) -> usize;
}
This works fine for the methods above. But the important part is that there are methods that require multiple capabilities. For example:
add_foo(&mut self): requires the mutability and the capability "foo"!
add_foo_and_bar(&mut self): requires mutability and the capabilities "foo" and "bar".
... and more: imagine there is a function for each combination of requirements.
Where should the methods with multiple requirements live?
One Trait per Combination of Capabilities
One way would be to additionally create a trait for each combination of optional requirements:
FooListMut
BarListMut
FooBarList
FooBarListMut
These traits would have appropriate super trait bounds and could house the methods with multiple requirements. There are two problems with this:
The number of traits would grow exponentially with the number of optional capabilities. Yes, there would only need to be as many traits as methods, but it can still lead to a very chaotic API with loads of traits where most traits only contain one/a small number of methods.
There is no way (I think) to force types that implement ListMut and FooList to also implement FooListMut. Thus, functions would probably need to add more bounds. This trait system would give implementors flexibility I might not want to give them.
where Self bounds on methods
One could also add where Self: Trait bounds to methods. For example:
trait FooList: List {
// ...
fn add_foo(&mut self)
where
Self: ListMut;
}
This also works, but has two important disadvantages:
Implementors of FooList that don't implement ListMut would need to dummy implement add_foo (usually with unreachable!()) because the Rust compiler still requires it.
It is not clear where to put the methods. add_foo could also live inside ListMut with the bound being where Self: FooList. This makes the trait API more confusing.
Define Capabilities via associated types/consts
In this solution, there would only be one trait. (Note that in the following code, dummy types are used. Ideally, this would be an associated const instead of type, but we cannot use consts in trait bounds yet, so dummy types it is.)
trait Bool {}
enum True {}
enum False {}
impl Bool for True {}
impl Bool for False {}
trait List {
type SupportsMut: Bool;
type SupportsFoo: Bool;
type SupportsBar: Bool;
fn add_foo(&mut self)
where
Self: List<SupportsMut = True, SupportsBar = True>;
// ...
}
This solves two problems: for one, we know that if Mut and Foo are supported, that we can use add_foo (in contrast to the first solution, where a data structure could implement ListMut and FooList but not FooListMut). Also, since all methods live in one trait, there it's not unclear anymore where a method should live.
But:
Implementors still might need to add a bunch of unreachable!() implementations as by the last solution.
It is more noisy to bound for certain capabilities. One could add trait ListMut: List<SupportsMut = True> (the same for foo and bar) as trait alias (with blanket impl) to make this a bit better, though.
Something else?
The three solutions so far are what I can think of. One could combine them somehow, of course. Or maybe there is a completely different solution even?
Are there clear advantages of one solution over the other solutions? Have they important semantic differences? Is one of these considered more idiomatic by the community? Have there been previous discussions about this? Which one should be preferred?

Why does T not implement AsRef<T>?

This code does not compile:
fn ref_on_int<T>(_: T) where T: AsRef<i32> {}
fn main() {
ref_on_int(&0_i32)
}
because
the trait bound `i32: std::convert::AsRef<i32>` is not satisfied
Why is it so?
This could be useful for example with a newtype like
struct MyInt(i32);
impl AsRef<i32> for MyInt {
/* etc. */
}
then you could indifferently pass a reference on an i32 or a reference on a MyInt, because in the memory we have in both cases an i32.
AsRef and Borrow are pretty similar at first glance, but they are used for different things. The Book describes the difference between them pretty well:
Choose Borrow when you want to abstract over different kinds of
borrowing, or when you’re building a data structure that treats owned
and borrowed values in equivalent ways, such as hashing and
comparison.
Choose AsRef when you want to convert something to a reference
directly, and you’re writing generic code.
In your case Borrow is a more reasonable choice because there is no conversion involved.
As for the question of why AsRef is not implemented between different integral types, I guess this would go against the intent of Rust to be expressive about casts; I think it's similar to the question Why can't I compare two integers of different types?.
Here's an authoritative answer by Aaron Turon:
Borrow provides a blanket implementation T: Borrow<T>, which is essential for making the above collections work well. AsRef provides a different blanket implementation, basically &T: AsRef<U> whenever T: AsRef<U>, which is important for APIs like fs::open that can use a simpler and more flexible signature as a result. You can't have both blanket implementations due to coherence, so each trait is making the choice that's appropriate for its use case.
I think that is one of the differences of AsRef and Borrow.
That is, Borrow<T> is implemented directly for &T, while AsRef<T> is not implemented for &T.
Funny thing is that AsRef<U> is implemented for &T if T implements AsRef<U>. That is, if you can use AsRef with a type, you can use it with a reference to the same time.
And another funny thing is that Borrow<T> is implemented for &T but also for T!

When to use a reference or a box to have a field that implements a trait in a struct?

I have the following code:
pub trait MyTrait {
pub fn do_something(&self);
}
If I want a struct A to have a field a that implements the trait MyTrait, there are 2 options:
pub struct A<'a> {
a: &'a MyTrait
}
or
pub struct A {
a: Box<MyTrait>
}
But on Difference between pass by reference and by box, someone said:
Really, Box<T> is only useful for recursive data structures (so that
they can be represented rather than being of infinite size) and for
the very occasional performance optimisation on large types (which you
shouldn’t try doing without measurements).
Unless A implements MyTrait, I'd say A is not a recursive data structure, so that makes me think I should prefer using a reference instead of a box.
If I have another struct B that has a reference to some A object, like this:
pub struct A<'a> {
a: &'a MyTrait
}
pub struct B<'a, 'b: 'a> {
b: &'a A<'b>
}
I need to say that 'b is larger than 'a, and according to the documentation:
You won't often need this syntax, but it can come up in situations
like this one, where you need to refer to something you have a
reference to.
I feel like that's a bad choice too, because the example here is really simple and probably doesn't need this kind of advanced feature.
How to decide whether I should use a reference or a box then?
Unfortunately, the quote you used applied to a completely different situation.
Really, Box<T> is only useful for recursive data structures (so that they can be represented rather than being of infinite size) and for the very occasional performance optimisation on large types (which you shouldn’t try doing without measurements).
Is speaking about using either of MyEnum or Box<MyEnum> for data members:
it is not comparing to references,
it is not talking about traits.
So... reset your brain, and let's start from scratch again.
The main difference between Box and a reference is ownership:
A Box indicates that the surrounding struct owns the piece of data,
A reference indicates that the surrounding struct borrows the piece of data.
Their use, therefore, is dictated by whether you want ownership or borrowing, which is a situational decision: neither is better than the other in the same way that a screwdriver and a hammer are not better than the other.
Rc (and Arc) can somewhat alleviate the need to decide, as they allow multiple owners, however they also introduce the risk of reference cycles which is its own nightmare to debug so I would caution over overusing them.

Introduce side effects with Rust Add trait

I am learning Rust and, for the sake of exercise, trying to implement an Instrumented<T> type which:
Has a value field of type T;
Supports all basic operations that T supports, for example, equality, ordering, arithmetic;
Delegates all these operations to T and counts invocations;
Prints a nice report.
The idea is borrowed from a programming course by Alexander Stepanov, where he implements this thing in C++. Using Instrumented<T>, one could easily measure the complexity of any algorithm on type T in terms of basic operations, in a very generic way, taking advantage of Rust's trait system.
For start, I am trying to implement a non-generic InstrumentedInt with Add trait and count for all additions in additions field. Full code: http://is.gd/AnF3Rf
And here is the trait itself:
impl Add<InstrumentedInt, InstrumentedInt> for InstrumentedInt {
fn add(&self, rhs: &InstrumentedInt) -> InstrumentedInt {
self.additions += 1;
InstrumentedInt {value: self.value + rhs.value, additions: 0}
}
}
And of course it does not work because &self is an immutable pointer and its fields cannot be assigned to.
Tried to declare arguments of add function mutable, but compiler says it is incompatible with trait definition.
Declared add in impl InstrumentedInt without Add trait, this does not provide support for addition (no duck typing).
Defined operands as mutable, that gives no effect.
So is this at all possible?
P. S. Afterwards, I'm going to:
replace additions with a pointer to a mutable array of uint, to count for many operations, not only addition;
make this pointer shared among all instances of InstrumentedInt, maybe just supply an argument in a constructor,
generalize the type to Instrumented<T>;
ideally, try to make implicit and distinct counters for different specializations. For example, if an algorithm uses both int and f32, Instrumented<int> and Instrumented<f32> should have different counters.
Maybe these are important for the solution of my current problem.
Thank you!
This is exactly the use case for various kinds of cells. Cells provide facilities to implement interior mutability, i.e. mutability behind & reference. For example, in your case InstrumentedInt could look like this:
struct InstrumentedInt {
value: int,
additions: Cell<uint>
}
impl Add<InstrumentedInt, InstrumentedInt> for InstrumentedInt {
fn add(&self, rhs: &InstrumentedInt) -> InstrumentedInt {
self.additions.set(self.additions.get()+1);
InstrumentedInt {value: self.value + rhs.value, additions: Cell::new(0)}
}
}
As for things that you're going to do afterwards, you may have difficulties implementing them. They seem possible to me to implement, but not easy. For example, you won't be able to use mutable global data without unsafe (it's probably possible to use synchronization tools like Arc and Mutex to use global variables without unsafe, but I've never done it so I don't now for sure). As for different counters for different specializations, you may be able to use libraries like AnyMap or TypeMap, again, with some kind of global variable.

Resources