Why use traits? [closed] - rust

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
This post was edited and submitted for review 10 months ago and failed to reopen the post:
Opinion-based Update the question so it can be answered with facts and citations by editing this post.
Improve this question
What typical scenario would traits be used for in a Rust program? I have tried to wrap my head around them, but I still don't have a clear concept as to why one would decide to use them. I understand the syntax from reading about them from here.
How would one explain them to someone as if they were 5?

From the docs:
A trait is a collection of methods defined for an unknown type: Self. They can access other methods declared in the same trait.
Traits are similar to interfaces in languages like Java, C++, etc.
It would really be helpful to go over examples, such as the ToString and the Display traits.
If you write this:
format!("x is {x}");
the code will only work (and compile) if x implements the Display interface.
Similarly, this will only work if x implements the Debug interface:
format!("{x:?}");
Leaving aside the "magic" associated with formatting strings, it should be clear x cannot be an instance of just any type. There needs to be a specific implementation that converts a given type to a string. Note that u16's implementation, for example, would be completely different from, say, IpAddr's implementation. However, both types have a common behavior. As a result, a string formatter doesn't even need to know which types implement the Display or Debug trait. It's not its concern. It only cares whether the type in question implements the given trait (and if it does not, the compiler simply won't accept that code).

One way to conceptualize traits is to think of them as 'behaviors you want available to, and to operate over, a particular chunk of state'. So, if you have some struct that contains state, and you want to do something with it, you would write a trait for it.
There are two primary usages:
You are dealing with a struct in your code, and you would like that struct to know how perform some behavior. You can call the trait-defined behavior on the struct.
You would like to pass the struct to some other code (yours or a third party) that might not know anything about the struct itself, but want to perform some set of functions on it, trusting that the struct knows what to do in those cases.
In the first case, it allows you to do things like this:
struct Article {
body: String
}
trait Saveable {
fn save(&self) -> ();
}
impl Saveable for Article {
fn save(&self) -> () {
... // All the code you need to run to save the Article object
}
}
// A function called by your UX
fn handle_article_update(article: Article) -> () {
...
article.save() // Call the save functionality
}
The second case is arguably more interesting, though. Let's say you - or more probably a third party - has a function defined like this:
fn save_object(obj: Saveable) -> () {
...
obj.save()
}
struct Person {
name: String
}
impl Saveable for Person {
fn save(&self) -> () {
... // Code needed to save a Person object, could be different from that needed for an Article object
}
}
...
// Note that we are using the same function to save both of these, despite being different underlying Structs
save_object(article)
save_object(person)
What this means is that the save_object function does not need to know anything about your custom Article struct in order to call save on it. It can simply refer to your implementation of that method in order to do so. In this way you can write custom objects that third party or generic library functions are able to act upon, because the Trait has defined a set of behaviors with a contract that the code can rely on safely.
Then the question of, 'when do you want to use a Trait' can be answered by saying: whenever you want to use behavior defined on a struct without needing to know everything about the struct - just that it has that behavior. So, in the above, you might also have an 'edit' functionality attached to Article, but not to Person. You don't want to have to change save_object to account for this, or to even care. All save_object needs to know is that the functions defined in the Saveable trait are implemented - it doesn't need to know anything else about the object to function equally well.
Another way to phrase this is to say, 'Use a trait when you want to pass an object based on what it can do, not what it is.'

Related

Is there a way to automatically register trait implementors?

I'm trying to load JSON files that refer to structs implementing a trait. When the JSON files are loaded, the struct is grabbed from a hashmap. The problem is, I'll probably have to have a lot of structs put into that hashmap all over my code. I would like to have that done automatically. To me this seems to be doable with procedural macros, something like:
#[my_proc_macro(type=ImplementedType)]
struct MyStruct {}
impl ImplementedType for MyStruct {}
fn load_implementors() {
let implementors = HashMap::new();
load_implementors!(implementors, ImplementedType);
}
Is there a way to do this?
No
There is a core issue that makes it difficult to skip manually inserting into a structure. Consider this simplified example, where we simply want to print values that are provided separately in the code-base:
my_register!(alice);
my_register!(bob);
fn main() {
my_print(); // prints "alice" and "bob"
}
In typical Rust, there is no mechanism to link the my_print() call to the multiple invocations of my_register. There is no support for declaration merging, run-time/compile-time reflection, or run-before-main execution that you might find in other languages that might make this possible (unless of course there's something I'm missing).
But Also Yes
There are third party crates built around link-time or run-time tricks that can make this possible:
ctor allows you to define functions that are executed before main(). With it, you can have my_register!() create invididual functions for alice and bob that when executed will add themselves to some global structure which can then be accessed by my_print().
linkme allows you to define a slice that is made from elements defined separately, which are combined at compile time. The my_register!() simply needs to use this crate's attributes to add an element to the slice, which my_print() can easily access.
I understand skepticism of these methods since the declarative approach is often clearer to me, but sometimes they are necessary or the ergonomic benefits outweigh the "magic".

Why does `Drop::drop` take `&mut self` instead of `self`?

I am aware of a similar question that was once asked, but I am still rather perplexed after reading the accepted answer (to be clear, I do not see why the compiler does not have a special case for Drop::drop, as mentioned in the comments, since it already disallows moving out of an object that implements Drop). So let me please try to reiterate on this.
Suppose we have a struct Foo that owns some resources; say, a thread that has to be joined when the object goes out of scope (this situation can happen if we are writing an asynchronous runtime):
struct Foo {
t: JoinHandle<()>,
}
Since std::ops::Drop::drop takes &mut self, with this definition we cannot really hope to join this thread in drop, because JoinHandle::join takes self by value. Therefore, we have to resort to making this field optional:
struct Foo {
t: Option<JoinHandle<()>>,
}
And then we can go with if let Some(t) = self.t.take() { t.join(); }. But there still seem to be some unanswered questions here:
Is the drop signature hinting us that the drop method may get called several times on some objects, or that drop may not be the last method that is called on an object during its lifetime? If yes, in which scenarios can this happen? Is it correct to write self.t.unwrap() in the example above?
How to achieve similar semantics without introducing the overhead of Option (which may not in all cases get covered by niche optimisation and is anyway rather verbose)?

Is it idiomatic rust to accept arguments that `impl Borrow<T>` to abstract over references and values of T? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I find myself writing functions that accepts arguments as Borrow<T> so that it accepts both values and references transparently.
Example:
use std::borrow::Borrow;
#[derive(Debug, Copy)]
struct Point {
pub x: i32,
pub y: i32,
}
pub fn manhattan<T, U>(p1: T, p2: U) -> i32
where
T: Borrow<Point>,
U: Borrow<Point>,
{
let p1 = p1.borrow();
let p2 = p2.borrow();
(p1.x - p2.x + p1.y - p2.y).abs()
}
That can be useful to implement std:ops like Add, which would otherwise require a lot of repetition to support references transparently.
Is this idiomatic? Are there drawbacks?
I think there are two parts to this question.
1. Is the Borrow trait the idiomatic way to abstract over ownership in Rust?
Yes. If what you intend is to write a function that either takes a Foo or a &Foo, F: Borrow<Foo> is the right bound to use. AsRef, on the other hand, is usually only implemented for things that are reference-like, and not for owned values.
2. Is it idiomatic in Rust to abstract over ownership at all?
Sometimes. This is an interesting question because there is a subtle but important distinction between a function like manhattan and how Borrow is idiomatically used.
In Rust, whether a function needs to own its arguments or merely borrow them is an important part of the function's interface. Rustaceans, as a rule, don't mind writing & in a function call because it's a syntactic marker of a relevant semantic fact about the function being called. A function that can accept either Point or &Point is no more generally useful than the one that can accept only &Point: if you have a Point, all you have to do is borrow it. So it's idiomatic to use the simpler signature that most accurately documents the type the function really needs: &Point.
But wait! There are other differences between those ways of accepting arguments. One difference is call overhead: a &Point will generally be passed in a single pointer-sized register, while a Point may be passed in multiple registers or on the stack, depending on the ABI. Another difference is code size: each unique instantiation of <T: Borrow<Point>> represents a monomorphization of the function, which bloats the binary. A third difference is drop order: if Point has destructors, a function that accepts T: Borrow<Point> will call Point::drop internally, while a function that accepts &Point will leave the object in place for the caller to deal with. Whether this is good or bad depends on the context; for performance, though, it's usually irrelevant (if you assume the Point will eventually be dropped anyway).
A function accepting T: Borrow<Point> suggests that it's doing something with T internally for which a mere &Point might be suboptimal. Drop order is probably the best reason for doing this (I wrote more about this in this answer, although the puts function I used as an example isn't a particularly strong one).
In the case of manhattan drop order is irrelevant, because Point is Copy (trivially copied types may not have drop glue). So there is no performance advantage from accepting Point as well as &Point (and although a single function isn't likely to make much difference one way or another, if generics are used pervasively, the cost to code size may well be a disadvantage).
There is one more reason to avoid using generics unnecessarily: they interfere with type inference and can decrease the quality of error messages and suggestions from the compiler. For instance, imagine if Point only implemented Clone (not Copy) and you wrote manhattan(p, q) and then used p again later in the same function. The compiler would warn you that p was used after being moved into the function and suggest adding a .clone(). In fact, the better solution is to borrow p, and if manhattan takes references the compiler will enforce that you do just that.
The fact Point is small (so overhead to using it as a function argument is probably minimal) and Copy (so has no drop glue to worry about) raises another question: should manhattan simply accept Point and not use references at all? This is an opinion-based question and really it comes down to which better fits your mental model. Either accept &Point, and use & when a caller has an owned value, or accept Point, and use * when a caller has a reference - there is no hard and fast rule.
What is an appropriate use of Borrow, then?
The argument above strongly depends on the fact that references are easy to take anywhere, so you may as well take them concretely in the caller as abstractly inside the generic function. One time this is not the case is when the borrowed-or-owned type is not passed directly to the function, but wrapped in another generic data structure. Consider sorting a slice of Point-like things by their distance from (0, 0):
fn sort_by_radius<T: Borrow<Point>>(points: &mut [T]) {
points.sort_by_key(|p| {
let Point { x, y } = p.borrow();
x * x + y * y
});
}
In this case it's definitely not the case that the caller with a &mut [Point] can simply borrow it to get a &mut [&Point]. Yet we would like sort_by_radius to be able to accept both kinds of slices (without writing two functions) so Borrow<Point> comes to the rescue. The difference between sort_by_radius and your version of manhattan is that T is not being passed directly to the function to be immediately borrowed, but is a part of the type that sort_by_radius needs to treat like a Point in order to perform a task ultimately unrelated to borrowing (sorting a slice).

How could I avoid cloning the entirety of a large struct to send to a thread when only parts are needed? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
My use case:
I have a large complex struct.
I want to take a snapshot of this struct to send it to a thread to do some calculation.
Many large fields within this struct are not neccessary for calculation.
Many fields within the struct are partially required (a field may be a struct and only a few parameters from this struct are required).
At the moment I simply call .clone() and pass a clone of the entire struct to the thread.
It is difficult to give a good example, but this is a summary of my current method:
use tokio::task;
fn main() {
let compex_struct = ComplexStruct::new(...);
// some extra async stuff I don't think is 100% neccessary to this question
let future = async_function(compex_struct.clone()); // Ideally not cloning whole struct
// some extra async stuff I don't think is 100% neccessary to this question
}
fn async_function(complex_struct:ComplexStruct) -> task::JoinHandle<_> {
task::spawn_blocking(move || {
// bunch of work, then return something
})
}
My current working idea is to have a seperate struct such as ThreadData which is instantiated with ThreadData::new(ComplexStruct) and effectively clones the required fields. I then pass ThreadData to the thread instead.
What is the best solution to this problem?
I think you've answered your own question. 😁 If you're just looking for validation, I believe a refactor to only the needed parts is a good idea. You may find ways to simplify your code, but the performance boost seems to be your reasoning. We can't see benchmarks on this, but perhaps you want to track that.
This part is just my opinion, but instead of ThreadData::new(), you could do ThreadData::from(), or better yet, impl From<ComplexStruct> for ThreadData {}. If it only has one purpose then it doesn't matter, but if ThreadData will ever be used in a different context, I like to keep the "new"/"from" functions available for a general instance. Otherwise I eventually have Struct::from_this(), Struct::from_that(), or Struct::from_some_random_input(). 😋

Why can't Rust do more complex type inference? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
In Chapter 3 of The Rust Programming Language, the following code is used as an example for a kind of type inference that Rust cannot manage:
fn main() {
let condition = true;
let number = if condition { 5 } else { "six" };
println!("The value of number is: {}", number);
}
with the explanation that:
Rust needs to know at compile time what type the number variable is, definitively, so it can verify at compile time that its type is valid everywhere we use number. Rust wouldn’t be able to do that if the type of number was only determined at runtime; the compiler would be more complex and would make fewer guarantees about the code if it had to keep track of multiple hypothetical types for any variable.
I'm not certain I understand the rationale, because the example does seem like something where a simple compiler could infer the type.
What exactly makes this kind of type inference so difficult? In this case, the value of condition can be clearly inferred at compile time (it's true), and so thus the type of number can be too (it's i32?).
I can see how things could become a lot more complicated, if you were trying to infer types across multiple compilation units for instance, but is there something about this specific example that would add a lot of complexity to the compiler?
There are three main reasons I can think of:
1. Action at a distance effects
Let's suppose the language worked that way. Since we're extending type inference, we might as well make the language even smarter and have it infer return types as well. This allows me to write something like:
pub fn get_flux_capacitor() {
let is_prod = true;
if is_prod { FluxCapacitor::new() } else { MovieProp::new() }
}
And elsewhere in my project, I can get a FluxCapacitor by calling that function. However, one day, I change is_prod to false. Now, instead of getting an error that my function is returning the wrong type, I will get errors at every callsite. A small change inside one function has lead to errors in entirely unchanged files! That's pretty weird.
(If we don't want to add inferered return types, just imagine it's a very long function instead.)
2. Compiler internals exposed
What happens in the case where it's not so simple? Surely this should be the same as the above example:
pub fn get_flux_capacitor() {
let is_prod = (1 + 1) == 2;
...
}
But how far does that extend? The compiler's constant propagation is mostly an implementation detail. You don't want the types in your program to depend on how smart this version of the compiler is.
3. What did you actually mean?
As a human looking at this code, it looks like something is missing. Why are you branching on true at all? Why not just write FluxCapacitor::new()? Perhaps there's logic missing to check and see if a env=DEV environment variable is missing. Perhaps a trait object should actually be used so that you can take advantage of runtime polymorphism.
In this kind of situation where you're asking the computer to do something that doesn't seem quite right, Rust often chooses to throw its hands up and ask you to fix the code.
You're right, in this very specific case (where condition=true statically), the compiler could be made able to detect that the else branch is unreachable and therefore number must be 5.
This is just a contrived example, though... in the more general case, the value of condition would only be dynamically known at runtime.
It's in that case, as other have said, that inference becomes hard to implement.
On that topic, there are two things I haven't seen mentioned yet.
The Rust language design tends to err on the side of doing things as
explicitly as possible
Rust type inference is only local
On point #1, the explicit way for Rust to deal with the "this type can be one of multiple types" use case are enums.
You can define something like this:
#[derive(Debug)]
enum Whatsit {
Num(i32),
Text(&'static str),
}
and then do let number = if condition { Num(5) } else { Text("six") };
On point #2, let's see how the enum (while wordier) is the preferred approach in the language. In the example from the book we just try printing the value of number.
In a more real-case scenario we would at one point use number for something other than printing.
This means passing it to another function or including it in another type. Or (to even enable use of println!) implementing the Debug or Display traits on it. Local inference means that (if you can't name the type of number in Rust), you would not be able to do any of these things.
Suppose you want to create a function that does something with a number;
with the enum you would write:
fn do_something(number: Whatsit)
but without it...
fn do_something(number: /* what type is this? */)
In a nutshell, you're right that in principle it IS doable for the compiler to synthesize a type for number. For instance, the compiler might create an anonymous enum like Whatsit above when compiling that code.
But you - the programmer - would not know the name of that type, would not be able to refer to it, wouldn't even know what you can do with it (can I multiply two "numbers"?) and this would greatly limit its usefulness.
A similar approach was followed for instance to add closures to the language. The compiler would know what specific type a closure has, but you, the programmer, would not. If you're interested I can try finding out discussions on the difficulties that the approach introduced in the design of the language.

Resources