I have been playing around with overloading some operators. I ran into a situation that I don't quite understand. When implementing the trait PartialEq for my struct Value, I noticed that the implementation below works and doesn't move the values, allowing me to continue to use the values after using the == operator on them without passing references of the values into the operator.
On the other hand, this doesn't work for the implementation of the Neg trait (or Add, Sub, etc.). In order to use the - operator without moving the value, I have to implement the Neg trait on references to the Value struct.
Why can I implement the PartialEq trait without having to worry about a move when not passing in a reference to the values, but when implementing the Neg trait I do need to worry? Am I implementing the Neg trait incorrectly? Is there a subtlety to the PartialEq trait that I am overlooking?
Here is my code:
struct Value {
x: i32
}
impl PartialEq for Value {
fn eq(&self, other: &Value) -> bool {
if self.x == other.x {
true
} else {
false
}
}
}
impl Eq for Value {}
impl Neg for &Value {
type Output = Value;
fn neg(self) -> Self::Output {
Value {
x: -self.x
}
}
}
fn main() {
let v1: Value = Value {x: 1};
let v2: Value = Value {x: 2};
let equal = v1 == v2; // Not passing a reference, but also able to use v1
let v3 = -&v1;
let v4 = -&v1; // Works because I am passing a reference. If I change the implementation of Neg to 'impl Neg for Value' and remove the reference here and in the line above (for v3), it will complain that v1 had been moved (as expected).
}
Is there a subtlety to the PartialEq trait that I am overlooking?
PartialEq's methods take self and other by reference (&self and other: &T in the signature), while Neg, Add, Sub, etc. take self and (for binary operators) other by value (self and other: T in the signature). v1 == v2 desugars to PartialEq::eq(&v1, &v2), while !v1 desugars to Neg::neg(v1).
The reason why you might want Neg to take ownership of the passed value is if the value has allocated dynamic memory (via Box, Vec, etc.). In that case, it might be more efficient to mutate self and then return self (or another object reusing the dynamic memory in the case where the Output type is different from the Self type) instead of allocating a new object (which would require new dynamic memory allocations), even if the original value is not used after the operation.
On the other hand, PartialEq's methods always return a bool. A bool doesn't allocate any dynamic memory, therefore there is no gain in passing the parameters by value. It's not expected that testing whether two objects are equal would need to mutate either or both of the objects, hence why the parameters are shared references.
Am I implementing the Neg trait incorrectly?
No, but you might want to consider implementing Neg for both Value and &Value (especially if you're writing a library for others to use).
If your type is cheap to copy (i.e. it's small and doesn't use dynamic memory), consider implementing Clone and Copy (possibly by deriving them) as well. This way, you can pass values to the operators without moving the value to the operator, because the value will be copied instead.
Related
I am currently learning Rust for fun. I have some experience in C / C++ and other experience in other programming languages that use more complex paradigms like generics.
Background
For my first project (after the tutorial), I wanted to create a N-Dimensional array (or Matrix) data structure to practice development in Rust.
Here is what I have so far for my Matrix struct and a basic fill and new initializations.
Forgive the absent bound checking and parameter testing
pub struct Matrix<'a, T> {
data: Vec<Option<T>>,
dimensions: &'a [usize],
}
impl<'a, T: Clone> Matrix<'a, T> {
pub fn fill(dimensions: &'a [usize], fill: T) -> Matrix<'a, T> {
let mut total = if dimensions.len() > 0 { 1 } else { 0 };
for dim in dimensions.iter() {
total *= dim;
}
Matrix {
data: vec![Some(fill); total],
dimensions: dimensions,
}
}
pub fn new(dimensions: &'a [usize]) -> Matrix<'a, T> {
...
Matrix {
data: vec![None; total],
dimensions: dimensions,
}
}
}
I wanted the ability to create an "empty" N-Dimensional array using the New fn. I thought using the Option enum would be the best way to accomplish this, as I can fill the N-Dimensional with None and it would allocate space for this T generic automatically.
So then it comes down to being able to set the entries for this. I found the IndexMut and Index traits that looked like I could do something like m[&[2, 3]] = 23. Since the logic is similar to each other here is the IndexMut impl for Matrix.
impl<'a, T> ops::IndexMut<&[usize]> for Matrix<'a, T> {
fn index_mut(&mut self, indices: &[usize]) -> &mut Self::Output {
match self.data[get_matrix_index(self.dimensions, indices)].as_mut() {
Some(x) => x,
None => {
NOT SURE WHAT TO DO HERE.
}
}
}
}
Ideally what would happen is that the value (if there) would be changed i.e.
let mut mat = Matrix::fill(&[4, 4], 0)
mat[&[2, 3]] = 23
This would set the value from 0 to 23 (which the above fn does via returning &mut x from Some(x)). But I also want None to set the value i.e.
let mut mat = Matrix::new(&[4, 4])
mat[&[2, 3]] = 23
Question
Finally, is there a way to make m[&[2,3]] = 23 possible with what the Vec struct requires to allocate the memory? If not what should I change and how can I still have an array with "empty" spots. Open to any suggestions as I am trying to learn. :)
Final Thoughts
Through my research, the Vec struct impls I see that the type T is typed and has to be Sized. This could be useful as to allocate the Vec with the appropriate size via vec![pointer of T that is null but of size of T; total]. But I am unsure of how to do this.
So there are a few ways to make this more similar to idiomatic rust, but first, let's look at why the none branch doesn't make sense.
So the Output type for IndexMut I'm going to assume is &mut T as you don't show the index definition but I feel safe in that assumption. The type &mut T means a mutable reference to an initialized T, unlike pointers in C/C++ where they can point to initialized or uninitialized memory. What this means is that you have to return an initialized T which the none branch cannot because there is no initialized value. This leads to the first of the more idiomatic ways.
Return an Option<T>
The easiest way would be to change Index::Output to be an Option<T>. This is better because the user can decide what to do if there was no value there before and is close to what you are actually storing. Then you can also remove the panic in your index method and allow the caller to choose what to do if there is no value. At this point, I think you can go a little further with gentrifying the structure in the next option.
Store a T directly
This method allows the caller to directly change what the type is that's stored rather than wrapping it in an option. This cleans up most of your index code nicely as you just have to access what's already stored. The main problem is now initialization, how do you represent uninitialized values? You were correct that option is the best way to do this1, but now the caller can decide to have this optional initialization capability by storing an Option themselves. So that means we can always store initialized Ts without losing functionality. This only really changes your new function to instead not fill with None values. My suggestion here is to make a bound T: Default for the new function2:
impl<'a, T: Default> Matrix<'a, T> {
pub fn new(dimensions: &'a [usize]) -> Matrix<'a, T> {
Matrix {
data: (0..total).into_iter().map(|_|Default::default()).collect(),
dimensions: dimensions,
}
}
}
This method is much more common in the rust world and allows the caller to choose whether to allow for uninitialized values. Option<T> also implements default for all T and returns None So the functionality is very similar to what you have currently.
Aditional Info
As you're new to rust there are a few comments that I can make about traps that I've fallen into before. To start your struct contains a reference to the dimensions with a lifetime. What this means is that your structs cannot exist longer than the dimension object that created them. This hasn't caused you a problem so far as all you've been passing is statically created dimensions, dimensions that are typed into the code and stored in static memory. This gives your object a lifetime of 'static, but this won't occur if you use dynamic dimensions.
How else can you store these dimensions so that your object always has a 'static lifetime (same as no lifetime)? Since you want an N-dimensional array stack allocation is out of the question since stack arrays must be deterministic at compile time (otherwise known as const in rust). This means you have to use the heap. This leaves two real options Box<[usize]> or Vec<usize>. Box is just another way of saying this is on the heap and adds Sized to values that are ?Sized. Vec is a little more self-explanatory and adds the ability to be resized at the cost of a little overhead. Either would allow your matrix object to always have a 'static lifetime.
1. The other way to represent this without Option<T>'s discriminate is MaybeUninit<T> which is unsafe territory. This allows you to have a chunk of initialized memory big enough to hold a T and then assume it's initialized unsafely. This can cause a lot of problems and is usually not worth it as Option is already heavily optimized in that if it stores a type with a pointer it uses compiler magic to store the discriminate in whether or not that value is a null pointer.
2. The reason this section doesn't just use vec![Default::default(); total] is that this requires T: Clone as the way this macro works the first part is called once and cloned until there are enough values. This is an extra requirement that we don't need to have so the interface is smoother without it.
I want to write an LRU Cache with a memory size limitation rather than the "number of objects" limitation in std. After trying to figure it out for myself, I cheated and looked at an existing implementation, and I almost understand it, but this stops me:
struct KeyRef<K> {
k: *const K,
}
impl<K: Hash> Hash for LruKeyRef<K> {
fn hash<H: Hasher>(&self, state: &mut H) {
unsafe { (*self.k).hash(state) }
}
}
impl<K: PartialEq> PartialEq for LruKeyRef<K> {
fn eq(&self, other: &LruKeyRef<K>) -> bool {
unsafe { (*self.k).eq(&*other.k) }
}
}
It's that last unsafe line that I don't understand. I'm using a HashMap as the underlying structure, the key is stored with the value, and I want the hasher to be able to find it. I make the working hash key a reference to the real key and provide Hash and PartialEq functions such that the HashMap can find and use the key for bucketing purposes. That's easy.
I understand then that I have to compare the two for PartialEq, and so it makes sense to me that I have to use *self.k to dereference the current object, so why &*other.k for the other object? That's what I don't understand. Why isn't it just *other.k? Aren't I just dereferencing both so I can compare the actual keys?
We wish to call PartialEq::eq:
trait PartialEq<Rhs = Self>
where
Rhs: ?Sized,
{
fn eq(&self, other: &Rhs) -> bool;
}
Assuming the default implementation where Rhs = Self and Self = K, we need to end up with two &K types
other.k is of type *const K
*other.k is of type K
&*other.k is of type &K
This much should hopefully make sense.
self.k is of type *const K
*self.k is of type K
The piece that's missing that that method calls are allowed to automatically reference the value they are called on. This is why there's no distinct syntax for a reference and a value, as there would be in C or C++ (foo.bar() vs foo->bar()).
Thus, the K is automatically referenced to get &K, fulfilling the signature.
impl<K: PartialEq> PartialEq for LruKeyRef<K> {
fn eq(&self, other: &LruKeyRef<K>) -> bool {
unsafe { (*self.k).eq(&*other.k) }
}
}
Under typical circumstances, we can call methods taking &self with just a reference to the object. In addition, a chain of references to the object is also implicitly coerced. That is, we can write:
let a: &str = "I'm a static string";
assert_eq!(str.len(), 19);
assert_eq!((&&&&str).len(), 19);
In your case however, we start with a pointer, which must be explicitly dereferenced inside an unsafe scope. Here are the types of all relevant expressions:
self.k : *const K
(*self.k) : K
other.k : *const K
&*other.k : &K
Since equals takes a reference on its right-hand member, we must make it a reference. Unlike in C++, you can not just pass an lvalue as a reference without making this reference-passing explicit, nor can you pass an rvalue to a const reference. You can however, prepend & to a literal in order to obtain a reference to it (foo(&5)). It only appears asymmetrical because (in a way) self.k is the caller and other.k is the callee.
The struct std::vec::Vec implements two kinds of Extend, as specified here – impl<'a, T> Extend<&'a T> for Vec<T> and impl<T> Extend<T> for Vec<T>. The documentation states that the first kind is an "Extend implementation that copies elements out of references before pushing them onto the Vec". I'm rather new to Rust, and I'm not sure if I'm understanding it correctly.
I would guess that the first kind is used with the equivalent of C++ normal iterators, and the second kind is used with the equivalent of C++ move iterators.
I'm trying to write a function that accepts any data structure that will allow inserting i32s to the back, so I take a parameter that implements both kinds of Extend, but I can't figure out how to specify the generic parameters to get it to work:
fn main() {
let mut vec = std::vec::Vec::<i32>::new();
add_stuff(&mut vec);
}
fn add_stuff<'a, Rec: std::iter::Extend<i32> + std::iter::Extend<&'a i32>>(receiver: &mut Rec) {
let x = 1 + 4;
receiver.extend(&[x]);
}
The compiler complains that &[x] "creates a temporary which is freed while still in use" which makes sense because 'a comes from outside the function add_stuff. But of course what I want is for receiver.extend(&[x]) to copy the element out of the temporary array slice and add it to the end of the container, so the temporary array will no longer be used after receiver.extend returns. What is the proper way to express what I want?
From the outside of add_stuff, Rect must be able to be extended with a reference whose lifetime is given in the inside of add_stuff. Thus, you could require that Rec must be able to be extended with references of any lifetime using higher-ranked trait bounds:
fn main() {
let mut vec = std::vec::Vec::<i32>::new();
add_stuff(&mut vec);
}
fn add_stuff<Rec>(receiver: &mut Rec)
where
for<'a> Rec: std::iter::Extend<&'a i32>
{
let x = 1 + 4;
receiver.extend(&[x]);
}
Moreover, as you see, the trait bounds were overly tight. One of them should be enough if you use receiver consistently within add_stuff.
That said, I would simply require Extend<i32> and make sure that add_stuff does the right thing internally (if possible):
fn add_stuff<Rec>(receiver: &mut Rec)
where
Rec: std::iter::Extend<i32>
{
let x = 1 + 4;
receiver.extend(std::iter::once(x));
}
I noticed that Box<T> implements everything that T implements and can be used transparently. For Example:
let mut x: Box<Vec<u8>> = Box::new(Vec::new());
x.push(5);
I would like to be able to do the same.
This is one use case:
Imagine I'm writing functions that operate using an axis X and an axis Y. I'm using values to change those axis that are of type numbers but belongs only to one or the other axis.
I would like my compiler to fail if I attempt to do operations with values that doesn't belong to the good axis.
Example:
let x = AxisX(5);
let y = AxisY(3);
let result = x + y; // error: incompatible types
I can do this by making a struct that will wrap the numbers:
struct AxisX(i32);
struct AxisY(i32);
But that won't give me access to all the methods that i32 provides like abs(). Example:
x.abs() + 3 // error: abs() does not exist
// ...maybe another error because I don't implement the addition...
Another possible use case:
You can appropriate yourself a struct of another library and implement or derive anything more you would want. For example: a struct that doesn't derive Debug could be wrapped and add the implementation for Debug.
You are looking for std::ops::Deref:
In addition to being used for explicit dereferencing operations with the (unary) * operator in immutable contexts, Deref is also used implicitly by the compiler in many circumstances. This mechanism is called 'Deref coercion'. In mutable contexts, DerefMut is used.
Further:
If T implements Deref<Target = U>, and x is a value of type T, then:
In immutable contexts, *x on non-pointer types is equivalent to *Deref::deref(&x).
Values of type &T are coerced to values of type &U
T implicitly implements all the (immutable) methods of the type U.
For more details, visit the chapter in The Rust Programming Language as well as the reference sections on the dereference operator, method resolution and type coercions.
By implementing Deref it will work:
impl Deref for AxisX {
type Target = i32;
fn deref(&self) -> &i32 {
&self.0
}
}
x.abs() + 3
You can see this in action on the Playground.
However, if you call functions from your underlying type (i32 in this case), the return type will remain the underlying type. Therefore
assert_eq!(AxisX(10).abs() + AxisY(20).abs(), 30);
will pass. To solve this, you may overwrite some of those methods you need:
impl AxisX {
pub fn abs(&self) -> Self {
// *self gets you `AxisX`
// **self dereferences to i32
AxisX((**self).abs())
}
}
With this, the above code fails. Take a look at it in action.
I made a two element Vector struct and I want to overload the + operator.
I made all my functions and methods take references, rather than values, and I want the + operator to work the same way.
impl Add for Vector {
fn add(&self, other: &Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
Depending on which variation I try, I either get lifetime problems or type mismatches. Specifically, the &self argument seems to not get treated as the right type.
I have seen examples with template arguments on impl as well as Add, but they just result in different errors.
I found How can an operator be overloaded for different RHS types and return values? but the code in the answer doesn't work even if I put a use std::ops::Mul; at the top.
I am using rustc 1.0.0-nightly (ed530d7a3 2015-01-16 22:41:16 +0000)
I won't accept "you only have two fields, why use a reference" as an answer; what if I wanted a 100 element struct? I will accept an answer that demonstrates that even with a large struct I should be passing by value, if that is the case (I don't think it is, though.) I am interested in knowing a good rule of thumb for struct size and passing by value vs struct, but that is not the current question.
You need to implement Add on &Vector rather than on Vector.
impl<'a, 'b> Add<&'b Vector> for &'a Vector {
type Output = Vector;
fn add(self, other: &'b Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
In its definition, Add::add always takes self by value. But references are types like any other1, so they can implement traits too. When a trait is implemented on a reference type, the type of self is a reference; the reference is passed by value. Normally, passing by value in Rust implies transferring ownership, but when references are passed by value, they're simply copied (or reborrowed/moved if it's a mutable reference), and that doesn't transfer ownership of the referent (because a reference doesn't own its referent in the first place). Considering all this, it makes sense for Add::add (and many other operators) to take self by value: if you need to take ownership of the operands, you can implement Add on structs/enums directly, and if you don't, you can implement Add on references.
Here, self is of type &'a Vector, because that's the type we're implementing Add on.
Note that I also specified the RHS type parameter with a different lifetime to emphasize the fact that the lifetimes of the two input parameters are unrelated.
1 Actually, reference types are special in that you can implement traits for references to types defined in your crate (i.e. if you're allowed to implement a trait for T, then you're also allowed to implement it for &T). &mut T and Box<T> have the same behavior, but that's not true in general for U<T> where U is not defined in the same crate.
If you want to support all scenarios, you must support all the combinations:
&T op U
T op &U
&T op &U
T op U
In rust proper, this was done through an internal macro.
Luckily, there is a rust crate, impl_ops, that also offers a macro to write that boilerplate for us: the crate offers the impl_op_ex! macro, which generates all the combinations.
Here is their sample:
#[macro_use] extern crate impl_ops;
use std::ops;
impl_op_ex!(+ |a: &DonkeyKong, b: &DonkeyKong| -> i32 { a.bananas + b.bananas });
fn main() {
let total_bananas = &DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = &DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
}
Even better, they have a impl_op_ex_commutative! that'll also generate the operators with the parameters reversed if your operator happens to be commutative.