Consider the following Rust function, which is meant to indicate whether the given 3-byte string is equal to b"foo".
fn is_foo(value: [u8; 3]) -> bool {
value == b"foo"
}
This doesn't work:
error[E0277]: can't compare [u8; 3] with &[u8; 3]
The compiler is complaining that it can't compare a value of some type to a reference of the same type.
I found two ways of getting the equality check to work:
Turning the value into a ref first: &value == b"foo"
Turning the ref into a value first: value == *b"foo"
Coming from C++ (where a value and a reference are pretty much the same thing), both approaches look a bit strange to me. What is the most idiomatic way of comparing a value and a reference?
In rust T and &T (and also &mut T) are different types. Though different types can be compared it is not standard. Equality comparison (the == operator) is done by PartialEq trait. It looks like this:
pub trait PartialEq<Rhs = Self> where
Rhs: ?Sized,
{
fn eq(&self, other: &Rhs) -> bool;
fn ne(&self, other: &Rhs) -> bool { ... }
}
And although it is generic over Rhs (meaning that one type can potentially be compared with many other types), it defaults to Self.
So coming back to your example you could just write value.eq(b"foo") (that would compare values of references), but though it is not wrong, probably more often is &value == b"foo". Dereferencing is fine too, but I seldom see it.
You also don't have to be worry, that equality between types and their references will differ, because standard library has following blanket implementation (and others for mutable references):
impl<A, B> PartialEq<&B> for &A where
A: PartialEq<B> + ?Sized,
B: ?Sized,
{ ... }
That automatically implements equality for references in a proper way.
Related
I want to write an LRU Cache with a memory size limitation rather than the "number of objects" limitation in std. After trying to figure it out for myself, I cheated and looked at an existing implementation, and I almost understand it, but this stops me:
struct KeyRef<K> {
k: *const K,
}
impl<K: Hash> Hash for LruKeyRef<K> {
fn hash<H: Hasher>(&self, state: &mut H) {
unsafe { (*self.k).hash(state) }
}
}
impl<K: PartialEq> PartialEq for LruKeyRef<K> {
fn eq(&self, other: &LruKeyRef<K>) -> bool {
unsafe { (*self.k).eq(&*other.k) }
}
}
It's that last unsafe line that I don't understand. I'm using a HashMap as the underlying structure, the key is stored with the value, and I want the hasher to be able to find it. I make the working hash key a reference to the real key and provide Hash and PartialEq functions such that the HashMap can find and use the key for bucketing purposes. That's easy.
I understand then that I have to compare the two for PartialEq, and so it makes sense to me that I have to use *self.k to dereference the current object, so why &*other.k for the other object? That's what I don't understand. Why isn't it just *other.k? Aren't I just dereferencing both so I can compare the actual keys?
We wish to call PartialEq::eq:
trait PartialEq<Rhs = Self>
where
Rhs: ?Sized,
{
fn eq(&self, other: &Rhs) -> bool;
}
Assuming the default implementation where Rhs = Self and Self = K, we need to end up with two &K types
other.k is of type *const K
*other.k is of type K
&*other.k is of type &K
This much should hopefully make sense.
self.k is of type *const K
*self.k is of type K
The piece that's missing that that method calls are allowed to automatically reference the value they are called on. This is why there's no distinct syntax for a reference and a value, as there would be in C or C++ (foo.bar() vs foo->bar()).
Thus, the K is automatically referenced to get &K, fulfilling the signature.
impl<K: PartialEq> PartialEq for LruKeyRef<K> {
fn eq(&self, other: &LruKeyRef<K>) -> bool {
unsafe { (*self.k).eq(&*other.k) }
}
}
Under typical circumstances, we can call methods taking &self with just a reference to the object. In addition, a chain of references to the object is also implicitly coerced. That is, we can write:
let a: &str = "I'm a static string";
assert_eq!(str.len(), 19);
assert_eq!((&&&&str).len(), 19);
In your case however, we start with a pointer, which must be explicitly dereferenced inside an unsafe scope. Here are the types of all relevant expressions:
self.k : *const K
(*self.k) : K
other.k : *const K
&*other.k : &K
Since equals takes a reference on its right-hand member, we must make it a reference. Unlike in C++, you can not just pass an lvalue as a reference without making this reference-passing explicit, nor can you pass an rvalue to a const reference. You can however, prepend & to a literal in order to obtain a reference to it (foo(&5)). It only appears asymmetrical because (in a way) self.k is the caller and other.k is the callee.
I have the following code:
fn example(known_primes: &[i32], number: i32, prime: i32, limit: i32) {
let mut is_prime = true;
for prime in known_primes {
if number % prime == 0 {
is_prime = false;
break;
}
if *prime > limit {
break;
}
}
}
Why do I need to dereference prime in the second condition (*prime > limit), when I don't need to do so in the first one (number % prime == 0)?
Both % and < are operators that take two numbers and return something. The only difference seems to be in what they return (a number vs. a boolean). While Why isn't it possible to compare a borrowed integer to a literal integer? does explain what would be required to make the code work (implementations for all overloads, ideally in the standard library), it does not say why it does work for a % b. Is there a fundamental difference between these operators? Or is it just not implemented yet?
Comparison operators actually do behave differently than arithmetic operators. The difference becomes obvious when looking at the trait definitions. As an example, here is the PartialEq trait
pub trait PartialEq<Rhs = Self>
where
Rhs: ?Sized,
{
fn eq(&self, other: &Rhs) -> bool;
fn ne(&self, other: &Rhs) -> bool { ... }
}
and the Add trait
pub trait Add<RHS = Self> {
type Output;
fn add(self, rhs: RHS) -> Self::Output;
}
We can see that comparison traits take the operands by reference, while the arithmetic traits take the operands by value. This difference is reflected in how the compiler translates operator expressions:
a == b ==> std::cmp::PartialEq::eq(&a, &b)
a + b ==> std::ops::Add::add(a, b)
The operands of comparisons are evaluated as place expressions, so they can never move values. Operands of arithmetic operators, on the other hand, are evaluated as value expressions, so they are moved or copied depending on whether the operand type is Copy.
As a result of this difference, if we implement PartialEq for the type A, we can not only compare A and A, but also &A and &A by virtue of deref coercions for the operands. For Add on the other hand we need a separate implementation to be able to add &A and &A.
I can't answer why the standard library implements the "mixed" versions for reference and value for arithmetic operators, but not for comparisons. I can't see a fundamental reason why the latter can't be done.
Because you can have Rem implementation for different types and the core library implements
impl<'a> Rem<&'a i32> for i32 { /* … */ }
This is impossible for PartialOrd and Ord traits, so you need to compare exactly the same types, in this case i32, that is why there is requirement for dereference.
I tried to compile the following code to understand the behavior of comparison operators applied to references:
fn main() {
&1 == &2; // OK
&&1 == &&2; // OK
&1 == &mut 2; // OK
&mut(&1) == &(&mut 2); // OK
1 == &2; // Compilation Error
&1 == &&2; // Compilation Error
}
According to this result, for type T implementing PartialEq, it seems that
References of Ts, references of references of Ts, ... are comparable.
Shared and mutable references can be mixed.
The amount of references for both sides of == must be the same.
Where do these rules come from? Rule 1 and 2 can be derived from the semantics of comparison operators and deref coercion. Consider &1 == &mut 2:
The expression is interpreted as PartialEq::eq(&&1, &(&mut 2)) by the compiler.
The first argument &&1 turns into &1 by deref coercion because &T implements Deref<Target = T>.
The second argument &(&mut 2) turns into &2 by deref coercion because &mut T implements Deref<Target = T>.
Now the types of both arguments match the signature of PartialEq::eq implemented by i32. So PartialEq::<i32>::eq(&1, &2) is evaluated.
However, I don't understand where Rule 3 comes from. I think PartialEq::eq(&1, &&2) is coerced to PartialEq::eq(&1, &2) because deref coercion is applied to the both arguments independently.
What is the rationale of Rule 3? Please show documented semantics of Rust or the appropriate code in the compiler?
It seems like deref coercion isn't taking place when using the operators (I am not sure why this is), however borrows may still be used due to an implementation of PartialEq.
From the Rust documentation of PartialEq the following implementation can be seen:
impl<'a, 'b, A, B> PartialEq<&'b B> for &'a A
where
A: PartialEq<B> + ?Sized,
B: ?Sized,
This states there is an implementation for a borrow of type B and a borrow for type A if there exists an implementation of PartialEq for type A and B.
Given this definition, &i32 == &i32 can be used because i32 implements PartialEq and given the above impl this dictates PartialEq is implemented for borrows of i32s. This then leads to the recursive case that &&i32 == &&i32 works because PartialEq is implemented for &i32 so given the above impl PartialEq is also implemented for &&i32.
Because of the definition of this implementation the number of borrows on both sides must be the same.
In Rust 1.14, the Index trait is defined as follows:
pub trait Index<Idx> where Idx: ?Sized {
type Output: ?Sized;
fn index(&self, index: Idx) -> &Self::Output;
}
The implicit Sized bound of the Output type is relaxed with ?Sized here. Which makes sense, because the index() method returns a reference to Output. Thus, unsized types can be used, which is useful; example:
impl<T> Index<Range<usize>> for Vec<T> {
type Output = [T]; // unsized!
fn index(&self, index: Range<usize>) -> &[T] { … } // no problem: &[T] is sized!
}
The Idx type parameter's implicit bound is also relaxed and can be unsized. But Idx is used by value as method argument and using unsized types as arguments is not possible AFAIK. Why is Idx allowed to be unsized?
I'm pretty sure this is just an accident of history. That looser bound was introduced in 2014. At that time, the trait looked a bit different:
// Syntax predates Rust 1.0!
pub trait Index<Sized? Index, Sized? Result> for Sized? {
/// The method for the indexing (`Foo[Bar]`) operation
fn index<'a>(&'a self, index: &Index) -> &'a Result;
}
Note that at this point in time, the Index type was passed by reference. Later on the renamed Idx type changed to pass by value:
fn index<'a>(&'a self, index: Idx) -> &'a Self::Output;
However, note that both forms coexisted in the different compiler bootstrap stages. That's probably why the optional Sized bound couldn't be immediately removed. It's my guess that it basically was forgotten due to more important changes, and now we are where we are.
It's an interesting thought experiment to decide if restricting the bound (by removing ?Sized) would break anything... maybe someone should submit a PR... ^_^
Thought experiment over! Lukas submitted a PR! There's been discussion that it might break downstream code that creates subtraits of Index like:
use std::ops::Index;
trait SubIndex<I: ?Sized>: Index<I> { }
There's also talk that someday, we might want to pass dynamically-sized types (DSTs) by value, although I don't understand how.
I made a two element Vector struct and I want to overload the + operator.
I made all my functions and methods take references, rather than values, and I want the + operator to work the same way.
impl Add for Vector {
fn add(&self, other: &Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
Depending on which variation I try, I either get lifetime problems or type mismatches. Specifically, the &self argument seems to not get treated as the right type.
I have seen examples with template arguments on impl as well as Add, but they just result in different errors.
I found How can an operator be overloaded for different RHS types and return values? but the code in the answer doesn't work even if I put a use std::ops::Mul; at the top.
I am using rustc 1.0.0-nightly (ed530d7a3 2015-01-16 22:41:16 +0000)
I won't accept "you only have two fields, why use a reference" as an answer; what if I wanted a 100 element struct? I will accept an answer that demonstrates that even with a large struct I should be passing by value, if that is the case (I don't think it is, though.) I am interested in knowing a good rule of thumb for struct size and passing by value vs struct, but that is not the current question.
You need to implement Add on &Vector rather than on Vector.
impl<'a, 'b> Add<&'b Vector> for &'a Vector {
type Output = Vector;
fn add(self, other: &'b Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
In its definition, Add::add always takes self by value. But references are types like any other1, so they can implement traits too. When a trait is implemented on a reference type, the type of self is a reference; the reference is passed by value. Normally, passing by value in Rust implies transferring ownership, but when references are passed by value, they're simply copied (or reborrowed/moved if it's a mutable reference), and that doesn't transfer ownership of the referent (because a reference doesn't own its referent in the first place). Considering all this, it makes sense for Add::add (and many other operators) to take self by value: if you need to take ownership of the operands, you can implement Add on structs/enums directly, and if you don't, you can implement Add on references.
Here, self is of type &'a Vector, because that's the type we're implementing Add on.
Note that I also specified the RHS type parameter with a different lifetime to emphasize the fact that the lifetimes of the two input parameters are unrelated.
1 Actually, reference types are special in that you can implement traits for references to types defined in your crate (i.e. if you're allowed to implement a trait for T, then you're also allowed to implement it for &T). &mut T and Box<T> have the same behavior, but that's not true in general for U<T> where U is not defined in the same crate.
If you want to support all scenarios, you must support all the combinations:
&T op U
T op &U
&T op &U
T op U
In rust proper, this was done through an internal macro.
Luckily, there is a rust crate, impl_ops, that also offers a macro to write that boilerplate for us: the crate offers the impl_op_ex! macro, which generates all the combinations.
Here is their sample:
#[macro_use] extern crate impl_ops;
use std::ops;
impl_op_ex!(+ |a: &DonkeyKong, b: &DonkeyKong| -> i32 { a.bananas + b.bananas });
fn main() {
let total_bananas = &DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = &DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
}
Even better, they have a impl_op_ex_commutative! that'll also generate the operators with the parameters reversed if your operator happens to be commutative.