Rust: Implement Hash/Eq - rust

What is the simplest way to implement Hash/Eq on a struct so that two different instances with the same properties will be unequal?
Consider the following struct:
struct Person {
name: String,
}
What is the simplest way to implement Hash/Eq so that two different people are NOT equal, even if they have the same name? Is the only way to Box<> something?

You don't. Rust isn't Java. Two identical Person instances are represented in memory by the exact same sequence of bits and are, thus, indistinguishable. And Box won't even save you here: The Eq instance for Box delegates to the contained value.
The only way to compare for pointer equality in the way you're describing is with std::ptr::eq, using std::pin to ensure that the pointers don't change. But, and I cannot emphasize this enough, this is the wrong approach. You don't want this kind of equality. It doesn't make sense in Rust. If Person { name: "Joe" } and Person { name: "Joe" } are meant to be distinct objects, then your data structure is poorly designed. It's your job to add a distinguishing field. If these are backed by a database, you might use
struct Person {
primary_key: u64,
name: String,
}
Or maybe everybody has a hexadecimal employee ID.
struct Person {
employee_id: String,
name: String,
}
The point is that the data structure itself (Person in our example) encodes everything about it. Rust eschews the Java-esque notion that every object intrinsically has an identity distinct from all others, in favor of your data explicitly describing itself to the world.

As pointed out by #SilvioMayolo, value "identity" is not a thing in Rust because Rust's values are not heap-allocated by default. While you can take an address of any value, you can't use it as to represent identity the address changes every time the value is moved to a different variable, passed to a function, or inserted in a container. You can make the address stable by heap-allocating the value, but that requires an allocation when the value is created, and an extra dereference on every access.
For values that heap-allocate their content, such as Strings, you could use the address of the contents to represent identity, as shown in #Smitop's answer. But that is also not a good representation of identity because it changes any time the string re-allocates, e.g. if you append some data to it. If you never plan to grow your strings, then that option will work well. Otherwise, you must use something else.
In general, instead of using an address to represent identity, you can explicitly track the identity as part of the object. Nothing stops you from adding a field representing identity, and assigning it in the constructor:
static NEXT_ID: AtomicU64 = AtomicU64::new(0);
pub struct Person {
id: u64,
name: String,
}
impl Person {
/// Create a new unique person.
pub fn new(name: String) -> Self {
Person {
id: NEXT_ID.fetch_add(1, Ordering::Relaxed),
name,
}
}
/// Identity of a `Person`.
///
/// Two persons with the same name will still have different identities.
pub fn id(&self) -> u64 {
self.id
}
}
Then you can implement Hash and Eq to use this identity:
impl Hash for Person {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.id.hash(state);
}
}
impl Eq for Person {}
impl PartialEq for Person {
fn eq(&self, other: &Self) -> bool {
self.id == other.id
}
}
// We can't #[derive(Clone)] because that would clone the id, and we
// want to generate a new one instead.
impl Clone for Person {
fn clone(&self) -> Person {
Person::new(self.name.clone())
}
}
Playground

You can compare the String pointers instead of the actual String data. Of course, this method has issue that the Hash of a person will change every execution, but you can't really get around that since there's no extra data in a Person:
use std::{cmp, hash};
impl cmp::PartialEq for Person {
fn eq(&self, other: &Self) -> bool {
self.name.as_ptr() == other.name.as_ptr()
}
}
impl cmp::Eq for Person {}
impl hash::Hash for Person {
fn hash<H: hash::Hasher>(&self, state: &mut H) {
self.name.as_ptr().hash(state);
}
}

Related

Rust - Typescript - Keyof

I'm new to Rust and just wondering if there's an equivalent of the keyof (like in TypeScript) operator in Rust.
I don't know if this is possible, but I'm trying to access the key and value of a struct within another struct.
example:
interface Events {
msg:(data:string)=>any,
abort:()=>any
}
class EventEmitter<T>{
on(event: keyof T,callback:T[keyof T])
}
I'm trying to achieve the same on function in rust.
struct Events {
msg: Fn(&str)->(),
abort: Fn()->(),
}
struct EventEmitter<T> {
pub listeners: Vec<Listener<T>>,
}
Context: I'm trying to recreate EventEimtter exactly like node.js & ts
What you're describing is reflection. As mentioned in the comments to your question, Rust does not have reflection as a native feature. Using a String to access members of your struct could potentially create unpredictable or undefined behavior.
If it truly is important to access members of your struct that have the same type, you could look into creating a trait called "Keyable" or something similar. This struct should (probably) look something like this.
pub trait Keyable<T> {
fn get_string(&self, for_key: T) -> Option<&String>;
fn get_i32(&self, key: T) -> Option<&i32>;
}
pub enum UserKeys {
Id,
Username,
Password
}
pub struct User {
id: i32,
username: String,
password: String
}
impl Keyable<UserKeys> for User {
fn get_string(&self, key: UserKeys) -> Option<&String> {
match key {
UserKeys::Username => Some(&self.username),
UserKeys::Password => Some(&self.password),
_ => None
}
}
fn get_i32(&self, key: UserKeys) -> Option<&i32> {
match key {
UserKeys::Id => Some(&self.id),
_ => None
}
}
}
This would create a valid implementation of reflection in Rust. It is worth noting that you would not necessarily have to type all of this by hand; you could look into creating a Derive macro (Rust Book).
You could then add a type bound to your EventEmitter so it becomes:
struct EventEmitter<K, T: Keyable<K>> {
pub listeners: Vec<Listener<T>>,
}
This code says "I want to create a struct that can hold many instances of a Listener for type Keyable (T) with a certain key type (K).
There would still be quite a bit of work to do in order to get your events all connected, but taking care of reflection is a big step.
This is an example I've written that allows a derivation of a struct called ToJson. It allows all implementors to automatically inherit a to_json function that creates a String of all its properties. GitHub

How can I simultaneously access data from and call a method from a struct?

For my first project I wanted to create a terminal implementation of Monopoly. I have created a Card, Player, and App structs. Originally my plan was to create an array inside the App struct holding a list of cards which I could then randomly select and run an execute() method on, which would push a log to the logs field of the App. What I thought would be best was this:
pub struct Card {
pub group: u8,
pub name: String,
pub id: u8,
}
impl Card {
pub fn new(group: u8, name: String, id: u8) -> Card {
Card { group, name, id }
}
pub fn execute(self, app: &mut App) {
match self.id {
1 => {
app.players[app.current_player].index = 24;
app.push_log("this works".to_string(), "LOG: ".to_string());
}
_ => {}
}
}
}
pub struct Player<'a> {
pub name: &'a str,
pub money: u64,
pub index: u8,
pub piece: char,
pub properties: Vec<u8>,
pub goojf: u8, // get out of jail freeh
}
impl<'a> Player<'a> {
pub fn new(name: &'a str, piece: char) -> Player<'a> {
Player {
name,
money: 1500,
index: 0,
piece,
properties: Vec::new(),
goojf: 0,
}
}
}
pub struct App<'a> {
pub cards: [Card; 1],
pub logs: Vec<(String, String)>,
pub players: Vec<Player<'a>>,
pub current_player: usize,
}
impl<'a> App<'a> {
pub fn new() -> App<'a> {
App {
cards: [Card::new(
11,
"Take a ride on the Penn. Railroad!".to_string(),
0,
)],
logs: vec![(
String::from("You've begun a game!"),
String::from("BEGIN!:"),
)],
players: vec![Player::new("Joe", '#')],
current_player: 0,
}
}
pub fn draw_card(&mut self) {
if self.players[self.current_player].index == 2
|| self.players[self.current_player].index == 17
|| self.players[self.current_player].index == 33
{
self.cards[0].execute(self);
}
}
pub fn push_log(&mut self, message: String, status: String) {
self.logs.push((message, status));
}
}
fn main() {}
However this code throws the following error:
error[E0507]: cannot move out of `self.cards[_]` which is behind a mutable reference
--> src/main.rs:76:13
|
76 | self.cards[0].execute(self);
| ^^^^^^^^^^^^^ move occurs because `self.cards[_]` has type `Card`, which does not implement the `Copy` trait
I managed to fix this error by simply declaring the array of cards in the method itself, however, this seem to be pretty brute and not at all efficient, especially since other methods in my program depend on cards. How could I just refer to a single array of cards for all of my methods implemented in App or elsewhere?
As others have pointed out, Card::execute() needs to take &self instead of self, but then you run into the borrow checker issue, which I'll spend the rest of this answer discussing.
It may seem odd, but Rust is actually protecting you here. The borrow checker does not look into functions to see what they do, so it has no idea that Card::execute() won't do something to invalidate the referenced passed as the first argument. For example, if App::cards was a vector instead of an array, it could clear the vector.
Something that could actually practically happen here to cause undefined behavior would be if Card::execute() took a string slice from self.name and then cleared the card's name attribute through the mutable reference to app. None of these actions would be prohibited, and you'd be left with an invalid reference to a string slice. This is why the borrow checker isn't letting you make this method call, and this is exactly the kind of accident that Rust is designed to prevent.
There's a few ways around this. One option is to only pass the pieces of the App value needed to complete the task. You can borrow different parts of the same value. The problem here is that the reborrow of self overlaps with the borrow self.cards[0]. Passing each field separately isn't very ergonomic though, as in this case you'll wind up having to pass a reference to pretty much everything else on App.
It looks like the Card values don't actually contain any game state, and are used as data for the game engine. If this is the case, then the cards can live outside of App, like so:
pub struct App<'a> {
// Changed to an unowned slice.
pub cards: &'a [Card],
pub logs: Vec<(String, String)>,
pub players: Vec<Player<'a>>,
pub current_player: usize,
}
impl<'a> App<'a> {
pub fn new(cards: &'a [Card]) -> App<'a> {
App {
cards,
// ...
Then in your main() you can initialize the data and borrow it:
fn main() {
let cards: [Card; 1] = [Card::new(
11,
"Take a ride on the Penn. Railroad!".to_string(),
0,
)];
let app = App::new(&cards);
}
This solves the compilation problem. I'd suggest making other changes, as well:
App should probably be renamed GameState or something to emphasize that this struct should contain only mutable game state, and no immutable reference data.
Player's name field should probably be an owned String instead of an unowned &str, otherwise some other entity in the program will need to own a string slice for the duration of the program so that Player can borrow it.
Does a card need to be able to mutate the cards in self? If you know that execute will not need access to cards, the easiest solution is to split App into further structs so you can better limit the access of the function.
Unless there is another layer to your program that interacts with App as a singular object, requiring everything be in a single struct to perform operations will likely only constrain your code. If possible it is better to split it into its core components so you can be more selective when sharing references and avoid needing to make so many fields pub.
Here is a rough example:
pub struct GameState<'a> {
pub players: Vec<Player<'a>>,
pub current_player: usize,
}
/// You may want to look into using https://crates.io/crates/log instead to make logging easier.
pub struct Logs {
entries: Vec<(String, String)>,
}
impl Logs {
/// Use ToOwned so you can use both both &str and String
pub fn push<S: ToOwned<String>>(&mut self, message: S, status: S) {
self.entries.push((message.to_owned(), status.to_owned()));
}
}
pub struct Card {
group: u8,
name: String,
id: u8,
}
impl Card {
pub fn new(group: u8, name: String, id: u8) -> Self {
Card { group, name, id }
}
/// Use &self because there is no reason we need are required to consume the card
pub fn execute(&self, state: &mut GameState, logs: &mut Logs) {
if self.id == 1{
state.players[state.current_player].index = 24;
logs.push("this works", "LOG");
}
}
}
Also as a side note, execute consumes a card when called since it does not take a reference. Since Card does not implement Copy, that would require it be removed from cards so it can be moved.
Misc Tips and Code Review
Using IDs
It looks like you frequently use IDs to distinguish between items. However, I think your code will look cleaner and be easier to write if you used more human readable types. For example, do cards need an ID? Usually it is preferable to define your struct based on how the data is used. Last time I played monopoly, I don't remember picking up a card and referring to the card ID to determine what to do. I would instead recommend defining a Card by how it is used in the came. If I am remembering correctly, each card contains a short message telling the player what to do. Technically a card could consist of just the message, but you can instead make your code a bit cleaner by separating out the action to an enum so actions are not hard coded to the text on the card.
pub struct Card {
text: String,
action: CardAction,
}
// Note: enums which can also hold data, are more commonly referred to as
// "tagged unions" in computer science. It can be a pain to search for them if
// you don't know what they are called.
pub enum CardAction {
ProceedToGo,
GoToJail,
GainOrLoseMoney(i64),
// etc.
}
On a similar note, it looks like you are trying to be memory conscious by using the smallest type required for a given value. I would recommend against this thinking. If a value of 253u8 is equally as invalid as 324234i32, then the smaller type is not doing anything to help you. You might as well use i32/u32 or i64/u64 since most systems will have an easier time operating on these types. The same thing goes for indices and using other integer types instead of usize since choosing to use another type will only give you more work converting it to and from a usize.
Sharing Owned References
Depending on your design philosophy you might want to store a reference to a struct in multiple places. This can be done using a reference counter Rc<T>. Here are some quick examples. Note that these can not be shared between threads.
let property: Rc<Property> = Rc::new(Property::new(/* etc */));
// Holds an owned reference to the same property as property that can be accessed immutably
let ref_to_property: Rc<Property> = property.clone();
// Or if you want interior mutability you can use Rc<RefCell<T>> instead.
let mutable_property = Rc::new(RefCell::new(Property::new(/* etc */)));

What is an idiomatic way to have multiple structs with the same properties in Rust?

I'm aware that Rust does not have inheritance and that the language provides and easy way to share different implementations of the same methods across objects through the use of Traits. But is there an idiomatic way to share property name definitions or do they need to be defined on each struct?
My use case is that I have many different structs that track some information. Each piece of information can be updated and I want each struct to know the date of its last update. Is there a common pattern (maybe macros?) to add a last_update property to all the structs or must I add it to each struct explicitly?
There is currently no way to do this via traits, the closest thing is the "Fields in Traits" RFC (discussion, RFC), but that doesn't seem terribly active as of now.
The simplest way to do this is to have a type / struct with a method and include that field in any struct you want:
struct UpdateTimestamp {
timestamp: Timestamp, // dummy type
}
impl UpdateTimestamp {
fn update(&mut self) {
self.timestamp = now(); // dummy function
}
fn last_updated(&self) -> Timestamp {
self.timestamp
}
}
You could then include this in any struct where you want the functionality:
struct MyStruct {
my_field: u32,
my_other_field: i32,
update_ts: UpdateTimestamp,
}
impl MyStruct {
fn my_field(&self) -> u32 {
// Getter - no update
self.my_field
}
fn set_my_field(&mut self, my_field: u32) {
self.update_ts.update();
self.my_field = my_field;
}
fn last_updated(&self) -> Timestamp {
self.update_ts.last_updated()
}
}
Now you could write a complicated macro for this which automates the implementation part (injects updates into the setters and the last_updated method in the impl block), but unless you're doing this a lot I don't think it would be worth it.

Returning &String vs &str from immutable accessor in Rust

I am designing a simple struct which groups multiple pieces of owned data together. Once the data is inside the struct, I don't want to expose it to mutation. One of the fields of this struct is a String, I am unsure how I want to expose it through its getter function.
The two ways that jump to mind of doing this are as follows:
struct Foo {
bar: String,
}
impl Foo {
// Option 1
fn bar(&self) -> &String { ... }
// Option 2
fn bar(&self) -> &str { ... }
}
I am not sure what the cleanest way to design this would be in Rust. Which is better in a general case? What do the two options conceptually represent to a user of the API?

Is it possible to create a macro that implements Ord by delegating to a struct member?

I have a struct:
struct Student {
first_name: String,
last_name: String,
}
I want to create a Vec<Student> that can be sorted by last_name. I need to implement Ord, PartialOrd and PartialEq:
use std::cmp::Ordering;
impl Ord for Student {
fn cmp(&self, other: &Student) -> Ordering {
self.last_name.cmp(&other.last_name)
}
}
impl PartialOrd for Student {
fn partial_cmp(&self, other: &Student) -> Option<Ordering> {
Some(self.cmp(other))
}
}
impl PartialEq for Student {
fn eq(&self, other: &Student) -> bool {
self.last_name == other.last_name
}
}
This can be quite monotonous and repetitive if you have a lot of structs with an obvious field to sort by. Is it possible to create a macro to automatically implement this?
Something like:
impl_ord!(Student, Student.last_name)
I found Automatically implement traits of enclosed type for Rust newtypes (tuple structs with one field), but it's not quite what I'm looking for.
Yes, you can, but first: please read why you shouldn't!
Why not?
When a type implements Ord or PartialOrd it means that this type has a natural ordering, which in turn means that the ordering implemented is the only logical one. Take integers: 3 is naturally smaller than 4. There are other useful orderings, for sure. You could sort integers in decreasing order instead by using a reversed ordering, but there is only one natural one.
Now you have a type consisting of two strings. Is there a natural ordering? I claim: no! There are a lot of useful orderings, but is ordering by the last name more natural than ordering by the first name? I don't think so.
How to do it then?
There are two other sort methods:
sort_by(), and
sort_by_key().
Both let you modify the way the sorting algorithm compares value. Sorting by the last name can be done like this (full code):
students.sort_by(|a, b| a.last_name.cmp(&b.last_name));
This way, you can specify how to sort on each method call. Sometimes you might want to sort by last name and other times you want to sort by first name. Since there is no obvious and natural way to sort, you shouldn't "attach" any specific way of sorting to the type itself.
But seriously, I want a macro...
Of course, it is possible in Rust to write such a macro. It's actually quite easy once you understand the macro system. But let's not do it for your Student example, because -- as I hope you understand by now -- it's a bad idea.
When is it a good idea? When only one field semantically is part of the type. Take this data structure:
struct Foo {
actual_data: String,
_internal_cache: String,
}
Here, the _internal_cache does not semantically belong to your type. It's just an implementation detail and thus should be ignored for Eq and Ord. The simple macro is:
macro_rules! impl_ord {
($type_name:ident, $field:ident) => {
impl Ord for $type_name {
fn cmp(&self, other: &$type_name) -> Ordering {
self.$field.cmp(&other.$field)
}
}
impl PartialOrd for $type_name {
fn partial_cmp(&self, other: &$type_name) -> Option<Ordering> {
Some(self.cmp(other))
}
}
impl PartialEq for $type_name {
fn eq(&self, other: &$type_name) -> bool {
self.$field == other.$field
}
}
impl Eq for $type_name {}
}
}
Why do I call such a big chunk of code simple you ask? Well, the vast majority of this code is just exactly what you have already written: the impls. I performed two simple steps:
Add the macro definition around your code and think about what parameters we need (type_name and field)
Replace all your mentions of Student with $type_name and all your mentions of last_name with $field
That's why it's called "macro by example": you basically just write your normal code as an example, but can make parts of it variable per parameter.
You can test the whole thing here.
I created a macro which allows implementing Ord by defining expression which will be used to compare elements: ord_by_key::ord_eq_by_key_selector, similar to what you were asking.
use ord_by_key::ord_eq_by_key_selector;
#[ord_eq_by_key_selector(|s| &s.last_name)]
struct Student {
first_name: String,
last_name: String,
}
If you have to sort by different criteria in different cases, you can introduce a containers for your struct which would implement different sorting strategies:
use ord_by_key::ord_eq_by_key_selector;
struct Student {
first_name: String,
last_name: String,
}
#[ord_eq_by_key_selector(|(s)| &s.first_name)]
struct StudentByFirstName(Student);
#[ord_eq_by_key_selector(|(s)| &s.last_name, &s.first_name)]
struct StudentByLastNameAndFirstName(Student);

Resources