How should library crates be designed around Rust's Orphan Rule? - rust

My understanding of the orphan rule of interest is that:
For any impl of a Trait on a Type, either the Trait or the Type must be defined in the same crate as the impl.
or equivalently:
It is impossible to implement a trait defined in a foreign crate on a type which is also defined in a foreign crate.
So the question is, how should one design a library crate that provides a type which should implement a foreign trait, but leave it to the consumer crate to define the implementation of that trait?
For example, imagine a hypothetical library crate, cards, which provides types representing cards in a standard 52-card deck of playing cards intended for use in external crates to facilitate the development of card game programs. Such a library might provide the following type:
/// Ranks of a standard French-suited playing card
pub enum Rank {
Ace,
Two,
Three,
Four,
Five,
Six,
Seven,
Eight,
Nine,
Ten,
Jack,
Queen,
King,
}
Since a substantial portion of popular card games involve comparing the ranks of 2 cards, it should make sense for enum Rank to implement std::cmp::PartialOrd so that ranks can easily be compared with the <, >, <=, and >= operators. But crate cards should not define this implementation because the ranks of playing cards are not intrinsically equipped with a partial ordering relation; such a relation is instead optionally imposed on them by the specific game within which the cards are being used and in general, the implementation varies by game. For example, some games consider Ace to be greater than all other ranks while others consider Ace to be less than all other ranks and certain games may not even compare ranks at all and therefor not need a PartialOrd implementation.
In keeping with best practice regarding separation of concerns, it seems obvious that crate cards should not implement PartialOrd, but instead leave its implementation up to the consumers of cards, allowing each consumer to define their own partial ordering relation for the ranks. However, since in a crate depending on cards, both Rank and PartialOrd are crate-foreign, this is prohibited by the Orphan rule.
What is the best way to deal with this scenario? It seems to me the only correct thing to do is to refrain from implementing PartialOrd for Rank and let the consumers make their own functions for rank comparison like:
fn blackjack_rank_compare(first: &Rank, second: &Rank) -> Option<Ordering> {...}
fn poker_rank_compare(first: &Rank, second: &Rank) -> Option<Ordering> {...}
fn war_rank_compare(first: &Rank, second: &Rank) -> Option<Ordering> {...}
// etc.
However this is inconvenient. Is there any viable workaround for this problem?

There are a lot of answers to your question.
You could for example use a type parameter for defining the ordering (playground):
use core::cmp::{ PartialOrd, Ordering, };
use std::marker::PhantomData;
pub enum Card {
Ace, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Jack, Queen, King,
}
pub use Card::{
Ace, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Jack, Queen, King,
};
// Evaluated card structure; default game is DefaultT
pub struct MyValuedCard<T=DefaultT>(Card,PhantomData<T>);
impl<T> MyValuedCard<T> {
pub fn new(card: Card) -> Self { Self(card,PhantomData) }
}
trait MyOrder { // Trait for customizing the order
fn order(c: &Card) -> u8;
}
// PartialEq and PartialOrd implementation
impl<T> PartialEq for MyValuedCard<T> where T: MyOrder {
fn eq(self: &Self, &Self(ref second, _): &Self,) -> bool {
PartialEq::eq(&T::order(&self.0), &T::order(second))
}
}
impl<T> PartialOrd for MyValuedCard<T> where T: MyOrder {
fn partial_cmp(self: &Self, &Self(ref second, _): &Self,) -> Option<Ordering> {
PartialOrd::partial_cmp(&T::order(&self.0), &T::order(second))
}
}
// Default game with its order
pub enum DefaultT {}
impl MyOrder for DefaultT {
fn order(c: &Card) -> u8 {
match c {
Ace => 1, Two => 2, Three => 3, Four => 4, Five => 5, _ => 6,
}
}
}
// Game G1 defined by user
pub enum G1 {}
impl MyOrder for G1 {
fn order(c: &Card) -> u8 {
match c {
Ace => 1, Two => 2, Three => 3, Four => 4, Five => 5, _ => 6,
}
}
}
// Game G2 defined by user
pub enum G2 {}
impl MyOrder for G2 {
fn order(c: &Card) -> u8 {
match c {
Ace => 6, Two => 5, Three => 4, Four => 3, Five => 2, _ => 1,
}
}
}
fn main() {
// Default game
let ace: MyValuedCard = MyValuedCard::new(Ace);
let king: MyValuedCard = MyValuedCard::new(King);
// Game G1
let ace_g1 = MyValuedCard::<G1>::new(Ace);
let king_g1 = MyValuedCard::<G1>::new(King);
// Game G2
let ace_g2 = MyValuedCard::<G2>::new(Ace);
let king_g2 = MyValuedCard::<G2>::new(King);
println!("ace <= king -> {}", ace <= king);
println!("ace_g1 <= king_g1 -> {}", ace_g1 <= king_g1);
println!("ace_g2 <= king_g2 -> {}", ace_g2 <= king_g2);
}
which results in:
Standard Error
Compiling playground v0.0.1 (/playground)
Finished dev [unoptimized + debuginfo] target(s) in 0.86s
Running `target/debug/playground`
Standard Output
ace <= king -> true
ace_g1 <= king_g1 -> true
ace_g2 <= king_g2 -> false
In this case, the user of your crate is able to change the order of your cards by defining a type that implements MyOrder.
But have I well understood your question?

Related

Panic if the capacity of a vector is increased

I am working on implementing a sieve of atkins as my first decently sized program in rust. This algorithm takes a number and returns a vector of all primes below that number. There are two different vectors I need use this function.
BitVec 1 for prime 0 for not prime (flipped back and forth as part of the algorithm).
Vector containing all known primes.
The size of the BitVec is known as soon as the function is called. While the final size of the vector containing all known primes is not known, there are relatively accurate upper limits for the number of primes in a range. Using these I can set the size of the vector to an upper bound then shrink_to_fit it before returning. The upshot of this neither array should ever need to have it's capacity increased while the algorithm is running, and if this happens something has gone horribly wrong with the algorithm.
Therefore, I would like my function to panic if the capacity of either the vector or the bitvec is changed during the running of the function. Is this possible and if so how would I be best off implementing it?
Thanks,
You can assert that the vecs capacity() and len() are different before each push:
assert_ne!(v.capacity(), v.len());
v.push(value);
If you want it done automatically you'd have to wrap your vec in a newtype:
struct FixedSizeVec<T>(Vec<T>);
impl<T> FixedSizeVec<T> {
pub fn push(&mut self, value: T) {
assert_ne!(self.0.len(), self.0.capacity())
self.0.push(value)
}
}
To save on forwarding unchanged methods you can impl Deref(Mut) for your newtype.
use std::ops::{Deref, DerefMut};
impl<T> Deref for FixedSizeVec<T> {
type Target = Vec<T>;
fn deref(&self) -> &Vec<T> {
&self.0
}
}
impl<T> DerefMut for FixedSizeVec<T> {
fn deref_mut(&mut self) -> &mut Vec<T> {
&mut self.0
}
}
An alternative to the newtype pattern is to create a new trait with a method that performs the check, and implement it for the vector like so:
trait PushCheck<T> {
fn push_with_check(&mut self, value: T);
}
impl<T> PushCheck<T> for std::vec::Vec<T> {
fn push_with_check(&mut self, value: T) {
let prev_capacity = self.capacity();
self.push(value);
assert!(prev_capacity == self.capacity());
}
}
fn main() {
let mut v = Vec::new();
v.reserve(4);
dbg!(v.capacity());
v.push_with_check(1);
v.push_with_check(1);
v.push_with_check(1);
v.push_with_check(1);
// This push will panic
v.push_with_check(1);
}
The upside is that you aren't creating a new type, but the obvious downside is you need to remember to use the newly defined method.

How to define a range of u8 that can exhaust in pattern match

#[derive(PartialEq, Eq)]
pub struct SlidingRole(u8);
const KING: SlidingRole = SlidingRole(1);
const QUEEN: SlidingRole = SlidingRole(2);
const ALL_SLIDING_ROLES: [SlidingRole; 2] = [
KING,
QUEEN
];
// error[E0004]: non-exhaustive patterns: `SlidingRole(0_u8)` and `SlidingRole(3_u8..=u8::MAX)` not covered
fn adf(role: SlidingRole) -> u8 {
match role {
KING => 3,
QUEEN => 2
}
}
fn main() {
println!("Hello, world!");
}
This looks like a perfect use case for enum. Note that enum in Rust is not like it is in C. In C, enum is a narrow wrapper around an integer. In Rust, it's a full-fledged sum-of-product algebraic data type, capable of describing basically whatever you want.
#[derive(Clone, Copy)]
pub enum SlidingRole { King, Queen }
fn adf(role: SlidingRole) -> u8 {
match role {
SlidingRole::King => 3,
SlidingRole::Queen => 2,
}
}
Now SlidingRole is a new type with exactly two possible values. If you want, you can provide From / TryFrom instances for conversion to / from u8.
I also derive Clone and Copy so you don't have to worry about borrow semantics in the above example. For simple enums (where it's just a list of cases like yours is), this is fine. If your enum starts to contain something like a String, you won't be able to get Copy since it's no longer a plain memcpy to clone data. You'll probably want to add Eq, PartialEq, and potentially an ordering, depending on your specific use cases.
Finally, if you need the underlying representation to be u8 (for instance, if you're interfacing with a C library), you can use a repr annotation to force this.
#[repr(u8)]
#[derive(Clone, Copy)]
pub enum SlidingRole { King, Queen }
Now SlidingRole is genuinely still a new type, distinct from u8. But it's guaranteed to use a memory layout equivalent to u8, so you can pass it to C functions expecting a u8, and you can transmute it to/from a u8 safely. This is a niche and more advanced use case, so I don't recommend doing it unless you specifically need the data to be a u8.
In particular, you shouldn't do this for the purposes of premature optimization; Rust's compiler is already really good at optimizing datatypes, so "I think this fits best in u8" is not a good use case; the compiler is smarter than you or I when it comes to building assembly code. Stick with a plain enum unless you have a very niche use case.

How to design DNA/RNA/protein sequences in Rust

at the moment I'm learning Rust. Comming from a biology background I thought a good training would be to write a small bioinformatics tool. I came up with the idea to implement a struct/enum for sequences that provide some of the most common methods for such sequences like transcription, translation, calculating GC content etc. However, there are quite some design choices to be made beforehand and I'm a bit lost at this point. My problem is the following:
There are essentially three types of sequences to be considered: DNA, RNA and proteins. All of these are sequences and they share some behavior (for instance it should be possible to count the appearances of symbols in all of them) but also have differences (a DNA can be transcribed, a protein cannot). Inspired by the IPAddr enum from the standard library that was mentioned in the Rust book I started by creating an enum that reflects the three possible sequence types.
pub enum Sequence {
DNA(DNASeq),
RNA(RNASeq),
Protein(AASeq)
}
pub struct DNASeq {
seq: String
}
pub struct RNASeq {
seq: String
}
pub struct AASeq { // AA = Amino acid => protein.
seq: String
}
As I mentioned some methods make sense for some sequences but not for others. For instance, DNA can be transcribed to RNA but a protein cannot be transcribed. So I decided to implement a transcribe method for the enum and a corresponding method for DNASeq:
impl Sequence {
pub fn transcribe(&self) -> Sequence::RNA {
match self {
DNA(seq) => Sequence::RNA(seq.transcribe()),
RNA(seq) => Sequence::RNA(seq),
AASeq(_) => Err("Amino acid sequences cannot be transcribed"),
}
}
}
impl DNASeq {
pub fn transcribe(&self) -> RNASeq {
let seq = self.seq.chars()
.map(|x| match x {
'T' => 'U',
_ => x
})
.collect();
RNASeq{ seq }
}
}
For me, that makes sense so far. However, what about methods that are common to all three sequence variants? For instance, what about a method that counts the appearance of symbols in a sequence. The code is essentially the same in all cases. A naive solution would be to implement such a method for each struct:
impl Sequence {
// ...
pub fn count(&self) -> HashMap<char, usize> {
match self {
DNA(seq) => seq.count(),
RNA(seq) => seq.count(),
Protein(seq) => seq.count()
}
}
}
impl DNASeq {
// ...
pub fn count(&self) -> HashMap<char, usize> {
let mut cnt = HashMap::new();
for c in self.seq.chars() {
*cnt.entry(c).or_insert(0) += 1;
}
cnt
}
}
impl RNASeq {
// As above.
}
impl AASeq {
// As above.
}
This leads to a lot of code duplication. Of course, I could write some function that does the job and then calling that function from the enum method but that'd separate the count implementation from the structs. Another idea was to create a trait and provide a default implementation but I cannot access the structs fields from a default implementation (which makes sense to me) so this also doesn't work. What would be a better way to design this? Or is my entire approach questionable? Do you have any further valuable criticism?
Thank you a lot in advance!

Vector of traits (dynamic dispatch) which contains associated type (also dynamic dispatch) [duplicate]

I have a program that involves examining a complex data structure to see if it has any defects. (It's quite complicated, so I'm posting example code.) All of the checks are unrelated to each other, and will all have their own modules and tests.
More importantly, each check has its own error type that contains different information about how the check failed for each number. I'm doing it this way instead of just returning an error string so I can test the errors (it's why Error relies on PartialEq).
My Code So Far
I have traits for Check and Error:
trait Check {
type Error;
fn check_number(&self, number: i32) -> Option<Self::Error>;
}
trait Error: std::fmt::Debug + PartialEq {
fn description(&self) -> String;
}
And two example checks, with their error structs. In this example, I want to show errors if a number is negative or even:
#[derive(PartialEq, Debug)]
struct EvenError {
number: i32,
}
struct EvenCheck;
impl Check for EvenCheck {
type Error = EvenError;
fn check_number(&self, number: i32) -> Option<EvenError> {
if number < 0 {
Some(EvenError { number: number })
} else {
None
}
}
}
impl Error for EvenError {
fn description(&self) -> String {
format!("{} is even", self.number)
}
}
#[derive(PartialEq, Debug)]
struct NegativeError {
number: i32,
}
struct NegativeCheck;
impl Check for NegativeCheck {
type Error = NegativeError;
fn check_number(&self, number: i32) -> Option<NegativeError> {
if number < 0 {
Some(NegativeError { number: number })
} else {
None
}
}
}
impl Error for NegativeError {
fn description(&self) -> String {
format!("{} is negative", self.number)
}
}
I know that in this example, the two structs look identical, but in my code, there are many different structs, so I can't merge them. Lastly, an example main function, to illustrate the kind of thing I want to do:
fn main() {
let numbers = vec![1, -4, 64, -25];
let checks = vec![
Box::new(EvenCheck) as Box<Check<Error = Error>>,
Box::new(NegativeCheck) as Box<Check<Error = Error>>,
]; // What should I put for this Vec's type?
for number in numbers {
for check in checks {
if let Some(error) = check.check_number(number) {
println!("{:?} - {}", error, error.description())
}
}
}
}
You can see the code in the Rust playground.
Solutions I've Tried
The closest thing I've come to a solution is to remove the associated types and have the checks return Option<Box<Error>>. However, I get this error instead:
error[E0038]: the trait `Error` cannot be made into an object
--> src/main.rs:4:55
|
4 | fn check_number(&self, number: i32) -> Option<Box<Error>>;
| ^^^^^ the trait `Error` cannot be made into an object
|
= note: the trait cannot use `Self` as a type parameter in the supertraits or where-clauses
because of the PartialEq in the Error trait. Rust has been great to me thus far, and I really hope I'm able to bend the type system into supporting something like this!
When you write an impl Check and specialize your type Error with a concrete type, you are ending up with different types.
In other words, Check<Error = NegativeError> and Check<Error = EvenError> are statically different types. Although you might expect Check<Error> to describe both, note that in Rust NegativeError and EvenError are not sub-types of Error. They are guaranteed to implement all methods defined by the Error trait, but then calls to those methods will be statically dispatched to physically different functions that the compiler creates (each will have a version for NegativeError, one for EvenError).
Therefore, you can't put them in the same Vec, even boxed (as you discovered). It's not so much a matter of knowing how much space to allocate, it's that Vec requires its types to be homogeneous (you can't have a vec![1u8, 'a'] either, although a char is representable as a u8 in memory).
Rust's way to "erase" some of the type information and gain the dynamic dispatch part of subtyping is, as you discovered, trait objects.
If you want to give another try to the trait object approach, you might find it more appealing with a few tweaks...
You might find it much easier if you used the Error trait in std::error instead of your own version of it.
You may need to impl Display to create a description with a dynamically built String, like so:
impl fmt::Display for EvenError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{} is even", self.number)
}
}
impl Error for EvenError {
fn description(&self) -> &str { "even error" }
}
Now you can drop the associated type and have Check return a trait object:
trait Check {
fn check_number(&self, number: i32) -> Option<Box<Error>>;
}
your Vec now has an expressible type:
let mut checks: Vec<Box<Check>> = vec![
Box::new(EvenCheck) ,
Box::new(NegativeCheck) ,
];
The best part of using std::error::Error...
is that now you don't need to use PartialEq to understand what error was thrown. Error has various types of downcasts and type checks if you do need to retrieve the concrete Error type out of your trait object.
for number in numbers {
for check in &mut checks {
if let Some(error) = check.check_number(number) {
println!("{}", error);
if let Some(s_err)= error.downcast_ref::<EvenError>() {
println!("custom logic for EvenErr: {} - {}", s_err.number, s_err)
}
}
}
}
full example on the playground
I eventually found a way to do it that I'm happy with. Instead of having a vector of Box<Check<???>> objects, have a vector of closures that all have the same type, abstracting away the very functions that get called:
fn main() {
type Probe = Box<Fn(i32) -> Option<Box<Error>>>;
let numbers: Vec<i32> = vec![ 1, -4, 64, -25 ];
let checks = vec![
Box::new(|num| EvenCheck.check_number(num).map(|u| Box::new(u) as Box<Error>)) as Probe,
Box::new(|num| NegativeCheck.check_number(num).map(|u| Box::new(u) as Box<Error>)) as Probe,
];
for number in numbers {
for check in checks.iter() {
if let Some(error) = check(number) {
println!("{}", error.description());
}
}
}
}
Not only does this allow for a vector of Box<Error> objects to be returned, it allows the Check objects to provide their own Error associated type which doesn't need to implement PartialEq. The multiple ases look a little messy, but on the whole it's not that bad.
I'd suggest you some refactoring.
First, I'm pretty sure, that vectors should be homogeneous in Rust, so there is no way to supply elements of different types for them. Also you cannot downcast traits to reduce them to a common base trait (as I remember, there was a question about it on SO).
So I'd use algebraic type with explicit match for this task, like this:
enum Checker {
Even(EvenCheck),
Negative(NegativeCheck),
}
let checks = vec![
Checker::Even(EvenCheck),
Checker::Negative(NegativeCheck),
];
As for error handling, consider use FromError framework, so you will able to involve try! macro in your code and to convert error types from one to another.

Automatically implement traits of enclosed type for Rust newtypes (tuple structs with one field)

In Rust, tuple structs with only one field can be created like the following:
struct Centimeters(i32);
I want to do basic arithmetic with Centimeters without extracting their "inner" values every time with pattern matching, and without implementing the Add, Sub, ... traits and overloading operators.
What I want to do is:
let a = Centimeters(100);
let b = Centimeters(200);
assert_eq!(a + a, b);
is there a way to do it without extracting their "inner" values every time with pattern matching, and without implementing the Add, Sub, ... traits and overloading operators?
No, the only way is to implement the traits manually. Rust doesn't have an equivalent to the Haskell's GHC extension GeneralizedNewtypeDeriving which allows deriving on wrapper types to automatically implement any type class/trait that the wrapped type implements (and with the current set-up of Rust's #[derive] as a simple AST transformation, implementing it like Haskell is essentially impossible.)
To abbreviate the process, you could use a macro:
use std::ops::{Add, Sub};
macro_rules! obvious_impl {
(impl $trait_: ident for $type_: ident { fn $method: ident }) => {
impl $trait_<$type_> for $type_ {
type Output = $type_;
fn $method(self, $type_(b): $type_) -> $type_ {
let $type_(a) = self;
$type_(a.$method(&b))
}
}
}
}
#[derive(Eq, PartialEq, Ord, PartialOrd, Clone, Debug)]
pub struct Centimeters(i32);
obvious_impl! { impl Add for Centimeters { fn add } }
obvious_impl! { impl Sub for Centimeters { fn sub } }
#[derive(Eq, PartialEq, Ord, PartialOrd, Clone, Debug)]
pub struct Inches(i32);
obvious_impl! { impl Add for Inches { fn add } }
obvious_impl! { impl Sub for Inches { fn sub } }
fn main() {
let a = Centimeters(100);
let b = Centimeters(200);
let c = Inches(10);
let d = Inches(20);
println!("{:?} {:?}", a + b, c + d); // Centimeters(300) Inches(30)
// error:
// a + c;
}
playpen
I emulated the normal impl syntax in the macro to make it obvious what is happening just by looking at the macro invocation (i.e. reducing the need to look at the macro definition), and also to maintain Rust's natural searchability: if you're looking for traits on Centimeters just grep for for Centimeters and you'll find these macro invocations along with the normal impls.
If you are accessing the contents of the Centimeters type a lot, you could consider using a proper struct with a field to define the wrapper:
struct Centimeters { amt: i32 }
This allows you to write self.amt instead of having to do the pattern matching. You can also define a function like fn cm(x: i32) -> Centimeters { Centimeters { amt: x } }, called like cm(100), to avoid the verbosity of constructing a full struct.
You can also access the inner values of a tuple struct using the .0, .1 syntax.
I made the derive_more crate for this problem. It can derive lots of traits for structs of which the elements implement them.
You need to add derive_more to your Cargo.toml. Then you can write:
#[macro_use]
extern crate derive_more;
#[derive(Clone, Copy, Debug, PartialEq, Eq, Add)]
struct Centimeters(i32);
fn main() {
let a = Centimeters(100);
let b = Centimeters(200);
assert_eq!(a + a, b);
}
For Rust version 1.10.0, it seems to me that type aliases would perfectly fit the situation you are describing. They simply give a type a different name.
Let's say all centimeters are u32s. Then I could just use the code
type Centimeters = u32;
Any trait that u32 has, Centimeters would automatically have. This doesn't eliminate the possibility of adding Centimeters to Inches. If you're careful, you wouldn't need different types.

Resources