I have a collection of types, representing various older and newer versions of my data schema:
struct Version1;
struct Version2;
struct Version3;
struct Version4;
These types can be migrated between each other, one at a time:
impl Version1 { fn migrate_to_v2(self) -> Version2 { Version2 } }
impl Version2 { fn migrate_to_v3(self) -> Version3 { Version3 } }
impl Version3 { fn migrate_to_v4(self) -> Version4 { Version4 } }
I have an enum containing all of these versions, which is the data format I read from the disk:
enum Versioned {
V1(Version1),
V2(Version2),
V3(Version3),
V4(Version4),
}
I'd like to write a function that performs a full migration of a Versioned object to Version4. There are various ways to do it, but all of them have obvious flaws; my question is, are there any other solutions I've overlooked?
Call the methods directly. The problem with this is exponential blowup; as I add more versions over the lifespan of this product, the size of this code will blow up quadratically:
fn migrate_versioned(versioned: Versioned) -> Version4 {
match versioned {
V1(data) => data.migrate_to_v2().migrate_to_v3().migrate_to_v4(),
V2(data) => data.migrate_to_v3().migrate_to_v4(),
V3(data) => data.migrate_to_v4(),
V4(data) => data,
}
}
Use a series of chained if lets. This requires round-tripping through the enum type, and we lose the exhaustiveness checking of a match:
fn migrate_versioned(mut versioned: Versioned) -> Version4 {
use Versioned::*;
if let V1(data) = versioned {
versioned = V2(data.migrate_to_v2());
}
if let V2(data) = versioned {
versioned = V3(data.migrate_to_v3());
}
if let V3(data) = versioned {
versioned = V4(data.migrate_to_v4());
}
if let V4(data) = versioned {
return data;
}
unreachable!();
}
Use a loop. This restores the exhaustiveness checking that was lost in the if let version, but we lose a syntactic halting guarantee, and still have the other flaws from if let:
fn migrate_versioned(mut versioned: Versioned) -> Version4 {
loop {
versioned = match versioned {
V1(data) => V2(data.migrate_to_v2()),
V2(data) => V3(data.migrate_to_v3()),
V3(data) => V4(data.migrate_to_v4()),
V4(data) => break data,
};
}
}
Some chained traits. This is better than the others but is very heavy on boilerplate:
trait ToV2: Sized {
fn migrate_to_v2(self) -> Version2;
}
impl ToV2 for Version1 { ... }
trait ToV3: Sized {
fn migrate_to_v3(self) -> Version3;
}
impl ToV3 for Version2 { ... }
impl<T: ToV2> ToV3 for T {
fn migrate_to_v3(self) -> Version3 { self.migrate_to_v2().migrate_to_v3() }{
}
trait ToV4: Sized {
fn migrate_to_v4(self) -> Version4;
}
impl ToV4 for Version3 { ... }
impl<T: ToV3> ToV4 for T {
fn migrate_to_v4(self) -> Version3 { self.migrate_to_v3().migrate_to_v4() }{
}
fn migrate_versioned(versioned: Versioned) -> Version4 {
match versioned {
V1(data) => data.migrate_to_v4(),
V2(data) => data.migrate_to_v4(),
V3(data) => data.migrate_to_v4(),
V4(data) => data,
}
}
Various macro-based solutions. Generally these use a macro to produce one of the noisy solutions (such as the trait version or the direct call version). Because of trickiness related to creating chaining calls using macro repetition rules (which requires a complex recursive macro, as far as I can tell), these tend to be extremely noisy, to the point where it can't meaningfully be said to be a complexity saver: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c8b8bed94afe654bbee7cfabfe24c784.
Are there any solutions that I've missed here? Conceptually it seems straightforward to want to do something like this:
// V1 starts here
let v2 = v1.migrate_to_v2();
// V2 starts here
let v3 = v2.migrate_to_v3();
// V3 starts here
let v4 = v3.migrate_to_v4();
// v4 starts here
But it's not at all clear to me how (if at all) it's possible to express the control flow for a solution that looks as simple as that.
One way would be a pair of traits. The first trait determines how to get to the next version, and the second trait would complete the chain.
pub trait Migrate {
type NextVersion;
fn migrate(self) -> Self::NextVersion;
}
pub trait MigrateToLatest {
type Latest;
fn migrate_to_latest(self) -> Self::Latest;
}
/// If the next version knows how to MigrateToLatest, call their implementation
impl<M: Migrate> MigrateToLatest for M
where
M::NextVersion: MigrateToLatest,
{
type Latest = <M::NextVersion as MigrateToLatest>::Latest;
#[inline]
fn migrate_to_latest(self) -> Self::Latest {
self.migrate().migrate_to_latest()
}
}
/// Start the waterfall for the LatestVersion type alias (below).
impl MigrateToLatest for LatestVersion {
type Latest = Self;
fn migrate_to_latest(self) -> Self::Latest {
self
}
}
Then we just need to implement Migrate for the previous version every time we have a new release.
impl Migrate for Version1 {
type NextVersion = Version2;
fn migrate(self) -> Self::NextVersion {
Version2
}
}
impl Migrate for Version2 {
type NextVersion = Version3;
fn migrate(self) -> Self::NextVersion {
Version3
}
}
impl Migrate for Version3 {
type NextVersion = Version4;
fn migrate(self) -> Self::NextVersion {
Version4
}
}
When a new version is version is made, the only changes that are needed are updating the LatestVersion and adding a new Migrate from the previous version.
/// Declare the latest version which ends the waterfall
type LatestVersion = Version4;
This approach means we only need to deal with the latest version and don't need to write out the entire chain ourselves.
You may also find it helpful to write your Versioned enum in a macro since you likely need to perform a couple uniform operations on every variant.
macro_rules! make_versioned {
($($name:ident: $type:ty),+) => {
enum Versioned {
/// Define the elements of the macro in the enum
$($name($type)),+
}
impl MigrateToLatest for Versioned {
type Latest = LatestVersion;
fn migrate_to_latest(self) -> Self::Latest {
use Versioned::*;
match self {
// Take advantage of the macro to easily call on every variant
$($name(x) => x.migrate_to_latest()),+
}
}
}
}
}
/// Define the elements in the Versioned enum
make_versioned! {
V1: Version1,
V2: Version2,
V3: Version3,
V4: Version4
}
Related
I have made a minimal example. In lib.rs:
mod sealed {
pub enum Choice {
A,
B,
}
}
pub fn print_choice(choice: sealed::Choice) {
match choice {
sealed::Choice::A => println!("Choice A"),
sealed::Choice::B => println!("Choice B"),
}
}
I think: The enum Choice is public. However, it's in a private mod, and cannot be reached from outside of the crate. Therefore the function print_choice is not callable at all.
What is wrong with my thinking?
What is wrong with my thinking?
You could have something like
pub use sealed::Choice;
at the toplevel. That is a common way to split up the implementation while providing a simple single-module interface to external users.
Or even just an other function returning an instance of Choice. Since it's pub, it's not considered a private type.
If you change pub enum to pub(crate) enum (meaning you state that the enum can not be made visible outside the crate) then the compilation will fail.
An important thing to understand is that Choice is not private. It is inside a private module, and thus unnamable, but it is public.
The only thing the module's privacy affects is that you cannot access the enum via this path. You can do any other thing with it, e.g. accessing it via other path it is reexported into:
mod sealed {
pub enum Choice {
A,
B,
}
}
pub use sealed::Choice;
// In other module
crate::Choice::A;
Or manipulate it with generics and traits, for example:
mod sealed {
pub enum Choice {
A,
B,
}
impl Default for Choice {
fn default() -> Self { Self::A }
}
}
pub fn print_choice(choice: sealed::Choice) { ... }
// In other module
crate::print_choice(Default::default());
mod sealed {
#[derive(Debug)]
pub enum Choice {
A,
B,
}
}
pub fn print_choice(choice: sealed::Choice) { crate::print(choice) }
// In other module
pub fn print<T: Debug>(v: T) { ... }
This private type seems to leak, but you don't have full control over it.
You cannot build such a private Choice, but another public function can provide you with it.
mod outer {
mod sealed {
pub enum Choice {
A,
B,
}
}
pub fn print_choice(choice: sealed::Choice) {
match choice {
sealed::Choice::A => println!("Choice A"),
sealed::Choice::B => println!("Choice B"),
}
}
pub fn make_choice(n: u32) -> sealed::Choice {
if n % 2 == 0 {
sealed::Choice::A
} else {
sealed::Choice::B
}
}
}
fn main() {
// let ch = outer::sealed::Choice::A; // error: module `sealed` is private
let ch = outer::make_choice(2);
outer::print_choice(ch);
}
In C++, we can overload operator bool() to convert a struct to bool:
struct Example {
explicit operator bool() const {
return false;
}
};
int main() {
Example a;
if (a) { /* some work */ }
}
Can we do something simple (and elegant) in Rust so to:
pub struct Example {}
fn main() {
let k = Example {};
if k {
// some work
}
}
There's no direct equivalent of operator bool(). A close alternative would be to implement From (which will also implement Into) and call the conversion explicitly:
pub struct Example;
impl From<Example> for bool {
fn from(_other: Example) -> bool {
false
}
}
fn main() {
let k = Example;
if k.into() {
// some work
}
}
This will take ownership of Example, meaning you can't use k after it's been converted. You could implement it for a reference (impl From<&Example> for bool) but then the call site becomes uglier ((&k).into()).
I'd probably avoid using From / Into for this case. Instead, I'd create a predicate method on the type. This will be more readable and can take &self, allowing you to continue using the value.
See also:
When should I implement std::convert::From vs std::convert::Into?
Rust does not have C++'s implicit type conversion via operator overloading. The closest means of implicit conversion is through a Deref impl, which provides a reference of a different type.
What is possible, albeit not necessarily idiomatic, is to implement the not operator ! so that it returns a boolean value, and perform the not operation twice when needed.
use std::ops::Not;
pub struct Example;
impl Not for Example {
type Output = bool;
fn not(self) -> bool { false }
}
fn main() {
let k = Example;
if !!k {
println!("OK!");
} else {
println!("WAT");
}
}
Playground
You have a few options, but I'd go for one of these:
Into<bool> (From<Example>)
If your trait conceptually represents a bool, but maybe with some extra metadata, you can implement From<Example> for bool:
impl From<Example> for bool {
fn from(e: Example) {
// perform the conversion
}
}
Then you can:
fn main() {
let x = Example { /* ... */ };
if x.into() {
// ...
}
}
Custom method
If your type doesn't really represent a boolean value, I'd usually go for an explicit method:
impl Example {
fn has_property() -> bool { /* ... */ }
}
This makes it more obvious what the intent is, for example, if you implemented From<User> for bool:
fn main() {
let user = User { /* ... */ };
if user.into() {
// when does this code get run??
}
// compared to
if user.logged_in() {
// much clearer
}
}
You can implement std::ops::Deref with the bool type. If you do that, you have to call *k to get the boolean.
This is not recommended though, according to the Rust documentation:
On the other hand, the rules regarding Deref and DerefMut were designed specifically to accommodate smart pointers. Because of this, Deref should only be implemented for smart pointers to avoid confusion.
struct Example {}
impl std::ops::Deref for Example {
type Target = bool;
fn deref(&self) -> &Self::Target {
&true
}
}
fn main() {
let k = Example {};
if *k {
// some work
}
}
Playground
Imagine the following (common) task at hand: a cross-platform library is needed which provides access to physical devices (e.g. USB ones) and shall expose a unified API across all platforms. The following additional properties are to be satisfied:
A platform can support multiple backends to interact with a physical device.
The backend can be chosen by consumers of the library by using URIs.
One must be able to choose the backend at runtime.
Here is the common implementation part representing two concrete USB device abstractions, "A" and "B":
// only available if feature "A" is selected
struct A {}
// only available if feature "B" is selected
struct B {}
trait Common {
fn doit(&self);
}
impl Common for A {
fn doit(&self) { /* do something */ }
}
impl Common for B {
fn doit(&self) { /* do something */ }
}
The following is a naive example of using a global context struct as "factory" to solve the task:
struct Context {}
impl Context {
pub fn create<S: AsRef<str>>(_uri: S) -> Option<Box<dyn Common>> {
#[cfg(feature = "A")]
if _uri == "A" {
return Some(Box::new(A {}))
}
#[cfg(feature = "B")]
if _uri == "B" {
return Some(Box::new(B {}))
}
None
}
}
fn main() {
// pretend feature "A" has been selected
let a = Context::create("A");
}
But there is another way to do this in Rust - using a single struct which hides the implementation details but acts as a single point of contact to consumers:
struct Device {
inner: Box<dyn Common>,
}
impl Device {
pub fn new<S: AsRef<str>>(_uri: S) -> Option<Self> {
#[cfg(feature = "A")]
if _uri == "A" {
return Some(Device {
inner: Box::new(A {})
});
}
#[cfg(feature = "B")]
if _uri == "B" {
return Some(Device {
inner: Box::new(B {})
});
}
None
}
}
impl Common for Device {
fn doit(&self) { self.inner.doit() }
}
fn main() {
// pretend feature "A" has been selected
let a = Device::new("A");
}
Which one is deemed more idiomatic in Rust?
What about auto-derived traits such as Send or Sync?
For example, "A" and "B" being Send does not make Box<dyn Common> Sync automatically, instead one has to explicitly add the constraint: Box<dyn Common + Send>.
Adding such trait constraints in the first case (factory style) requires changing the function signature. In the second case, one only needs to change the internals of struct Device, the public facing API is not affected - apart from struct Device becoming e.g. Send.
I have an object that I know that is inside an Arc because all the instances are always Arced. I would like to be able to pass a cloned Arc of myself in a function call. The thing I am calling will call me back later on other threads.
In C++, there is a standard mixin called enable_shared_from_this. It enables me to do exactly this
class Bus : public std::enable_shared_from_this<Bus>
{
....
void SetupDevice(Device device,...)
{
device->Attach(shared_from_this());
}
}
If this object is not under shared_ptr management (the closest C++ has to Arc) then this will fail at run time.
I cannot find an equivalent.
EDIT:
Here is an example of why its needed. I have a timerqueue library. It allows a client to request an arbitrary closure to be run at some point in the future. The code is run on a dedicated thread. To use it you must pass a closure of the function you want to be executed later.
use std::time::{Duration, Instant};
use timerqueue::*;
use parking_lot::Mutex;
use std::sync::{Arc,Weak};
use std::ops::{DerefMut};
// inline me keeper cos not on github
pub struct MeKeeper<T> {
them: Mutex<Weak<T>>,
}
impl<T> MeKeeper<T> {
pub fn new() -> Self {
Self {
them: Mutex::new(Weak::new()),
}
}
pub fn save(&self, arc: &Arc<T>) {
*self.them.lock().deref_mut() = Arc::downgrade(arc);
}
pub fn get(&self) -> Arc<T> {
match self.them.lock().upgrade() {
Some(arc) => return arc,
None => unreachable!(),
}
}
}
// -----------------------------------
struct Test {
data:String,
me: MeKeeper<Self>,
}
impl Test {
pub fn new() -> Arc<Test>{
let arc = Arc::new(Self {
me: MeKeeper::new(),
data: "Yo".to_string()
});
arc.me.save(&arc);
arc
}
fn task(&self) {
println!("{}", self.data);
}
// in real use case the TQ and a ton of other status data is passed in the new call for Test
// to keep things simple here the 'container' passes tq as an arg
pub fn do_stuff(&self, tq: &TimerQueue) {
// stuff includes a async task that must be done in 1 second
//.....
let me = self.me.get().clone();
tq.queue(
Box::new(move || me.task()),
"x".to_string(),
Instant::now() + Duration::from_millis(1000),
);
}
}
fn main() {
// in real case (PDP11 emulator) there is a Bus class owning tons of objects thats
// alive for the whole duration
let tq = Arc::new(TimerQueue::new());
let test = Test::new();
test.do_stuff(&*tq);
// just to keep everything alive while we wait
let mut input = String::new();
std::io::stdin().read_line(&mut input).unwrap();
}
cargo toml
[package]
name = "tqclient"
version = "0.1.0"
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
timerqueue = { git = "https://github.com/pm100/timerqueue.git" }
parking_lot = "0.11"
There is no way to go from a &self to the Arc that self is stored in. This is because:
Rust references have additional assumptions compared to C++ references that would make such a conversion undefined behavior.
Rust's implementation of Arc does not even expose the information necessary to determine whether self is stored in an Arc or not.
Luckily, there is an alternative approach. Instead of creating a &self to the value inside the Arc, and passing that to the method, pass the Arc directly to the method that needs to access it. You can do that like this:
use std::sync::Arc;
struct Shared {
field: String,
}
impl Shared {
fn print_field(self: Arc<Self>) {
let clone: Arc<Shared> = self.clone();
println!("{}", clone.field);
}
}
Then the print_field function can only be called on an Shared encapsulated in an Arc.
having found that I needed this three times in recent days I decided to stop trying to come up with other designs. Maybe poor data design as far as rust is concerned but I needed it.
Works by changing the new function of the types using it to return an Arc rather than a raw self. All my objects are arced anyway, before they were arced by the caller, now its forced.
mini util library called mekeeper
use parking_lot::Mutex;
use std::sync::{Arc,Weak};
use std::ops::{DerefMut};
pub struct MeKeeper<T> {
them: Mutex<Weak<T>>,
}
impl<T> MeKeeper<T> {
pub fn new() -> Self {
Self {
them: Mutex::new(Weak::new()),
}
}
pub fn save(&self, arc: &Arc<T>) {
*self.them.lock().deref_mut() = Arc::downgrade(arc);
}
pub fn get(&self) -> Arc<T> {
match self.them.lock().upgrade() {
Some(arc) => return arc,
None => unreachable!(),
}
}
}
to use it
pub struct Test {
me: MeKeeper<Self>,
foo:i8,
}
impl Test {
pub fn new() -> Arc<Self> {
let arc = Arc::new(Test {
me: MeKeeper::new(),
foo:42
});
arc.me.save(&arc);
arc
}
}
now when an instance of Test wants to call a function that requires it to pass in an Arc it does:
fn nargle(){
let me = me.get();
Ooddle::fertang(me,42);// fertang needs an Arc<T>
}
the weak use is what the shared_from_this does so as to prevent refcount deadlocks, I stole that idea.
The unreachable path is safe because the only place that can call MeKeeper::get is the instance of T (Test here) that owns it and that call can only happen if the T instance is alive. Hence no none return from weak::upgrade
I'm working with a library that uses Rust types to keep track of state. As a simplified example, say you have two structs:
struct FirstStruct {}
struct SecondStruct {}
impl FirstStruct {
pub fn new() -> FirstStruct {
FirstStruct {}
}
pub fn second(self) -> SecondStruct {
SecondStruct {}
}
// configuration methods defined in this struct
}
impl SecondStruct {
pub fn print_something(&self) {
println!("something");
}
pub fn first(self) -> FirstStruct {
FirstStruct {}
}
}
And to actually use these structs you usually follow a pattern like so, after printing you may stay in second state or go back to first state depending on how you're using the library:
fn main() {
let first = FirstStruct::new();
let second = first.second(); // consumes first
second.print_something();
// go back to default state
let _first = second.first();
}
I want to create my own struct that handles the state changes internally and simplifies the interface. This also lets me have a single mutable reference around that I can pass to other functions and call the print method. Using it should look something like this:
fn main() {
let mut combined = CombinedStruct::new(FirstStruct::new());
combined.print();
}
I've come up with the following solution that works, at least in this simplified example:
enum StructState {
First(FirstStruct),
Second(SecondStruct),
}
struct CombinedStruct {
state: Option<StructState>,
}
impl CombinedStruct {
pub fn new(first: FirstStruct) -> CombinedStruct {
CombinedStruct {
state: Some(StructState::First(first)),
}
}
pub fn print(&mut self) {
let s = match self.state.take() {
Some(s) => match s {
StructState::First(first) => first.second(),
StructState::Second(second) => second,
},
None => panic!(),
};
s.print_something();
// If I forget to do this, then I lose access to my struct
// and next call will panic
self.state = Some(StructState::First(s.first()));
}
}
I'm still pretty new to Rust but this doesn't look right to me. I'm not sure if there's a concept I'm missing that could simplify this or if this solution could lead to ownership problems as my application gets more complicated. Is there a better way to do this?
Playground link
I once had a similar problem and went basically with your solution, but I avoided the Option.
I.e. I basically kept your
enum StructState {
First(FirstStruct),
Second(SecondStruct),
}
If an operation tries to convert a FirstStruct to a SecondStruct, I introduced a function try_to_second roughly as follows:
impl StructState {
fn try_to_second(self) -> Result<SecondState, StructState> {
/// implementation
}
}
In this case, an Err indicates that the StructState has not been converted to SecondStruct and preserves the status quo, while an Ok value indicates successfull conversion.
As an alternative, you could try to define try_to_second on FirstStruct:
impl FirstStruct {
fn try_to_second(self) -> Result<FirstStruct, SecondStruct> {
/// implementation
}
}
Again, Err/Ok denote failure/success, but in this case, you have more concrete information encoded in the type.