Zero cost builder pattern for recursive data structure using transmute. Is this safe? Is there a better approach? - rust

I would like to create a struct using the builder pattern which must be validated before construction, and I would like to minimize the construction overhead.
I've come up with a nice way to do that using std::mem::transmute, but I'm far from confident that this approach is really safe, or that it's the best approach.
Here's my code: (Rust Playground)
#[derive(Debug)]
pub struct ValidStruct {
items: Vec<ValidStruct>
}
#[derive(Debug)]
pub struct Builder {
pub items: Vec<Builder>
}
#[derive(Debug)]
pub struct InvalidStructError {}
impl Builder {
pub fn new() -> Self {
Self { items: vec![] }
}
pub fn is_valid(&self) -> bool {
self.items.len() % 2 == 1
}
pub fn build(self) -> Result<ValidStruct, InvalidStructError> {
if !self.is_valid() {
return Err(InvalidStructError {});
}
unsafe {
Ok(std::mem::transmute::<Builder, ValidStruct>(self))
}
}
}
fn main() {
let mut builder = Builder::new();
builder.items.push(Builder::new());
let my_struct = builder.build().unwrap();
println!("{:?}", my_struct)
}
So, this seems to work. I think it should be safe because I know the two structs are identical. Am I missing anything? Could this actually cause problems somehow, or is there a cleaner/better approach available?

You can't normally transmute between different structures just because they seem to have the same fields in the same order, because the compiler might change that. You can avoid the risk by forcing the memory layout but you're then fighting the compiler and preventing optimizations. This approach isn't usually recommended and is, in my opinion, not needed here.
What you want is to have
a recursive data structure with public fields so that you can easily build it
an identical structure, built from the first one but with no public access and only built after validation of the first one
And you want to avoid useless copies for performance reasons.
What I suggest is to have a wrapper class. This makes sense because wrapping a struct in another one is totally costless in Rust.
You could thus have
/// This is the "Builder" struct
pub struct Data {
pub items: Vec<Data>,
}
pub struct ValidStruct {
data: Data, // no public access here
}
impl Data {
pub fn build(self) -> Result<ValidStruct, InvalidStructError> {
if !self.is_valid() {
return Err(InvalidStructError {});
}
Ok(Self{ data })
}
}
(alternatively, you could declare a struct Builder as a wrapper of Data too but with a public access to its field)

Related

Factory pattern in Rust

I have a Terminal and a TerminalFactory type. What I want to achieve is a factory pattern or at least an implementation that is similar to it in regards to some key aspects.
In particular, I want every Terminal to have specific properties (e.G. id) that are set by TerminalFactory. In GRASP terms, TerminalFactory is the Creator and the Information Expert because it knows next_id.
TerminalFactory is not a singleton, it is an instance that will be injected in a container that will be handed around where necessary (I'm trying to implement an application based on the composite pattern to achieve the OCP of SOLID).
It looks like everything is in line with the fact that Rust discourages static state unless it's unsafe code.
The problem now is that instances of Terminal have an id which should not be set by arbitrary code. Therefore, it's not pub. On the other hand, it can be gotten. So I added a public getter, like this:
pub struct Terminal {
id: u32, // private
pub name: String
}
impl Terminal {
pub fn get_id(self: &Self) -> u32 {
self.id
}
}
As far as I can tell, there is no friend keyword in rust, no static state I can keep in the Terminal type and TerminalFactory cannot set Terminal::id during creation because it's not public.
I have the impression I can't implement the factory pattern correctly. Maybe I shouldn't even try to transfer "classical patterns" to Rust? Then what about the SOLID principles which are absolutely necessary to manage change in medium-scale applications (say, 50+ types and beyond 10000 lines of code)?
By adding an associate function new(id : u32) to Terminal I completely defeat the purpose of the factory. So this won't make sense, either.
As all of this code will be in a library that is consumed by other Rust code, how can I protect the consumer code from creating broken instances?
Put both types in the same module:
mod terminal {
pub struct Terminal {
id: u32,
pub name: String
}
impl Terminal {
pub fn get_id(&self) -> u32 {
self.id
}
}
pub struct TerminalFactory {
next_id: u32,
}
impl TerminalFactory {
pub fn new() -> Self {
Self {
next_id: 0,
}
}
pub fn new_terminal(&mut self, name: &str) -> Terminal {
let next_id = self.next_id;
self.next_id += 1;
Terminal {
id: next_id,
name: name.to_owned(),
}
}
}
}
Then this is OK:
let mut tf = terminal::TerminalFactory::new();
let t0 = tf.new_terminal("foo");
let t1 = tf.new_terminal("bar");
println!("{} {}", t0.get_id(), t0.name);
println!("{} {}", t1.get_id(), t1.name);
But this is not:
println!("{} {}", t0.id, t0.name); // error: private field
See Visibility and privacy for other scenarios.

How to program shared behaviors in Rust without repeating same code in each module?

For writing a very large program, I see no way to alleviate having to write the same code for each struct that uses a certain shared behaviour.
For example, Dog may "bark":
struct Dog {
is_barking: bool,
....
}
impl Dog {
pub fn bark(self) {
self.is_barking = true;
emit_sound("b");
emit_sound("a");
emit_sound("r");
emit_sound("k");
self.is_barking = false;
}
....
}
And many breeds of this dog may exist:
struct Poodle {
unique_poodle_val: &str
}
impl Poodle {
pub fn unique_behaviour(self) {
self.some_behaviour();
}
}
struct Rottweiler {
unique_rottweiler_val: u32
}
impl Rottweiler{
pub fn unique_behaviour(self) {
self.some_behaviour();
}
}
The issue is that Rust seems incapable of this in my current knowledge, but it needs to be done and I need a workaround for:
Allowing Poodle and Rottweiler to bark using the exact same behavior which the breeds should not need to regard.
Allowing this to be possible without recoding bark() in every breed module, which is programming hell as it leads to repetitious code and every module has to implement bark().
Traits are the inverse and cannot access the struct, so default-trait implements do not work. Rust does not support OOP-like inheritance and nor is it desired here.
Therefore, I ask:
How would it be possible to implement bark() without rewriting bark() in each module, since Poodle and Rottweiler bark exactly the same way after all?
Please provide an example of how to solve this issue in Rust code, idiomatic and slightly hacky solutions are welcome but please state which they are as I am still learning Rust.
Thank you.
Edit: The boolean is not a thread thing, rather it's a example of setting some state before doing something, i.e. emit_sound is within this state. Any code we put into bark() has the same issue. It's the access to the struct variables that is impossible with say, traits.
You've put the finger on something that Rust doesn't do nicely today: concisely add to a struct a behavior based on both internal data and a function.
The most typical way of solving it would probably be to isolate Barking in a struct owned by all dogs (when in doubt, prefer composition over inheritance):
pub struct Barking {
is_barking: bool,
}
impl Barking {
pub fn do_it(&mut self) {
self.is_barking = true; // <- this makes no sense in rust, due to the ownership model
println!("bark");
println!("a");
println!("r");
println!("k");
self.is_barking = false;
}
}
struct Poodle {
unique_poodle_val: String,
barking: Barking,
}
impl Poodle {
pub fn unique_behaviour(self) {
println!("some_behaviour");
}
pub fn bark(&mut self) {
self.barking.do_it();
}
}
struct Rottweiler {
unique_rottweiler_val: u32,
barking: Barking,
}
impl Rottweiler{
pub fn unique_behaviour(self) {
println!("unique behavior");
}
pub fn bark(&mut self) {
// maybe decide to bite instead
self.barking.do_it();
}
}
In some cases it can make sense to define a Barking trait, with a common implementation and declaring some functions to deal with the state:
pub trait Barking {
fn bark(&mut self) {
self.set_barking(true);
println!("bark");
println!("a");
println!("r");
println!("k");
self.set_barking(false);
}
fn set_barking(&mut self, b: bool);
}
struct Poodle {
unique_poodle_val: String,
is_barking: bool,
}
impl Poodle {
pub fn unique_behaviour(self) {
println!("some_behaviour");
}
}
impl Barking for Poodle {
fn set_barking(&mut self, b: bool) {
self.is_barking = b;
}
}
Beware that this half-OOP approach often ends up too much complex and less maintainable (like inheritance in OOP languages).

How can I determine if I have a unique Arc when it's dropped?

I've an Arc<Mutex<Thing>> field in a struct which is cloned many times. It is shared between concurrent threads. Drop::drop is called for each clone as it goes out of scope. Is there any way to determine when Drop::drop is called for the last (unique) Arc<Mutex<Thing>>?
It's clear that strong_count is subject to data races (I've seen them). So, you can't count on Arc::strong_count() == 1 (no pun intended).
I found that I couldn't use Arc::try_unwrap() due to a move issue.
Arc::is_unique() is private.
Other than keeping a Arc<AtomicUsize> field, which is incremented on clone and decremented on drop, is there any way to determine if a drop is for a unique Arc<Mutex<Thing>>?
Here's an MRE:
use std::sync::{Arc};
#[derive(Debug)]
enum Action {
One, Two, Three
}
// Thing trait which operates on an Action, which should be a enum, allowing for
// different action sets.
trait Thing<T> {
fn disconnected(&self);
fn action(&self, action: T);
}
// There are many instances of an ActionController.
// There may be zero or more clones of an instance.
// The final drop of the instances should call thing.disconnected()
// In a multi-core environment, the same instance may be running on multiple cores
// ActionController should not be generic.
#[derive(Clone)]
struct ActionController {
id: usize,
thing: Arc<dyn Thing<Action>>,
}
impl ActionController {
fn new(id: usize, thing: Box<dyn Thing<Action>>) -> Self {
Self { id, thing: Arc::from(thing) }
}
fn invoke(&self, action: Action) {
self.thing.action(action);
}
}
//
// To work around the drop issue, I've implemented Clone for ActionController which
// performs a fetch_add(1) on clone and a fetch_sub(1) on drop. This provides
// suficient information to call disconnected() -- but it just seems like there's
// got to be a better way.
impl Drop for ActionController {
fn drop(&mut self) {
// drop will be called for each clone of an Controller instance. When
// the unique instance is dropped, disconnected() must be called
self.thing.disconnected();
}
}
struct Controlled {}
impl Thing<Action> for Controlled {
fn disconnected(&self) { println!("disconnected")}
fn action(&self, action: Action) {println!("action: {:#?}", action)}
}
fn bad() {
let controlled = Controlled{};
let controlled = Box::new(controlled) as Box<dyn Thing<Action>>;
let controller = ActionController::new(1, controlled);
let clone = controller.clone();
controller.invoke(Action::One);
clone.invoke(Action::Two);
drop (controller);
clone.invoke(Action::Three);
}
fn main() {
bad();
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn incorrect() {
bad();
}
}
Arc::try_unwrap is probably the intended way to do this - is it possible to restructure your code to avoid the move issues you were running into?
Why do you want to know? If you have some extra cleanup code that needs to be executed before the Mutex<Thing> is dropped, maybe you could use an Arc<MyLockedThing> instead, where MyLockedThing is a struct containing a Mutex<Thing> that impls Drop to do the cleanup?
It seems like you want to be notified when the data inside the Arc is to be dropped. If so, this can be done by implementing Drop on the type "inside" the Arc.
Define a newtype:
struct ThingAction(Box<dyn Thing<Action>>);
impl Thing<Action> for ThingAction {
fn disconnected(&self) {
self.0.disconnected()
}
fn action(&self, action: Action) {
self.0.action(action)
}
}
And implement Drop:
impl Drop for ThingAction {
fn drop(&mut self) {
self.disconnected()
}
}
Then use the newtype:
#[derive(Clone)]
struct ActionController {
id: usize,
thing: Arc<ThingAction>,
}
impl ActionController {
fn new(id: usize, thing: Box<dyn Thing<Action>>) -> Self {
Self { id, thing: Arc::new(ThingAction(thing)) }
}
I don't think there's any perfect way to do this without stdlib support (go checkout out Arc::drop).
Weak::strong_count or Weak::upgrade is less subject to races so if you downgrade your Arc then drop it, if the weakref's strong count is 0 or trying to upgrade it fails you know the Arc is dead, but there is no guarantee the current thread killed it, two might have concurrently dropped the Arc at the same time before either had the time to check for the weakref's strong count.
I think the only bulletproof way would be to get notified by a Drop stored inside the Arc, that you're guaranteed is only called once.

What is an idiomatic way to have multiple structs with the same properties in Rust?

I'm aware that Rust does not have inheritance and that the language provides and easy way to share different implementations of the same methods across objects through the use of Traits. But is there an idiomatic way to share property name definitions or do they need to be defined on each struct?
My use case is that I have many different structs that track some information. Each piece of information can be updated and I want each struct to know the date of its last update. Is there a common pattern (maybe macros?) to add a last_update property to all the structs or must I add it to each struct explicitly?
There is currently no way to do this via traits, the closest thing is the "Fields in Traits" RFC (discussion, RFC), but that doesn't seem terribly active as of now.
The simplest way to do this is to have a type / struct with a method and include that field in any struct you want:
struct UpdateTimestamp {
timestamp: Timestamp, // dummy type
}
impl UpdateTimestamp {
fn update(&mut self) {
self.timestamp = now(); // dummy function
}
fn last_updated(&self) -> Timestamp {
self.timestamp
}
}
You could then include this in any struct where you want the functionality:
struct MyStruct {
my_field: u32,
my_other_field: i32,
update_ts: UpdateTimestamp,
}
impl MyStruct {
fn my_field(&self) -> u32 {
// Getter - no update
self.my_field
}
fn set_my_field(&mut self, my_field: u32) {
self.update_ts.update();
self.my_field = my_field;
}
fn last_updated(&self) -> Timestamp {
self.update_ts.last_updated()
}
}
Now you could write a complicated macro for this which automates the implementation part (injects updates into the setters and the last_updated method in the impl block), but unless you're doing this a lot I don't think it would be worth it.

How best to deal with struct field that can change types

I'm working with a library that uses Rust types to keep track of state. As a simplified example, say you have two structs:
struct FirstStruct {}
struct SecondStruct {}
impl FirstStruct {
pub fn new() -> FirstStruct {
FirstStruct {}
}
pub fn second(self) -> SecondStruct {
SecondStruct {}
}
// configuration methods defined in this struct
}
impl SecondStruct {
pub fn print_something(&self) {
println!("something");
}
pub fn first(self) -> FirstStruct {
FirstStruct {}
}
}
And to actually use these structs you usually follow a pattern like so, after printing you may stay in second state or go back to first state depending on how you're using the library:
fn main() {
let first = FirstStruct::new();
let second = first.second(); // consumes first
second.print_something();
// go back to default state
let _first = second.first();
}
I want to create my own struct that handles the state changes internally and simplifies the interface. This also lets me have a single mutable reference around that I can pass to other functions and call the print method. Using it should look something like this:
fn main() {
let mut combined = CombinedStruct::new(FirstStruct::new());
combined.print();
}
I've come up with the following solution that works, at least in this simplified example:
enum StructState {
First(FirstStruct),
Second(SecondStruct),
}
struct CombinedStruct {
state: Option<StructState>,
}
impl CombinedStruct {
pub fn new(first: FirstStruct) -> CombinedStruct {
CombinedStruct {
state: Some(StructState::First(first)),
}
}
pub fn print(&mut self) {
let s = match self.state.take() {
Some(s) => match s {
StructState::First(first) => first.second(),
StructState::Second(second) => second,
},
None => panic!(),
};
s.print_something();
// If I forget to do this, then I lose access to my struct
// and next call will panic
self.state = Some(StructState::First(s.first()));
}
}
I'm still pretty new to Rust but this doesn't look right to me. I'm not sure if there's a concept I'm missing that could simplify this or if this solution could lead to ownership problems as my application gets more complicated. Is there a better way to do this?
Playground link
I once had a similar problem and went basically with your solution, but I avoided the Option.
I.e. I basically kept your
enum StructState {
First(FirstStruct),
Second(SecondStruct),
}
If an operation tries to convert a FirstStruct to a SecondStruct, I introduced a function try_to_second roughly as follows:
impl StructState {
fn try_to_second(self) -> Result<SecondState, StructState> {
/// implementation
}
}
In this case, an Err indicates that the StructState has not been converted to SecondStruct and preserves the status quo, while an Ok value indicates successfull conversion.
As an alternative, you could try to define try_to_second on FirstStruct:
impl FirstStruct {
fn try_to_second(self) -> Result<FirstStruct, SecondStruct> {
/// implementation
}
}
Again, Err/Ok denote failure/success, but in this case, you have more concrete information encoded in the type.

Resources