Mutable Value Objects / Sharing State (and beer brewing!) - reference

I'm a rusty programmer attempting to become learned in the field again. I've discovered, fitfully, that my self-taught and formal education both induced some bad habits. As such, I'm trying to get my mind around good design patterns, and -- by extension -- when they're wrong. The language is Java, and here's my issue:
I'm attempting to write software to assist in beer brewing. In brewing, sometimes you must substitute a particular variety of hop for what's called for in the recipe. For example, you might have a recipe that calls for 'Amarillo' hops, but all you can get is 'Cascade', which has a similar enough aroma for substitution; hops have an Alpha Acid amount (per a given mass), and the ratio between two hops is part of the substitution formula. I'm attempting to model this (properly) in my program.
My initial go is to have two objects. One a HopVariety, which has general descriptive information about a variety of hop, and one a HopIngredient, which is a particular instantiation of a HopVariety and also includes the amount used in a given recipe. HopIngredient should have knowledge of its variety, and HopVariety should have knowledge of what can be used as a substitute for it (not all substitutions are symmetric). This seems like good OOP.
The problem is this: I'm trying to follow good practice and make my value objects immutable. (In my head, I'm classifying HopVariety and HopIngredient as value objects, not 'actors'.) However, I need the user to be able to update a given HopVariety with new viable substitutions. If I follow immutability, these changes will not propagate to individual ingredients. If choose mutability, I'm Behaving Badly by potentially introducing side-effects by sharing a mutable value object.
So, option B: introduce a VarietyCollection of sorts, and loosely couple the ingredients and the varieties by way of a name or unique identifier. And then a VarietySubstitutionManager, so that varieties don't hold references to other varieties, only to their ids. This goes against what I want to do, because holding a reference to the variety object makes intuitive sense, and now I'm introducing what feels like excessive levels of abstraction, and also separating functions from the data.
So, how do I properly share state amongst what amounts to specific instances? What's the proper, or at least, sanest way to solve the problem?

Are you sure HopVariety should be a value object? It sounds to me like an entity - "Amarillo" hop variety is one and only one, so it should be a uniquely identifiable object. Just like "Tony Wooster" is only one (well not exactly, but you get the point ;) )
I recommend reading about Domain Driven Design and the differences between entities and value objects.
BTW: DDD book has a lot of examples of situations like yours and how to handle them.

I think your choices are either to
have the 'update variety' function walk the existing object graph and create a new one with all of the ingredients updated, or
keep using mutability
With the information at hand, it isn't clear to me which is better for your situation.

How badly do you want to avoid mutation of objects with multiple references?
If you want the recipe collection to reflect the user updates to the variety set, and you want to avoid mutable fields in your objects, then you'll be forcing yourself into using functional object update to handle user input.
At the end of that road lies the I/O monad. How far do you want to go in that direction?
With a functional language that allows mutation side-effects, e.g. S/ML etc., you might opt to keep your variety and ingredient objects pure, and to store functions that return the current variety object from the latest variety collection stored in a single mutable reference cell. This might seem like a reasonable way to split the difference.

Related

Managing complex state in FP

I want to write a simulation of a multi-entity system. I believe such systems motivated creation of Simula and OOP where each object would maintain its own state and the runtime would manage the the entire system (e.g. stop threads, serialize data).
On the other hand, I would like to have ability to rewind, change the simulation parameters and compare the results. Thus, immutability sounds great (at least up to almost certain garbage collection issues caused by keeping track of possibly redundant data).
However I don't know how to model this. Does this mean that I must put every interacting entity into a single, huge structure where each object update would require locating it first?
I'm worried that such approach would affect performance badly because of GC overhead and constant structure traversals as opposed to keeping one fixed address of entity in memory.
UPDATE
To clarify, this question asks if there is any other design option available other than creating a single structure that contains all possibly interacting entities as a root. Intuitively, such a structure would imply logarithmic single update penalty unless updates are "clustered" somehow to amortize.
Is there a known system where interactions could be modelled differently? For example, like in cold/hot data storage optimization?
After some research, there seems to be a connection with N-body simulation where systems can be clustered but I'm not familiar with it yet. Even so, would that also mean I need to have a single structure of clusters?
While I agree with the people commenting that this is a vague question, I'll still try to address some of the issues put forth.
It's true that there's some performance overhead from immutability, because when you use mutable state, you can update some values in-place, whereas with immutable state, some copying has to take place.
It is, however, a common misconception that this is causes problems with big 'object' graphs. It doesn't have to.
Consider a Haskell data structure:
data BigDataStructure = BigDataStructure {
bigChild1 :: AnotherBigDataStructure
, bigChild2 :: YetAnotherBigDataStructure
-- more elements go here...
, bigChildN :: Whatever }
deriving (Show, Eq)
Imagine that each of these child elements are big and complex themselves. If you want to change, say, bigChild2, you could write something like:
updatedValue = myValue { bigChild2 = updatedChild }
When you do that, some data copying takes place, but it's often less that most people think. This expression does create a new BigDataStructure record, but it doesn't 'deep copy' any of its values. It just reuses bigChild1, updatedChild, bigChildN, and all the other values, because they're immutable.
In theory (but we'll get back to that in a minute), the flatter your data structures are, the more data sharing should be enabled. If, on the other hand, you have some deeply nested data structures, and you need to update the leafs, you'll need to create a copy of the immediate parents of those leafs, plus the parents of those parents, and their parents as well, all the way to the root. That might be expensive.
That's the theory, though, but we've known for decades that it's impractical to try predict how software will perform. Instead, try to measure it.
While the OP suggest that significant data is involved, it doesn't state how much, and neither does it state the hardware specs of the system that's going to run the simulation. So, as Eric Lippert explains so well, the person who can best answer questions about performance is you.
P.S. It's my experience that when I start to encounter performance problems, I need to get creative with how I design my system. Efficient data structures can address many performance issues. This is just as much the case in OOP as it is in FP.

UML and Implementation: Associating Classes through IDs

I was recently studying an online course. it was recommended that to reduce coupling we could simply pass the ID from the customer object to the Order object. that way the Order did not have to have a full reference to the Customer class.
The idea certainly seems simple and why pass a whole object if you don't need all its attributes?
1) What do you think of this idea?
2) How would I express the relationship between the Customer class and the Order class in UML if only an ID is passed. This isn't just an example of aggregation is it? Doesn't composition and aggregation require more than just passing a value?
Thanks!
First of all you need to be clear about what UML actually is. On the one hand you have an idea and on the other side there is some code running on hardware. Ideally the latter supports the first in a way that brings added value to a user of the idea. Now, there are many possibilities to describe the way from idea to code. And UML is one of them. It is possible to describe each step on this way but for pragmatic reasons UML stops at the border of code, namely programming languages.
Now for you concrete question: Any object can be seen as an instance. That is some concrete memory partition with a fixed address. Programming languages realize instances by allocating memory and using the start address as reference. And since this reference does not change the object can be identified by its address. Clearly then, an association will just be the a pointer. And an association class will hold two (or more) such pointers.
Honestly, the very first time I started with OO I was also confused and thought that it's a waste of resource to pass those large objects. But since it's just a pointer it's really easy going.
Again, things can get more difficult if you need to persist objects. In that case you need an artificial key you can save along with the object and you will likely need tables to map artificial key to the concrete instance address.
The answer to this question depends on a number of factors, which I started listing in a comment attached to your question. I will assume that you are either using UML to create a Domain Model, or you are describing an implementation done using a statically typed language.
If you are using UML to create a Domain Model, you are obfuscating the semantics when you use an ID to "link" classes. Just draw and annotate the association and you're done.
If you are describing an implementation done using a statically typed language - types exist for a reason. Using generic IDs to link things means that the information that the system needs most become more indirect, and therefore more opaque (which is bad). In your case, the Order object still must acquire a typed reference to a Customer object to do anything with it.
For example, the Order may acquire a reference to the Customer by invoking a lookup by the ID, but it must cast the reference to an appropriate type to invoke anything on the Customer object. So you haven't reduced the coupling from the Order to the Customer. You just buried it somwhere else.

How to go about creating an Aggregate

Well this time the question I have in mind is what should be the necessary level of abstraction required to construct an Aggregate.
e.g.
Order is composed on OrderWorkflowHistory, Comments
Do I go with
Order <>- OrderWorkflowHistory <>- WorkflowActivity
Order <>- CommentHistory <>- Comment
OR
Order <>- WorkflowActivity
Order <>- Comment
Where OrderWorkflowHistory is just an object which will encapsulate all the workflow activities that took place. It maintains a list. Order simply delegates the job of maintaining th list of activities to this object.
CommentHistory is similarly a wrapper around (list) comments appended by users.
When it comes to database, ultimately the Order gets written to ORDER table and the list of workflow activities gets written to WORKFLOW_ACTIVITY table. The OrderWorkflowHistory has no importance when it comes to persistence.
From DDD perspective which would be most optimal. Please share your experiences !!
As you describe it, the containers (OrderWorkflowHistory, CommentHistory) don't seem to encapsulate much behaviour. On that basis I'd vote to omit them and manage the lists directly in Order.
One caveat. You may find increasing amounts of behaviour required of the list (e.g. sophisticated searches). If that occurs it may make sense to introduce one/both containers to encapulate that logic and stop Order becoming bloated.
I'd likely start with the simple solution (no containers) and only introduce them if justified as above. As long as external clients make all calls through Order's interface you can refactor Order internally without impacting the clients.
hth.
This is a good question, how to model and enrich your domain. But sooo hard to answer since it vary so much for different domain.
My experince has been that when I started with DDD I ended up with a lots of repositories and a few Value Objects. I reread some books and looked into several DDD code examples with an open mind (there are so many different ways you can implement DDD. Not all of them suits your current project scenario).
I started to try to have in mind that "more value objects, more value objects, more value objects". Why?
Well Value objects brings less tight dependencies, and more behaviour.
In your example above with one to many (1-n) relationship I have solved 1-n rel. in different ways depending on my use cases uses the domain.
(1)Sometimes I create a wrapper class (like your OrderWorkflowHistory) that is a value object. The whole list of child objects is set when object is created. This scenario is good when you have a set of child objects that must be set during one request. For example a Qeustion Weights on a Questionaire form. Then all questions should get their question weight through a method Questionaire.ApplyTuning(QuestionaireTuning) where QuestionaireTuning is like your OrderWorkflowHistory, a wrapper around a List. This add a lot to the domain:
a) The Questionaire will never get in a invalid state. Once we apply tuning we do it against all questions in questionaire.
b) The QuestionaireTuning can provide good access/search methods to retrieve a weight for a specific question or to calculate average weight score... etc.
(2)Another approach has been to have the 1-n wrapper class not being a Value object. This approach suits more if you want to add a child object now and then. The parent cannot be in a invalid state because of x numbers of child objects. This typical wrapper class has Add(Child...) method and several search/contains/exists/check methods.
(3)The third approach is just having the IList exposed as a readonly collection. You can add some search functionality with Extension methods (new in .Net 3.0) but I think it's a design smell. Better to incapsulate the provided list access methods through a list-wrapper class.
Look at http://dddsamplenet.codeplex.com/ for some example of approach one.
I believe the entire discussion with modeling Value objects, entities and who is responsible for what behaviour is the most centric in DDD. Please share your thoughts around this topic...

Value Objects in CQRS - where to use

Let's say we have CQRS-inspired architecture, with components such as Commands, Domain Model, Domain Events, Read Model DTOs.
Of course, we can use Value Objects in our Domain Model. My question is, should they also be used in:
Commands
Events
DTOs
I haven't seen any examples where Value Objects (VO) are used in the components mentioned above. Instead, primitive types are used. Maybe it's just the simplistic examples. After all, my understanding of VOs use in DDD is that they act as a glue for the whole application.
My motivation:
Commands.
Let's say user submits a form which contains address fields. We have Address Value Object to represent this concept. When constructing command in the client, we should validate user input anyway, and when it is well-formed, we can create Address object right there and initialize Command with it. I see no need to delegate creation of Address object to command handler.
Domain Events.
Domain Model already operates in terms of Value Objects, so by publishing events with VOs instead of converting them to primitive types, we can avoid some mapping code. I'm pretty sure it's alright to use VOs in this case.
DTOs.
If our query-side DTOs can contain Value Objects, this allows for some more flexibility. E.g., if we have Money object, we can choose whether to display it in EUR or USD, no need to change Read Model.
Ok I've changed my mind. I have been trying to deal with VOs a bunch lately and after watching this http://www.infoq.com/presentations/Value-Objects-Dan-Bergh-Johnsson it clarified a couple of things for me.
Commands and Event are messages (and not objects, objects are data + behavior), in some respects much like DTOs, they communicate data about an event and they themselves encapsulate no behavior.
Value Objects are not like DTOs at all. They are a domain representation and they are, generally speaking, rich on behavior like all other domain representations.
Commands and Events communicate information into and out of the domain respectively, but they themselves do not encapsulate any behavior. From that perspective it seems wrong and a possibly a violation context boundaries to pass VOs inside of them.
To paraphrase Oren (though he was referring to nHibernate and WCF) "Don't send your domain across the wire".
http://ayende.com/Blog/archive/2009/05/14/the-stripper-pattern.aspx
If you want to communicate a value object, then I suggest passing the necessary attributes needed to re-construct the VO within them instead.
Original Text (for posterity):
If you are asking if Value Objects can be passed by the domain model to events or passed in by commands, I don't really see a huge problem with the former, though the latter may violate some of the rules of the aggregate root being the "owner" of values.
That said a value object represents concepts like for example a color. You don't have green, you are green or not. There seems to be nothing intrinsically wrong with a command telling you that you are green by passing this.
Reading the chapter from DDD on the Aggregate Root pattern explains Entities and Value Objects quite well and is worth reading over a few times.
I say it's a bad idea.
There's a reason we don't do the same with entities - to avoid coupling other parts of the system to the domain (in the wrong places). The same is true for Value Objects, the only difference between value objects and entities is lifetime and ownership - these differences do not affect how we should and should not couple to them.
Imagine that you make an event contain a VO. A change in your domain requires you to change that VO. You've now boxed yourself into a corner where your event is also forced to change, ditto for any Commands or DTO's it's a part of.
This is to be avoided.
Use DTO's and/or primitives. Map them (AutoMapper makes it a 1-line deal).
Similar to other answers, in SOA this would break encapsulation of the service as the domain is now leaking out.
According to Clean Code your DTOs are data structures (just to add another term), while value objects are objects. The difference that objects can have behavior. Mixing data structures with objects is generally a very bad idea, because it will be hard to maintain the hybrid you get.
I don't feel right to put value objects to DTOs from an architecture perspective as well. The value objects are inside of the domain model while the DTOs you mentioned are defining the interface of the model. We usually build an interface to decouple the outside world from the inside of something. So in the current case we added DTOs to decouple the outside world from the value objects (and other model related stuff). After that adding value objects to the interface is crazy.
So you haven't met this solution yet because it is an anti-pattern.
Value Objects are or at least should be immutable. Once instantiated with a value that value will never change throughout the lifetime of the object. Thus, passing VOs as data to DTOs (such as Events) should not be an issue since all you can do with them is get their value. At the most their value in a differing representation such as toString() as opposed to an original getValue() which might return an integer or whatever the value is.

Am I allowed to have "incomplete" aggregates in DDD?

DDD states that you should only ever access entities through their aggregate root. So say for instance that you have an aggregate root X which potentially has a lot of child Y entities. Now, for some scenario, you only really care about a subset of these Y entities at a time (maybe you're displaying them in a paged list or whatever).
Is it OK to implement a repository then, so that in such scenarios it returns an incomplete aggregate? Ie. an X object who'se Ys collection only contains the Y instances we're interested in and not all of them? This could for instance cause methods on X which perform some calculation involving the Ys to not behave as expected.
Is this perhaps an indication that the Y entity in question should be considered promoted to an aggregate root?
My current idea (in C#) is to leverage the delayed execution of LINQ, so that my X object has an IQueryable to represent its relationship with Y. This way, I can have transparent lazy loading with filtering... But getting this to work with an ORM (Linq to Sql in my case) might be a bit tricky.
Any other clever ideas?
I consider an aggregate root with a lot of child entities to be a code smell, or a DDD smell if you will. :-) Generally I look at two options.
Split your aggregate into many smaller aggregates. This means that my original design was not optimal and I need to identify some new entities.
Split your domain into multiple bounded contexts. This means that there are specific sets of scenarios that use a common subset of the entities in the aggregate, while there are other sets of scenarios that use a different subset.
Jimmy Nilsson hints in his book that instead of reading a complete aggregate you can read a snapshot of parts of it. But you are not supposed to be able to save changes in the snapshot classes to the database.
Jimmy Nilsson's book Chapter 6: Preparing for infrastructure - Querying. Page 226.
Snapshot pattern
You're really asking two overlapping questions.
The title and first half of your question are philosophical/theoretical. I think the reason for accessing entities only through their "aggregate root" is to abstract away the kinds of implementation details you're describing. Access through the aggregate root is a way to reduce complexity by having a trusted point of access. You're eliminating friction/ambiguity/uncertainty by adhering to a convention. It doesn't matter how it's implemented within the root, you just know that when you ask for an entity it will be there. I don't think this perspective rules out a "filtered repository" as you describe. But to provide a pit of success for devs to fall into, it should be impossible instantiate the repository without being explicit about its "filteredness;" likewise, if shared access to a repository instance is possible, the "filteredness" should be explicit when coding in the caller.
The second half of your question is about implementation on a specific platform. Not sure why you mention delayed execution, I think that's really orthogonal to the filtering question. The filtering itself could be a bit tricky to implement with LINQ. Maybe rather than inlining the Where lambdas, you set up a collection of them and select one depending on the filter you need.
You are allowed since the code will compile anyway, but if you're going for a pure DDD design you should not have incomplete instances of objects.
You should look into LazyLoading if you're afraid to load a huge object of which you will only use a small portion of its child entities.
LazyLoading delays the loading of whatever you decide to lazy-load until the moment they are accessed. They make use of callbacks to call the loading method once the code calls for them.
Is it OK to implement a repository then, so that in such scenarios it
returns an incomplete aggregate?
Not at all. Aggregate is a transnational boundary to change the state of your system. Never use aggregates for querying data. Split the system into Write and Read sides. (read about CQR & CQRS). When we think "CRUD" based, we implement our system, based on some resource. Lets say you have "Appointment" aggregate. Thinking "Crudish" means we should implement usecases Create, Update, Delete, GetAll appointments. That means Appointment[] should be returned for GetAll. When you think usecase based, (HexagonalArchitecture) your usecases would be ScheduleAppointment, RescheduleAppointment, CancelAppointment. But for query side it can be: /myCalendar. We return back all appointments for a specific user in a ClientCalendar object. Create separate DTO's for Query sides. Never use aggregates for this purpose.

Resources