DDD: Help me further understand Value Objects and Entities - domain-driven-design

There are several questions on this, and reading them isn't helping me. In Eric Evans DDD, he uses the example of address being a value type in certain situations. For a mail order company, the address is a value type because it doesn't really matter if the address is shared, who else lives at the address, simply that the package arrives at the address.
This makes sense to me until I start thinking about how this would be designed. Given the diagram on page 99, he has it like this:
+------------+
|Customer |
+------------+
|customerId |
|name |
|street |
|city |
|state |
+------------+
This changes to:
+------------+
|Customer | (entity)
+------------+
|customerId |
|name |
|address |
+------------+
+------------+
|Address | (value object)
+------------+
|street |
|city |
|state |
+------------+
If these were tables, Address would have its own Id in order to have a relationship with the customer, turning it into an entity.
Is the idea that in a relational database these would stay in the same table, such as in the first example, and that you'd use features of the ORM to abstract address as a value object (such as nHibernate's component features)?
I realize that a couple of pages later he talks about denormalization, I'm just trying to make sure I'm understanding the concept correctly.

When Eric Evans talks about "entities have identity, Value Objects do not", he's not talking about an ID column in the database - he's talking about identity as a concept.
VOs have no conceptual identity. That doesn't mean that they shouldn't have persistence identity. Don't let persistence implementation cloud your understanding of Entities vs VOs.
You can create separate table for address or in same table in Customer

Is the idea that in a relational
database these would stay in the same
table, such as in the first example,
and that you'd use features of the ORM
to abstract address as a value object
(such as nHibernate's component
features)?
Yes, generally, that is the idea.
Alternatively (if your ORM doesn't support Value Objects directly), you can let the VO tables have an ID, but hide that within your domain model.

I personally don't give a damn about having ID on value objects as long as they override equality comparison properly (cause value objects differs by their value not identity).
Mapping value objects to database is technical concern, sometimes (e.g. marking props virtual so ORM could crawl underneath) You just need to sacrifice purity of domain model a bit. Or make Your infrastructure smarter - usage of nhib components or something.

Yes, generally Address would stay in the same table. Address would be mapped something like this:
+-----------------+
|Customer |
+-----------------+
|customerId |
|name |
|address_street |
|address_city |
|address_state |
+-----------------+
If Address was an entity, then it would be in a separate table, as you said. If two of the same Customers linked to the same Address entity, then changing an attribute of that Address would affect both Customers. However, a VO implementation would only affect one or the other.

Related

Do you store data in the Delta Lake Silver layer in a normalized format or do you derive it?

I am currently setting up a data lake trying to follow the principles of Delta Lake (landing in bronze, cleaning and merging into silver, and then, if needed, presenting the final view in gold) and have a question about what should be stored in Silver.
For example, if the data in bronze comes in from a REST API and is stored in the JSON form it comes in in this format:
id (Int)
name (String)
fields (Array of Strings)
An example looks like:
{
'id':12345,
'name':'Test',
'fields':['Hello','this','is','a','test']
}
In the end I want to present this as two tables. One would be the base table and look like:
TABLE 1
| id | name |
| -------- | -------------- |
| 12345 | Test |
And another would look like:
TABLE 2
| id | field_value |
| -------- | -------------- |
| 12345 | Hello |
| 12345 | this |
| 12345 | is |
| 12345 | a |
| 12345 | test |
My question is, should I pre-process the data in Spark and store the data in silver in separate folders like this:
-- root
---table 1
----file1.parquet
----etc.parquet
---table 2
----file1.parquet
----etc.parquet
Or store it all in silver under one folder and then derive those two tables using TSQL and functions like OPENJSON later?
Thank you for your help or insight!
I do not think there is a real answer to your questions, but here is a stab - based on your explicit example and this reference https://k21academy.com/microsoft-azure/data-engineer/delta-lake/
My question is, should I pre-process the data in Spark and store the
data in silver in separate folders like this: ...
Yes, I would as JSON takes more time to process. I use JSON for RAW on a current project if it comes in that format and in the REFined Area we store arrays if needed, as opposed to JSON structs. But this is because we use a data Hub approach based on Martin Fowler's Distributed Data Mesh. We have a BUSiness Area where we model the data according to a semantic model.
But for every expert there is an equal and opposite expert. Some would say do it on the fly, like SAP Hana ETL on-the-fly.
For analysis of datasets given to Data Scientist for analysis, or ad hoc analysis, the 2nd approach is fine. The data would be in the Bronze zone. That said gdpr aspects would, could mean refine them to the Silver zone with gdpr aspects removed.
In short, depends on your use case.

Mapping UML Class Diagram to Python Code

I've been asked to document a piece of code using UML diagrams. The code models a situation like the following: a driver can be assigned to one or more routes. Each route has an upstream and a downstream direction. For each route the driver can drive in the upstream direction and/or in the downstream direction.
A simplified pseudo-code is for the Driver class is the following:
class Driver:
HashMap<Route, Direction> upstream;
HashMap<Route, Direction> downstream;
HashMap<Route, Direction> assignedTo;
where the assignedTo map is actually a property returning a hashmap composed of the routes where the driver is assigned to both the upstream and downstream directions (think of it as a view on the other two hashmaps)
So far I've come up the the following UML representation.
----------- ---------
| CLASS | (assignedTo) | CLASS |
| DRIVER |----------------------------| ROUTE |
----------- * | * ---------
|
-------------
| CLASS |
| DIRECTION |
-------------
^ ^
| |
------------ --------------
| CLASS | | CLASS |
| UPSTREAM | | DOWNSTREAM |
------------ --------------
However, I'm a little puzzled by the fact that in the UML I;m using inheritance while the code uses no inheritance. What do you think?
I've changed a little but this is another sample of mine. I am not sure if I understand the shown pseudo-code correctly, but the case when assigned to both directions, it could be a problem. In my personal opinion, my sample diagram would be easier for implementation too.
Regarding the inheritance, the answer would be different what this UML is for.. to represent how to implement or to explain the concepts. If the latter, there would be no problem using inheritance.

UML association class - clarifying

I am reading "UML distilled" by Martin Fowler, and during reading about association classes I got this quote:
What benefit do you gain with the association class to offset the
extra notation you have to
remember? The association class adds an extra constraint, in that
there can be only one instance of
the association class between any two participating objects.
Then there was an example, but I want to make sure I got this right, if for example I got:
--------- ---------
| |* *| |
| CLASS A |----------| CLASS B |
| | | | |
--------- | ---------
|
______|______
| |
| |
| CLASS C |
| |
|_____________|
then, for every distinct pair (instance of A,instance of B) there exists only one instance of class C.
So if I would take A1,A2,B1,B2-instances then for (A1,B1) (A1,B2) (A2,B1) (A2,B2) I would get 4 instances of C, nothing less, nothing more?
From the UML 2.5 specification:
Note that when one or more ends of the AssociationClass have
isUnique=false, it is possible to have several instances associating
the same set of instances of the end Classes.
Mr. Fowler may have gotten the facts wrong. There is no extra constraint, just the ability to store additional property values.
When isUnique=false, extra properties allow one to model multiple visits to the same doctor on different dates, or multiple purchases of the same products on different dates, for example.
That'd be correct, without any intention to mix concepts here but it's similar to Tables in a database where:
A 1-* C
B 1-* C
Where C can be seen as the result of breaking a many to many relationship between A and B.
For each row on B can only exist 1 and only 1 Row C and That Particular row (on C) can only me related to 1 row on A.
Hence, for each Pair of unique rows on A and B can only exist 1 row on C or none, because the * indicates 0 or more.
Your reasoning is correct: if an association class does not have one or both association ends annotated with {nonunique}, then it implies the constraint that there can be only one link between the same objects (as explained by Martin Fowler).
Notice, however, that the option of non-unique association ends has only been added in UML 2 (in 2005), and Martin Fowler's book (from 2003) refers to UML 1.x.
Some examples may help. For instance, the association LandPurchase between Person and PieceOfLand could be modeld as a UML association class (with default unique association ends), since there can be only one purchase link between a person and a piece of land. The association ProductPurchase between Person and Product can only be modeld as an association class if the association end at the Product side is annotated as {nonunique} since there can be more than one purchase link between the same person and the same product (as a type). For instance, I can buy more than one Tesla Model S cars (if I would have the money).
Similarly, in the case of Appointment between Person and Doctor, since the same person can have more than one appointment with the same doctor, the association end at the Doctor side has to be annotated as {nonunique}.
Association in UML represented (have) logical sens (UML is not tool for database modeling!). Association describe possible logical fact. E.g. Two person A and B could be married, we can draw this as association, it is representing meaning like a "we know that exist an logical connection between person A and person B". If we know what that is, we draw class association [marriage cerificate] as materialised fact.

UML Do you still show composition/aggregate objects as member variables?

If I had a class Airplane and a class Wing, if there was a composition relationship between the two, does Airplane have a member variable of type Wing in the class diagram, shown in the Airplane box?
ASCII art!
+-------------+ 1 1..* +----------+
| Airplane |<*>------------| Wing |
+-------------+ +----------+
where <*> represents a filled diamond, indicating composition. I used multiplicity 1..*, since it's possible to have aircraft that are essentially a single wing (such as the B-2), and although nobody builds them anymore AFAIK, you have biplanes (2 or 3 wings, depending on how you're counting), triplanes, etc.
No. Compositions and aggregations are kinds of associations and are shown like associations, i.e., with lines between classes (with solid and hollow diamonds, respectively, on the containing side). As a general rule, if you have an attribute whose type is a class, your model is wrong.
Implementation is a completely separate matter from analysis/design. You may implement associations in a variety of ways, including using member variables e.g. in C++.
I'm not sure at 100%, but as far I remember no. Is just implicit that u will'have a variable of type Wing.
No, it doesn't. But that doesn't mean that you can't have an attribute that is of class type. You just can't have both. It's a choice about what you want to emphasise.

SpecFlow/Cucumber/Gherkin - Using tables in a scenario outline

Hopefully I can explain my issue clearly enough for others to understand, here we go, imagine I have the two following hypothetical scenarios:
Scenario: Filter sweets by king size and nut content
Given I am on the "Sweet/List" Page
When I filter sweets by
| Field | Value |
| Filter.KingSize | True |
| Filter.ContainsNuts | False |
Then I should see :
| Value |
| Yorkie King Size |
| Mars King Size |
Scenario: Filter sweets by make
Given I am on the "Sweet/List" Page
When I filter sweets by
| Field | Value |
| Filter.Make | Haribo |
Then I should see :
| Value |
| Starmix |
These scenarios are useful because I can add as many When rows of Field/Value and Then Value entries as I like without changing the associated compiled test steps. However copy/pasting scenarios for different filter tests will become repetitive and take up alot of code - something I would like to avoid. Ideally I would like to create a scenario outline and keep the dynamic nature I have with the tests above, however when I try to do that I run into a problem defining the example table I cant add new rows as I see fit because that would be a new test instance, currently I have this:
Scenario Outline: Filter Sweets
Given I am on the <page> Page
When I filter chocolates by
| Field | Value |
| <filter> | <value> |
Then I should see :
| Output |
| <output> |
Examples:
| page | filter | value | output |
| Sweet/List | Filter.Make | Haribo | Starmix |
So I have the problem of being able to dynamically add rows to my filter and expected data when using a scenario outline, is anyone aware of a way around this? Should I be approaching this from a different angle?
A workaround could be something like :
Then I should see :
| Output |
| <x> |
| <y> |
| <z> |
Examples:
| x | y | z |
But thats not very dynamic.... hoping for a better solution? :)
I don't think what you're asking for is possible with SpecFlow, Gherkin, and out-of-the-box Cucumber. I can't speak for the authors, but I bet it purposely is not meant to be used this way because it goes against the overall "flow" of writing and implementing these specs. Among many things, the specs are meant to be readable to non-programmers, to give the programmer a guide to implement code that matches the specs, for integration testing, and to give a certian amount of flexibility when refactoring.
I think this is one of the situations where the pain you're feeling is a sign that there's a problem, but it may not be the one you think. You said:
"However copy/pasting scenarios for different filter tests will become repetitive and take up alot of code - something I would like to avoid. "
First, I'd disagree that explaining yourself in writing is "repetitive," at least any more than it's repetitive to use specific words like "the, apple, car, etc." over and over again. The issue is: Are these words properly explaining what you're doing? If they are, and explaining your situation requires you to write out multiple scenarios, then that's just what it requires. Communication requires words, and sometimes the same ones.
In fact, what you call "repetitive" is one of the benefits of using Gherkin and a tool like Cucumber or SpecFlow. If you're able to use that sentence over and over and over and over, it means you're not having to write the test code over and over and over and over.
Second, are you sure you're writing a spec for the right thing? I ask only because if the number of scenarios gets out-of-hand, to the point where you have so many that a human can't follow what you write, it's possible that your spec isn't targeted at the right thing.
A possible example of this could be how you're testing the filtering and the pagination in this scenario. Yes, you want your specs to cover full features and your site will have pagination on the same page as your filtering, but at what cost? It takes experience and practice to know when giving up on the supposed "ideal" of no-mocking, full-integration tests will yield better results.
Third, don't think that specs are meant to be perfect coverage for every possible scenario. The scenarios are basically snapshots of state, which means that there are some features that could cover an infinitely-large set of scenarios, which is impossible. So what do you do? Write features that tell the story as best you can. Even let the story drive the development. However, details that don't translate to your specs or other cases are best left to straight-up TDD, done in addition to the specs.
In your example, it seems that you basically are telling a story about a site that lets a user create a dynamic search against sweets and candy. They enter one of a large set of possible search criteria, click a button, and get results. Just stick to that, writing only enough specs to fulfill the story. If you're not satisfied with your coverage, clean it up with more specs or unit tests.
Anyway, that's just my thoughts, hope it helps.
Technically, I think you could try calling steps from within a step definition:
Calling Steps from Step Definitions
For example I think you could rewrite the
Then I should see :
| Output |
| <output> |
To be a custom step like
I should have output that contains <output>
Where output is a comma separated list of expected values. In the custom step you could break the comma separated list into an array and iterate over it calling
Then "I should see #{iterated_value}"
You could use a similar technique to pass in lists of filters and filter values. Your example row for the king size test might look like
| page | filter | value | output |
| Sweet/List | Filter.KingSize, Filter.ContainsNuts | True, False | Yorkie King Size, Mars King Size |
Or maybe
| page | filter-value-pairs | output |
| Sweet/List | Filter.KingSize:True, Filter.ContainsNuts:False | Yorkie King Size, Mars King Size |
That being said, you should perhaps take Darren's words to heart. I'm not really sure that this method would help the ultimate goal of having scenarios that are readable by non-developers.

Resources