How do I find a UML object if I only know the value of one of its attributes in OCL? - uml

I have a UML class Student that has an attribute id. I want to write a query in OCL to find a particular Student by only knowing its id. Should I use allInstances()? Or derive? I'm not very familiar with OCL.

Usually OCL is used to express some constraints on an UML model, in relation to a given class instance (context or self) and you’ll start navigating from a specific instance.
But OCL was developed as a formal specification language that can be used to specify more than only constraints, and in particular queries as explained in section 7.1. of the OCL specifications.
If you want to express something regarding all the possible instances of a class MyClass, you would then start your clause with:
MyClass.allInstances()
Wich is a set containing all the instances of the given class. Typically, you would then operate on this set to further specify some model features.
For example, to express uniqueness of an id, you would write a Boolean clause on this set (based on example of section 7.5.10 of the OCL specs)
MyClass.allInstances()->forAll(e1, e2 | e1 <> e2 implies e1.id <> e2.id)
One of the operation you can perform on such set is to create a subset by selecting elements that match a certain condition, and this should answer your question:
MyClass.allInstances()->select(c|c.id='...')
Additional thoughts:
OCL is an abstract language. Nothing is said on how this expression will be implemented. It could be a large inefficient iteration over an in-memory collection, or a very effective SQL query. OCL will help you to define what the result should be, but not how to really get it. So performance should not be your concern at this level of abstraction.
Now, you didn’t tell the purpose of your OCL query. If it is to explain how database queries will be performed, you could see some advantages in separation of concerns: identify the relevant classes and reusable queries and enrich your model using the repository pattern: a repository is a (singleton) class MyClassRepository that acts as a container of all objects of a given MyClass in the database. You would then define operation for manipulating and querying the database. Typically, you’ll have a couple of getXxxx() operation like getById() that return one or a set of several instances. First it will make explicit what is to be implemented as a database functionality, second the new operations can be used to simplify some OCL expressions.

Yes. allInstances() is the easy solution, but potentially the least efficient and not necessarily flexible.
If your application and constraints (for one School) are expanded to multiple schools you may realize that actually you wanted the Student at a particular School, so it is often better to go to the logical context and invoke a method on the School to return a given Student. Perhaps this is what you meant by 'derive'. The implementation of School.students->at(id) might well use a Map giving multiple gains, through not searching the whole model for every access and through having a fast access to what you have already got.

Related

Normalization versus multiple inheritance

I have to model a situation where I would like to use specializations to ensure classes are somewhat normalized, but:
Risk multiple inheritance problems, especially in the long run
Will need to derive an XML-compliant UML model from it (a.o., only one superclass allowed)
The simplified situation is as follows (see also diagram below): we have Parts, like doors, bolts, wheels, etc., and Tools, like drills, ladders, and bigger machinery. All of these may be used in generic processes, like Orders, Shipments, etc. As such, I would like to have one superclass (Powertype, maybe?) that represents them, say Item.
Both Tools and Parts have specialized classes that carry a serial number. As such, I figured that a SerializedItem class with a SerialNumber, which both SerializedPart and SerializedTool inherit, would ensure that all serialized 'things' we have carry at least the same information. However, I also need these Serialized items to carry at least the same information as their more generic parts, and hence I would introduce multiple inheritance.
I have considered making the Item classes interfaces. This would at least mitigate some (many, all?) multiple inheritance problems. This is where another however comes in: aside from an attribute SerialNumber, I would also like to enforce that all Serialized specializations have an aggregation relation with a Manufacturer. Aggregation to an interface is not allowed, so I feel like I cannot with one relation to the superclass enforce this relation.
As such, I have the following considerations/problems:
Have two disjoint 'branches' of Item, with little to no technical governance on content of Serialized specializations
Item classes as Interfaces, but then little governance w.r.t. use of Manufacturer by Serialized specializations
All concrete classes, but then there exist multiple inheritance issues which must be solved when trying to derive XML classes from the model
Which option would you prefer, and why? Did I miss any considerations?
If you want to have a (platform-independent) information design model (similar in spirit to a conceptual model), then you should use multiple-inheritance if this reflects the concepts of your problem domain.
According to such a model-based engineering approach, your model is a pretty good design model that can be used as a basis for making (platform-specific) implementation models such as, e.g., a Java class model or an XML Schema model.
For making an XML Schema model, you would have to choose a certain mapping. In particular, you need to choose a mapping for resolving the multiple inheritance pattern, see also https://stackoverflow.com/a/27102169/2795909.
I just would not make SerializedItem a superclass. Nothing is a serialized thing which generalization would mean. Things can conform to a serialization protocol which is the same as implementing an interface (maybe called Serializable). If you happen to deal with serializable things without bothering about their content you would just deal with Serializable and only know the number.
Basically you should make your SerializedItem an interface (eventually renaming it to Serializable), remove the generalization upwards and make the two horizontal ones realizations.
This is probably not an ultima ratio. But to me this approach sounds more reasonable.

UML and Implementation: Associating Classes through IDs

I was recently studying an online course. it was recommended that to reduce coupling we could simply pass the ID from the customer object to the Order object. that way the Order did not have to have a full reference to the Customer class.
The idea certainly seems simple and why pass a whole object if you don't need all its attributes?
1) What do you think of this idea?
2) How would I express the relationship between the Customer class and the Order class in UML if only an ID is passed. This isn't just an example of aggregation is it? Doesn't composition and aggregation require more than just passing a value?
Thanks!
First of all you need to be clear about what UML actually is. On the one hand you have an idea and on the other side there is some code running on hardware. Ideally the latter supports the first in a way that brings added value to a user of the idea. Now, there are many possibilities to describe the way from idea to code. And UML is one of them. It is possible to describe each step on this way but for pragmatic reasons UML stops at the border of code, namely programming languages.
Now for you concrete question: Any object can be seen as an instance. That is some concrete memory partition with a fixed address. Programming languages realize instances by allocating memory and using the start address as reference. And since this reference does not change the object can be identified by its address. Clearly then, an association will just be the a pointer. And an association class will hold two (or more) such pointers.
Honestly, the very first time I started with OO I was also confused and thought that it's a waste of resource to pass those large objects. But since it's just a pointer it's really easy going.
Again, things can get more difficult if you need to persist objects. In that case you need an artificial key you can save along with the object and you will likely need tables to map artificial key to the concrete instance address.
The answer to this question depends on a number of factors, which I started listing in a comment attached to your question. I will assume that you are either using UML to create a Domain Model, or you are describing an implementation done using a statically typed language.
If you are using UML to create a Domain Model, you are obfuscating the semantics when you use an ID to "link" classes. Just draw and annotate the association and you're done.
If you are describing an implementation done using a statically typed language - types exist for a reason. Using generic IDs to link things means that the information that the system needs most become more indirect, and therefore more opaque (which is bad). In your case, the Order object still must acquire a typed reference to a Customer object to do anything with it.
For example, the Order may acquire a reference to the Customer by invoking a lookup by the ID, but it must cast the reference to an appropriate type to invoke anything on the Customer object. So you haven't reduced the coupling from the Order to the Customer. You just buried it somwhere else.

Should DDD entities compare by reference or by ID?

When I started using DDD, I created Equals() methods in my entities that compared the ID of the entity. So two entity objects with the same ID would be considered equal.
At some point I thought about that and found that two entities in different states should not be considered equal, even when they describe the same thing (i.e. have the same ID). So now I use reference equality for my entities.
I then stumbled over this answer by Mark Seemann, where he writes
Entities are equal if their IDs equal each other.
Now, of course, I'd like to know which approach is better.
Edit: Note that the question is not whether having two instances of the same entity at the same time is a good idea. I'm aware that in most situations it is probably not.
The question is twofold. First, what you really want to know is
How to handle terms that the X language (or Y framework) impose when I code a domain model with it?
C# for example imposes you that any new concept you define inherit a certain set of public methods. Java includes even more methods.
I've never heard a domain expert talking about hash codes or instance equality, but this is one of those situations when the (often misunderstood) quote "don't fight the framework" from Evans apply: just teach developers to not use them when they do not belong to domain's interfaces.
Then, what you want to know is
What is an entity? How it relates to its own identity?
Start with why! You know that entities are terms of the ubiquitous language that are identifiable.
But WHY?
Plain simple: entities describe concepts whose evolution in time is relevant in the context of the problem we are solving!
It is the relevance of the evolution that defines the entity, not the other way around! The identity is just a communication tool to keep track of the evolution, to talk about it.
As an example think about you: you are a person with a name; we use your name to communicate about your interactions with the rest of the world during your life; still, you are not that name.
Ask yourself: why I need to compare domain entities? Is the domain expert talking this way? Or I'm just using a DDD parlance to describe a CRUD application that interact with a relational database?
To me, the need to actually implement Equals(object) or GetHashCode() into an entity looks like a smell of an inadequate infrastructure.
Entities shouldn't be compared like that, in the first place. There is no valid use case (outside testing, but then again the assertion library should handle this for you) to see if 2 entities are equal using the object's Equals method.
What makes an Entity unique is its Id. The purpose of the id is to say 'this very entity is different from other entities despite having identical properties/values'.
That being said, in a Domain you might need to compare a concept instance with another instance. The comparison is done according to Bounded Context (or even Aggregate) specific Domain rules. It doesn't matter an entity is involved, it could have been a value object as well.
Basically the 'comparison' should be a Domain use case which will be probably implemented as a service. This has no relation to an object's Equals method, which is a technical aspect.
When doing DDD, don't think like a programmer (i.e technical aspects) think like an architect (high level). Code, programming language etc is just an implementation detail.
I think it's bad idea to have the two separate instances of the same entity in different states. I can't think of a scenario where that would be desirable. Maybe there is one? I believe there should only ever be one instance of a given entity with a particular ID.
In general I'd compare their equality using their IDs.
But if you wanted to check if they are the same object reference then you could just use:
if (Object.ReferenceEquals(entityA, entityB))
{
DoSomething();
}

UML - association or aggregation (simple code snippets)

I drives me crazy how many books contradicts themselves.
Class A {} class B {void UseA(A a)} //some say this is an association,
no reference is held but communication is possible
Class A {} class B {A a;} //some say this is
aggregration, a reference is held
But many say that holding a reference is still just an association and for aggregation they use a list - IMHO this is the same, it it still a reference.
I am very confused, I would like to understand the problem.
E.g. here: http://aviadezra.blogspot.cz/2009/05/uml-association-aggregation-composition.html - what is the difference between Strong Association and Aggregation, in both cases the author uses a field to store the reference..
Another example:
This is said to be Association:
And this is said to be Aggregration:
public class Professor {
// ...
}
public class Department {
private List<Professor> professorList;
// ..
}
Again, what is the difference? It is a reference in both cases
This question has been, and will be, asked many times in many different variants, because many people, including many high-profile developers, are confused about the meaning of these terms, which have been defined in the UML. Since the question has been asked many times, it has also been answered many times. See, e.g. this answer. I'll try to summarize the UML definitions.
An association between two classes is not established via a method parameter, but rather via reference properties (class attributes), the range/type of which are the associated classes. If the type of a method parameter is a class, this does not establish an association, but a dependency relationship.
It's essential to understand the logical concept of associations first, before looking at how they are coded. An association between object types classifies relationships between objects of those types. For instance, the association Committee-has-ClubMember-as-chair, which is visualized as a connection line in the class diagram shown below, may classify the relationships FinanceCommittee-has-PeterMiller-as-chair, RecruitmentCommittee-has-SusanSmith-as-chair and AdvisoryCommittee-has-SarahAnderson-as-chair, where the objects PeterMiller, SusanSmith and SarahAnderson are of type ClubMember, and the objects FinanceCommittee, RecruitmentCommittee and AdvisoryCommittee are of type Committee.
An association is always encoded by means of reference properties, the range/type of which is the associated class. For instance, like so
class Committee { ClubMember chair; String name;}
In the UML, aggregation and composition are defined as special forms of associations with the intended meaning of classifying part-whole-relationships. In the case of aggregation, as opposed to composition, the parts of a whole can be shared with other wholes. This is illustrated in the following example of an aggregation, where a course can belong to many degree programs.
The defining characteristic of a composition is to have exclusive (or non-shareable) parts. A composition may come with a life-cycle dependency between the whole and its parts implying that when a whole is destroyed, all of its parts are destroyed with it. However, this only applies to some cases of composition, and not to others, and it is therefore not a defining characteristic. An example of a composition where the parts (components) can be detached from the whole (composite) and therefore survive its destruction, is the following:
See Superstructures 2.1.1:
An association may represent a composite aggregation (i.e., a whole/part relationship). Only binary associations can be aggregations. Composite aggregation is a strong form of aggregation that requires a part instance be included in at most one composite at a time. If a composite is deleted, all of its parts are normally deleted with it. Note that a part can (where allowed) be removed from a composite before the composite is deleted, and thus not be deleted as part of the composite. Compositions may be linked in a directed acyclic graph with transitive deletion characteristics; that is, deleting an element in one part of the graph will also result in the deletion of all elements of the subgraph below that element. Composition is represented by the isComposite attribute on the part end of the association being set to true.
Navigability means instances participating in links at runtime (instances of an association) can be accessed efficiently from instances participating in links at the other ends of the association. The precise mechanism by which such access is achieved is implementation specific. If an end is not navigable, access from the other ends may or may not be possible, and if it is, it might not be efficient. Note that tools operating on UML models are not prevented from navigating associations from non-navigable ends.
Your above examples are on different abstraction levels. Department/Course are concrete coding classes while Department/Professor are at some abstract business level. Though there is no good source (I know) explaining this fact, composition and aggregation are concepts you will use only on business level and almost never at coding level (exception below). When you are at code level you live much better with Association having role names on both sides. Roles themselves are a different(/redundant!) rendering of properties of a class that refer to the opposite class.
Aggregation as a strong binding between classes is used e.g. in database modeling. Here you can delete a master only if the aggregates have all been deleted previously (or vice vera: deleting the master will force deletion of the aggregates). The aggregate can not live on its own. The composition as in your example is (from my POV) a silly construct as it pretends to be some week aggregation. But that's simply nonsense. Then use an association. Only on a business level you can try to model (e.g.) machine parts as composite. On a concrete level a composition is a useless concept.
tl;dr;
If there is a relation between classes show it as simple association. Adding details like roles will aid when discussing domain details. Use of composition/aggregation is encouraged only when modeling on business level and dis-encouraged on code level.
I've written an article about the differences between UML Association vs Aggregation vs Composition based on the actual UML specification rather then interpretations of book authors.
The primary conclusion being that
In short, the Composition is a type of Association with real constraints and impact on development, whereas the Aggregation is purely a functional indication of the nature of the Association with no technical impact.
Navigability is a completely different property and independent of the AggregationKind.
For one thing, UML is a rich language, meaning there is more than one way to describe the same thing. That's one reason you find different ways described in different books (and conflicting answers on SO).
But a key issue is the huge disconnect between UML and source code. How a specific source code construct is represented in UML, and vice versa, is not part of the UML specification at all. To my knowledge, only one language (Java) has an official UML profile, and that's out of date.
So the representation of specific source-language constructs are left to the tool vendors, and therefore differ. If you intend to generate code from your model, you must follow the vendor's conventions. If, conversely, you wish to generate a model from existing source code, you get a model based on those same conventions. But if you transfer that model to a different tool (which is difficult at the best of times) and generate code out of that, you won't end up with the same code.
In language-and-tool-agnostic mode, my take on which relationships to use in which situations can be found here. One point there worth repeating is that I don't use undirected associations in source-code models, precisely because they have no obvious counterpart in actual code. If in the code class A has a reference to class B, and B also has one to A, then I draw two relationships instead.

Domain model or base type in find method?

While implementing a find (or search) method in a repository class, is is better to accept a domain model or is it better to implement specific find methods?
For example we have a Person class with the attributes name, id.
In the repository we could have a find method that accepts a person as a parameter. That method will use the given model to search for an existing person.
The other approach is to implement a find method per attribute (find_by_name, find_by_id).
Since I will be implementing this in Python I could also implement a method accepting keywords. This will resemble the accept-a-model approach: find(name='harry')
As a side question, when the find method concerns an indexed value (id), is it better to use get_by_id() (which implies indexes) or find_by_id() (which is more abstract).
Personally I would implement specific find methods. A repository is a collection-like abstraction of the persistence mechanism and its interface should be written in the semantics of your domain.
Although queries like find_by_name or find_by_id are valid, very often one would need queries of the type find_vip_persons which could be a combination of several properties of the Person aggregate root (e.g. salary > 10000 & age > 21). Especially in cases like these I would avoid a generic query method, since the domain logic (i.e. what makes a person a VIP) could easily become scattered everywhere in your code base.
Although I'm not very familiar with Pythons keyword arguments and could be wrong here, I would assume that the 'accept-a-model approach' you're considering doesn't allow for more complex conditions like the VIP example from above anyway (i.e. comparison operators).
If you want use a generic repository interface and reuse queries at different locations in the domain, then you should have a look at the 'Specification Pattern'.
Regarding your 'Find vs. Get' question I'd say it doesn't really matter. I would probably use 'Query' instead, but that's just a matter of personal preference.
Use findBy(attribute) is preferrable in semantics and more meaningfule if there is not many specific query requirements.
personRepository.find_by_name(name); //is easy to read
personRepository.find_by_age(age); //
personRepository.find(person); //this one is at odds and confused
But if there are too many specific query methods on the repository, it's also a pain. In this case, you need a criteria. Pretty much the same way you use your find_by_person now, but more natural in semantics.
criteria.nameEq = 'hippoom';
personRepository.findBy(criteria);
criteria.worksFor = 'XXX company';
criteria.ageGt = 25
personRepository.findBy(criteria);

Resources