Are datatypes the enemy of abstraction?

Are datatypes the enemy of abstraction? - alloy

Page 137 of the book Software Abstractions has these remarkable statements:
Integers are actually not very useful. If you think you need them,
think again; there is often a more abstract description that matches
the properties better. Just because integers appear in the problem
domain does not mean that they should be modeled as such. To figure
out whether integers are necessary, ask yourself what properties are
actually relied upon. For example, a communication protocol that
numbers its messages may rely only on the numbers being distinct; or
it may rely on them increasing; or perhaps even being totally ordered.
In none of these cases should integers be used.
Wow!
This is important. I want to deeply understand this.
To help me understand, I created two versions of the communication protocol.
The first version uses the Int datatype:
sig Message {
number: Int
}
The second version does not:
sig Message {
number: Number
}
sig Number {}
Is the second version more abstract? How is it more abstract? Is it more abstract because Number is not tied to a datatype? The first version is less abstract because it specifies a datatype (Int)?
Are datatypes the enemy of abstraction?

No, the second is no better. Suppose your messages aren't totally ordered, but are just partially ordered. Then the point is that rather than assigning an index to each message, you'd do better to make explicit the partial ordering on messages:
sig Message {follows: set Message, ...}

Related

Always valid domain model entails prefixing a bunch of Value Objects. Doesn't that break the ubiquitous language?

The principle of always valid domain model dictates that value object and entities should be self validating to never be in an invalid state.
This requires creating some kind of wrapper, sometimes, for primitive values. However it seem to me that this might break the ubiquitous language.
For instance say I have 2 entities: Hotel and House. Each of those entities has images associated with it which respect the following rules:
Hotels must have at least 5 images and no more than 20
Houses must have at least 1 image and no more than 10
This to me entails the following classes
class House {
HouseImages images;
// ...
}
class Hotel {
HotelImages images;
}
class HouseImages {
final List<Image> images;
HouseImages(this.images) : assert(images.length >= 1),
assert(images.length <= 10);
}
class HotelImages {
final List<Image> images;
HotelImages(this.images) : assert(images.length >= 5),
assert(images.length <= 20);
}
Doesn't that break the ubiquitous languages a bit ? It just feels a bit off to have all those classes that are essentially prefixed (HotelName vs HouseName, HotelImages vs HouseImages, and so on). In other words, my value object folder that once consisted of x, y, z, where x, y and z where also documented in a lexicon document, now has house_x, hotel_x, house_y, hotel_y, house_z, hotel_z and it doesn't look quite as english as it was when it was x, y, z.
Is this common or is there something I misunderstood here maybe ? I do like the assurance it gives though, and it actually caught some bugs too.

There is some reasoning you can apply that usually helps me when deciding to introduce a value object or not. There are two very good blog articles concerning this topic I would like to recommend:
https://enterprisecraftsmanship.com/posts/value-objects-when-to-create-one/
https://enterprisecraftsmanship.com/posts/collections-primitive-obsession/
I would like to address your concrete example based on the heuristics taken from the mentioned article:
Are there more than one primitive values that encapsulate a concept, i.e. things that always belong together?
For instance, a Coordinate value object would contain Latitude and Longitude, it would not make sense to have different places of your application knowing that these need to be instantiated and validated together as a whole. A Money value object with an amount and a currency identifier would be another example. On the other hand I would usually not have a separate value object for the amount field as the Money object would already take care of making sure it is a reasonable value (e.g. positive value).
Is there complexity and logic (like validation) that is worth being hidden behind a value object?
For instance, your HotelImages value object that defines a specific collection type caught my attention. If HotelImages would not be used in different spots and the logic is rather simple as in your sample I would not mind adding such a collection type but rather do the validation inside the Hotel entity. Otherwise you would blow up your application with custom value objects for basically everything.
On the other hand, if there was some concept like an image collection which has its meaning in the business domain and a set of business rules and if that type is used in different places, for instance, having a ImageCollection value object that is used by both Hotel and House it could make sense to have such a value object.
I would apply the same thinking concerning your question for HouseName and HotelName. If these have no special meaning and complexity outside of the Hotel and House entity but are just seen as some simple properties of those entities in my opinion having value objects for these would be an overkill. Having something like BuildingName with a set of rules what this name has to follow or if it even is consisting of several primitive values then it would make sense again to use a value object.
This relates to the third point:
Is there actual behaviour duplication that could be avoided with a value object?
Coming from the last point thinking of actual duplication (not code duplication but behaviour duplication) that can be avoided with extracting things into a custom value object can also make sense. But in this case you always have to be careful not to fall into the trap of incidental duplication, see also [here].1
Does your overall project complexity justify the additional work?
This needs to be answered from your side of course but I think it's good to always consider if the benefits outweigh the costs. If you have a simple CRUD like application that is not expected to change a lot and will not be long lived all the mentioned heuristics also have to be used with the project complexity in mind.

Should equals() method of value objects strive to return true by using transformations of attributes?

Assume we have a value object Duration (with attributes numberOfUnits, unit). Would it be a good idea to treat these objects as equal (for example, overriding Object.equals()) if they have the same duration but different units? Should 1 min be equal to 60 sec.
There are many contradicting examples. With java's BigDecimal compareTo() == 0 does not imply equals() == true (new BigDecimal("0").equals(new BigDecimal("0.0")) returns false). But Duration.ofHours(24).equals(Duration.ofDays(1)) returns true.

That's an unfortunately complicated question.
The simple answer is no: the important goal of value objects is to correctly model queries in your domain.
If it happens that equals in your domain has nice properties, then you should model them and everything is awesome. But if you are modeling something weird then getting it right trumps following the rules everywhere else.
Complications appear when your implementation language introduces contracts for equals that don't match the meaning in your domain. Likely, you will need to invent a different spelling for the domain meaning.
In Java, there are a number of examples where equals doesn't work as you would expect, because the hashCode contract prohibits it.

Well... I upvoted #jonrsharpe comment because without knowing the context it is almost impossible to give you an answer.
An example of what #jonrsharpe means could be that if in your domain you are using Duration VO to compare users input (who choose numberOfUnits and unit in a UI) it is obvious that Duration in minutes is not equal to Duration in seconds even if 1 min = 60 sec because you want to know if the users inputs the same things, and in this case they not.
Now, assuming that you will use Duration just for other things in which the format does not matter and ever means the same thing for domain rules (i.e. outdate something):
Why do you need Duration.unit attribute if it gives you nothing of value in your domain?
Why can not you just work with one unit type internally?
If it is just because different inputs/outputs in your system you should transform it to your internal/external(UI, REST API, etc) representation before apply rules, persist the VO (if needed) and/or showing it in a UI. So, separate input/output concerns from your domain. Maybe Duration (with unit attribue) is not a VO is just part of your ViewModel.

Is there an object-identity-based, thread-safe memoization library somewhere?

I know that memoization seems to be a perennial topic here on the haskell tag on stack overflow, but I think this question has not been asked before.
I'm aware of several different 'off the shelf' memoization libraries for Haskell:
The memo-combinators and memotrie packages, which make use of a beautiful trick involving lazy infinite data structures to achieve memoization in a purely functional way. (As I understand it, the former is slightly more flexible, while the latter is easier to use in simple cases: see this SO answer for discussion.)
The uglymemo package, which uses unsafePerformIO internally but still presents a referentially transparent interface. The use of unsafePerformIO internally results in better performance than the previous two packages. (Off the shelf, its implementation uses comparison-based search data structures, rather than perhaps-slightly-more-efficient hash functions; but I think that if you find and replace Cmp for Hashable and Data.Map for Data.HashMap and add the appropraite imports, you get a hash based version.)
However, I'm not aware of any library that looks answers up based on object identity rather than object value. This can be important, because sometimes the kinds of object which are being used as keys to your memo table (that is, as input to the function being memoized) can be large---so large that fully examining the object to determine whether you've seen it before is itself a slow operation. Slow, and also unnecessary, if you will be applying the memoized function again and again to an object which is stored at a given 'location in memory' 1. (This might happen, for example, if we're memoizing a function which is being called recursively over some large data structure with a lot of structural sharing.) If we've already computed our memoized function on that exact object before, we can already know the answer, even without looking at the object itself!
Implementing such a memoization library involves several subtle issues and doing it properly requires several special pieces of support from the language. Luckily, GHC provides all the special features that we need, and there is a paper by Peyton-Jones, Marlow and Elliott which basically worries about most of these issues for you, explaining how to build a solid implementation. They don't provide all details, but they get close.
The one detail which I can see which one probably ought to worry about, but which they don't worry about, is thread safety---their code is apparently not threadsafe at all.
My question is: does anyone know of a packaged library which does the kind of memoization discussed in the Peyton-Jones, Marlow and Elliott paper, filling in all the details (and preferably filling in proper thread-safety as well)?
Failing that, I guess I will have to code it up myself: does anyone have any ideas of other subtleties (beyond thread safety and the ones discussed in the paper) which the implementer of such a library would do well to bear in mind?
UPDATE
Following #luqui's suggestion below, here's a little more data on the exact problem I face. Let's suppose there's a type:
data Node = Node [Node] [Annotation]
This type can be used to represent a simple kind of rooted DAG in memory, where Nodes are DAG Nodes, the root is just a distinguished Node, and each node is annotated with some Annotations whose internal structure, I think, need not concern us (but if it matters, just ask and I'll be more specific.) If used in this way, note that there may well be significant structural sharing between Nodes in memory---there may be exponentially more paths which lead from the root to a node than there are nodes themselves. I am given a data structure of this form, from an external library with which I must interface; I cannot change the data type.
I have a function
myTransform : Node -> Node
the details of which need not concern us (or at least I think so; but again I can be more specific if needed). It maps nodes to nodes, examining the annotations of the node it is given, and the annotations its immediate children, to come up with a new Node with the same children but possibly different annotations. I wish to write a function
recursiveTransform : Node -> Node
whose output 'looks the same' as the data structure as you would get by doing:
recursiveTransform Node originalChildren annotations =
myTransform Node recursivelyTransformedChildren annotations
where
recursivelyTransformedChildren = map recursiveTransform originalChildren
except that it uses structural sharing in the obvious way so that it doesn't return an exponential data structure, but rather one on the order of the same size as its input.
I appreciate that this would all be easier if say, the Nodes were numbered before I got them, or I could otherwise change the definition of a Node. I can't (easily) do either of these things.
I am also interested in the general question of the existence of a library implementing the functionality I mention quite independently of the particular concrete problem I face right now: I feel like I've had to work around this kind of issue on a few occasions, and it would be nice to slay the dragon once and for all. The fact that SPJ et al felt that it was worth adding not one but three features to GHC to support the existence of libraries of this form suggests that the feature is genuinely useful and can't be worked around in all cases. (BUT I'd still also be very interested in hearing about workarounds which will help in this particular case too: the long term problem is not as urgent as the problem I face right now :-) )
1 Technically, I don't quite mean location in memory, since the garbage collector sometimes moves objects around a bit---what I really mean is 'object identity'. But we can think of this as being roughly the same as our intuitive idea of location in memory.

If you only want to memoize based on object identity, and not equality, you can just use the existing laziness mechanisms built into the language.
For example, if you have a data structure like this
data Foo = Foo { ... }
expensive :: Foo -> Bar
then you can just add the value to be memoized as an extra field and let the laziness take care of the rest for you.
data Foo = Foo { ..., memo :: Bar }
To make it easier to use, add a smart constructor to tie the knot.
makeFoo ... = let foo = Foo { ..., memo = expensive foo } in foo
Though this is somewhat less elegant than using a library, and requires modification of the data type to really be useful, it's a very simple technique and all thread-safety issues are already taken care of for you.

It seems that stable-memo would be just what you needed (although I'm not sure if it can handle multiple threads):
Whereas most memo combinators memoize based on equality, stable-memo does it based on whether the exact same argument has been passed to the function before (that is, is the same argument in memory).
stable-memo only evaluates keys to WHNF.
This can be more suitable for recursive functions over graphs with cycles.
stable-memo doesn't retain the keys it has seen so far, which allows them to be garbage collected if they will no longer be used. Finalizers are put in place to remove the corresponding entries from the memo table if this happens.
Data.StableMemo.Weak provides an alternative set of combinators that also avoid retaining the results of the function, only reusing results if they have not yet been garbage collected.
There is no type class constraint on the function's argument.
stable-memo will not work for arguments which happen to have the same value but are not the same heap object. This rules out many candidates for memoization, such as the most common example, the naive Fibonacci implementation whose domain is machine Ints; it can still be made to work for some domains, though, such as the lazy naturals.

Ekmett just uploaded a library that handles this and more (produced at HacPhi): http://hackage.haskell.org/package/intern. He assures me that it is thread safe.
Edit: Actually, strictly speaking I realize this does something rather different. But I think you can use it for your purposes. It's really more of a stringtable-atom type interning library that works over arbitrary data structures (including recursive ones). It uses WeakPtrs internally to maintain the table. However, it uses Ints to index the values to avoid structural equality checks, which means packing them into the data type, when what you want are apparently actually StableNames. So I realize this answers a related question, but requires modifying your data type, which you want to avoid...

Maximum number of variables in classes

What is the max number? Will my program crash if it exceeds certain number? Is there a standard just like it is 5 for method parameters?

An answer to such a question would depend on the language you are using, but generally speaking, there isn't any limit on the amount of variables or parameters of a method.
There is a cap on the amount of data you can handle, and that's the amount of memory available to your system, but that's a cap on the size of the actual data held by the variables.
Having a high number of variables or methods inside a class is not recommended because your code can become unmaintainable very quickly. That is due to the Single Responsibility Principle: your class should be responsible for one thing, and only one thing, and that one thing will rarely need that many variables to accurately represent it's state. In the event that it does, use Object Composition: identify the small structures which have emerged inside the class and break them up into smaller classes, then add references to objects of those classes to the original class, effectively creating a "has a" relationship between the original class and the smaller classes.
For example, a car has an engine:
class Car {
Engine engine;
};

Your code will become unreadable long before you reach any hard limits set by a programming language, both for variables and method parameters.

This is unlikely to be an issue. Although I would guess that it depends on the language you are talking about,

And why don't you try to code all your program in only one file, and with only one function ? :)
Because it's unreadable, and unmaintainable, so it's full of bugs, and so it will not work very well.
This is a kind of real limit to the number of member variables yes.

Although there is no hard limit, it is never recommended to use large number of variables in a class or method parameters. One can use composition design pattern or inheritance in some cases for reuse. The latter should used sparingly. I would rarely use more than 25 variables in a class or 5 in method parameters.

Naming suggestion for a class containing a timestamp and a float value?

I need to name this structure:
struct NameNeeded
{
DateTime timestamp;
float value;
}
I will have arrays of this struct (a time-series).
I'd like a short and suggestive name. The data is financial (and Tick is not a good name).
The best one I can think of is DataPoint, but I feel that a better one exists :)
How would you name it?

Since you have a data value and an associated timestamp, the first thing that popped into my head was DataSample. I pictured a series of these, as if you were taking a digital sampling of an analog signal (the two values were like x- and y-coordinates on a graph).

My old scientist neurons are telling me that this is a Measurement. A measurement is an instrument reading associated with some context - time, position, experimental conditions, and so on.
The other metaphor that springs to mind is a Snapshot, or a moment in an evolving scene illuminated by a strobe light - an Instant, perhaps.

Given that we can't associate a specific concept with the float value structure member, only vague names such as "Value", "Number", "Float" or "Data" come to mind.
The DateTime timestamp member suggests to me that the name should have a time related suffix such as "When", "AtTime", "Instant" or "Moment"
So, combining these name fragments and you could have
ValueWhen
ValueAtInstant
NumberWhen
DataAtTime
etc.
When stuck on a naming problem, consulting a dictionary or thesaurus can sometimes help. It's pleasing to see well chosen type names, and gratifying to come up with them - good luck with your quest.

I would personally include "Float" in the name, to leave open the possibility of providing other time-stamped types. For example, you could provide a timestamped int or enum for analyst recommendation.
If you want the time-stamping to be implicit, consider "FloatValue." Making it implicit might be desirable if other attributes might someday join the timestamp (e.g., source of data, confidence level, or uncertainty).
If you want to be explicit, one possibility would be "RecordedFloat."

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Are datatypes the enemy of abstraction? - alloy

No, the second is no better. Suppose your messages aren't totally ordered, but are just partially ordered. Then the point is that rather than assigning an index to each message, you'd do better to make explicit the partial ordering on messages: sig Message {follows: set Message, ...}

Related

Always valid domain model entails prefixing a bunch of Value Objects. Doesn't that break the ubiquitous language?

Should equals() method of value objects strive to return true by using transformations of attributes?

Is there an object-identity-based, thread-safe memoization library somewhere?

Maximum number of variables in classes

Naming suggestion for a class containing a timestamp and a float value?

Categories

Resources