How to retrieve multilingual domain model? - domain-driven-design

I have a lot of entities with 3 language columns: DescriptionNL, DescriptionFR and DescriptionDE (Description, Info, Article, ... all in 3 languages).
My idea was to create a forth property Description which return the right value according to the Thread.CurrentThread.CurrentCulture.TwoLetterISOLanguageName.
But a drawback is that when you have a GetAll() method in your repository for a dropdownlist or something else, you return the 3 values to the application layer. So extra network traffic.
Adding a parameter language to the domain services to retrieve data is also "not done" according to DDD experts. The reason is that the language is part of the UI, not the Domain. So what is the best method to retrieve your models with the right description?

You are correct in stating that a language has no bearing on a domain model. If you need to manipulate objects or data you will need to use some canonical form of that data. This only applies to situation where the value has any kind of meaning in your domain. Anything that is there only for classification may not interest your model but it may be useful to still use a canonical value.
The added benefit of a canonical value is that you know what the value represent even across systems as you can do a mapping.
A canonical approach that was used on one of my previous projects had data sets with descriptions in various languages, yet the keys were the same for each value. For instance, Mr is key 1, whereas Mrs is key 2. Now in French M. would be key 1 and Mme would be key 2. These values are your organisational values. Now let's assume you have System A and System B. In System A Mr is value 67 and in System B Mr is value 22. Now you can map to these values via your canonical values.
You wouldn't necessarily store these as entities in a repository but they should be in some read model that can be easily queried. The trip to the database should not be too big of a deal as you could cache the results along with a version number or expiry date.

Related

SharePoint choice or lookup field representing other list's fields

The scenario is that I have a Projects list and there are a bunch of different SPFieldUser fields associated to it. I have another list representing the Project's Logbook (it contains a bunch of data about different milestones of the project). The relationship is like this: (1 project list item : 1 logbook list).
I have to store some metadata in a logbook's list item that points to a specific user, stored in Project's list item. For that I have to create a choice field which represents different SPFieldUser fields from the project's list.
The question is which is optimal way of representing such a structure?
I can just hard-code a choice option for every SPFieldUser in a Projects list, but then when I have to reference this data in a code, I'll have to somehow transform the choice's value into internal name of the associated project's field.
I can also create a lookup of those fields and this way, accessing it is easy. I can show the Title to user and have the internal name stored in a lookup.
I was also thinking about defining some kind of custom FieldType, but I feel like it would require far more work than an of the other methods.
So which method do I choose? Can someone probably suggest a better way?
Lets check out your options one by one in terms of efforts and scalability.
1 Hardconding option : High efforts [Not at all recommended]
- Column needs to be updated when new user joins or user leaves the
company.
- Once format of data is specified its difficult to change. [e.g. FirstName+Lastname or Empid]
Highly recommended OOTB option : very low efforts
Configurable [Please check if you can change format of user data once added as lookup column.]
Custom column type will take coding efforts.
My recommendation is 2nd OOTB option. If you find some flaws in 2nd option let us know we can look for soultion.

What is Zone Hashing in Natural Language Processing?

Has anyone in the NLP field heard of the term Zone Hashing? From what I hear, zone hashing is the process of iterating through a document and extracting sentences. An accumulation of sentences is then hashed, and the process continues for the next n sentences...
I haven't found any references to this on Google, so I'm wondering if it goes by a different name. It should be related to measuring text similarity/nearness.
Perhaps it refers to locality sensitive hashing?
As far as I know, "zone hashing" is not a well established concept in the NLP as a discipline. It is just a simple concept used in some algorithms (related to NLP). The only one I know, which uses it is a Sphinx search server, and here, "zone hashing" is simply "hashing of objects called zones", where "zone" is described as follows:
Zones can be formally defined as follows. Everything between an
opening and a matching closing tag is called a span, and the aggregate
of all spans corresponding sharing the same tag name is called a zone.
For instance, everything between the occurrences of < H1 > and < /H1 > in
the document field belongs to H1 zone.
Zone indexing, enabled by index_zones directive, is an optional
extension of the HTML stripper. So it will also require that the
stripper is enabled (with html_strip = 1). The value of the
index_zones should be a comma-separated list of those tag names and
wildcards (ending with a star) that should be indexed as zones.
Zones can nest and overlap arbitrarily. The only requirement is that
every opening tag has a matching tag. You can also have an arbitrary
number of both zones (as in unique zone names, such as H1) and spans
(all the occurrences of those H1 tags) in a document. Once indexed,
zones can then be used for matching with the ZONE operator, see
Section 5.3, “Extended query syntax”.
And hashing of these structures is used in the traditional sense to speed up search and lookup. I am not aware of any "deeper" meaning.
Perhaps it refers to locality sensitive hashing?
Locality sensitive hashing is a probabilistic method for multi dimensional data, I do not see any deeper connections to the zone hashing then fact that both use hash functions.

Best approach for well-known rows in user-customizable table?

I write database applications for a living that the end user can customize.
Frequently, this means that--leaving the database aside for a moment--some of my notional entity types have a universe or domain that is infinite.
Take name types. You could have a first name, last name, married name, legal name, salutation name, and so on. I am not going to put an upper bound on this universe.
But I do need to find and use certain well-known name types. Let's say display name and sort name, just to keep it simple.
I would also like to be able to query for all name types (i.e. the whole universe) and have my well-known name types returned as well.
There are several strategies for accomplishing this within a database:
Have one name_type table with an id column and a code column. ID values less than a certain amount are "reserved" for use by the system; ID values higher than this are deemed to be user types.
Add a column to the id/code pair that is some representation of a boolean or an int type that indicates what type of row this is (e.g. user-defined or system). Same thing, really; just uses another column to explicitly break out the information instead of overloading it in the id.
Have two tables with perhaps a naming convention: name_type and name_type_system. It is understood or enforced that name_type_system is off-limits to users; name_type is their domain. Queries do a UNION across these tables and applications just "know" to never update the system table.
What strategies do people use? Any war stories? Any particular reasons to pick one over the other? Huge pitfalls I'm not seeing?
Best,
Laird
Of your three ideas the first is often called a Magic Number, http://en.wikipedia.org/wiki/Magic_number_(programming), and is a Bad Thing because any code that doesn't "know" about it can make mistakes. Plus you end up over time realizing, "oops, I need to push the minimum value higher, need to resequence 10,000 existing rows." headaches, headaches.
After that, either of the other two works. But the third one lets you use the DB server to deny insert/update/delete access to the account used by end-users, simplifying code.
The way to decide between option 2 and 3 is to ask, are they really 2 separate things? If they are, they will tend to have different security, different operations are performed on them, one is modified by upgrades, the other is not, etc. If they really are two different things, they go in two tables. If they are two flavors of one thing that are almost always treated the same, they go in one table with a "type" flag, option 2.

When is the Data Vault model the right model for a data-warehouse?

I recently found a reference to 'Data Vault Modeling' as a model for data-warehouses. The models I've seen before are Inmon and Kimball. The author refers to possible performance problems due to the joins needed. It looks like a nice model, but I wonder about the gotcha's. Are there any experience reports on-line?
We have been using a home-grown modification to Data Vault for a number of years, called 'Link Modelling', which only has entities and links; drawing principles from neo4j, but implementing in a SQL database.
Both Link Modelling and Data Vault are very different ways of thinking to Kimball/Inmon models.
My comments below relate to a system built with the follow structure: a temporary staging database, a DWH, then a number of marts build from the DWH. There are other ways to architect a DWH solution, but this is quite typical.
With Kimball/Inmon
Data is cleaned on the way into the DWH, but sometimes applied on the way into the staging database
Business rules and MDM are (generally) applied between the staging db and the DWH
The marts are often subject area specific
With Data Vault/Link Modelling
Data is landed unchanged in staging
These data are passed through to the DWH also uncleaned, but stored in an entity/link form
Data cleansing, MDM and business rules are applied between the DWH and the marts.
Marts are based on subject area specific needs (same as above).
For us, we would often (but not always) build Kimball Star Schema style Marts, as the end users understand the data structures of these easily.
The occasions where a Link Modelled DWH comes into its own, are the following (using Kimball terminology to express the issues)
Upon occasion, there will be queries from the users asking 'why is a specific number having this value?'. In traditional Kimball/Inmon, data is cleansed on the way in, there is no way to know what the original value was. Link Model has the original data in the DWH.
When no transaction records exist that link a number of dimensions, and it is required to be able to report on the full set of data, so e.g. ask questions like 'How many insurance policies that were sold by a particular broker have no claim transactions paid?'.
The application of MDM in a type 2 Kimball or Inmon DWH can cause massive numbers of type 2 change records to be written to Dimensions, which often contain all the data values, so there is a lot of duplication of data. With a Link Model/Data Vault, a new dimensional value will just cause new type 2 links to be created in a link table, which only have foreign keys to entity tables. This is often overcome in Kimball DWH by having a slowly changing dimension and a fast changing dimension, which is a fair workaround.
In Insurance and other industries where there is the need to be able to produce 'as at date' reports, Fact tables will be slowly changing as well, type 2 dimension tracking against type 2 fact records are a nightmare.
From a development point of view, adding a new column to a large Kimball dimension needs to be done carefully and consideration of back-populating is important, but with a Link Model, adding an extra column to an Entity is relatively trivial.
There are always ways around these in Kimball methodology, but they require some careful thought and sometimes some jumping through hoops.
From our perspective, there is little downside to the Link Modelling.
I am not connected with any of the companies marketing/producing Kimball/Inmon or Data Vault methodologies.
You can find a whole lot more information on my blog: http://danLinstedt.com, and on the forums at datavaultinstitute dot com
But to give you a quick/brief answer to your question:
The gotchas are as follows:
1) Have to accept the concept of loading raw data to the data warehouse
2) Understand that the Data Vault usually doesn't allow "end-users" direct access because of the model.
There may be a few more, but the benefits outweigh the drawbacks.
Feel free to check out the blog, it's free to register/follow.
Cheers,
Dan Linstedt

How are Value Objects stored in the database?

I haven't really seen any examples, but I assume that they are saved inside the containing entity table within the database.
Ie. If I have a Person entity/aggregate root and a corresponding Person table, if it had a Value Object called Address, Address values would be saved inside this Person table!
Does that make sense for a domain where I have other entities such as Companies etc. that have an Address?
(I'm currently writing a project management application and trying to get into DDD)
It's ok to store Value Objects in a separate table, for the very reasons you've described. However, I think you're misunderstanding Entities vs VOs - it's not a persistence related concern.
Here's an example:
Assume that a Company and Person both have the same mail Address. Which of these statements do consider valid?
"If I modify Company.Address, I want
Person.Address to automatically get
those changes"
"If I modify Company.Address, it
must not affect Person.Address"
If 1 is true, Address should be an Entity, and therefore has it's own table
If 2 is true, Address should be a Value Object. It could be stored as a component within the parent Entity's table, or it could have its own table (better database normalisation).
As you can see, how Address is persisted has nothing to do with Entity/VO semantics.
Most developers tend to think in the database first before anything else. DDD does not know about how persistence is handled. That's up to the repository to deal with that. You can persist it as an xml, sql, text file, etc etc. Entities/aggregates/value objects are concepts related to the domain.
Explanation by Vijay Patel is perfect.

Resources