Delete orphaned shared ValueObjects? - domain-driven-design

I'm still trying to understand DDD. Let's say I have an immutable VO PresentationSlide. Because it is immutable it can be easily shared and used by different entities.
But what happens if I try to map this to a database. In EntityFramework I could model the PresentationSlide as a ComplexType, meaning the properties of the shared PresentationSlide are mapped to the tables using it. This is fine, but a Slide might be quite large and therefore I'm wasting space, if it used/reference several times.
As an alternative approach I could map a PresentationSlide to a separate table and reference it. Because it is immutable this should also work. But now, if I modify a Presentation, I have to copy the content of the old PresentationSlide and create a new instance. If there are a lot of changes, I will soon have a lot of orphaned PresentationSlides in my database.
Isn't this a problem? Is there a solution?
Should I implement a custom periodical 'Cleanup orphaned PresentationSlides'-Task?

First you should think about ownership and life cycle of the PresentationSlide within your domain model. Always make sure you follow your model semantics when doing performance or storage optimization.
I'd start out with duplicating each PresentationSlide within its entity. This is just the natural way to do, because that's what you do in your domain model, too.
Put metrics in place that enable you to make an informed decision about the storage optimization. E.g. go with the duplication approach and calculate the space wasted due to duplicates after some time in production.
Only if you really have a problem, do the storage optimization. Otherwise you've become victim of Premature Optimization.
If you really need to make an optimization, I think the approach you mention is a reasonable one. Create a separate table, and regularly clean up PresentationSlides that are not referenced anymore.

Related

Data.HashMap performances outside of IO

I've discovered Helm (http://helm-engine.org/) the other day and I've been playing a bit with it. I like Elm, so Helm has been great to use thus far.
Simply put, the update function gets called every tick and get passed the model, and has to return an updated version of that model. Another function then gets called to render that model on screen.
For small games with not much in the model it seems ideal, but I've been thinking about something a bit bigger, for which a HashMap (or a lot of them) would be ideal, and I'm wondering about the performances of that.
I'm no expert but I believe using the Data.HashTable.IO should modify the hashtable in ram instead of creating a new copy on change, but that seems complicated to interface with Helm. That would mean using a Cmd for each lookups and each changes and returning that to Helmn and then get passed the result from a new call to update, a nightmare to use if you have more than one or two things to lookup I think.
Data.HashMap.Strict (or Lazy ?) would probably work better, but I imagine each change would create a new copy, and the GC would free up the old one at some future point. Is that correct ?
That would potentially mean hundreds of copy then free per frame, unless the whole thing is smart enough to realise I'm not using the old hashtable again after the change and just not copy it.
So how does this work in practice ? (I'm thinking of HashMap because it seems like the easier solution, but I guess this applies to regular lists too).
I support the comments about avoiding premature optimization and benchmarking, instead of guessing, to determine if performance is acceptable. That said, you had some specific questions too.
Data.HashMap.Strict (or Lazy ?) would probably work better, but I imagine each change would create a new copy, and the GC would free up the old one at some future point. Is that correct ?
Yes, the path to the modified node will consist of new nodes. Modulo balancing, the sub trees on the left and right of the path will all be shared (not copied) by the old and new tree structures.
That would potentially mean hundreds of copy then free per frame
I'm not sure where you get "hundreds" from. Are you saying there are hundreds of updates? For some structures there are rewrite rules that allow much of the intermediate values to be used in a mutating manner. See for example, this small examination of vector.
So how does this work in practice ?
In practice people implement what they want and rework parts that are too slow. I might reach for HashMap early, instead of assuming the containers Data.Map will suffice, but I don't go beyond that without evidence.

Scratch couchdb document

Is it possible to "scratch" a couchdb document? By that I mean to delete a document, and make sure that the document and its history is completely removed from the database.
I do not want to perform a database compaction, I just want to fully wipe out a single document. And I am looking for a solution that guarantees that there is no trace of the document in the database, without needing to wait for internal database processes to eventually remove the document.
(a python solution is appreciated)
When you delete a document in CouchDB, generally only the _id, _rev, and a deleted flag are preserved. These are preserved to allow for eventual consistency through replication. Forcing an immediate delete across an entire group of nodes isn't really consistent with the architecture.
The closest thing would be to use purge; once you do that, all traces of the doc will be gone after the next compaction. I realize this isn't exactly what you're asking for, but it's the closest thing off the top of my head.
Here's a nice article explaining the basis behind the various delete methods available.
Deleting anything from file system for sure is difficult, and usually quite expensive problem. Even more with databases in general. Depending of what for sure means to you, you may end up with custom db, custom os and custom hw. So it is kind of saying I want fault tolerant system, yes everyone would like to have one, but only few can afford it, but good news is that most can settle for less. Similar is for deleteing for sure, I assume you are trying to adress some security or privacy issue, so try to see if there is some other way to get what you need. Perhaps encrypting the document or sensitive parts of it.

Core Data: make object-by-object copy of database

I would like to make a backup copy of my Core Data database, without using either the File Manager to make a copy of the sqlite file, or using the Persistent Store Coordinator's migratePersistentStore method (for reasons that are too long to explain here). What I want to do is to open a new persistent store with the same MOMD as the original file, create a new Managed Object Context, then iterate over all the objects in my database and insert them into the new context.
This will work for simple entities, but the problem is that my model has about 20 entities, many of them with one-to-many and many-to-many relationships. The slightly more complex solution would be to insert every object into the new MOC, and then hold all the new Managed Objects in memory and use them to tie up all the relationships between the objects in a subsequent passes. But it seems like a really messy solution.
Is there a clean, elegant way to achieve this, that might work for any kind of data model, not just a customized solution for my own model, and without having to hold every object in memory at the same time?
Thanks.
Copying the persistent store is far and above the easiest way to do this – I'd suggest revisiting your reasons against it or explaining what they are.
Copying objects from one context to another – from one on-disk persistent store to another – doesn't necessarily hold them all in memory at the same time. Core Data can turn them into faults.

Transaction with Cassandra data model

According to the CAP theory, Cassandra can only have eventually consistency. To make things worse, if we have multiple reads and writes during one request without proper handling, we may even lose the logical consistency. In other words, if we do things fast, we may do it wrong.
Meanwhile the best practice to design the data model for Cassandra is to think about the queries we are going to have, and then add a CF to it. In this way, to add/update one entity means to update many views/CFs in many cases. Without atomic transaction feature, it's hard to do it right. But with it, we lose the A and P parts again.
I don't see this concerns many people, hence I wonder why.
Is this because we can always find a way to design our data model to avoid to do multiple reads and writes in one session?
Is this because we can just ignore the 'right' part?
In real practice, do we always have ACID feature somewhere in the middle? I mean maybe implement in application layer or add a middleware to handle it?
It does concern people, but presumably you are using cassandra because a single database server is unable to meet your needs due to scaling or reliability concerns. Because of this, you are forced to work around the limitations of a distributed system.
In real practice, do we always have ACID feature somewhere in the
middle? I mean maybe implement in application layer or add a
middleware to handle it?
No, you don't usually have acid somewhere else, as presumably that somewhere else must be distributed over multiple machines as well. Instead, you design your application around the limitations of a distributed system.
If you are updating multiple columns to satisfy queries, you can look at the eventually atomic section in this presentation for ideas on how to do that. Basically you write enough info about your update to cassandra before you do your write. That way if the write fails, you can retry it later.
If you can structure your application in such a way, using a co-ordination service like Zookeeper or cages may be useful.

How to avoid mutable state (when multithreading)

Multithreading is hard. The only this you can do is program very carefully and follow good advice. One great advice I got from the answers on this forum is to avoid mutable state. I understand this is even enforced in the Erlang language. However, I fail to see how this can be done without a severe performance hit and huge amounts of caches.
For example. You have a big list of objects, each containing quite a lot of properties; in other words: a large datastructure. Suppose you have got a bunch of threads and they all need to access and modify the list. How can this be done without shared memory without having to cache the whole datastructure in each of the threads?
Update: After reading the reactions so far, I would like to put some more emphasis on performance. Don't you think that copying the same data around will make the program slower than with shared memory?
Not each algorithm can be parallelized in a successful manner.
If your program doesn't exhibit any "parallel structure", then you're pretty doomed to use locking and shared, mutable structures.
If your algorithm exhibit structure, then you can express your computation in terms of some patterns or formalism (for ex., a macro dataflow graph) that makes the choice of an immutable datastruct trivial.
So: think in term of the structure of the algorithm and just not in term of the properties of the datastructure to use.
You can get a great start in thinking about immutable collections, where they are applicable, how they can actually work without requiring lots of copying, etc. by looking through Eric Lippert's articles tagged with immutability:
http://blogs.msdn.com/ericlippert/archive/tags/Immutability/default.aspx
I guess the first question is: why do they need to modify the list? Would it be possible for them to return their changes as a list of modifications rather than actually modifying the shared list? Could they work with a list which looks like it's a mutable version of the original list, but is actually only locally mutable? Are you changing which elements are in the list, or just the properties of those elements?
These are just questions rather than answers, but I'm trying to encourage you to think about your problem in a different way. Look at the bigger picture as the task you want to achieve instead of thinking about the way you'd tackle it in a normal imperative, mutable fashion. Changing the way you think about problems is very difficult, but you may find you get some great "aha!" moments :)
There are many pitfalls when working with multiple threads and large sets of data. The advice to avoid mutable state is meant to try to make life easier for you if you can manage to follow the guideline (i.e. if you have no mutable state then multi-threading will be much easier).
If you have a large amount of data that does need to be modified then you perhaps cannot avoid mutable state. An alternative though would be to partition the data into blocks, each of which is passed to a thread for manipulation. The block can be processed and then passed back, and the controller can then perform the updates where necessary. In this scenario you have removed the mutable state from out of the the thread.
If this cannot be done and each thread needs update access to the full list (i.e. it could update any item on the list at any time) then you are going to have a lot of fun trying to make sure you have got your locking strategies and concurrency issues sorted. I'm sure there are scenarios where this is required, and the design pattern of avoiding mutable state may not apply.
Just using immutable data-objects is a big help.
Modifying lists sounds like a constructed argument, but consider granular methods that are unaware of lists.
If you really need to update the structure one way to do this is have a single worker thread which picks up update requests from a fixed area prtected by a mutex.
If you are clever you can update the structure in place without affecting any "reading"
threads (e.g. If you are adding to the end of an array you do all the work to add the new structure but only as the very last instruction do you increment the NoOfMembers count -- the reading threads should not see the new entry until you do this - or - arrange your data as an array of references to structures -- when you want to update a structure you copy the current member, update it, then as the last operation replace the reference in the array)
The other threads then only need to check a single simple "update in progess" mutex only when they activly want to update.

Resources