Core Data "garbage collection"

Core Data "garbage collection" - core-data

Does CoreData provide something akin to garbage collection (or reference counting) so that entities can be automatically removed from the persistent store if they are not referenced by other entities?
Generally, entities are "root entities". Once created they will exist until explicitly removed.
However, I would like to label some entities as not being root entities. These should only exist provided another entity references them that is ultimately referenced by a root entity.
As a concrete example, perhaps User entity has many (possibly shared) Document entities. User is flagged root and exists until explicitly removed. Document is not flagged root, and only exists provided one or more User reference it.
Thanks.

Core Data has delete rules as mentioned. You can set up a delete rule on the relationship between User and Document so that when the User is removed any associated Document entities will be deleted with it.
This is not "garbage collection" as that implies releasing memory that is no longer being used but is referential integrity management so that the data store does not have dangling entities. However it is up to the developer to define what the rules are using the object model editor.
If the documents relationship on User specifies Cascade, and a Document is shared by a group of User, will it be deleted if any User referencing it is deleted? Or is it only deleted when the last User referencing it is deleted?
If you have a many to many and set up a cascade then yes it would delete the child on the first parent that is deleted. If you need to delete a child when there are no references left then you will need to handle that in code in a subclass of the parent entity. I would suggest looking at the -prepareForDeletion method.

Related

Can I use a Lightweight Migration to move a Relationship in a Hierarchy?

I have a Core Data model that includes Document Entities and Quote Entities. There is a many-Quotes-to-one-Document Relationship in the model.
I am introducing a new type of Quote, so I would like to create a parent BaseQuote Entity, that will have TextQuote and ImageQuote 'child' Entities. The existing Quote will become a TextQuote.
So, I need to push the Quote side of the Relationship down the hierarchy into BaseQuote.
The lightweight migration documentation says that I can manage "changes to hierarchies" and "changes to relationships", but is not clear that it handles both at once!
If I check the mapping, Core Data thinks it is possible, inferredMappingModel does not throw an error:
NSMappingModel.inferredMappingModel(forSourceModel: lastVersion, destinationModel: thisVersion)
However, when I run the migration I get a crash with the message:
Validation error missing attribute values on mandatory destination relationship
It turns out the relationship is not being correctly populated by the migration - although structurally it seems to have worked.
Has anyone tried this before and got it working?

I think this is beyond lightweight migration. The page you link to explains that relationship changes include adding, deleting, renaming, and changing to-one to to-many or back. What you need is to move the relationship from one entity to a different one in the hierarchy, that is, take a relationship to Quote and move it to the new BaseQuote. It would probably be fine if you were changing the hierarchy and making one of those changes (renaming the relationship, for example). Lightweight migration doesn't cover re-targeting a relationship to a different part of the hierarchy, though.

A problem with understanding aggregates and aggregate roots in Domain Driven Design (DDD)

I've stumbled upon a problem: "I can't split my domain models into aggregate roots".
I'm a junior developer and novice at DDD. I really want to understand it, but sometimes it's really confusing.
From this point I want to describe my domain briefly.
My poject dedicates to provide users opportunity to create any kind of documents by themselve. Users can create a new type of document. Each new type consists of its attributes. Then a user of this application can create a concrete document based on its type. User can also send the document for approval. An approval flow is different for each types.
So, we have the following models:
DocumentType/ DocumentTemplate - acts as a template based on which
concrete documents are created. It has one to many relationship with
Document.
DocumentsAttribute - represents an attribute of document.
It has many to many relationship with DocumentType.
AttributeValue - when a concrete document is created, It looks at
its type and creates values for attributes, which has
its type. Many to many relationship with Document and Attribute.
Document - represents a concrete document that is created by users.
There are others models but I don't think that they make sense.
As you understand, here I apply Entity Attribute Value (EAV) pattern of data model. You can see a diagram that shows relationships in the database.
And my problems are:
I have a lot of entities in my model besides I have described.
I think that Document is definitely an aggregate root in my Domain. Because such things as ApprovalProcess which is aggregate cannot live out of it.
Here is the first question:
ApprovalProcess consists of its steps. Each step is an entity since it is mutable. A step has its state that can be changed. ApprvalProcess's state depends on its steps. Here we have a business invariant: "ApprovalProcess can be approved only if all its steps is approved".
I think that it is an aggregate root because it has the business invariant and contains entities that cannot live out of it. And we don't want to allow to have direct access to its steps in order to keep ApprovalProcess consistent.
Am I mistaken that ApprovalProcess is an aggregate root? May it is just an aggregate?
Can one aggregate root exist within another one as it's part? Does it mean that ApprovalProcess is just aggregate because Document is responsible for access to its parts? But when ApprovalProcess's step is approved, Document delegates an operation to ApprovalProcess.
For example:
Document doc = new Document(...);
doc.SendForAooroval(); //ApprovalProcess is created.
doc.ApproveStep(int stepId); // Inside the method Document delegates responsibility for approvement to ApprovalProcess.
Or I should leave Document and ApprovalProcess separately. Hence Document is going to refer to ApprovalProcess by Identity. And we have the following scenario:
Document doc = documentRepository.Get(docId);
doc.SendForAooroval();// A domain event "DocumentCreatedEvent" is raised.
DocumentCreatedEventHandler:
ApprovalProcess approvalProcess = new ApprovalProcess(event.DocId); // ApprovalProcessCreatedEvent is raised
approvalProcessRepository.Add(approvalProcess);
approvalProcessRepositroy.UnitOfWork.Save(); //commit
But if ApprovalProcess's state changes, Document's state also changes. ApprovalProcess is approved then Document is also approved. Another word ApprovalProcess is kind of part of Document's state. Only thanks to it we can know that Document is approved.
And the biggest problem that I'm experiencing:
DocumentType is also an aggregate root. It consists of its attributes and ApprovalScheme. I haven't mentioned ApprovalScheme yet on purpose to keep my explanation as simple as possible. ApporvalScheme consists also from some entities. It's just an approval flow for DocumentType. ApprovalProcess is created according to ApprovalScheme of DocumentType which has Document. ApprovalScheme cannot exist without DocumentType. One to one relationship.
Document refers by identity to its DocumentType. Is it correctly?
At the begining of this task I thought that DocumentType should be a part of Document.
DocumentType has many Documents but in my domain It doesn't make sense. It doesn't represent the state of DocumentType. DocumentType can be marked as deleted but can't be deleted.
Document and DocumentType are two different aggregate roots. Am I right?
Thank you so much If you read it. Thank you a lot for you attention and help!
Sorry for my terrible English.

Am I mistaken that ApprovalProcess is an aggregate root? May it is
just an aggregate? Can one aggregate root exist within another one as
it's part?
These questions doesnt make any sense to me. An aggregate is a group of entities and value objects, where one of the entities is the parent of the group. The aggregate root is the parent entity of an aggregate. A particular case is when the aggregate is just an entity. The entity alone is an aggregate and the entity is the aggregate root of course.
I think that I would try to model your problem from another point of view: as a state machine.
I see ApprovalProcess as a flow a document follows, not as an entity. I don't know the flow diagram of the process, but I guess that what you call "steps" would be the "states" a document can have during the process, and you have transitions between steps, so that first when you create a new document, it is at a starting step, and through the lifetime of the document, it pass from a step to another, till it reaches a final step (e.g. document approved).
So the document entity would have behaviour that changes its a state.
For example, in Java you can implement the state pattern (a state machine) with enums.

How to model an entity's current status in DDD

I am trying to get to grips with the ideas behind DDD and apply them to a pet project we have, and I am having some questions that I hope that someone here would be able to answer.
The project is a document management system. The particular problem we have regards two notions that our system handles: That of a Document and that of a DocumentStatus.
A Document has a number of properties (such as title, author, etc). Users can change any of the Document's properties through out its life time.
A Document may be, at any time, be at a particular state such as NEW, UNDER_REVISION, REVISED, APPROVED etc. For each state we need to know who made that change to that state.
We need to be able to query the system based on a document status. An example query would be "Get me all documents that are in the REVISED state".
"Get me all documents whose status has been changed by user X"
The only time that a Document and a DocumentStatus need to be changed in the same transaction is when the Document is created (create the document and at the same time assign it a status of NEW).
For all other times, the UI allows the update of either but not both (i.e. you may change a document's property such as the author, but not its state.) Or you can update its state (from NEW to UNDER_REVISION) but not its properties.
I think we are safe to consider that a Document is an Entity and an Aggregate Root.
We are buffled about what DocumentStatus is. One option is to make it a Value Object part of the Document's aggregate.
The other option is to make it an Entity and be the root of its own aggregate.
We would also liked to mention that we considered CQRS as described in various DDD documents, but we think it is too much of a hassle, especially given the fact that we need to perform queries on the DocumentStatus.
Any pointers or ideas would be welcomed.

Domain
You say you need to be able to see past status changes, so the status history becomes a domain concept. A simple solution would then be the following:
Define a StatusHistory within the Document entity.
The StatusHistory is a list of StatusUpdate value objects.
The first element in the StatusHistory always reflects the current state - make sure you add the initial state as StatusUpdate value object when creating Document entities.
Depending on how much additional logic you need for the status history, consider creating a dedicated value object (or even entity) for the history itself.
Persistence
You don't really say how your persistence layer looks like, but I think creating queries against the first element of the StatusHistory list should be possible with every persistence mechanism. With a map-reduce data store, for example, create a view that is indexed by Document.StatusHistory[0] and use that view to realize the queries you need.

If you were only to record the current status, then that could well be a value object.
Since you're composing more qualifying - if not identifying - data into it, for which you also intend to query, then that sounds to me as if no DocumentStatus is like another, so a value object doesn't make much sense, does it?
It is identified by
the document
the author
the time it occurred
Furthermore, it makes even more sense in the context of the previous DocumentStatus (if you consider more states than just NEW and UNDER_REVISION).
To me, this clearly rules out modeling DocumentStatus as a value object.
In terms of the state as a property of DocumentStatus, and following the notion of everything is an object (currently reading David West's Object Thinking), then that could of course be modeled as a value object.
Follows How to model an entity's current status in DDD.

Domain Driven Design and local identity in an aggregate

In Domain Driven Design there is an Aggregate Root that has reference to internal entities.
Aggregate Root is an entity with global identity (everyone able to use its id). Aggregate root has links to local objects (entities).
Assuming here that Entities are the Hibernate #Entities (let's say)
Let's say we have Aggregate Root "User" that has "Address" entity in it as an object (which is actually an entity as well)
The question is:
How is it possible to make local entities to be with local identity only. I mean, there is no any barriers that could prevent anyone to use local entities (like Address) by its IDs. (so then this identity is not local at all, but global). Then, what is the way to make it local?

Well i don't think this is a matter of a public field or property or some access restriction mechanism, the way i see it "local identity" means that objects outside of the aggregate boundary can't use that local identity in a meaningful or useful way (e.g. they can't use that identity to retrieve that object or persist it to the database or any other operation). That identity doesn't mean anything to the outside world and it is only unique within that aggregate. Another example, what guarantees you that objects outside of an aggregate boundary won't hold references to objects within (which violates one of the principles of aggregates), well nothing unless those objects are VALUE OBJECTS which might not be the case every time. If i want to put that in a few words: Don't create any public APIs that use identities of objects within an aggregate , this way you will make it clear to the developer not to use those IDs.

All entities, including the root, have an identity. The fact that only the identity of the aggregate root should be used "globally" is something that cannot be easily enforced by the code itself. In a relational database in particular, every table record will have some key, regardless of whether that record stores an aggregate root, and entity or a value object. As such, it is up to the developer to discern which database identities are part of the domain and which are not.

Entities within an aggregate root are supposed to only have local identity. For all intents and purposes the database table need not have a primary key. When the aggregate is hydrated the entities within the AR should be fetched based on their link to the AR. But even that FK need not be represented in the local entity since the connection is obvious based on the containment of the local entities with the AR.
Since most database systems will moan if there is no PK on a table so you could add one for the sake thereof but you can just ignore it in your entity design. So there would be no property for the PK in the entity. The only way someone could then get to that entity is by way of the DB since there should be no way in your code to do so.

Security question: how to secure Hibernate collections coming back from client to server?

I've got a simple pojo named "Parent" which contains a collection of object "Child".
In hibernate/jpa, it's simply a one-to-many association, children do not know their parent: these Child objects can have different type of Parent so it easier to not know the parent (think of Child which represents Tags and parents can be different object types which have tags).
Now, I send my Parent object to the client view of my web site to allow user to modify it.
For it, I use Hibernate/GWT/Gilead.
My user mades some changes and click the save button (ajax) which sends my Parent object to the server. fields of my parent has been modified but more important, some Child objects has been added or deleted in the collection.
To summary, when Parent object comes back to server, it now has in its collection:
- new "Child" objects where id is null and need to be persist
- modified "Child" objects where id is not null and need to be merge
- potentially hacked "Child" objects where id is not null but are not originally owned by the Parent
- Child objects missing (deleted): need to be deleted
How do you save the parent object (and its collection) ? do you load the parent collection from database to compare each objects of the modified collection to see if there is no hacked item ?
Do you clear the old collection (to remove orphan) and re add new child (but there is some Child that has not been modified) ?
thanks
PS: sorry for my english, I hope you have understand the concept ;)

Something in your stack has to supply the logic you are talking about, and given your circumstances it is probably you. You will have to get the current persisted state of the object by reading from your datasource so you can do the comparison. Bear in mind that, if several legitimate actions can update your parent object and its collection simultaneously you will have to take great care over defining your transaction grain and the thread-safe nature of your code.
This is not a simple problem by any means and there may well be framework features that can assist, but I am yet to find something which has solved this for any real world implementation I have encountered, especially where I have logic which tried to distinguish between legitimate and "hacked" data.
You may consider altering your architecture such that the parent and children are persisted in separate actions. It may not be appropriate in your case but you might be able to have a finer grain of transaction by splitting up the persistence actions and provide child-oriented security which makes your problem of hacking a little more manageable.
Good luck. I recommend you draw a detailed flow chart of your logic before you do too much coding.

The best solution I've found is to manage a DTO, manually created. The DTO sends only needed datas to the client. For each fields I want to set in ReadOnly mode, I calculate a signature based on a secret key that I send to client with my dto.
When my DTO comes back to server, I check the signature to be sure that my read only fields have not changed (recalculate the signature with coming back fields and compare it to the signature coming back with dto)
It allows me to specify read only fields and be sure that my objects are not hacked.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string