Core Data: make object-by-object copy of database - core-data

I would like to make a backup copy of my Core Data database, without using either the File Manager to make a copy of the sqlite file, or using the Persistent Store Coordinator's migratePersistentStore method (for reasons that are too long to explain here). What I want to do is to open a new persistent store with the same MOMD as the original file, create a new Managed Object Context, then iterate over all the objects in my database and insert them into the new context.
This will work for simple entities, but the problem is that my model has about 20 entities, many of them with one-to-many and many-to-many relationships. The slightly more complex solution would be to insert every object into the new MOC, and then hold all the new Managed Objects in memory and use them to tie up all the relationships between the objects in a subsequent passes. But it seems like a really messy solution.
Is there a clean, elegant way to achieve this, that might work for any kind of data model, not just a customized solution for my own model, and without having to hold every object in memory at the same time?
Thanks.

Copying the persistent store is far and above the easiest way to do this – I'd suggest revisiting your reasons against it or explaining what they are.
Copying objects from one context to another – from one on-disk persistent store to another – doesn't necessarily hold them all in memory at the same time. Core Data can turn them into faults.

Related

Core Data in-memory saving

I am building an app which uses a TabBarController and has multiple views showing the SAME data, but in different ways. One of the views is a TableView and the other is a Map view.
The data comes from a server and I would like to have a way to store this data in which it is accessible from multiple view controllers (have a "single source of truth"). I believe that Core Data is a good choice, especially because I find the NSFetchedResultsController class rather convenient to work with when dealing with table views.
The data only needs to be around while the app is being used, so I am thinking about using Core Data without actually saving anything to the disk. I saw that there exists an In-memory store type which I believe is what I need. However, I found that just by inserting a new entity into my context (not yet calling context.save()), the NSFetchedResultsController can already detect the changes and update my UI.
Question 1:
Is it really neccessary to call context.save() when using the In-memory store type?
I believe it might be necessary in the case of multiple contexts.
Question 2:
If it is not necessary to call context.save(), does it even matter what persistent store type I use?
Any help is appreciated!
Question 1: Is it really neccessary to call context.save() when using the In-memory store type?
I believe it might be necessary in the case of multiple contexts.
That's correct. If you use the common pattern of having one context for the UI and a different one to handle incoming network data, you'll need to save changes for updates from one context to be available in the other.
Question 2: If it is not necessary to call context.save(), does it even matter what persistent store type I use?
If you use an in-memory store, your persistent store type is NSInMemoryStoreType. It matters in that choosing the right store type is how you get it to be an in-memory store.
Keep in mind that using an in-memory store means that users won't be able to use the app offline in any way. Whether that's important depends on your app, but it can be useful to let people view older data when they don't have a network connection.

Delete orphaned shared ValueObjects?

I'm still trying to understand DDD. Let's say I have an immutable VO PresentationSlide. Because it is immutable it can be easily shared and used by different entities.
But what happens if I try to map this to a database. In EntityFramework I could model the PresentationSlide as a ComplexType, meaning the properties of the shared PresentationSlide are mapped to the tables using it. This is fine, but a Slide might be quite large and therefore I'm wasting space, if it used/reference several times.
As an alternative approach I could map a PresentationSlide to a separate table and reference it. Because it is immutable this should also work. But now, if I modify a Presentation, I have to copy the content of the old PresentationSlide and create a new instance. If there are a lot of changes, I will soon have a lot of orphaned PresentationSlides in my database.
Isn't this a problem? Is there a solution?
Should I implement a custom periodical 'Cleanup orphaned PresentationSlides'-Task?
First you should think about ownership and life cycle of the PresentationSlide within your domain model. Always make sure you follow your model semantics when doing performance or storage optimization.
I'd start out with duplicating each PresentationSlide within its entity. This is just the natural way to do, because that's what you do in your domain model, too.
Put metrics in place that enable you to make an informed decision about the storage optimization. E.g. go with the duplication approach and calculate the space wasted due to duplicates after some time in production.
Only if you really have a problem, do the storage optimization. Otherwise you've become victim of Premature Optimization.
If you really need to make an optimization, I think the approach you mention is a reasonable one. Create a separate table, and regularly clean up PresentationSlides that are not referenced anymore.

How can I execute a piece of code after doing an automatic lightweight migration?

I have a core data model, and I want to delete an entity in another new core data model, but when doing it (or after), I will like to execute some code (let's say, save old entities into a XML file or similar using a ObjC snippet).
I know migrating a model that only deletes an entity is easy and can be done automatically using lightweight migration, but not sure about execute some code.
Perhaps I have no choice and have to create something more complex like a mapping model, use NSRemoveEntityMappingType, subclass NSEntityMigrationPolicy and rewrite something like beginEntityMapping:manager:error: or maybe createDestinationInstancesForSourceInstance:entityMapping:manager:error: but not sure about this last one because there is no new entity, as what I want is removing it.
Any thoughts are welcome.
Thanks a lot for your time.
Ricardo.

Why do I have extraneous objects in my Core Data object?

I have an app which uses Core Data. I have defined the Entities with their respective attributes several times. Now, I pretty much have it finalized, looking like this:
I deleted the old sqlite d/b, re-ran the program which creates a new Sqlite d/b, and it looks like this (using SQLite Database Browser). The areas highlighted in yellow are the ones that don't belong there (IMHO)... how do I clear the old junk out of there when the Sqlite d/b is re-built from Core Data?
The motivation is quite simple.
When you use entity inheritance, Core Data, under the hood, creates a (relational) table that has all the properties for the parent entity as well as its child (or children).
Although this feature is very useful, you should be aware of such a mechanism to avoid performance penalties.
Anyway, you should not work with the db created for you. You should think only in terms of object graph. You will simplify your life.
Hope that helps.

What is the best way to store and search through object transactions?

We have a decent sized object-oriented application. Whenever an object in the app is changed, the object changes are saved back to the DB. However, this has become less than ideal.
Currently, transactions are stored as a transaction and a set of transactionLI's.
The transaction table has fields for who, what, when, why, foreignKey, and foreignTable. The first four are self-explanatory. ForeignKey and foreignTable are used to determine which object changed.
TransactionLI has timestamp, key, val, oldVal, and a transactionID. This is basically a key/value/oldValue storage system.
The problem is that these two tables are used for every object in the application, so they're pretty big tables now. Using them for anything is slow. Indexes only help so much.
So we're thinking about other ways to do something like this. Things we've considered so far:
- Sharding these tables by something like the timestamp.
- Denormalizing the two tables and merge them into one.
- A combination of the two above.
- Doing something along the lines of serializing each object after a change and storing it in subversion.
- Probably something else, but I can't think of it right now.
The whole problem is that we'd like to have some mechanism for properly storing and searching through transactional data. Yeah you can force feed that into a relational database, but really, it's transactional data and should be stored accordingly.
What is everyone else doing?
We have taken the following approach:-
All objects are serialised (using the standard XMLSeriliser) but we have decorated our classes with serialisation attributes so that the resultant XML is much smaller (storing elements as attributes and dropping vowels on field names for example). This could be taken a stage further by compressing the XML if necessary.
The object repository is accessed via a SQL view. The view fronts a number of tables that are identical in structure but the table name appended with a GUID. A new table is generated when the previous table has reached critical mass (a pre-determined number of rows)
We run a nightly archiving routine that generates the new tables and modifies the views accordingly so that calling applications do not see any differences.
Finally, as part of the overnight routine we archive any old object instances that are no longer required to disk (and then tape).
I've never found a great end all solution for this type of problem. Some things you can try is if your DB supports partioning (or even if it doesn't you can implement the same concept your self), but partion this log table by object type and then you can further partion by date/time or by your object ID (if your ID is a numeric this works nicely not sure how a guid would partion).
This will help maintain the size of the table and keep all related transactions to a single instance of an object to itself.
One idea you could explore is instead of storing each field in a name value pair table, you could store the data as a blob (either text or binary). For example serialize the object to Xml and store it in a field.
The downside of this is that as your object changes you have to consider how this affects all historical data if your using Xml then there are easy ways to update the historical xml structures, if your using binary there are ways but you have to be more concious of the effort.
I've had awsome success storing a rather complex object model that has tons of interelations as a blob (the xml serializer in .net didn't handle the relationships btw the objects). I could very easily see myself storing the binary data. A huge downside of storing it as binary data is that to access it you have to take it out of the database with Xml if your using a modern database like MSSQL you can access the data.
One last approach is to split the two patterns, you could define a Difference Schema (and I assume more then one property changes at a time) so for example imagine storing this xml:
<objectDiff>
<field name="firstName" newValue="Josh" oldValue="joshua"/>
<field name="lastName" newValue="Box" oldValue="boxer"/>
</objectDiff>
This will help alleviate the number of rows, and if your using MSSQL you can define an XML Schema and get some of the rich querying ability around the object. You can still partition the table.
Josh
Depending on the characteristics of your specific application an alternative approach is to keep revisions of the entities themselves in their respective tables, together with the who, what, why and when per revision. The who, what and when can still be foreign keys.
Although I would be very careful to use this approach, since this is only viable for applications with a relatively small amount of changes per entity/entity type.
If querying the data is important I would use true Partitioning in SQL Server 2005 and above if you have enterprise edition of SQL Server. We have millions of rows partitioned by year down to day for the current month - you can be as granular as your application demands with a maximum number of 1000 partitions.
Alternatively , if you are using SQL 2008 you could look into filtered indexes.
These are solutions that will enable you to retain the simplified structure you have whilst providing the performance you need to query that data.
Splitting/Archiving older changes obviously should be considered.

Resources