I have an scenario on which I am merging an object with a to-many relationship. My privacy policy is mergeByPropertyObjectTrumpMergePolicyType and I have also tried overwriteMergePolicyType. When the object gets merged, instead of taking the value of the relationships of the object in-memory and replacing the previous values, it just adds them to the set.
So far, I've managed to do some manual house-keeping to delete the old objects but this is far from ideal. Is there any way I can replace the set in disk exactly with the the set in memory and not have to clean it manually?
Related
I have created a webpage for doctors where there are doctors, patients, and diagnoses stored in the database. There is an option to delete either of these entries (e.g. delete a doctor instance), but in, order to avoid data loss, I want to archive the data instead of deleting it entirely. I have found 2 options to do these:
1 - Keep an archive collection (e.g. doctors-archive, patients-archive) to move the entry there from the original collection.
2 - Keep an attribute isDeleted inside the original collection, so when an entry is deleted, isDeleted will become true, and when fetching the entries with isDeleted=true will not be returned.
Both of these options have their drawbacks - 1st option makes it hard to keep relations, as the patient and doctor have relations and if one of them is deleted the relation will be lost. The second option makes the original collection too heavy as the data will never be removed from it.
Is there a better option than these 2 to store archived data? If not, which of these options are better?
It's a very good discussion. I had the same confusion in one of my projects, and after a long research I found a solution that fit well my needs.
First of all, I found that the suppression is not reputed, developers avoid this action to avoid data loss wrongly.
Furthermore, the suppression of data has another disadvantages:
It will lead to having more data into your system, because you will create new models & collections (e.g., doctors-archive model for doctors model).
This will have a significant impact on the performance of the system: the suppression will be done after 3 (and may be more) queries:
find(): to select the object to be deleted.
insert(): to create an archive instance for the object to be deleted.
delete(): to delete the object.
Note that we can not use the findOneAndDelete() method since we have to create an archive instance before !
Contrariwise, the second option can be done by 1 query: findAndModify().
But, the best solution is not the first one, nor the second one ! But a combination of both of them: it consists of cleaning your data after a specific period (every 4 months for example).
In other worlds, when deleting an instance, you apply the 2nd solution (i.e., isDeleted = true), but after 4 months, you can delete this instance from the collection and keep a copy in the archive (also called backup). This solution will prevent the original collection from being too heavy.
Note that you can also have a separately backup database to avoid use of the originale database.
Both of these options have their drawbacks - 1st option makes it hard to keep relations, as the patient and doctor have relations and if one of them is deleted the relation will be lost. The second option makes the original collection too heavy as the data will never be removed from it. => What kind of relationship are you talking about?
If you still need a document post it is archived there has to be a middle ground there and by the why in those scenarios the data isn't lost you still have the power to do $lookup.
Based on my experience the option 2 has way too many negatives, it leads often to range index scans and on top of it you are maintaining extra burden for the indexes too where isDeleted will become part of every index you create.
TLDR; Definitely option 1 and if you need to query on archived content then use $lookup.
There are some techniques for mongodb which you can use to manage data growth in MongoDB:
Using capped collection
Using TTLs
Using multiple collections for months( rename old one and keep data in new one only)
After adding a new Core Data model version to my app, I performed a lightweight migration, apparently successfully. The migrated file loaded fine, but upon the first attempt to access an attribute via a particular relationship, the app crashes with an NSRangeException: '*** -[__NSArrayM objectAtIndex:]: index 4294967295 beyond bounds [0 .. 35]'. This relationship worked fine prior to the migration. I know from other posts here that 4294967295 is really -1, but the only thing I can identify with 36 items in my app/data is that there are 36 total entities in the data model (for reference, the relationship that's being fetched has 58 items in its table).
The question:
My question is: based on the error I'm getting and the troubleshooting I've done below, is there a type of schema change that could pass the lightweight migration, but corrupt the data along the way, leading to the noted exception? I'm going to try breaking down the migration into smaller chunks over several versions to either isolate or avoid the issue, but it would be nice to be able to focus on specific schema changes that might be at fault.
The failure:
The failure occurs with the following code in "myobject":
[[self object2] text];
The object2 relationship is to-one, non-optional both ways and neither the forward nor inverse relationship was changed between data models. The text attribute is likely not relevant because when the error occurs, awakeFromFetch is not reached in object2. If I assign [self object2] to a variable prior to the above statement, the assignment is successful and reports data: <fault>.
The database:
Looking at the database in sqlite3, I notice the following:
The index values for the forward and inverse relationships appear to be correct in each table.
The object2 table has two columns for the inverse relationship instead of the one prior to migration (ZMYOBJECT as before and the additional Z2_MYOBJECT, which is empty for all rows). No other relationship were added to explain this column.
In the Z_PRIMARYKEY table, all entries post-migration show -1 for Z_MAX, whereas prior to migration they showed zero for empty tables and the maximum row number for populated tables. Manually updating Z_MAX to the proper values did not help with the exception. All Z_SUPER values were correct.
I set up a mapping model to see if anything looked awry with the automatic mappings, but everything looked fine.
Overall schema changes:
In the source version of the data model, there were fourteen entities, of which only four had been populated with data (the app is still in development). Seven were top-level entities and seven were sub-entities of three of the top-level entities.
In the target version of the data model, twenty-two entities were added, some top-level and some sub-entities, with dozens of relationships, including some added to existing entities.
Some attributes and relationships were removed from existing entities and others were added. No data types or relationship settings were changed, no attributes or relationships were renamed, and no special mappings were required.
Update (2/25/12): As I started working on a new intermediate model, I remembered that I had changed the class (representedClassName) for a number of entities from NSManagedObject to an NSManagedObject subclass, but hadn't generated the class files. I didn't suspect that would cause an issue and, indeed, creating all of the class files did not help with the exception. I just wanted to note that as another change between models.
Conclusions:
This is a wild guess, but if the 36 entity count is not a coincidence, it seems that when "myobject" attempts to fault in "object2" it does not have a valid reference for the table and is attempting to load table number -1, causing the exception. The fact that a simple assignment of [self object2] is successful, however, doesn't jibe with that conclusion.
Any ideas?
By working through several incremental migrations I was able to determine what is causing the issue, and a solution.
The problem:
One of the existing entities with data has no child entities in the current model. If I create a new model that simply adds a child entity, containing no attributes or relationships, and makes no other changes, the NSRangeException, Z_MAX observation, and doubling of the inverse relationship noted in my question all occur.
The solution:
After observing the failures following a "successful" lightweight migration for the case above, I created a mapping model. Since the only change was one additional entity, all but one of the entity mappings were straightforward. The question was what to do with the single added entity.
By default, the added entity with no attributes or relationships of its own was showing attribute and relationship mappings for all of the parent's properties. All of the mappings had empty value expressions by default, which I assumed meant that it would just skip them during the migration. Not true, apparently. By deleting all of the attribute and relationship mappings within the entity mapping and then turning off inferred mapping, the migration proceeded successfully.
I still have to tackle all of the remaining entities and will be trying this approach to do the rest in bulk, with all planned attributes and relationships intact.
Your posts were helpful when I encountered this problem. Thank you. [Have you reported the bug yet?]
Here are some more experimental results but, alas, not a great solution.
My schema change similarly added an entity subtype that has no additional attributes or relationships. The error message is the same as yours except the bounds are [0 .. 19]. That does correspond to 20 entity types, validating your hypothesis. Like your situation, the error happened when attempting to access an entity property after migration completed.
Adding a dummy attribute and a dummy self-relationship to the new entity type didn't avoid the post-migration crash. (However, I didn't test with that new entity type as the only schema change since I previously pushed that schema change to alpha testers.)
I observe the Z2_MYOBJECT column and Z_PRIMARYKEY.Z_MAX = -1 symptoms after successful migrations for other schema changes, so those may not be problematic at all. The -1 values get replaced lazily by the proper max values. The extra column might be used during migration.
In my case, the new entity's supertype has an ordered to-many relationship. In the very simple case where the entire data store contains just one object instance (an instance of that entity type with no outgoing relationship links), the schema migration succeeds. It does have the extra Z2_MYOBJECT column and Z_PRIMARYKEY.Z_MAX = -1 values and yet the resulting data store works fine when adding objects from there.
I tried creating a mapping model but was unsuccessful in getting Core Data to apply it. Turning off inferred mapping just made Core Data unable to migrate at all. Is there a trick to it? Do I have to write custom migration code to invoke a mapping model? This is Xcode 4.6.2 so the older bug is long gone.
When using git to roll the code & data model backwards or forwards to conduct an experiment, it seems to be necessary to (1) close & reopen the Xcode project and (2) do a clean build. Otherwise Xcode may crash and/or leave confounding state around.
To experimentally roll backwards, you must delete the .momd/ directory or the entire app from the target iOS simulator/device (or deploy the app via iTunes or TestFlight) since redeploying via Xcode won't remove obsolete files (like .mom and .omo data model definitions) which in turn lets the app do lightweight migrations that the actual deployed app can't do.
About the entity mapping to use for the added entity type, note that when Core Data applies a mapping model, it's copying entities from the old data store to a new one. It's not modifying the tables in place. You don't want it to "skip" properties (including inherited properties) unless you want to drop them.
However, since the schema change added an entity type, that entity has no instances to migrate so its custom mapping model rules do not matter.
Thus I wonder if something else caused your crashes to stop, like leftover experimental .mom files or custom migration code. Did your workaround hold up?
After 2 days of experimenting I decided my alpha testers would have to live without data migration this time. Fortunately this happened without production customers. But it doesn't give me confidence in Core Data.
I had the same sort of NSRangeException after adding a core data model version when accessing any instance of a particular entity after automatic lightweight migration. In my case also the range corresponded to the number of entities in my model.
I generated a mapping model with Xcode 4.6 (4H127) using File > New > File... and then selecting Core Data > Mapping Model. This caused the crash to (d)evolve into -[NSSymbolicExpression length]: unrecognized selector sent to instance...
Solution
The issue in my case was that my entity causing the original crash had a relationship named size, which is a reserved word listed in apple's Predicate Programming Guide. An examination of the mapping model revealed that the reserved word had been capitalized in the Value Expression for the relationship:
FUNCTION($manager, "destinationInstancesForEntityMappingNamed:sourceInstances:" , "PNSizeOptionToPNSizeOption", $source.SIZE)
I found the solution in Core Data Model Versioning and Data Migration Programming Guide:
Reserved words in custom value expressions: If you use a custom value
expression, you must escape reserved words such as SIZE, FIRST, and
LAST using a # (for example, $source.#size).
Unfortunately, Xcode's algorithm for generating the mapping model did not recognize the reserved word and I had to change the expression's key path in the Relationship Mapping inspector to $source.#size. This solved the problem. I assume that core data's inferred mapping model ran into a similar problem during lightweight migration.
There may be other causes of this kind crash and so this solution may not apply, but it may be worth checking the property names in your model against the list of reserved words in the Predicate Programming Guide.
I would like to make a backup copy of my Core Data database, without using either the File Manager to make a copy of the sqlite file, or using the Persistent Store Coordinator's migratePersistentStore method (for reasons that are too long to explain here). What I want to do is to open a new persistent store with the same MOMD as the original file, create a new Managed Object Context, then iterate over all the objects in my database and insert them into the new context.
This will work for simple entities, but the problem is that my model has about 20 entities, many of them with one-to-many and many-to-many relationships. The slightly more complex solution would be to insert every object into the new MOC, and then hold all the new Managed Objects in memory and use them to tie up all the relationships between the objects in a subsequent passes. But it seems like a really messy solution.
Is there a clean, elegant way to achieve this, that might work for any kind of data model, not just a customized solution for my own model, and without having to hold every object in memory at the same time?
Thanks.
Copying the persistent store is far and above the easiest way to do this – I'd suggest revisiting your reasons against it or explaining what they are.
Copying objects from one context to another – from one on-disk persistent store to another – doesn't necessarily hold them all in memory at the same time. Core Data can turn them into faults.
Does anyone have an example of how to efficiently provide a UITableView with data from a Core Data model, preferable including the use of sections (via a referenced property), without the use of NSFetchedResultsController?
How was this done before NSFetchedResultsController became available? Ideally the sample should only get the data that's being viewed and make extra requests when necessary.
Thanks,
Tim
For the record, I agree with CommaToast that there's at best a very limited set of reasons to implement an alternative version of NSFetchedResultsController. Indeed I'm unable to think of an occasion when I would advocate doing so.
That being said, for the purpose of education, I'd imagine that:
upon creation, NSFetchedResultsController runs the relevant NSFetchRequest against the managed object context to create the initial result set;
subsequently — if it has a delegate — it listens for NSManagedObjectContextObjectsDidChangeNotification from the managed object context. Upon receiving that notification it updates its result set.
Fetch requests sit atop predicates and predicates can't always be broken down into the keys they reference (eg, if you create one via predicateWithBlock:). Furthermore although the inserted and deleted lists are quite explicit, the list of changed objects doesn't provide clues as to how those objects have changed. So I'd imagine it just reruns the predicate supplied in the fetch request against the combined set of changed and inserted records, then suitably accumulates the results, dropping anything from the deleted set that it did previously consider a result.
There are probably more efficient things you could do whenever dealing with a fetch request with a fetch limit. Obvious observations, straight off the top of my head:
if you already had enough objects, none of those were deleted or modified and none of the newly inserted or modified objects have a higher sort position than the objects you had then there's obviously no changes to propagate and you needn't run a new query;
even if you've lost some of the objects you had, if you kept whichever was lowest then you've got an upper bound for everything that didn't change, so if the changed and inserted ones together with those you already had make more then enough then you can also avoid a new query.
The logical extension would seem to be that you need re-interrogate the managed object context only if you come out in a position where the deletions, insertions and changes modify your sorted list so that — before you chop it down to the given fetch limit — the bottom object isn't one you had from last time. The reasoning being that you don't already know anything about the stored objects you don't have hold of versus the insertions and modifications; you only know about how those you don't have hold of compare to those you previously had.
We have a decent sized object-oriented application. Whenever an object in the app is changed, the object changes are saved back to the DB. However, this has become less than ideal.
Currently, transactions are stored as a transaction and a set of transactionLI's.
The transaction table has fields for who, what, when, why, foreignKey, and foreignTable. The first four are self-explanatory. ForeignKey and foreignTable are used to determine which object changed.
TransactionLI has timestamp, key, val, oldVal, and a transactionID. This is basically a key/value/oldValue storage system.
The problem is that these two tables are used for every object in the application, so they're pretty big tables now. Using them for anything is slow. Indexes only help so much.
So we're thinking about other ways to do something like this. Things we've considered so far:
- Sharding these tables by something like the timestamp.
- Denormalizing the two tables and merge them into one.
- A combination of the two above.
- Doing something along the lines of serializing each object after a change and storing it in subversion.
- Probably something else, but I can't think of it right now.
The whole problem is that we'd like to have some mechanism for properly storing and searching through transactional data. Yeah you can force feed that into a relational database, but really, it's transactional data and should be stored accordingly.
What is everyone else doing?
We have taken the following approach:-
All objects are serialised (using the standard XMLSeriliser) but we have decorated our classes with serialisation attributes so that the resultant XML is much smaller (storing elements as attributes and dropping vowels on field names for example). This could be taken a stage further by compressing the XML if necessary.
The object repository is accessed via a SQL view. The view fronts a number of tables that are identical in structure but the table name appended with a GUID. A new table is generated when the previous table has reached critical mass (a pre-determined number of rows)
We run a nightly archiving routine that generates the new tables and modifies the views accordingly so that calling applications do not see any differences.
Finally, as part of the overnight routine we archive any old object instances that are no longer required to disk (and then tape).
I've never found a great end all solution for this type of problem. Some things you can try is if your DB supports partioning (or even if it doesn't you can implement the same concept your self), but partion this log table by object type and then you can further partion by date/time or by your object ID (if your ID is a numeric this works nicely not sure how a guid would partion).
This will help maintain the size of the table and keep all related transactions to a single instance of an object to itself.
One idea you could explore is instead of storing each field in a name value pair table, you could store the data as a blob (either text or binary). For example serialize the object to Xml and store it in a field.
The downside of this is that as your object changes you have to consider how this affects all historical data if your using Xml then there are easy ways to update the historical xml structures, if your using binary there are ways but you have to be more concious of the effort.
I've had awsome success storing a rather complex object model that has tons of interelations as a blob (the xml serializer in .net didn't handle the relationships btw the objects). I could very easily see myself storing the binary data. A huge downside of storing it as binary data is that to access it you have to take it out of the database with Xml if your using a modern database like MSSQL you can access the data.
One last approach is to split the two patterns, you could define a Difference Schema (and I assume more then one property changes at a time) so for example imagine storing this xml:
<objectDiff>
<field name="firstName" newValue="Josh" oldValue="joshua"/>
<field name="lastName" newValue="Box" oldValue="boxer"/>
</objectDiff>
This will help alleviate the number of rows, and if your using MSSQL you can define an XML Schema and get some of the rich querying ability around the object. You can still partition the table.
Josh
Depending on the characteristics of your specific application an alternative approach is to keep revisions of the entities themselves in their respective tables, together with the who, what, why and when per revision. The who, what and when can still be foreign keys.
Although I would be very careful to use this approach, since this is only viable for applications with a relatively small amount of changes per entity/entity type.
If querying the data is important I would use true Partitioning in SQL Server 2005 and above if you have enterprise edition of SQL Server. We have millions of rows partitioned by year down to day for the current month - you can be as granular as your application demands with a maximum number of 1000 partitions.
Alternatively , if you are using SQL 2008 you could look into filtered indexes.
These are solutions that will enable you to retain the simplified structure you have whilst providing the performance you need to query that data.
Splitting/Archiving older changes obviously should be considered.