Simply put, I'm trying to update a list of related entities; they have composite primary keys and an additional constraint (a sort of ordinal column should be unique, something like RowNumber). Now, when making more complex edits to the collection, I can't just issue updates to each row because the constraint will be violated - reordering two entities with consecutive RowNumbers, for example (let's put aside the validity of that constraint for now, that's not the point).
So I thought, well in this case it's easy, I'll just delete the entities and recreate them. Except that EF seems to detect that the same entity is inserted after it's deleted (based on its primary key which stays the same), so it transforms the delete+insert into an update. At least that's what I'm seeing with the profiler after making absolutely certain the old and new entities are in the deleted and added states, respectively.
Normally I'd say that's a pretty logical thing to do but... Is there any way to alter this behavior? Is this really what's happening or am I missing something?
In the end I was forced to assume this is indeed what's happening, and it sort of makes sense as an optimization... sort of. I changed my DB model to always use surrogate keys (that's what EF works best with anyway) and this prevents it identifying the deleted and inserted entities as being the exact same thing, which means the delete+insert is not replaced with an update. I suppose that's not too bad, even if it does mean the surrogate key will keep increasing with each such edit.
Related
I have the two Aggregates 'notebook' and 'note'.
When I use the role 'aggregates reference only by there ids', I think I have two options:
Notebook(List<NoteId>, [other properties])
Note([other properties])
or
Notebook([other properties])
Note(NotebookId, [other properties])
With the first option, I need two DB calls to show all notes of a notebook (one to get the list and the second to load the notes).
So my current favorite is the second option. Now I have few options in my mind to save the order of the notes, where anyone has some disadvantages.
What is a good approach to solve my problem? Or is the first option better and the two DB calls are negligible?
Can anybody help?
Big THX
It looks that the order of the Notes is important, at least related to the Notebook, so maybe it should be part of the domain. If yes, I would suggest to store it together with the Note. Or use some other information of the Note to give an ordering when a list is loaded.
If not, why is the order relevant? I mean, the two entities have a related but separated lifecycle, or at least it looks: one aggregate - the Notebook - has a list that only references the other - the Note. Hence no direct interaction is planned. But, given the the domain is correctly modelled (there's not enough information to say something about it), somewhere you need a ordered list of Notes. The only way to have it as you need it is to store the information (or use one already stored), otherwise the hypothesis (order is relevant) is not valid anymore.
update after infos about number of Notes and their size
It looks that your domain is organized in this way:
a root entity, the Notebook, where the order of each Note, with only its ID, is also stored: any change in the order will be updated from here, not from the Note
another root entity, the Note, with its own lifecycle and its own 'actions' (operations that trigger a change in the entity)
Whenever you load the Notebook, you must load also the Note and it's order to show it correctly ordered. On the other side, when you change the order, this structure allows you to have a single action (or operation) on the Notebook, for example changeOrder(NoteId), that updates the order of the given Note and, if needed, changes the order of all the others. The trick, here, is that when you persist the Notebook you work just with the ID of the Note, so you don't have to load all the entity, but just a part of it, update and save it again. So, how big is the Note entity is not important, because you don't use it all. Hence, at every change you could trigger an update of all the couples (NoteID, order) for that Notebook. You can't do differently. But, to support this you need a single function in the repository where you load the ID of the Note and its order and you save it again; that should be not so expensive.
On the other side, all the actions that operate directly on the Note should load it, hence you have to load all. But in this case is required to load all, and save all, because you are changing the Note itself.
Anyway, the way you persist the order is totally demanded to the persistence layer, that is built over the domain. I mean, the domain has a Notebook and a set of Notes with order 1, 2, 3, etc.
Even if I don't think that this needs such a complex solution, you could use a totally differen way to store the order: you can use for example steps of 100 (so 100, 200, 300, etc): each new Note is put in the middle of the old two ones, and is the only one to be saved each time. Every since a while you run a job, or something else, that just normalizes all the values restoring the 100 steps (or whatever you use to persit the order). As I said, this looks an overcomplicated solution to the problem, but it also shows the fact that the entities of the domain could be totally different from the Persitence ones.
After adding a new Core Data model version to my app, I performed a lightweight migration, apparently successfully. The migrated file loaded fine, but upon the first attempt to access an attribute via a particular relationship, the app crashes with an NSRangeException: '*** -[__NSArrayM objectAtIndex:]: index 4294967295 beyond bounds [0 .. 35]'. This relationship worked fine prior to the migration. I know from other posts here that 4294967295 is really -1, but the only thing I can identify with 36 items in my app/data is that there are 36 total entities in the data model (for reference, the relationship that's being fetched has 58 items in its table).
The question:
My question is: based on the error I'm getting and the troubleshooting I've done below, is there a type of schema change that could pass the lightweight migration, but corrupt the data along the way, leading to the noted exception? I'm going to try breaking down the migration into smaller chunks over several versions to either isolate or avoid the issue, but it would be nice to be able to focus on specific schema changes that might be at fault.
The failure:
The failure occurs with the following code in "myobject":
[[self object2] text];
The object2 relationship is to-one, non-optional both ways and neither the forward nor inverse relationship was changed between data models. The text attribute is likely not relevant because when the error occurs, awakeFromFetch is not reached in object2. If I assign [self object2] to a variable prior to the above statement, the assignment is successful and reports data: <fault>.
The database:
Looking at the database in sqlite3, I notice the following:
The index values for the forward and inverse relationships appear to be correct in each table.
The object2 table has two columns for the inverse relationship instead of the one prior to migration (ZMYOBJECT as before and the additional Z2_MYOBJECT, which is empty for all rows). No other relationship were added to explain this column.
In the Z_PRIMARYKEY table, all entries post-migration show -1 for Z_MAX, whereas prior to migration they showed zero for empty tables and the maximum row number for populated tables. Manually updating Z_MAX to the proper values did not help with the exception. All Z_SUPER values were correct.
I set up a mapping model to see if anything looked awry with the automatic mappings, but everything looked fine.
Overall schema changes:
In the source version of the data model, there were fourteen entities, of which only four had been populated with data (the app is still in development). Seven were top-level entities and seven were sub-entities of three of the top-level entities.
In the target version of the data model, twenty-two entities were added, some top-level and some sub-entities, with dozens of relationships, including some added to existing entities.
Some attributes and relationships were removed from existing entities and others were added. No data types or relationship settings were changed, no attributes or relationships were renamed, and no special mappings were required.
Update (2/25/12): As I started working on a new intermediate model, I remembered that I had changed the class (representedClassName) for a number of entities from NSManagedObject to an NSManagedObject subclass, but hadn't generated the class files. I didn't suspect that would cause an issue and, indeed, creating all of the class files did not help with the exception. I just wanted to note that as another change between models.
Conclusions:
This is a wild guess, but if the 36 entity count is not a coincidence, it seems that when "myobject" attempts to fault in "object2" it does not have a valid reference for the table and is attempting to load table number -1, causing the exception. The fact that a simple assignment of [self object2] is successful, however, doesn't jibe with that conclusion.
Any ideas?
By working through several incremental migrations I was able to determine what is causing the issue, and a solution.
The problem:
One of the existing entities with data has no child entities in the current model. If I create a new model that simply adds a child entity, containing no attributes or relationships, and makes no other changes, the NSRangeException, Z_MAX observation, and doubling of the inverse relationship noted in my question all occur.
The solution:
After observing the failures following a "successful" lightweight migration for the case above, I created a mapping model. Since the only change was one additional entity, all but one of the entity mappings were straightforward. The question was what to do with the single added entity.
By default, the added entity with no attributes or relationships of its own was showing attribute and relationship mappings for all of the parent's properties. All of the mappings had empty value expressions by default, which I assumed meant that it would just skip them during the migration. Not true, apparently. By deleting all of the attribute and relationship mappings within the entity mapping and then turning off inferred mapping, the migration proceeded successfully.
I still have to tackle all of the remaining entities and will be trying this approach to do the rest in bulk, with all planned attributes and relationships intact.
Your posts were helpful when I encountered this problem. Thank you. [Have you reported the bug yet?]
Here are some more experimental results but, alas, not a great solution.
My schema change similarly added an entity subtype that has no additional attributes or relationships. The error message is the same as yours except the bounds are [0 .. 19]. That does correspond to 20 entity types, validating your hypothesis. Like your situation, the error happened when attempting to access an entity property after migration completed.
Adding a dummy attribute and a dummy self-relationship to the new entity type didn't avoid the post-migration crash. (However, I didn't test with that new entity type as the only schema change since I previously pushed that schema change to alpha testers.)
I observe the Z2_MYOBJECT column and Z_PRIMARYKEY.Z_MAX = -1 symptoms after successful migrations for other schema changes, so those may not be problematic at all. The -1 values get replaced lazily by the proper max values. The extra column might be used during migration.
In my case, the new entity's supertype has an ordered to-many relationship. In the very simple case where the entire data store contains just one object instance (an instance of that entity type with no outgoing relationship links), the schema migration succeeds. It does have the extra Z2_MYOBJECT column and Z_PRIMARYKEY.Z_MAX = -1 values and yet the resulting data store works fine when adding objects from there.
I tried creating a mapping model but was unsuccessful in getting Core Data to apply it. Turning off inferred mapping just made Core Data unable to migrate at all. Is there a trick to it? Do I have to write custom migration code to invoke a mapping model? This is Xcode 4.6.2 so the older bug is long gone.
When using git to roll the code & data model backwards or forwards to conduct an experiment, it seems to be necessary to (1) close & reopen the Xcode project and (2) do a clean build. Otherwise Xcode may crash and/or leave confounding state around.
To experimentally roll backwards, you must delete the .momd/ directory or the entire app from the target iOS simulator/device (or deploy the app via iTunes or TestFlight) since redeploying via Xcode won't remove obsolete files (like .mom and .omo data model definitions) which in turn lets the app do lightweight migrations that the actual deployed app can't do.
About the entity mapping to use for the added entity type, note that when Core Data applies a mapping model, it's copying entities from the old data store to a new one. It's not modifying the tables in place. You don't want it to "skip" properties (including inherited properties) unless you want to drop them.
However, since the schema change added an entity type, that entity has no instances to migrate so its custom mapping model rules do not matter.
Thus I wonder if something else caused your crashes to stop, like leftover experimental .mom files or custom migration code. Did your workaround hold up?
After 2 days of experimenting I decided my alpha testers would have to live without data migration this time. Fortunately this happened without production customers. But it doesn't give me confidence in Core Data.
I had the same sort of NSRangeException after adding a core data model version when accessing any instance of a particular entity after automatic lightweight migration. In my case also the range corresponded to the number of entities in my model.
I generated a mapping model with Xcode 4.6 (4H127) using File > New > File... and then selecting Core Data > Mapping Model. This caused the crash to (d)evolve into -[NSSymbolicExpression length]: unrecognized selector sent to instance...
Solution
The issue in my case was that my entity causing the original crash had a relationship named size, which is a reserved word listed in apple's Predicate Programming Guide. An examination of the mapping model revealed that the reserved word had been capitalized in the Value Expression for the relationship:
FUNCTION($manager, "destinationInstancesForEntityMappingNamed:sourceInstances:" , "PNSizeOptionToPNSizeOption", $source.SIZE)
I found the solution in Core Data Model Versioning and Data Migration Programming Guide:
Reserved words in custom value expressions: If you use a custom value
expression, you must escape reserved words such as SIZE, FIRST, and
LAST using a # (for example, $source.#size).
Unfortunately, Xcode's algorithm for generating the mapping model did not recognize the reserved word and I had to change the expression's key path in the Relationship Mapping inspector to $source.#size. This solved the problem. I assume that core data's inferred mapping model ran into a similar problem during lightweight migration.
There may be other causes of this kind crash and so this solution may not apply, but it may be worth checking the property names in your model against the list of reserved words in the Predicate Programming Guide.
Does anyone have an example of how to efficiently provide a UITableView with data from a Core Data model, preferable including the use of sections (via a referenced property), without the use of NSFetchedResultsController?
How was this done before NSFetchedResultsController became available? Ideally the sample should only get the data that's being viewed and make extra requests when necessary.
Thanks,
Tim
For the record, I agree with CommaToast that there's at best a very limited set of reasons to implement an alternative version of NSFetchedResultsController. Indeed I'm unable to think of an occasion when I would advocate doing so.
That being said, for the purpose of education, I'd imagine that:
upon creation, NSFetchedResultsController runs the relevant NSFetchRequest against the managed object context to create the initial result set;
subsequently — if it has a delegate — it listens for NSManagedObjectContextObjectsDidChangeNotification from the managed object context. Upon receiving that notification it updates its result set.
Fetch requests sit atop predicates and predicates can't always be broken down into the keys they reference (eg, if you create one via predicateWithBlock:). Furthermore although the inserted and deleted lists are quite explicit, the list of changed objects doesn't provide clues as to how those objects have changed. So I'd imagine it just reruns the predicate supplied in the fetch request against the combined set of changed and inserted records, then suitably accumulates the results, dropping anything from the deleted set that it did previously consider a result.
There are probably more efficient things you could do whenever dealing with a fetch request with a fetch limit. Obvious observations, straight off the top of my head:
if you already had enough objects, none of those were deleted or modified and none of the newly inserted or modified objects have a higher sort position than the objects you had then there's obviously no changes to propagate and you needn't run a new query;
even if you've lost some of the objects you had, if you kept whichever was lowest then you've got an upper bound for everything that didn't change, so if the changed and inserted ones together with those you already had make more then enough then you can also avoid a new query.
The logical extension would seem to be that you need re-interrogate the managed object context only if you come out in a position where the deletions, insertions and changes modify your sorted list so that — before you chop it down to the given fetch limit — the bottom object isn't one you had from last time. The reasoning being that you don't already know anything about the stored objects you don't have hold of versus the insertions and modifications; you only know about how those you don't have hold of compare to those you previously had.
Since CouchDB does not have support for SQL alike AUTO_INCREMENT what would be your approach to generate sequential unique numeric ids for your documents?
I am using numeric ids for:
User-friendly IDs (e.g. TASK-123, RQ-001, etc.)
Integration with libraries/systems that require numeric primary key
I am aware of the problems with replication, etc. That's why I am interested in how people try to overcome this issue.
As Dominic Barnes says, auto-increment integers are not scalable, not distributed-friendly or cloud-friendly. It seems every app nowadays needs a mobile version with offline support, and that is not directly compatible with auto-increment integers. We all know this, but it's true: auto-increment integers are necessary for legacy code and arguably other stuff.
In both scenarios, you are responsible for producing the auto-incrementing integer. A view is running emit(the_numeric_id, null). (You could also have a "type" namespace, e.g. by emit([doc.type, the_numeric_id], null). Query for the final row (e.g. with a startkey=MAXINT&descending=true&limit=1, increment the value returned, and that is your next id. The attempt to save is in a loop which can retry if there was a collision.
You can also play tricks if you don't need 100% density of the list of IDs. For example, you can add timestamps to the emit() rows, and estimate the document creation velocity, and increment by that velocity times your computation and transmit time. You could also simply increment by a random integer between 1 and N, so most of the time the first insert works, at a cost of non-homogeneous ID numbers.
About where to store the integer, I think there is the id strategy and the try and check strategy.
The id strategy is simpler and quicker in the short term. Document IDs are an integer (perhaps prefixed with a type to add a namespace). Since Couch guarantees uniqueness on the _id field, you just worry about the auto-incrementing. Do this in a loop: 409 Conflict triggers a retry, 201 Accepted means you're done.
I think the major pain with this trick is, that if and when you get conflicts, you have two completely unrelated documents, and one of them must be copied into a fresh document. If there were relationships with other documents, they must all be corrected. (The CouchDB 0.11 emit(key, {_id: some_foreign_doc_id}) trick comes to mind.)
The try and check strategy uses the default UUID as the doc._id, so every insert will succeed. Ideally, all or most of your inter-document relations are based on the immutable UUID _id, not the integer. That is just used for users and UI. The auto-incrementing integer is simply a field in the document, {"int_id":20}. The view of course does emit(doc.int_id, null). (You can look up a document by integer id with a ?key=23?include_docs=true parameter of the view.
Of course, after a replication, you might have id conflicts (not official CouchDB conflicts, but just documents using the same numeric id). The view which emits by ID would also have a reduce phase: simply _count should be enough. Next you must patrol the DB, querying this view with ?group=true and looking for any row (corresponding to an integer id) which has a count > 1. On the plus side, correcting the numeric id of a document is a minor change because it does not require new document creation.
Those are my ideas. Now that I wrote them down, I feel like you must do relation-shepherding regardless of where the id is stored; so perhaps using _id is better after all. The only other downside I see is that you are permanently married to a fundamentally broken naming model—for some definition of "permanently."
Is there any particular reason you want to use numeric IDs over the UUIDs that CouchDB can generate for you? UUIDs are perfect for the distributed paradigm that CouchDB uses, stick with what is built in.
If you find yourself with any more than 1 CouchDB node in your architecture, you're going to get conflicting document IDs if you rely on something like "auto increment" when it comes time for replication. Even if you're only using 1 node now, that's probably not always going to be the case, especially since CouchDB works so well in a distributed and "offline" architecture.
I have had pretty good luck just using an iso formatted date as my key:
http://wiki.apache.org/couchdb/IsoFormattedDateAsDocId
It's pretty simple to do, human-readable and it basically builds in a few querying options by just existing. :-)
Keeping in mind the issues around replication and conflicts, you can use an update function to generate incrementing IDs that are guaranteed unique in a single master setup.
function(doc, req) {
if (!doc) {
doc = {
_id: req.id,
type: 'idGenerator',
count: 0
};
}
doc.count++;
return [doc, toJSON(doc.count)];
}
Include this function in a design document like so:
{
"_id": "_design/application",
"language": "javascript",
"updates": {
"generateId": "function (doc, req) {\n\t\t\tif (!doc) {\n\t\t\t\tdoc = {\n\t\t\t\t\t_id: req.id,\n\t\t\t\t\ttype: 'idGenerator',\n\t\t\t\t\tcount: 0\n\t\t\t\t};\n\t\t\t}\n\n\t\t\tdoc.count++;\n\t\t\t\n\t\t\treturn [doc, toJSON(doc.count)];\n\t\t}"
}
}
Then call it like so:
curl -XPOST http://localhost:5984/mydb/_design/application/_update/generateId/entityId
Replace entityId with whatever you like to create several independent ID sequences.
Not a perfect solution but something that worked for me. Create an independent service that generates auto-incremented ids. Yes, you probably say "this breaks the offline model of couchdb" but what if you get a pool of N ids that you can then use whenever you need to get a new auto-incremented id. Then every time you're online you get some more ids and if you are running out of ids you tell your users - please go online. If the pool is big enough (say the monthly traffic) this shouldn't happen. Again, not perfect but maybe can be helpful to some people.
Instead of explicitly constructing an increasing integer key, you could use the implicit index couchDB accepts for paging.
The skip parameter accepts an integer that will effectively provide the auto-incrementing index you are used to.
http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
The drawback is that it is not a viable solution for "User-friendly IDs". The index is not tied to the doc, and is subject to change if you are rewriting history.
If your only constraint is "integration with libraries/systems that require numeric primary key", this will bridge the gap without loosing the benefits of couchDB's key structure.
I write database applications for a living that the end user can customize.
Frequently, this means that--leaving the database aside for a moment--some of my notional entity types have a universe or domain that is infinite.
Take name types. You could have a first name, last name, married name, legal name, salutation name, and so on. I am not going to put an upper bound on this universe.
But I do need to find and use certain well-known name types. Let's say display name and sort name, just to keep it simple.
I would also like to be able to query for all name types (i.e. the whole universe) and have my well-known name types returned as well.
There are several strategies for accomplishing this within a database:
Have one name_type table with an id column and a code column. ID values less than a certain amount are "reserved" for use by the system; ID values higher than this are deemed to be user types.
Add a column to the id/code pair that is some representation of a boolean or an int type that indicates what type of row this is (e.g. user-defined or system). Same thing, really; just uses another column to explicitly break out the information instead of overloading it in the id.
Have two tables with perhaps a naming convention: name_type and name_type_system. It is understood or enforced that name_type_system is off-limits to users; name_type is their domain. Queries do a UNION across these tables and applications just "know" to never update the system table.
What strategies do people use? Any war stories? Any particular reasons to pick one over the other? Huge pitfalls I'm not seeing?
Best,
Laird
Of your three ideas the first is often called a Magic Number, http://en.wikipedia.org/wiki/Magic_number_(programming), and is a Bad Thing because any code that doesn't "know" about it can make mistakes. Plus you end up over time realizing, "oops, I need to push the minimum value higher, need to resequence 10,000 existing rows." headaches, headaches.
After that, either of the other two works. But the third one lets you use the DB server to deny insert/update/delete access to the account used by end-users, simplifying code.
The way to decide between option 2 and 3 is to ask, are they really 2 separate things? If they are, they will tend to have different security, different operations are performed on them, one is modified by upgrades, the other is not, etc. If they really are two different things, they go in two tables. If they are two flavors of one thing that are almost always treated the same, they go in one table with a "type" flag, option 2.