Can I use NSBatchDeleteRequest on entities with relationships that have delete rules?

Can I use NSBatchDeleteRequest on entities with relationships that have delete rules? - core-data

I'm attempting to use NSBatchDeleteRequest to delete a pile of entities, many of these entities have delete cascade and/or nullify rules.
My first attempt to delete anything fails and the NSError I get back includes the string "Delete rule is not supported for batch deletes". I had thought it was fine to delete such things but i was responsible for making sure all the constraints are satisfied before I do a save.
Should I be able to batch delete these managed objects? (I want to keep the delete rules, other delete paths don't have an easy way to know what set of objects to delete) Do some kinds of batch deletes work in this case, but others not? (say predicates fail, but a list of object IDs work?)

Batch delete is problematic with relationships.
It goes directly to the database and deletes the records suspending all object graph rules, including the delete rules. You have correctly identified the requirement that you need to do all the constraint checking yourself again. (That by itself could be a deal-breaker.)
Even if you manage to delete the entities and all the necessary related entities correctly, you will still be left with lots of entries in the (opaque) join table Core Data creates in the background. There is no obvious safe way to delete the entries in the join tables and they have been reported to interfere with managing relationships in future operations.
IMO , the solution in this case is to still use the object graph rather than batch delete and optimize for performance. There are many good answers on SOF on how to do this, but most of it can be summarized with these points:
find the right batch size for saving (typically 500 entities for creation, about 2000 for deletion, but this could vary according to object size and relationship complexity - you have to experiment).
if you have memory constraints, use autoreleasepools.
use a background context to free the UI for interaction. I prefer to do the saving to the database in the background after updating the UI.

I just wrote a simple Department-Employee (one-to-many) demo project. The delete rule of Empolyee's department relationship is set to cascade.
When using batch deletes to delete a department with two employees, the number of deleted objects is only 1. So for the time being, batch deletes disregard delete rules.
You can try it for your self:
func deleteDepartment(named name: String) {
let fetch = NSFetchRequest<NSFetchRequestResult>(entityName: "Department")
fetch.predicate = NSPredicate(format: "name = %#", name)
let req = NSBatchDeleteRequest(fetchRequest: fetch)
req.resultType = .resultTypeCount
do {
let result = try self.persistentContainer.viewContext.execute(req)
as? NSBatchDeleteResult
print(result?.result as! Int) // number of objects deleted
} catch {
fatalError("Error!!!!")
}
}

If anyone would need this:
You can use two NSBatchDeleteRequest for parent and child entities.
let childFetchRequest: NSFetchRequest<NSFetchRequestResult> = NSFetchRequest(entityName: "ChildEntityName")
let childDeleteRequest = NSBatchDeleteRequest(fetchRequest: childFetchRequest)
do {
try persistenceService.context().execute(childDeleteRequest)
let parentFetchRequest: NSFetchRequest<NSFetchRequestResult> = NSFetchRequest(entityName: "ParentEntityName")
let parentDeleteRequest = NSBatchDeleteRequest(fetchRequest: parentFetchRequest)
do {
try persistenceService.context().execute(parentDeleteRequest)
persistenceService.saveContext()
/// handle success
} catch {
persistenceService.context().reset() // for example
/// handle error
}
}catch {
/// handle error
}

Related

How to avoid changing property values in an NSBatchInsertRequest?

I have a simple Core Data entity Story that occasionally I update with the latest data from a network call. This network call sometimes updates many, many stories instances, so I run an NSBatchInsertRequest, shown below. (The other reason I'm using a batch insert is that many stories might need to be added to the persistent store.)
The problem is a user can have already marked a Story as a favorite. When they do that, I set story.isFavorite = true on the main thread and save viewContext.
However, when the batch insert occurs it overwrites story.isFavorite, setting it back to false, even though I'm using NSMergeByPropertyObjectTrumpMergePolicy on both the batch insert and view contexts. I am not touching story.isFavorite in the batch insert handler either so I don't expect that property to be overwritten.
I thought the benefit of a batch insert with this merge policy was to avoid first fetching + then manually updating changed properties + finally saving. What is the right way to avoid changing property values in an NSBatchInsertRequest?
Story
#objc(Story)
public class Story: NSManagedObject {
#NSManaged public var title: String?
#NSManaged public var storyURL: URL?
#NSManaged public var updatedTime: Date?
#NSManaged public var isFavorite: Bool // <- the problem property
}
Batch insert
container.viewContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
container.viewContext.automaticallyMergesChangesFromParent = false
let context = NSManagedObjectContext(concurrencyType: .privateQueueConcurrencyType)
context.parent = container.viewContext
context.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
context.perform {
let batchInsert = NSBatchInsertRequest(entity: Story.entity(), managedObjectHandler: { managedObject in
let story = managedObject as! Story
let storyResponse = downloadedStories[I]
// Update story with latest response data BUT don't modify story.isFavorite.
story.title = storyResponse.title
story.storyURL = storyResponse.storyURL
story.updatedTime = storyResponse.updatedTime
// ...
})
let result = try context.execute(batchInsert) as? NSBatchInsertResult
if let insertedIDs = result?.result as? [NSManagedObjectID] {
// Merge changes into parent context. Skip save() because not needed for batch insert.
NSManagedObjectContext.mergeChanges(fromRemoteContextSave: [NSInsertedObjectsKey: insertedIDs], into: [container.viewContext])
}
}
Edit
The Story entity does have a unique value constraint using attribute storyURL.
Update after Michael Tsai's answer
By making the Story entity attribute isFavorite a non-Optional Boolean without a default value (it was marked as Optional before, though I'm not sure it makes a difference here) and keeping the Use Scalar Type box checked, I can confirm that existing objects in the store will not be modified (at all) with this configuration of the batch insert context.
context.persistentStoreCoordinator = container.persistentStoreCoordinator
// HOWEVER, observe that regardless of the merge policy below,
// setting `context.parent = container.viewContext` will also
// overwrite the store data!
context.mergePolicy = NSMergeByPropertyStoreTrumpMergePolicy
// NSMergeByPropertyObjectTrumpMergePolicy ignores objects in the store
// (which have the same unique constraint value, here equal `storyURL`)
// and overwrites all properties.
// To confirm that the batch insert operation does not modify
// existing Story instances (at all), first delete all instances where
// where isFavorite == false. Then load the all story data again and
// execute the NSBatchInsertRequest with this change to managedObjectHandler:
story.title = storyResponse.title + " (modified)"
You will see the missing stories get inserted back, this time with their titles having a suffix " (modified)"; but previously favorited stories
do not get modified (basically, with this setup, the batch insert won't re-insert objects).
So the isFavorite property does not get overwritten BUT neither do any properties that should be changed (because they received a new title, for example).
Therefore, if you don't want your objects to get updated, but you want completely new objects to be inserted, you can use this approach.
However, if you are expecting your objects to require updates here are some alternatives:
you may opt to run a separate update operation, maybe an NSBatchUpdateRequest after you run your batch insert in this way,
or after the batch insert, you can update certain properties in a simple loop in a (possibly background/child) context without a batch operation, which could be fine if there isn't tons of data;
lastly, you might be able to first batch insert new data to a temporary store before somehow manually merging your choice of properties with the new store, then delete the temporary store.
A simpler approach: you could fetch the all properties you want to keep unchanged before you execute the batch insert (storing them in an dictionary keyed by your object's uniqueness constraint value), and then during the batch insert set the property again.
For this approach, you will want to use a different merge policy such as NSMergeByPropertyObjectTrumpMergePolicy so that the updated object gets re-inserted into the store (make sure to fetch all properties that you don't want to lose in advance of the batch insert)
random idea: How to Save Data When Using One ManagedObjectContext and PersistentStoreCoordinator with Two Stores

I don't think it is actually possible to do a partial update with a batch insert request. It's hard to know for sure because I don't think any of this is documented except in WWDC sessions. When I first watched the 2019 session, I was excited because the presenter said:
Attributes that are optional or configured with default values can be omitted from the dictionary as well.
In the case of updating an object with unique constraint, the existing values will not be changed.
I took this to mean that:
You can omit values for new objects, and you'll get the defaults or NULL. That makes sense.
If there's an existing object and you omit a value, that value will not the changed. So you can purposely omit values to do a partial update, i.e. update other values while leaving your isFavorite alone.
But, after writing code to test this and looking at the output from com.apple.CoreData.SQLDebug, what actually seems to happen with NSMergeByPropertyObjectTrumpMergePolicy is:
If you omit a value that's required you get a validation error.
If you omit a value that's optional, it updates the row to NULL. For a Bool property in Swift, this will become false.
If you omit a value with a default value, it updates the row to the default.
This is a shame because it seems like partial updates could be implemented by having the ON CONFLICT clause only specify DO UPDATE SET for the attributes that you actually set. But (as of macOS 11) Core Data seems to always generate SQL to set all of the columns.
In summary, with batch inserts, NSMergeByPropertyObjectTrumpMergePolicy does not actually merge by property based on what's changed (like with a regular Core Data save). Rather, it either inserts a new row (if the object is absent) or overwrites all the columns but preserves the objectID (if the object was present).
NSMergeByPropertyStoreTrumpMergePolicy also doesn't merge by property. It just means to leave the stored object alone if it's already present.
Update (2021-06-24): I heard from DTS that Apple considers the current (iOS 14/macOS 11) behavior described above a bug, and that it should let you batch insert without changing omitted properties. The Radar number is 79747419.

Referencing external doc in CouchDB view

I am scraping an 90K record database using JSON-RPC and I am trying to put in some basic error checking. I want to start by scraping the database twice using two different settings and adding a prefix to the second scrape. This way I can check to ensure that the two settings are not producing different records (due to dropped updates, etc). I wanted to implement the comparison using a view which compares each document from the first scrape with it's twin produced by the second scrape and then emit the names of records with a difference between them.
However, I cannot quite figure out how to pull in another doc in the view, everything I have read only discusses external docs using the emit() function, which is too late to permit me to compare it. In the example below, the lookup() function would grab the referenced document.
Is this just not possible?
function(doc) {
if(doc._id.slice(0,1)!=='$' && doc._id.slice(0,1)!== "_"){
var otherDoc = lookup('$test" + doc._id);
if(otherDoc){
var keys = doc.value.keys();
var same = true;
keys.forEach(function(key) {
if ((key.slice(0,1) !== '_') && (key.slice(0,1) !=='$') && (key!=='expires')) {
if (!Object.equal(otherDoc[key], doc[key])) {
same = false;
}
}
});
if(!same){
emit(doc._id, 1);
}
}
}
}

Context
You are correct that this is not possible in CouchDB. The whole point of the map function is that it must be idempotent, otherwise you lose all the other nice benefits of a pre-calculated index.
This is why you cannot access external resources in the map function, whether they be other records or the clock. Any time you run a map you must always get the same result if you put the same record into it. Since there are no relationships between records in CouchDB, you cannot promise that this is possible.
Solution
However, you can still achieve your end goal, just be different means. Some possibilities...
Assuming there is some meaningful numeric value in each doc, you could use a view to take the sum of all those values and group them by which import you did ({key: <batch id>, value: <meaningful number>}). Then compare the two numbers in your client or the browser to see if they match.
A brute force approach would be to use a view to pair the docs that should match. Each doc is on a different row, but they're grouped by a common field. Then iterate through the entire index comparing the pairs. This would certainly be the quickest to code and doesn't depend on your application or data.
Implement a validation function to enforce a schema on your data. Just be warned that this will reduce your write throughput since each written record will be piped out of Erlang and into the JS engine. Also, this is only applicable if you're worried about properly formed records instead of their precise content, which might not be the case.
Instead of your different batch jobs creating different docs, have them place them into the same doc. The structure might look like this: { "_id": "something meaningful", "batch_one": { ..data.. }, "batch_two": { ..data.. } } Then your validation function could compare them or you could create a view that indexes all the docs that don't match. All depends on where in your pipeline you want to do the error checking and correction.
Personally I like the last option better, but only if you don't plan to use the database as is in production. Ie., you wouldn't want to carry around all that extra data in each record.
Hope that helps.
Cheers.

Azure table entity existence/synchronisation

I'm using an azure table query to retrieve all error entities assigned to a user.
Afther that I change a property of the entity to state that the entity is in processing mode.
After I have processed the entity I remove the entity from the table.
When I do parallel tests it can happen that during the query, an entity was already processed and deleted by another thread. So I get the error 404 ResourceNotFound when I want to Replace the entity.
Is there a way to test, if the entity was changed outside of the thread or if it still exists? Is it better to catch error 404 and ignore it or should I query for the entity again (seems all not right for me)?
TableQuery<ErrorObjectTableEntity> query = new TableQuery<ErrorObjectTableEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, user));
List<ErrorObjectTableEntity> queryResult = table.ExecuteQuery(query).OrderBy(x => x.action).ToList();
foreach (ErrorObjectTableEntity entity in queryResult)
{
entity.inProcess = true;
try
{
TableOperation updateOperation = TableOperation.Replace(entity);
table.Execute(updateOperation);
}
catch
{
//..some logging here
//catch error 404?
}
//do some action
try
{
TableOperation deleteOperation = TableOperation.Delete(entity);
table.Execute(deleteOperation);
}
catch{...}
}

There are a couple of issues here as far as best practice. Your code as written could simply ignore the exception assuming another worker removed it but this could end up masking other classes of errors. One solution would be to use Queues to insert messages per user query, and then have various workers retrieve a message and process the query for a specific user. This way if a node goes down the app would absorb the fault and continue on. Additionally, this would keep your workers from duplicating work which would optimize the entire application. Lastly, if you don't care about the state of the entity and the keys are predictable you can use the Merge semantic to simply update a given property of an Entity without replacing the entire thing.

You should just catch the 404 error. Although they're represented as exceptions in .NET, HTTP 4xx error codes are more informational than exceptional. (5xx error codes are exceptional.)
Even if you checked that the entity existed before doing the replace, you would still need to catch the NotFound error in case it had been deleted between the check and the replace call. So you might as well skip the check.

Core Data Managed Object one-to-many relationship insert/update save

Lets assume I have person->phones relationship (One to many of course). Initial insert save is working correctly where I do:
if (does not exist) {
user = (Member *)[NSEntityDescription insertNewObjectForEntityForName:#"Person" inManagedObjectContext:ctx];
} else {
searchObjectsForEntity:#"Person" withPredicate:pred andSortKey:nil andSortAscending:NO andContext:ctx];
Should this be replaced with just one insertNewObjectForEntityForName which would either insert or get existing?
Next, I need to create my phones objects and add them to my Person which I do with something like this:
NSManagedObject* mo=nil;
Phone* phone = (Phone *)[NSEntityDescription insertNewObjectForEntityForName:#"Phone"
inManagedObjectContext:ctx];
[mutableSetOfPhones addObject:mo];
user.phones = phones;
So I create a new instance of phone managed object add it to a set and add it to person, after that I save.
All this is good except when I use the same code to re-save Person instance i.e. edit/insert/delete a phone or any other modifications to user data. Old records of phones remain in the DB and are no longer associated with any Person.
What is the right approach of doing this? Do I need to iterate through user.phones to see edits/deletes by some id? Should I just delete older instances prior to saving updated records (much simpler)? What is the recommended approach, maybe I am doing something completely incorrect?

Insert/update Doctrine object from Excel

On the project which I am currently working, I have to read an Excel file (with over a 1000 rows), extract all them and insert/update to a database table.
in terms of performance, is better to add all the records to a Doctrine_Collection and insert/update them after using the fromArray() method, right? One other possible approach is to create a new object for each row (a Excel row will be a object) and them save it but I think its worst in terms of performance.
Every time the Excel is uploaded, it is necessary to compare its rows to the existing objects on the database. If the row does not exist as object, should be inserted, otherwise updated. My first approach was turn both object and rows into arrays (or Doctrine_Collections); then compare both arrays before implementing the needed operations.
Can anyone suggest me any other possible approach?

We did a bit of this in a project recently, with CSV data. it was fairly painless. There's a symfony plugin tmCsvPlugin, but we extended this quite a bit since so the version in the plugin repo is pretty out of date. Must add that to the #TODO list :)
Question 1:
I don't explicitly know about performance, but I would guess that adding the records to a Doctrine_Collection and then calling Doctrine_Collection::save() would be the neatest approach. I'm sure it would be handy if an exception was thrown somewhere and you had to roll back on your last save..
Question 2:
If you could use a row field as a unique indentifier, (let's assume a username), then you could search for an existing record. If you find a record, and assuming that your imported row is an array, use Doctrine_Record::synchronizeWithArray() to update this record; then add it to a Doctrine_Collection. When complete, just call Doctrine_Collection::save()
A fairly rough 'n' ready implementation:
// set up a new collection
$collection = new Doctrine_Collection('User');
// assuming $row is an associative
// array representing one imported row.
foreach ($importedRows as $row) {
// try to find an existing record
// based on a unique identifier.
$user = Doctrine_Core::getTable('User')
->findOneByUsername($row['username']);
// create a new user record if
// no existing record is found.
if (!$user instanceof User) {
$user = new User();
}
// sync record with current data.
$user->synchronizeWithArray($row);
// add to collection.
$collection->add($user);
}
// done. save collection.
$collection->save();
Pretty rough but something like this worked well for me. This is assuming that you can use your imported row data in some way to serve as a unique identifier.
NOTE: be wary of synchronizeWithArray() if you're using sf1.2/doctrine 1.0 - if I remember correctly it was not implemented correctly. it works fine in doctrine 1.2 though.

I have never worked on Doctrine_Collections, but I can answer in terms of database queries and code logic in a broader sense. I would apply the following logic:-
Fetch all the rows of the excel sheet from database in a single query and store them in an array $uploadedSheet.
Create a single array of all the rows of the uploaded excel sheet, call it $storedSheet. I guess the structures of the Doctrine_Collections $uploadedSheet and $storedSheet will be similar (both two-dimensional - rows, cells can be identified and compared).
3.Run foreach loops on the $uploadedSheet as follows and only identify the rows which need to be inserted and which to be updated (do actual queries later)-
$rowsToBeUpdated =array();
$rowsToBeInserted=array();
foreach($uploadedSheet as $row=>$eachRow)
{
if(is_array($storedSheet[$row]))
{
foreach($eachRow as $column=>$value)
{
if($value != $storedSheet[$row][$column])
{//This is a representation of comparison
$rowsToBeUpdated[$row]=true;
break; //No need to check this row anymore - one difference detected.
}
}
}
else
{
$rowsToBeInserted[$row] = true;
}
}
4. This way you have two arrays. Now perform 2 database queries -
bulk insert all those rows of $uploadedSheet whose numbers are stored in $rowsToBeInserted array.
bulk update all the rows of $uploadedSheet whose numbers are stored in $rowsToBeUpdated array.
These bulk queries are the key to faster performance.
Let me know if this helped, or you wanted to know something else.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string