How to avoid changing property values in an NSBatchInsertRequest?

How to avoid changing property values in an NSBatchInsertRequest? - core-data

I have a simple Core Data entity Story that occasionally I update with the latest data from a network call. This network call sometimes updates many, many stories instances, so I run an NSBatchInsertRequest, shown below. (The other reason I'm using a batch insert is that many stories might need to be added to the persistent store.)
The problem is a user can have already marked a Story as a favorite. When they do that, I set story.isFavorite = true on the main thread and save viewContext.
However, when the batch insert occurs it overwrites story.isFavorite, setting it back to false, even though I'm using NSMergeByPropertyObjectTrumpMergePolicy on both the batch insert and view contexts. I am not touching story.isFavorite in the batch insert handler either so I don't expect that property to be overwritten.
I thought the benefit of a batch insert with this merge policy was to avoid first fetching + then manually updating changed properties + finally saving. What is the right way to avoid changing property values in an NSBatchInsertRequest?
Story
#objc(Story)
public class Story: NSManagedObject {
#NSManaged public var title: String?
#NSManaged public var storyURL: URL?
#NSManaged public var updatedTime: Date?
#NSManaged public var isFavorite: Bool // <- the problem property
}
Batch insert
container.viewContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
container.viewContext.automaticallyMergesChangesFromParent = false
let context = NSManagedObjectContext(concurrencyType: .privateQueueConcurrencyType)
context.parent = container.viewContext
context.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
context.perform {
let batchInsert = NSBatchInsertRequest(entity: Story.entity(), managedObjectHandler: { managedObject in
let story = managedObject as! Story
let storyResponse = downloadedStories[I]
// Update story with latest response data BUT don't modify story.isFavorite.
story.title = storyResponse.title
story.storyURL = storyResponse.storyURL
story.updatedTime = storyResponse.updatedTime
// ...
})
let result = try context.execute(batchInsert) as? NSBatchInsertResult
if let insertedIDs = result?.result as? [NSManagedObjectID] {
// Merge changes into parent context. Skip save() because not needed for batch insert.
NSManagedObjectContext.mergeChanges(fromRemoteContextSave: [NSInsertedObjectsKey: insertedIDs], into: [container.viewContext])
}
}
Edit
The Story entity does have a unique value constraint using attribute storyURL.
Update after Michael Tsai's answer
By making the Story entity attribute isFavorite a non-Optional Boolean without a default value (it was marked as Optional before, though I'm not sure it makes a difference here) and keeping the Use Scalar Type box checked, I can confirm that existing objects in the store will not be modified (at all) with this configuration of the batch insert context.
context.persistentStoreCoordinator = container.persistentStoreCoordinator
// HOWEVER, observe that regardless of the merge policy below,
// setting `context.parent = container.viewContext` will also
// overwrite the store data!
context.mergePolicy = NSMergeByPropertyStoreTrumpMergePolicy
// NSMergeByPropertyObjectTrumpMergePolicy ignores objects in the store
// (which have the same unique constraint value, here equal `storyURL`)
// and overwrites all properties.
// To confirm that the batch insert operation does not modify
// existing Story instances (at all), first delete all instances where
// where isFavorite == false. Then load the all story data again and
// execute the NSBatchInsertRequest with this change to managedObjectHandler:
story.title = storyResponse.title + " (modified)"
You will see the missing stories get inserted back, this time with their titles having a suffix " (modified)"; but previously favorited stories
do not get modified (basically, with this setup, the batch insert won't re-insert objects).
So the isFavorite property does not get overwritten BUT neither do any properties that should be changed (because they received a new title, for example).
Therefore, if you don't want your objects to get updated, but you want completely new objects to be inserted, you can use this approach.
However, if you are expecting your objects to require updates here are some alternatives:
you may opt to run a separate update operation, maybe an NSBatchUpdateRequest after you run your batch insert in this way,
or after the batch insert, you can update certain properties in a simple loop in a (possibly background/child) context without a batch operation, which could be fine if there isn't tons of data;
lastly, you might be able to first batch insert new data to a temporary store before somehow manually merging your choice of properties with the new store, then delete the temporary store.
A simpler approach: you could fetch the all properties you want to keep unchanged before you execute the batch insert (storing them in an dictionary keyed by your object's uniqueness constraint value), and then during the batch insert set the property again.
For this approach, you will want to use a different merge policy such as NSMergeByPropertyObjectTrumpMergePolicy so that the updated object gets re-inserted into the store (make sure to fetch all properties that you don't want to lose in advance of the batch insert)
random idea: How to Save Data When Using One ManagedObjectContext and PersistentStoreCoordinator with Two Stores

I don't think it is actually possible to do a partial update with a batch insert request. It's hard to know for sure because I don't think any of this is documented except in WWDC sessions. When I first watched the 2019 session, I was excited because the presenter said:
Attributes that are optional or configured with default values can be omitted from the dictionary as well.
In the case of updating an object with unique constraint, the existing values will not be changed.
I took this to mean that:
You can omit values for new objects, and you'll get the defaults or NULL. That makes sense.
If there's an existing object and you omit a value, that value will not the changed. So you can purposely omit values to do a partial update, i.e. update other values while leaving your isFavorite alone.
But, after writing code to test this and looking at the output from com.apple.CoreData.SQLDebug, what actually seems to happen with NSMergeByPropertyObjectTrumpMergePolicy is:
If you omit a value that's required you get a validation error.
If you omit a value that's optional, it updates the row to NULL. For a Bool property in Swift, this will become false.
If you omit a value with a default value, it updates the row to the default.
This is a shame because it seems like partial updates could be implemented by having the ON CONFLICT clause only specify DO UPDATE SET for the attributes that you actually set. But (as of macOS 11) Core Data seems to always generate SQL to set all of the columns.
In summary, with batch inserts, NSMergeByPropertyObjectTrumpMergePolicy does not actually merge by property based on what's changed (like with a regular Core Data save). Rather, it either inserts a new row (if the object is absent) or overwrites all the columns but preserves the objectID (if the object was present).
NSMergeByPropertyStoreTrumpMergePolicy also doesn't merge by property. It just means to leave the stored object alone if it's already present.
Update (2021-06-24): I heard from DTS that Apple considers the current (iOS 14/macOS 11) behavior described above a bug, and that it should let you batch insert without changing omitted properties. The Radar number is 79747419.

Related

Maximo automatisation script to change statut of workorder

I have created a non-persistent attribute in my WoActivity table named VDS_COMPLETE. it is a bool that get changed by a checkbox in one of my application.
I am trying to make a automatisation script in Python to change the status of every task a work order that have been check when I save the WorkOrder.
I don't know why it isn't working but I'm pretty sure I'm close to the answer...
Do you have an idea why it isn't working? I know that I have code in comments, I have done a few experimentations...
from psdi.mbo import MboConstants
from psdi.server import MXServer
mxServer = MXServer.getMXServer()
userInfo = mxServer.getUserInfo(user)
mboSet = mxServer.getMboSet("WORKORDER")
#where1 = "wonum = :wonum"
#mboSet .setWhere(where1)
#mboSet.reset()
workorderSet = mboSet.getMbo(0).getMboSet("WOACTIVITY", "STATUS NOT IN ('FERME' , 'ANNULE' , 'COMPLETE' , 'ATTDOC')")
#where2 = "STATUS NOT IN ('FERME' , 'ANNULE' , 'COMPLETE' , 'ATTDOC')"
#workorderSet.setWhere(where2)
if workorderSet.count() > 0:
for x in range(0,workorderSet.count()):
if workorderSet.getString("VDS_COMPLETE") == 1:
workorder = workorderSet.getMbo(x)
workorder.changeStatus("COMPLETE",MXServer.getMXServer().getDate(), u"Script d'automatisation", MboConstants.NOACCESSCHECK)
workorderSet.save()
workorderSet.close()

It looks like your two biggest mistakes here are 1. trying to get your boolean field (VDS_COMPLETE) off the set (meaning off of the collection of records, like the whole table) instead of off of the MBO (meaning an actual record, one entry in the table) and 2. getting your set of data fresh from the database (via that MXServer call) which means using the previously saved data instead of getting your data set from the screen where the pending changes have actually been made (and remember that non-persistent fields do not get saved to the database).
There are some other problems with this script too, like your use of "count()" in your for loop (or even more than once at all) which is an expensive operation, and the way you are currently (though this may be a result of your debugging) not filtering the work order set before grabbing the first work order (meaning you get a random work order from the table) and then doing a dynamic relationship off of that record (instead of using a normal relationship or skipping the relationship altogether and using just a "where" clause), even though that relationship likely already exists.
Here is a Stack Overflow describing in more detail about relationships and "where" clauses in Maximo: Describe relationship in maximo 7.5
This question also has some more information about getting data from the screen versus new from the database: Adding a new row to another table using java in Maximo

Guava cache not taking values from cache and refreshing for every call

I have created cache in my spark application, to refresh few values after every 6 hours.
The code looks as below.
val cachedList: Cache[String, String] = CacheBuilder.newBuilder()
.maximumSize(10)
.expireAfterWrite(refreshTimeInSeconds.toLong, TimeUnit.SECONDS)
.build()
This method gets the items and caches for me.
def prepareUnfilteredAccountList(): Array[String] = {
logger.info("Refreshing the UNFILTERED Accounts List from API")
val unfilteredAccounts = apiService.getElementsToCache().get.map(_.accountNumber).toArray
logger.trace("cached list has been refreshed New List is " +
unfilteredAccounts
}
This method is used to get the cached values into list.
def getUnfilteredList(): Array[String] = {
unfilteredAccounts.get("cached_list", new Callable[String]() {
override def call(): String = {
prepareUnfilteredAccountList().mkString(",")
}
}).split(",")
}
However, i observed that the cache is getting refreshed for every call, instead of getting refreshed after specified time period.

First off, if you want to store a list or an array in a Cache you can do so, there's no need to convert it into a string and then split it back into an array.
Second, maximumSize() configures how many entries are in your cache - by the looks of it your cache only ever has one entry (your list), so specifying a maximum size is meaningless.
Third, if you're only trying to cache one value you might prefer the Suppliers.memoizeWithExperiation() API, which is simpler and less expensive than a Cache.
Fourth, in prepareUnfilteredAccountList() unfilteredAccounts appears to be an array, but in getUnfilteredList() the same variable appears to be a cache. At best this is going to be confusing for you to work with. Use distinct variable names for distinct purposes. It's possible this is the cause of your problems.
All that said calling Cache.get(K, Runnable) should work as you expect - it will invoke the Runnable only if the given key does not already exist in the the cache and has not expired. If that isn't the behavior you're seeing the bug is likely elsewhere in your code. Perhaps your refreshTimeInSeconds is not what you expect, or you're not actually reading the value you're caching, or the cache is actually working as intended and you're misdiagnosing its behavior as erroneous.

How to load document out of database instead of memory

Using Raven client and server #30155. I'm basically doing the following in a controller:
public ActionResult Update(string id, EditModel model)
{
var store = provider.StartTransaction(false);
var document = store.Load<T>(id);
model.UpdateEntity(document) // overwrite document property values with those of edit model.
document.Update(store); // tell document to update itself if it passes some conflict checking
}
Then in document.Update, I try do this:
var old = store.Load<T>(this.Id);
if (old.Date != this.Date)
{
// Resolve conflicts that occur by moving document period
}
store.Update(this);
Now, I run into the problem that old gets loaded out of memory instead of the database and already contains the updated values. Thus, it never goes into the conflict check.
I tried working around the problem by changing the Controller.Update method into:
public ActionResult Update(string id, EditModel model)
{
var store = provider.StartTransaction(false);
var document = store.Load<T>(id);
store.Dispose();
model.UpdateEntity(document) // overwrite document property values with those of edit model.
store = provider.StartTransaction(false);
document.Update(store); // tell document to update itself if it passes some conflict checking
}
This results in me getting a Raven.Client.Exceptions.NonUniqueObjectException with the text: Attempted to associate a different object with id
Now, the questions:
Why would Raven care if I try and associate a new object with the id as long as the new object carries the proper e-tag and type?
Is it possible to load a document in its database state (overriding default behavior to fetch document from memory if it exists there)?
What is a good solution to getting the document.Update() to work (preferably without having to pass the old object along)?

Why would Raven care if I try and associate a new object with the id as long as the new object carries the proper e-tag and type?
RavenDB leans on being able to serve the documents from memory (which is faster). By checking for persisting objects for the same id, hard to debug errors are prevented.
EDIT: See comment of Rayen below. If you enable concurrency checking / provide etag in the Store, you can bypass the error.
Is it possible to load a document in its database state (overriding default behavior to fetch document from memory if it exists there)?
Apparantly not.
What is a good solution to getting the document.Update() to work (preferably without having to pass the old object along)?
I went with refactoring the document.Update method to also have an optional parameter to receive the old date period, since #1 and #2 don't seem possible.

RavenDB supports optimistic concurrency out of the box. The only thing you need to do is to call it.
session.Advanced.UseOptimisticConcurrency = true;
See:
http://ravendb.net/docs/article-page/3.5/Csharp/client-api/session/configuration/how-to-enable-optimistic-concurrency

Can I use NSBatchDeleteRequest on entities with relationships that have delete rules?

I'm attempting to use NSBatchDeleteRequest to delete a pile of entities, many of these entities have delete cascade and/or nullify rules.
My first attempt to delete anything fails and the NSError I get back includes the string "Delete rule is not supported for batch deletes". I had thought it was fine to delete such things but i was responsible for making sure all the constraints are satisfied before I do a save.
Should I be able to batch delete these managed objects? (I want to keep the delete rules, other delete paths don't have an easy way to know what set of objects to delete) Do some kinds of batch deletes work in this case, but others not? (say predicates fail, but a list of object IDs work?)

Batch delete is problematic with relationships.
It goes directly to the database and deletes the records suspending all object graph rules, including the delete rules. You have correctly identified the requirement that you need to do all the constraint checking yourself again. (That by itself could be a deal-breaker.)
Even if you manage to delete the entities and all the necessary related entities correctly, you will still be left with lots of entries in the (opaque) join table Core Data creates in the background. There is no obvious safe way to delete the entries in the join tables and they have been reported to interfere with managing relationships in future operations.
IMO , the solution in this case is to still use the object graph rather than batch delete and optimize for performance. There are many good answers on SOF on how to do this, but most of it can be summarized with these points:
find the right batch size for saving (typically 500 entities for creation, about 2000 for deletion, but this could vary according to object size and relationship complexity - you have to experiment).
if you have memory constraints, use autoreleasepools.
use a background context to free the UI for interaction. I prefer to do the saving to the database in the background after updating the UI.

I just wrote a simple Department-Employee (one-to-many) demo project. The delete rule of Empolyee's department relationship is set to cascade.
When using batch deletes to delete a department with two employees, the number of deleted objects is only 1. So for the time being, batch deletes disregard delete rules.
You can try it for your self:
func deleteDepartment(named name: String) {
let fetch = NSFetchRequest<NSFetchRequestResult>(entityName: "Department")
fetch.predicate = NSPredicate(format: "name = %#", name)
let req = NSBatchDeleteRequest(fetchRequest: fetch)
req.resultType = .resultTypeCount
do {
let result = try self.persistentContainer.viewContext.execute(req)
as? NSBatchDeleteResult
print(result?.result as! Int) // number of objects deleted
} catch {
fatalError("Error!!!!")
}
}

If anyone would need this:
You can use two NSBatchDeleteRequest for parent and child entities.
let childFetchRequest: NSFetchRequest<NSFetchRequestResult> = NSFetchRequest(entityName: "ChildEntityName")
let childDeleteRequest = NSBatchDeleteRequest(fetchRequest: childFetchRequest)
do {
try persistenceService.context().execute(childDeleteRequest)
let parentFetchRequest: NSFetchRequest<NSFetchRequestResult> = NSFetchRequest(entityName: "ParentEntityName")
let parentDeleteRequest = NSBatchDeleteRequest(fetchRequest: parentFetchRequest)
do {
try persistenceService.context().execute(parentDeleteRequest)
persistenceService.saveContext()
/// handle success
} catch {
persistenceService.context().reset() // for example
/// handle error
}
}catch {
/// handle error
}

Insert/update Doctrine object from Excel

On the project which I am currently working, I have to read an Excel file (with over a 1000 rows), extract all them and insert/update to a database table.
in terms of performance, is better to add all the records to a Doctrine_Collection and insert/update them after using the fromArray() method, right? One other possible approach is to create a new object for each row (a Excel row will be a object) and them save it but I think its worst in terms of performance.
Every time the Excel is uploaded, it is necessary to compare its rows to the existing objects on the database. If the row does not exist as object, should be inserted, otherwise updated. My first approach was turn both object and rows into arrays (or Doctrine_Collections); then compare both arrays before implementing the needed operations.
Can anyone suggest me any other possible approach?

We did a bit of this in a project recently, with CSV data. it was fairly painless. There's a symfony plugin tmCsvPlugin, but we extended this quite a bit since so the version in the plugin repo is pretty out of date. Must add that to the #TODO list :)
Question 1:
I don't explicitly know about performance, but I would guess that adding the records to a Doctrine_Collection and then calling Doctrine_Collection::save() would be the neatest approach. I'm sure it would be handy if an exception was thrown somewhere and you had to roll back on your last save..
Question 2:
If you could use a row field as a unique indentifier, (let's assume a username), then you could search for an existing record. If you find a record, and assuming that your imported row is an array, use Doctrine_Record::synchronizeWithArray() to update this record; then add it to a Doctrine_Collection. When complete, just call Doctrine_Collection::save()
A fairly rough 'n' ready implementation:
// set up a new collection
$collection = new Doctrine_Collection('User');
// assuming $row is an associative
// array representing one imported row.
foreach ($importedRows as $row) {
// try to find an existing record
// based on a unique identifier.
$user = Doctrine_Core::getTable('User')
->findOneByUsername($row['username']);
// create a new user record if
// no existing record is found.
if (!$user instanceof User) {
$user = new User();
}
// sync record with current data.
$user->synchronizeWithArray($row);
// add to collection.
$collection->add($user);
}
// done. save collection.
$collection->save();
Pretty rough but something like this worked well for me. This is assuming that you can use your imported row data in some way to serve as a unique identifier.
NOTE: be wary of synchronizeWithArray() if you're using sf1.2/doctrine 1.0 - if I remember correctly it was not implemented correctly. it works fine in doctrine 1.2 though.

I have never worked on Doctrine_Collections, but I can answer in terms of database queries and code logic in a broader sense. I would apply the following logic:-
Fetch all the rows of the excel sheet from database in a single query and store them in an array $uploadedSheet.
Create a single array of all the rows of the uploaded excel sheet, call it $storedSheet. I guess the structures of the Doctrine_Collections $uploadedSheet and $storedSheet will be similar (both two-dimensional - rows, cells can be identified and compared).
3.Run foreach loops on the $uploadedSheet as follows and only identify the rows which need to be inserted and which to be updated (do actual queries later)-
$rowsToBeUpdated =array();
$rowsToBeInserted=array();
foreach($uploadedSheet as $row=>$eachRow)
{
if(is_array($storedSheet[$row]))
{
foreach($eachRow as $column=>$value)
{
if($value != $storedSheet[$row][$column])
{//This is a representation of comparison
$rowsToBeUpdated[$row]=true;
break; //No need to check this row anymore - one difference detected.
}
}
}
else
{
$rowsToBeInserted[$row] = true;
}
}
4. This way you have two arrays. Now perform 2 database queries -
bulk insert all those rows of $uploadedSheet whose numbers are stored in $rowsToBeInserted array.
bulk update all the rows of $uploadedSheet whose numbers are stored in $rowsToBeUpdated array.
These bulk queries are the key to faster performance.
Let me know if this helped, or you wanted to know something else.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string