I was wondering if anyone has done any perf tests around the effect calling EF Cores SaveChangesAsync() has on performance if there are no changes to be saved.
Essentially I am assuming it's basically nothing and therefore isn't a big deal to call it "just in case"?
(I am trying to do something with tracking user activity in middleware in asp net core and essentially on the way out I want to make sure save changes was called to persist the activity to the database. There is a chance that it has already been called on the context depending on the operation of the user and if that's the case I don't want to incur the cost of a second operation when the activity could be persisted as part of the normal transaction/round trip)
As you can see in implementation if there are no changes, nothing will be done. As far it has impact to performance, I don't know. But of course calling SaveChanges or SaveChangesAsync without any changes will have a performance impact in relation to don't call them.
That's the same behavior like EF6 has too.
Related
I'm writing my first 'serious' Node/Express application, and I'm becoming concerned about the number of O(n) and O(n^2) operations I'm performing on every request. The application is a blog engine, which indexes and serves up articles stored in markdown format in the file system. The contents of the articles folder do not change frequently, as the app is scaled for a personal blog, but I would still like to be able to add a file to that folder whenever I want, and have the app include it without further intervention.
Operations I'm concerned about
When /index is requested, my route is iterating over all files in the directory and storing them as objects
When a "tag page" is requested (/tag/foo) I'm iterating over all the articles, and then iterating over their arrays of tags to determine which articles to present in an index format
Now, I know that this is probably premature optimisation as the performance is still satisfactory over <200 files, but definitely not lightning fast. And I also know that in production, measures like this wouldn't be considered necessary/worthwhile unless backed by significant benchmarking results. But as this is purely a learning exercise/demonstration of ability, and as I'm (perhaps excessively) concerned about learning optimal habits and patterns, I worry I'm committing some kind of sin here.
Measures I have considered
I get the impression that a database might be a more typical solution, rather than filesystem I/O. But this would mean monitoring the directory for changes and processing/adding new articles to the database, a whole separate operation/functionality. If I did this, would it make sense to be watching that folder for changes even when a request isn't coming in? Or would it be better to check the freshness of the database, then retrieve results from the database? I also don't know how much this helps ultimately, as database calls are still async/slower than internal state, aren't they? Or would a database query, e.g. articles where tags contain x be O(1) rather than O(n)? If so, that would clearly be ideal.
Also, I am beginning to learn about techniques/patterns for caching results, e.g. a property on the function containing the previous result, which could be checked for and served up without performing the operation. But I'd need to check if the folder had new files added to know if it was OK to serve up the cached version, right? But more fundamentally (and this is the essential newbie query at hand) is it considered OK to do this? Everyone talks about how node apps should be stateless, and this would amount to maintaining state, right? Once again, I'm still a fairly raw beginner, and so reading the source of mature apps isn't always as enlightening to me as I wish it was.
Also have I fundamentally misunderstood how routes work in node/express? If I store a variable in index.js, are all the variables/objects created by it destroyed when the route is done and the page is served? If so I apologise profusely for my ignorance, as that would negate basically everything discussed, and make maintaining an external database (or just continuing to redo the file I/O) the only solution.
First off, the request and response objects that are part of each request last only for the duration of a given request and are not shared by other requests. They will be garbage collected as soon as they are no longer in use.
But, module-scoped variables in any of your Express modules last for the duration of the server. So, you can load some information in one request, store it in a module-level variable and that information will still be there when the next request comes along.
Since multiple requests can be "in-flight" at the same time if you are using any async operations in your request handlers, then if you are sharing/updating information between requests you have to make sure you have atomic updates so that the data is shared safely. In node.js, this is much simpler than in a multi-threaded response handler web server, but there still can be issues if you're doing part of an update to a shared object, then doing some async operation, then doing the rest of an update to a shared object. When you do an async operation, another request could run and see the shared object.
When not doing an async operation, your Javascript code is single threaded so other requests won't interleave until you go async.
It sounds like you want to cache your parsed state into a simple in-memory Javascript structure and then intelligently update this cache of information when new articles are added.
Since you already have the code to parse your set of files and tags into in-memory Javascript variables, you can just keep that code. You will want to package that into a separate function that you can call at any time and it will return a newly updated state.
Then, you want to call it when your server starts and that will establish the initial state.
All your routes can be changed to operate on the cached state and this should speed them up tremendously.
Then, all you need is a scheme to decide when to update the cached state (e.g. when something in the file system changed). There are lots of options and which to use depends a little bit on how often things will change and how often the changes need to get reflected to the outside world. Here are some options:
You could register a file system watcher for a particular directory of your file system and when it triggers, you figure out what has changed and update your cache. You can make the update function as dumb (just start over and parse everything from scratch) or as smart (figure out what one item changed and update only that part of the cache) as it is worth doing. I'd suggest you start simple and only invest more in it when you're sure that effort is needed.
You could just manually rebuild the cache once every hour. Updates would take an average of 30 minutes to show, but this would take 10 seconds to implement.
You could create an admin function in your server to instruct the server to update its cache now. This might be combined with option 2, so that if you added new content, it would automatically show within an hour, but if you wanted it to show immediately, you could hit the admin page to tell it to update its cache.
My iPhone app uses core data and things are fine for most part. But here is a problem:
after a certain amount of data, it stalls at first time execution (where core data entities must be loaded).
Some experimenting showed that things are OK up to a certain amount of data loaded in Core Data at start.
If I go over a critical amount the installation starts failing. The bigger the amount of data for start, the higher the probability that it fails.
By making separate tests I made sure the data themselves are not faulty.
I also can say this problem does not appear in the simulator.
It also does not happen when I connect the debugger to the device.
It looks like too much data loaded in core data in a short amount of time creates some kind of overload.
Is that true? Any idea on a possible solution?
At this point I made up a partial solution using a UIActionSheet object to kill some time (asking the user to push a button). But this is not very satisfactory, though for the time being it works.
Any comment or advice for a better way would be appreciated.
It is not quite clear what do you mean by "it fails".
However if you are using SQLite, by loading into CoreData, if you mean "create and save" entities at start up to populate CoreData, then remember to not call [managedObjectContext save...] only at the end especially with large amount of data, but create and save a reasonable set of NSManagedObject.
Otherwise, if you mean you have large amount of data that are retrieved as NSManagedObject, probably loaded into a UITableView consider using some kind of NSOperation for asynchronous loading.
If those two cases doesn't apply to you just tell us the error you are getting, or what you mean by "fails" os "stalls".
I'm returning A LOT (500k+) documents from a MongoDB collection in Node.js. It's not for display on a website, but rather for data some number crunching. If I grab ALL of those documents, the system freezes. Is there a better way to grab it all?
I'm thinking pagination might work?
Edit: This is already outside the main node.js server event loop, so "the system freezes" does not mean "incoming requests are not being processed"
After learning more about your situation, I have some ideas:
Do as much as you can in a Map/Reduce function in Mongo - perhaps if you throw less data at Node that might be the solution.
Perhaps this much data is eating all your memory on your system. Your "freeze" could be V8 stopping the system to do a garbage collection (see this SO question). You could Use V8 flag --trace-gc to log GCs & prove this hypothesis. (thanks to another SO answer about V8 and Garbage collection
Pagination, like you suggested may help. Perhaps even splitting up your data even further into worker queues (create one worker task with references to records 1-10, another with references to records 11-20, etc). Depending on your calculation
Perhaps pre-processing your data - ie: somehow returning much smaller data for each record. Or not using an ORM for this particular calculation, if you're using one now. Making sure each record has only the data you need in it means less data to transfer and less memory your app needs.
I would put your big fetch+process task on a worker queue, background process, or forking mechanism (there are a lot of different options here).
That way you do your calculations outside of your main event loop and keep that free to process other requests. While you should be doing your Mongo lookup in a callback, the calculations themselves may take up time, thus "freezing" node - you're not giving it a break to process other requests.
Since you don't need them all at the same time (that's what I've deduced from you asking about pagination), perhaps it's better to separate those 500k stuff into smaller chunks to be processed at the nextTick?
You could also use something like Kue to queue the chunks and process them later (thus not everything in the same time).
I have entities that are managed by Core Data and have several cases where, within a single method, I set some attribute values that will result graph changes that Core Data will enforce and perform additional actions that (logically) depend on uptodate state for the graph.
Is there any reason not to call processPendingChanges after each time a relationship is set, to ensure that the graph is always fully uptodate? Everything works as it should when I do this, but, clearly, it's a bit "noisy", and breaks up some processing that would otherwise be notifications (e.g, fetched results controllers that end up sending lots of controllerWillChangeContent/controllerDidChangeContent to their deligates when one would otherwise have happened).
ADDITION:
Will the graph always be up-to-date after a return from any method that makes changes to an entity?
I found it the hard way that you should call processPendingChanges before inspecting deletedObjects of NSManagedObjectContext. At least if some relationships have deleteRule set to NSCascadeDeleteRule.
If you don't call processPendingChanges then deletedObjects may not contain objects that will be deleted by cascade at the end of current event.
processPendingChanges is most often used on iOS with multiple context operating on seperate threads. It plays a bigger and more common role under MacOS.
You usually don't have to call it under iOS in most circumstances. Doing so doesn't really give you much of an advantage and it can cause lags in the UI when executed on the main thread if you have a complex graph.
I wouldn't bother with it unless testing reveals you are loosing graph integrity for some reason.
Right now whenever I need to access my data set size (and it can be quite frequently), I perform a countForFetchRequest on the managedObjectContext. Is this a bad thing to do? Should I manage the count locally instead? The reason I went this route is to ensure I am getting 100% correct answer. With Core Data being accessed from more than one places (for example, through NSFetchedResultsController as well), it's hard to keep an accurate count locally.
-countForFetchRequest: is always evaluated in the persistent store. When using the Sqlite store, this will result in IO being performed.
Suggested strategy:
Cache the count returned from -countForFetchRequest:.
Observe NSManagedObjectContextObjectsDidChangeNotification for your own context.
Observe NSManagedObjectContextDidSaveNotification for related contexts.
For the simple case (no fetch predicate) you can update the count from the information contained in the notification without additional IO.
Alternately, you can invalidate your cached count and refresh via -countForFetchRequest: as necessary.