Are there greenDAO thread safety best practices? - multithreading

I'm having a go with greenDAO and so far it's going pretty well. One thing that doesn't seem to be covered by the docs or website (or anywhere :( ) is how it handles thread safety.
I know the basics mentioned elsewhere, like "use a single dao session" (general practice for Android + SQLite), and I understand the Java memory model quite well. The library internals even appear threadsafe, or at least built with that intention. But nothing I've seen covers this:
greenDAO caches entities by default. This is excellent for a completely single-threaded program - transparent and a massive performance boost for most uses. But if I e.g. loadAll() and then modify one of the elements, I'm modifying the same object globally across my app. If I'm using it on the main thread (e.g. for display), and updating the DB on a background thread (as is right and proper), there are obvious threading problems unless extra care is taken.
Does greenDAO do anything "under the hood" to protect against common application-level threading problems? For example, modifying a cached entity in the UI thread while saving it in a background thread (better hope they don't interleave! especially when modifying a list!)? Are there any "best practices" to protect against them, beyond general thread safety concerns (i.e. something that greenDAO expects and works well with)? Or is the whole cache fatally flawed from a multithreaded-application safety standpoint?

I've no experience with greenDAO but the documentation here:
http://greendao-orm.com/documentation/queries/
Says:
If you use queries in multiple threads, you must call forCurrentThread() on the query to get a Query instance for the current thread. Starting with greenDAO 1.3, object instances of Query are bound to their owning thread that build the query. This lets you safely set parameters on the Query object while other threads cannot interfere. If other threads try to set parameters on the query or execute the query bound to another thread, an exception will be thrown. Like this, you don’t need a synchronized statement. In fact you should avoid locking because this may lead to deadlocks if concurrent transactions use the same Query object.
To avoid those potential deadlocks completely, greenDAO 1.3 introduced the method forCurrentThread(). This will return a thread-local instance of the Query, which is safe to use in the current thread. Every time, forCurrentThread() is called, the parameters are set to the initial parameters at the time the query was built using its builder.
While so far as I can see the documentation doesn't explicitly say anything about multi threading other than this this seems pretty clear that it is handled. This is talking about multiple threads using the same Query object, so clearly multiple threads can access the same database. Certainly it's normal for databases and DAO to handle concurrent access and there are a lot of proven techniques for working with caches in this situation.

By default GreenDAO caches and returns cached entity instances to improve performance. To prevent this behaviour, you need to call:
daoSession.clear()
to clear all cached instances. Alternatively you can call:
objectDao.detachAll()
to clear cached instances only for the specific DAO object.
You will need to call these methods every time you want to clear the cached instances, so if you want to disable all caching, I recommend calling them in your Session or DAO accessor methods.
Documentation:
http://greenrobot.org/greendao/documentation/sessions/#Clear_the_identity_scope
Discussion: https://github.com/greenrobot/greenDAO/issues/776

Related

Does node-cache uses locks

I'm trying to understand if the node-cache package uses locks for the cache object and can't find anything.
I tried to look at the source code and it doesn't look like it, but this answer suggests otherwise with the quote:
So there is Redis and node-cache for memory locks.
This cache is used in a CRUD server and I want to make sure that GET/UPDATE requests will not create a race condition on the data.
I don't see any evidence of locking in the code.
If two requests for the same key which is not in the cache are made one after the other, then it will launch two separate fetch() operations and whichever request comes back last is the one that will remain in the cache. This is probably not normally a problem, but an improved implementation could make only one request for that same key and have the second request just wait for the first request to provide the value that was already in flight.
Since the cache itself is all in-memory, all access to the cache is synchronous and thus regulated by Javascript's single threaded nature. So, the only place concurrency issues could affect things in the cache code itself are when they launch an asynchronous fetch() operation.
There are, of course, race conditions waiting to happen in how one uses the code that accesses the data just like there are with a database interface so the calling code has to be smart about how it uses the interface to avoid creating race conditions because of how it calls things.
Unfortunately no, you can write a unit test to confirm it.
I have written a library to fix that and also added read through method to easy the code usage:
https://github.com/KhanhPham2411/node-cache-async-lock

How best to instantiate execution contexts?

I'm building a library that others--i.e. those uninterested in internals--could use to pull data from our DBs. In the internals, I want a couple I/O calls to be performed in parallel for performance purposes. The trade-off here is that the client (who, again, might not care much about this whole threading thing) would need to provide an appropriate execution context. Therefore, I provide a suggestion to use a helpful execution context in a helper object:
object ThreadPoolHelper {
val cachedThreadPoolContext: ExecutionContext =
ExecutionContext.fromExecutor(Executors.newCachedThreadPool())
}
The question is (assuming that someday I also provide other options, like, say, a fixed thread pool for the clients to optionally use) am I fine just leaving this (these) as a val? Or am I better off making it lazy? Or a def?
One way or another, lazy is the way to go.
Making them lazy vals would be a good all-purpose choice, as each could be initialized as needed (as they are accessed). Then, you would never initialize more thread pools than are needed. scala.concurrent.ExecutionContext.Implicits.global is an implicit lazy val.
Technically, singleton objects (like ThreadPoolHelper) are lazy by default, so they will not be initialized until they are first accessed. A val would be fine if you only had one ExecutionContext in an object. However, multiple ExecutionContexts as vals in the same object wouldn't make as much sense, because accessing one would initialize them all--which would use more resources than needed.
A def would not make sense, because then you would be creating a new ExecutionContext on each call, and throwing it away when done. That could cause a lot of unwanted overhead, and default the purpose of having a thread pool in the first place.
Some ExecutionContexts that are little more custom than yours are singleton objects that extend ExecutionContext and implement their own custom behavior. These would also be lazy.

Is calling a lua function(as a callback) from another thread safe enough?

Actually I am using visual C++ to try to bind lua functions as callbacks for socket events(in another thread). I initialize the lua stuff in one thread and the socket is in another thread, so every time the socket sends/receives a message, it will call the lua function and the lua function determines what it should do according to the 'tag' within the message.
So my questions are:
Since I pass the same Lua state to lua functions, is that safe? Doesn't it need some kinda protection? The lua functions are called from another thead so I guess they might be called simultaneously.
If it is not safe, what's the solution for this case?
It is not safe to call back asynchronously into a Lua state.
There are many approaches to dealing with this. The most popular involve some kind of polling.
A recent generic synchronization library is DarkSideSync
A popular Lua binding to libev is lua-ev
This SO answer recommends Lua Lanes with LuaSocket.
It is not safe to call function within one Lua state simultaneously in multiple threads.
I was dealing with the same problem, since in my application all basics such as communication are handled by C++ and all the business logic is implemented in Lua. What I do is create a pool of Lua states that are all created and initialised on an incremental basis (once there's not enough states, create one and initialise with common functions / objects). It works like this:
Once a connection thread needs to call a Lua function, it checks out an instance of Lua state, initialises specific globals (I call it a thread / connection context) in a separate (proxy) global table that prevents polluting the original global, but is indexed by the original global
Call a Lua function
Check the Lua state back in to the pool, where it is restored to the "ready" state (dispose of the proxy global table)
I think this approach would be well suited for your case as well. The pool checks each state (on an interval basis) when it was last checked out. When the time difference is big enough, it destroys the state to preserve resources and adjust the number of active states to current server load. The state that is checked out is the most recently used among the available states.
There are some things you need to consider when implementing such a pool:
Each state needs to be populated with the same variables and global functions, which increases memory consumption.
Implementing an upper limit for state count in the pool
Ensuring all the globals in each state are in a consistent state, if they happen to change (here I would recommend prepopulating only static globals, while populating dynamic ones when checking out a state)
Dynamic loading of functions. In my case there are many thousands of functions / procedures that can be called in Lua. Having them constantly loaded in all states would be a huge waste. So instead I keep them byte code compiled on the C++ side and have them loaded when needed. It turns out not to impact performance that much in my case, but your mileage may vary. One thing to keep in mind is to load them only once. Say you invoke a script that needs to call another dynamically loaded function in a loop. Then you should load the function as a local once before the loop. Doing it otherwise would be a huge performance hit.
Of course this is just one idea, but one that turned out to be best suited for me.
It's not safe, as the others mentioned
Depends on your usecase
Simplest solution is using a global lock using the lua_lock and lua_unlock macros. That would use a single Lua state, locked by a single mutex. For a low number of callbacks it might suffice, but for higher traffic it probably won't due to the overhead incurred.
Once you need better performance, the Lua state pool as mentioned by W.B. is a nice way to handle this. Trickiest part here I find synchronizing the global data across the multiple states.
DarkSideSync, mentioned by Doug, is useful in cases where the main application loop resides on the Lua side. I specifically wrote it for that purpose. In your case this doesn't seem a fit. Having said that; depending on your needs, you might consider changing your application so the main loop does reside on the Lua side. If you only handle sockets, then you can use LuaSocket and no synchronization is required at all. But obviously that depends on what else the application does.

Are Views in Domino Thread Safe?

As I understand it, if I open a view from a database using db.getView() there's no point in doing this multiple times from different threads.
But suppose I have multiple threads searching the View using getAllDocumentsByKey() Is it safe to do so and iterate over the DocumentCollections in parallel?
Also, Document.recycle() messes with the DocumentCollection, will this mess with each other if two threads search for the same value and have the same results in their collection?
Note: I'm just starting to research this in depth, but thought it'd be a good thing to have documented here, and maybe I'll get lucky and someone will have the answer.
The Domino Java API doesn't really like sharing objects across threads. If you recycle() one view in one thread, it will delete the backend JNI references for all objects that referenced that view.
So you will find your other threads are then broken.
Bob Balaban did a really good series of articles on how the Java API works and recycling. Here is a link to part of it.
http://www.bobzblog.com/tuxedoguy.nsf/dx/geek-o-terica-5-taking-out-the-garbage-java?opendocument&comments
Each thread will have its own copy of a DocumentCollection object returned by the getAllDocumentsByKey() method, so there won't be any threading issues. The recycle() method will free up memory on your object, not the Document itself, so again there wouldn't be any threading issues either.
Probably the most likely issue you'll have is if you delete a document in the collection in one thread, and then later try to access the document in another. You'll get a "document has been deleted" error. You'll have to prepare for those types of errors and handle them gracefully.

Resolving an NSManagedObject conflict with multiple threads, relationships, and pointers

I'm having a conflict when saving a bunch of NSManagedObjects via an outside thread. For starters, I can tell you the following:
I'm using a separate MOC for each thread.
The MOCs share the same persistent store coordinator.
It's likely that an outside thread is modifying one or many of the records that I'm saving.
OK, so with that out of the way, here's what I'm doing.
In my outside thread, I'm doing some computation and updating a single value in a bunch of managed objects. I do this by looking up the object in the persistent store by my primary key, modifying the single decimal property, and then calling save on the bunch all at once.
In the meantime, I believe the main thread is doing some updating of its own.
When my outside thread does its big save on its managed object context, I get an exception thrown stating a large number of conflicts. All of the conflicts seem to be centered around a single relationship on each record. Though the managed object in the persistent store and my outside thread share the same ObjectID for this relationship, they don't share the same pointer. Based on what I see, that's the only thing that's different between the objects in my NSMergeConflict debug output.
It makes sense to me why the two objects have relationships with different pointers -- they're in different threads. However, as I understand it from Apple's documentation, the only thing cached when an object is first retrieved from the persistent store are the global IDs. So, one would think that when I run save on the outside thread MOC, it compares the ObjectIDs, sees they're the same, and lets it all through.
So, can anyone tell me why I'm getting a conflict?
Per the documentation in the Concurrency with Core Data chapter of The Core Data Programming Guide, the recommended configuration is for the contexts to share the same persistent store coordinator, not just the same persistent store.
Also, the section Track Changes in Other Threads Using Notifications of the same chapter states if you're tracking updates with the NSManagedObjectContextDidSaveNotification then you send -mergeChangesFromContextDidSaveNotification to the main thread's context so it can merge the changes. But if you're tracking with NSManagedObjectContextDidChangeNotification then the external thread should send the object IDs of the modified objects to the main thread which will then send -refreshObject:mergeChanges: to its context for each modified object.
And really, you should know if the main thread is also performing updates through its controller, and propagate its changes in like manner but in the opposite direction.
You need to have all your contexts listening for NSManagedObjectContextDidSaveNotification from any context that makes changes. Otherwise, only the front context will be aware of changes made on the background threads but the background context won't be aware of changes on the front thread.
So, if you have three threads and three context each of which makes changes, all three context must register for notifications from the other two.
Unfortunately, it seems as though this bug was actually being caused by something else -- I was calling the operation causing the error more than once at the same time when I shouldn't have been. Although this doesn't answer the initial question as to why pointers matter in conflicts, updating my code to prevent this situation has resolved my issue.

Resources