Let's have a worker thread which is accessed from a wide variety of objects. This worker object has some public slots, so anyone who connects its signals to the worker's slots can use emit to trigger the worker thread's useful tasks.
This worker thread needs to be almost global, in the sense that several different classes use it, some of them are deep in the hierarchy (child of a child of a child of the main application).
I guess there are two major ways of doing this:
All the methods of the child classes pass their messages upwards the hierarchy via their return values, and let the main (e.g. the GUI) object handle all the emitting.
All those classes which require the services of the worker thread have a pointer to the Worker object (which is a member of the main class), and they all connect() to it in their constructors. Every such class then does the emitting by itself. Basically, dependency injection.
Option 2. seems much more clean and flexible to me, I'm only worried that it will create a huge number of connections. For example, if I have an array of an object which needs the thread, I will have a separate connection for each element of the array.
Is there an "official" way of doing this, as the creators of Qt intended it?
There is no magic silver bullet for this. You'll need to consider many factors, such as:
Why do those objects emit the data in the first place? Is it because they need to do something, that is, emission is a “command”? Then maybe they could call some sort of service to do the job without even worrying about whether it's going to happen in another thread or not. Or is it because they inform about an event? In such case they probably should just emit signals but not connect them. Its up to the using code to decide what to do with events.
How many objects are we talking about? Some performance tests are needed. Maybe it's not even an issue.
If there is an array of objects, what purpose does it serve? Perhaps instead of using a plain array some sort of “container” class is needed? Then the container could handle the emission and connection and objects could just do something like container()->handle(data). Then you'd only have one connection per container.
I'm having a go with greenDAO and so far it's going pretty well. One thing that doesn't seem to be covered by the docs or website (or anywhere :( ) is how it handles thread safety.
I know the basics mentioned elsewhere, like "use a single dao session" (general practice for Android + SQLite), and I understand the Java memory model quite well. The library internals even appear threadsafe, or at least built with that intention. But nothing I've seen covers this:
greenDAO caches entities by default. This is excellent for a completely single-threaded program - transparent and a massive performance boost for most uses. But if I e.g. loadAll() and then modify one of the elements, I'm modifying the same object globally across my app. If I'm using it on the main thread (e.g. for display), and updating the DB on a background thread (as is right and proper), there are obvious threading problems unless extra care is taken.
Does greenDAO do anything "under the hood" to protect against common application-level threading problems? For example, modifying a cached entity in the UI thread while saving it in a background thread (better hope they don't interleave! especially when modifying a list!)? Are there any "best practices" to protect against them, beyond general thread safety concerns (i.e. something that greenDAO expects and works well with)? Or is the whole cache fatally flawed from a multithreaded-application safety standpoint?
I've no experience with greenDAO but the documentation here:
http://greendao-orm.com/documentation/queries/
Says:
If you use queries in multiple threads, you must call forCurrentThread() on the query to get a Query instance for the current thread. Starting with greenDAO 1.3, object instances of Query are bound to their owning thread that build the query. This lets you safely set parameters on the Query object while other threads cannot interfere. If other threads try to set parameters on the query or execute the query bound to another thread, an exception will be thrown. Like this, you don’t need a synchronized statement. In fact you should avoid locking because this may lead to deadlocks if concurrent transactions use the same Query object.
To avoid those potential deadlocks completely, greenDAO 1.3 introduced the method forCurrentThread(). This will return a thread-local instance of the Query, which is safe to use in the current thread. Every time, forCurrentThread() is called, the parameters are set to the initial parameters at the time the query was built using its builder.
While so far as I can see the documentation doesn't explicitly say anything about multi threading other than this this seems pretty clear that it is handled. This is talking about multiple threads using the same Query object, so clearly multiple threads can access the same database. Certainly it's normal for databases and DAO to handle concurrent access and there are a lot of proven techniques for working with caches in this situation.
By default GreenDAO caches and returns cached entity instances to improve performance. To prevent this behaviour, you need to call:
daoSession.clear()
to clear all cached instances. Alternatively you can call:
objectDao.detachAll()
to clear cached instances only for the specific DAO object.
You will need to call these methods every time you want to clear the cached instances, so if you want to disable all caching, I recommend calling them in your Session or DAO accessor methods.
Documentation:
http://greenrobot.org/greendao/documentation/sessions/#Clear_the_identity_scope
Discussion: https://github.com/greenrobot/greenDAO/issues/776
Wanting to be sure we're using the correct synchronization (and no more than necessary) when writing threadsafe code in JRuby; specifically, in a Puma instantiated Rails app.
UPDATE: Extensively re-edited this question, to be very clear and use latest code we are implementing. This code uses the atomic gem written by #headius (Charles Nutter) for JRuby, but not sure it is totally necessary, or in which ways it's necessary, for what we're trying to do here.
Here's what we've got, is this overkill (meaning, are we over/uber-engineering this), or perhaps incorrect?
ourgem.rb:
require 'atomic' # gem from #headius
SUPPORTED_SERVICES = %w(serviceABC anotherSvc andSoOnSvc).freeze
module Foo
def self.included(cls)
cls.extend(ClassMethods)
cls.send :__setup
end
module ClassMethods
def get(service_name, method_name, *args)
__cached_client(service_name).send(method_name.to_sym, *args)
# we also capture exceptions here, but leaving those out for brevity
end
private
def __client(service_name)
# obtain and return a client handle for the given service_name
# we definitely want to cache the value returned from this method
# **AND**
# it is a requirement that this method ONLY be called *once PER service_name*.
end
def __cached_client(service_name)
##_clients.value[service_name]
end
def __setup
##_clients = Atomic.new({})
##_clients.update do |current_service|
SUPPORTED_SERVICES.inject(Atomic.new({}).value) do |memo, service_name|
if current_services[service_name]
current_services[service_name]
else
memo.merge({service_name => __client(service_name)})
end
end
end
end
end
end
client.rb:
require 'ourgem'
class GetStuffFromServiceABC
include Foo
def self.get_some_stuff
result = get('serviceABC', 'method_bar', 'arg1', 'arg2', 'arg3')
puts result
end
end
Summary of the above: we have ##_clients (a mutable class variable holding a Hash of clients) which we only want to populate ONCE for all available services, which are keyed on service_name.
Since the hash is in a class variable (and hence threadsafe?), are we guaranteed that the call to __client will not get run more than once per service name (even if Puma is instantiating multiple threads with this class to service all the requests from different users)? If the class variable is threadsafe (in that way), then perhaps the Atomic.new({}) is unnecessary?
Also, should we be using an Atomic.new(ThreadSafe::Hash) instead? Or again, is that not necessary?
If not (meaning: you think we do need the Atomic.news at least, and perhaps also the ThreadSafe::Hash), then why couldn't a second (or third, etc.) thread interrupt between the Atomic.new(nil) and the ##_clients.update do ... meaning the Atomic.news from EACH thread will EACH create two (separate) objects?
Thanks for any thread-safety advice, we don't see any questions on SO that directly address this issue.
Just a friendly piece of advice, before I attempt to tackle the issues you raise here:
This question, and the accompanying code, strongly suggests that you don't (yet) have a solid grasp of the issues involved in writing multi-threaded code. I encourage you to think twice before deciding to write a multi-threaded app for production use. Why do you actually want to use Puma? Is it for performance? Will your app handle many long-running, I/O-bound requests (like uploading/downloading large files) at the same time? Or (like many apps) will it primarily handle short, CPU-bound requests?
If the answer is "short/CPU-bound", then you have little to gain from using Puma. Multiple single-threaded server processes would be better. Memory consumption will be higher, but you will keep your sanity. Writing correct multi-threaded code is devilishly hard, and even experts make mistakes. If your business success, job security, etc. depends on that multi-threaded code working and working right, you are going to cause yourself a lot of unnecessary pain and mental anguish.
That aside, let me try to unravel some of the issues raised in your question. There is so much to say that it's hard to know where to start. You may want to pour yourself a cold or hot beverage of your choice before sitting down to read this treatise:
When you talk about writing "thread-safe" code, you need to be clear about what you mean. In most cases, "thread-safe" code means code which doesn't concurrently modify mutable data in a way which could cause data corruption. (What a mouthful!) That could mean that the code doesn't allow concurrent modification of mutable data at all (using locks), or that it does allow concurrent modification, but makes sure that it doesn't corrupt data (probably using atomic operations and a touch of black magic).
Note that when your threads are only reading data, not modifying it, or when working with shared stateless objects, there is no question of "thread safety".
Another definition of "thread-safe", which probably applies better to your situation, has to do with operations which affect the outside world (basically I/O). You may want some operations to only happen once, or to happen in a specific order. If the code which performs those operations runs on multiple threads, they could happen more times than desired, or in a different order than desired, unless you do something to prevent that.
It appears that your __setup method is only called when ourgem.rb is first loaded. As far as I know, even if multiple threads require the same file at the same time, MRI will only ever let a single thread load the file. I don't know whether JRuby is the same. But in any case, if your source files are being loaded more than once, that is symptomatic of a deeper problem. They should only be loaded once, on a single thread. If your app handles requests on multiple threads, those threads should be started up after the application has loaded, not before. This is the only sane way to do things.
Assuming that everything is sane, ourgem.rb will be loaded using a single thread. That means __setup will only ever be called by a single thread. In that case, there is no question of thread safety at all to worry about (as far as initialization of your "client cache" goes).
Even if __setup was to be called concurrently by multiple threads, your atomic code won't do what you think it does. First of all, you use Atomic.new({}).value. This wraps a Hash in an atomic reference, then unwraps it so you just get back the Hash. It's a no-op. You could just write {} instead.
Second, your Atomic#update call will not prevent the initialization code from running more than once. To understand this, you need to know what Atomic actually does.
Let me pull out the old, tired "increment a shared counter" example. Imagine the following code is running on 2 threads:
i += 1
We all know what can go wrong here. You may end up with the following sequence of events:
Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A writes its incremented value back to i.
Thread B writes its incremented value back to i.
So we lose an update, right? But what if we store the counter value in an atomic reference, and use Atomic#update? Then it would be like this:
Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A tries to write its incremented value back to i, and succeeds.
Thread B tries to write its incremented value back to i, and fails, because the value has already changed.
Thread B reads i again and increments it.
Thread B tries to write its incremented value back to i again, and succeeds this time.
Do you get the idea? Atomic never stops 2 threads from running the same code at the same time. What it does do, is force some threads to retry the #update block when necessary, to avoid lost updates.
If your goal is to ensure that your initialization code will only ever run once, using Atomic is a very inappropriate choice. If anything, it could make it run more times, rather than less (due to retries).
So, that is that. But if you're still with me here, I am actually more concerned about whether your "client" objects are themselves thread-safe. Do they have any mutable state? Since you are caching them, it seems that initializing them must be slow. Be that as it may, if you use locks to make them thread-safe, you may not be gaining anything from caching and sharing them between threads. Your "multi-threaded" server may be reduced to what is effectively an unnecessarily complicated, single-threaded server.
If the client objects have no mutable state, good for you. You can be "free and easy" and share them between threads with no problems. If they do have mutable state, but initializing them is slow, then I would recommend caching one object per thread, so they are never shared. Thread[] is your friend there.
In my play application I have several Jobs and I have a singleton class.
What I would like to do is for each Job to store data in the singleton class and I would like to be able to retrieve from this singleton class the data that corresponds to the current Job via yet another class.
In other words I would like to have something like this:
-Job 1 stores "Job1Data" in the singleton class
-Job 2 stores "Job2Data" in the singleton class
-Another class asks the singleton class data for the currently executing job (in the current thread I guess) and use it
To perform this I assumed each Job is run on a different thread. Then what I did is that data from each Job stored in the singleton class is stored in a Map that maps the current thread id with the data.
However I'm not sure this is the way I should do it because it may not be thread safe (although Hashtable is said to be thread-safe) and maybe another thread is created each time the Job is executed which would make my Map grow a lot and never clear it-self.
I thought of another way to do what I want. Maybe I could use the ThreadLocal class in my singleton to be sure it's thread-safe and that I store thread-specific data. However I don't know if it will work well if another thread is used each time a Job is executing. Furthermore, I read somewhere that ThreadLocal creates memory leaks if the data is not remove, and the problem is that I don't know when I can remove the data.
So, would anybody have a solution for my issue ? I would like to be sure data I would like to store during Job execution is stored in a global class and can be accessed by another class (with an access to the data of the correct Job, thus the correct thread I guess).
Thank you for your help
How can you use Core Data and GCD when the methods that get called within the background thread need many different NSManagedObjects? You as the caller might not have insight which objects will be needed in the sub-calls?
Think of a complex download, parsing and saving procedure with many managed objects, helper methods for dates, statuses and so on. When you start your background thread with GCD, a new NSManagedObjectContext will be needed, that's for sure. But you are not able to tell what managed objects will be needed by every sub-method. So, do you need to pass the context to every single helper method, e.g. for just telling a NSDate difference?
Is there an easy approach that doesn't blow up the lines of code?
One approach that could fit the bill of not blowing up the code could be to receive your data in your various background threads and code it all into dictionaries. If you use JSON as the transfer format, that would be very few lines of code.
Then you could pass it all to a block on the main thread to create the managed object and insert it into the managed object context. Again, that would not carry much overhead, compared to a single-threaded solution.