Spawning a new thread for each object load - multithreading

I have a system which runs multiple service (long lived) and worker (short lived) threads. They all share a state which contains objects. Any thread can request an object an any time, through a singleton-of-sorts class called ObjectManager. If the object is not available it needs to be loaded.
Here's some pseudo-code of how object loading looks now:
class ObjectManager {
getLoadinData(path) {
if (hasLoadingDataFor(path))
return whatWeHave()
else {
loadingData = createNewLoadingData();
loadingData.path = path;
pushLoadingTaskToLoadingThread(loadingData);
return loadingData;
}
}
// loads object and blocks until it's loaded
loadObjectSync(path) {
loadingData = getLoadinData(path);
waitFor(loadingData.conditionVar);
return loadingData.loadedObject;
}
// initiates a load and calls a callback when done
loadObjectAsync(path, callback) {
loadingData = getLoadinData(path);
loadingData.callbacks.add(callback);
}
// dedicated loading thread
loadingThread() {
while (running) {
loadingData = waitForLoadingData();
object = readObjectFromDisk(loadingData.path);
object.onLoaded(); // !!!!
loadingData.object = object;
// unblock cv waiters
loadingData.conditionVar.notifyAll();
// call callbacks
loadingData.callbacks.callAll(object);
}
}
}
The problem is the line object.onLoaded. I have no control over this function. Some objects might decide that they need other objects to be valid. So in their onLoaded method they might call loadObjectSync. Uh-oh! This (naturally) dead locks. It blocks the loading loop until the loading loop makes more iterations.
What I could do to solve this is leave the onLoaded call to the initiating threads. This will change loadObjectSync to something like:
loadObjectSync(path) {
loadingData = getLoadinData(path);
waitFor(loadingData.conditionVar);
if (loadingData.wasCreatedInThisThread()) {
object.onLoaded();
loadingData.onLoadedConditionVar.notifyAll();
loadingData.callbacks.callAll(object);
}
else {
// wait more
waitFor(loadingData.onLoadedConditionVar);
}
return loadingData.loadedObject;
}
... but then the problem is that if I have no calls for loadSync and only for loadAsync or simply the loadAsync call was the first to create the loading data, there will be no one to finalize the object. So to make this work, I have to introduce another thread finalizes objects whose loadingData was created by loadObjectAsync.
It seems that it would work. But I have a simpler idea! What if I change getLoadingData instead. What if it does this:
getLoadinData(path) {
if (hasLoadingDataFor(path))
return whatWeHave()
else {
loadingData = createNewLoadingData();
loadingData.path = path;
///
thread = spawnLoadingThread(loadingData);
thread.detach();
///
return loadingData;
}
}
Spawn a new thread for every object load. Thus there is no dead lock. Every loading thread can safely block until it's done. The rest of the code remains exactly as it is.
This means potentially tens (or why not thousands in certain edge cases) active threads, waiting on condition variables. I know that spawning threads has its overhead but I think it would be negligible compared to the cost of I/O from readObjectFromDisk
So my question is: Is this terrible? Can this somehow backfire?
The target platform is conventional desktop machines. But this software is supposed to run for a long time without stopping: weeks, maybe months.
Alternatively... even though I have an idea how to solve this if the thread-per-load turns out to be terrible, can this be solved in another way?

Very interesting! This is a problem I have bumped into a couple of times, trying to add a synchronous interface to a fundamentally asynchronous operation (i.e. file load, or in my case, network write) that is performed by a service thread.
My own preference would be to not provide the synchronous interface. Why? Because it keeps the code simpler in design & implementation and easier to reason about -- always important for multi-threading.
Benefits of sticking to single thread & async only is that you only have 1 service thread, so resource growth is not a concern, plus the user callbacks are always invoked on this same thread, which simplifies thread-safety concerns for users of ObjectManager (if you have multiple callback threads, every user callback must be thread safe, so it's an important choice to make). However sticking to only an async interface does mean the user of ObjectManager has more work to do.
But if you do want to keep the synchronous interface, then another approach that I have taken could work for you. You stick to a single service thread but inside the implementation of loadObjectSync you check the thread-ID to determine if the invoker is the service thread or any-other thread. If it is any-other thread you queue the request and safely block. But if it is the service thread, you can immediately load the object, say by calling a new function loadObjectImpl. You will need to grab the thread-ID of the service thread during initialization and store it within the ObjectManager instance, and use that for thread identification. And you will need a new function which is basically just the internal scope of the loadingThread function -- i.e. a new function called something like loadObjectImpl.

Related

QT Multithreading Data Pass from Main Thread to Worker Thread

I am using multithreading in my QT program. I need to pass data to the worker object that lives in the worker thread from the main gui thread. I created a setData function in a QObject subclass to pass all the necessary data from the main gui thread. However I verified the function is called from the main thread by looking at QThread::currentThreadId() in the setData function. Even though the worker object function is called from the main thread does this ensure that the worker thread still has its own copy of the data as is required for a reentrant class? Keep in mind this is happening before the worker thread is started.
Also if basic data types are used in a class without dynamic memory and no static global variables is that class reentrant as long as all of its other member data is reentrant? (it's got reentrant data members like qstrings, qlists etc in addition the the basic ints bools etc)
Thanks for the help
Edited new content:
My main question was simply is it appropriate to call a QObject subclass method living in another thread from the main gui thread in order to pass my data to the worker thread to be worked on (in my case custom classes containing backup job information for long-pending file scans and copies for data backup). The data pass all happens before the thread is started so there's no danger of both threads modifying the data at once (I think but I'm no multithreading expert...) It sounds like the way to do this from your post is to use a signal from the main thread to a slot in the worker thread to pass the data. I have confirmed my data backup jobs are reentrant so all I need to do is assure that the worker thread works on its own instances of these classes. Also the transfer of data currently done by calling the QObject subclass method is done before the worker thread starts - does this prevent race conditions and is it safe?
Also here under the section "Accessing QObject Subclasses from Other Threads" it looks a little dangerous to use slots in the QObject subclass...
OK here's the code I've been busy recently...
Edited With Code:
void Replicator::advancedAllBackup()
{
updateStatus("<font color = \"green\">Starting All Advanced Backups</font>");
startBackup();
worker = new Worker;
worker->moveToThread(workerThread);
setupWorker(normal);
QList<BackupJob> jobList;
for (int backupCount = 0; backupCount < advancedJobs.size(); backupCount++)
jobList << advancedJobs[backupCount];
worker->setData(jobList);
workerThread->start();
}
The startBackup function sets some booleans and updates the gui.
the setupWorker function connects all signals and slots for the worker thread and worker object.
the setData function sets the worker job list data to that of the backend and is called before the thread starts so there is no concurrency.
Then we start the thread and it does its work.
And here's the worker code:
void setData(QList<BackupJob> jobs) { this->jobs = jobs; }
So my question is: is this safe?
There are some misconceptions in your question.
Reentrancy and multithreading are orthogonal concepts. Single-threaded code can be easily forced to cope with reentrancy - and is as soon as you reenter the event loop (thus you shouldn't).
The question you are asking, with correction, is thus: Are the class's methods thread-safe if the data members support multithreaded access? The answer is yes. But it's a mostly useless answer, because you're mistaken that the data types you use support such access. They most likely don't!
In fact, you're very unlikely to use multithread-safe data types unless you explicitly seek them out. POD types aren't, most of the C++ standard types aren't, most Qt types aren't either. Just so that there are no misunderstandings: a QString is not multithread-safe data type! The following code is has undefined behavior (it'll crash, burn and send an email to your spouse that appears to be from an illicit lover):
QString str{"Foo"};
for (int i = 0; i < 1000; ++i)
QtConcurrent::run([&]{ str.append("bar"); });
The follow up questions could be:
Are my data members supporting multithreaded access? I thought they did.
No, they aren't unless you show code that proves otherwise.
Do I even need to support multithreaded access?
Maybe. But it's much easier to avoid the need for it entirely.
The likely source of your confusion in relation to Qt types is their implicit sharing semantics. Thankfully, their relation to multithreading is rather simple to express:
Any instance of a Qt implicitly shared class can be accessed from any one thread at a given time. Corollary: you need one instance per thread. Copy your object, and use each copy in its own thread - that's perfectly safe. These instances may share data initially, and Qt will make sure that any copy-on-writes are done thread-safely for you.
Sidebar: If you use iterators or internal pointers to data on non-const instances, you must forcibly detach() the object before constructing the iterators/pointers. The problem with iterators is that they become invalidated when an object's data is detached, and detaching can happen in any thread where the instance is non-const - so at least one thread will end up with invalid iterators. I won't talk any more of this, the takeaway is that implicitly shared data types are tricky to implement and use safely. With C++11, there's no need for implicit sharing anymore: they were a workaround for the lack of move semantics in C++98.
What does it mean, then? It means this:
// Unsafe: str1 potentially accessed from two threads at once
QString str1{"foo"};
QtConcurrent::run([&]{ str1.apppend("bar"); });
str1.append("baz");
// Safe: each instance is accessed from one thread only
QString str1{"foo"};
QString str2{str1};
QtConcurrent::run([&]{ str1.apppend("bar"); });
str2.append("baz");
The original code can be fixed thus:
QString str{"Foo"};
for (int i = 0; i < 1000; ++i)
QtConcurrent::run([=]() mutable { str.append("bar"); });
This isn't to say that this code is very useful: the modified data is lost when the functor is destructed within the worker thread. But it serves to illustrate how to deal with Qt value types and multithreading. Here's why it works: copies of str are taken when each instance of the functor is constructed. This functor is then passed to a worker thread to execute, where its copy of the string is appended to. The copy initially shares data with the str instance in the originating thread, but QString will thread-safely duplicate the data. You could write out the functor explicitly to make it clear what happens:
QString str{"Foo"};
struct Functor {
QString str;
Functor(const QString & str) : str{str} {}
void operator()() {
str.append("bar");
}
};
for (int i = 0; i < 1000; ++i)
QtConcurrent::run(Functor(str));
How do we deal with passing data using Qt types in and out of a worker object? All communication with the object, when it is in the worker thread, must be done via signals/slots. Qt will automatically copy the data for us in a thread-safe manner so that each instance of a value is ever only accessed in one thread only. E.g.:
class ImageSource : public QObject {
QImage render() {
QImage image{...};
QPainter p{image};
...
return image;
}
public:
Q_SIGNAL newImage(const QImage & image);
void makeImage() {
QtConcurrent::run([this]{
emit newImage(render());
});
}
};
int main(int argc, char ** argv) {
QApplication app...;
ImageSource source;
QLabel label;
label.show();
connect(source, &ImageSource::newImage, &label, [&](const QImage & img){
label.setPixmap(QPixmap::fromImage(img));
});
source.makeImage();
return app.exec();
}
The connection between the source's signal and the label's thread context is automatic. The signal happens to be emitted in a worker thread in the default thread pool. At the time of signal emission, the source and target threads are compared, and if different, the functor will be wrapped in an event, the event posted the label, and the label's QObject::event will run the functor that sets the pixmap. This is all thread-safe and leverages Qt to make it almost effortless. The target thread context &label is critically important: without it, the functor would run in the worker thread, not the UI thread.
Note that we didn't even have to move the object to a worker thread: in fact, moving a QObject to a worker thread should be avoided unless the object does need to react to events and does more than merely generate a piece of data. You'd typically want to move e.g. objects that deal with communications, or complex application controllers that are abstracted from their UI. Mere generation of data can be usually done using QtConcurrent::run using a signal to abstract away the thread-safety magic of extracting the data from the worker thread to another thread.
In order to use Qt's mechanisms for passing data between threads with queues, you cannot call the object's function directly. You need to either use the signal/slot mechanism, or you can use the QMetaObject::invokeMethod call:
QMetaObject::invokeMethod(myObject, "mySlotFunction",
Qt::QueuedConnection,
Q_ARG(int, 42));
This will only work if both the sending and receiving objects have event queues running - i.e. a main or QThread based thread.
For the other part of your question, see the Qt docs section on reentrancy:
http://doc.qt.io/qt-4.8/threads-reentrancy.html#reentrant
Many Qt classes are reentrant, but they are not made thread-safe,
because making them thread-safe would incur the extra overhead of
repeatedly locking and unlocking a QMutex. For example, QString is
reentrant but not thread-safe. You can safely access different
instances of QString from multiple threads simultaneously, but you
can't safely access the same instance of QString from multiple threads
simultaneously (unless you protect the accesses yourself with a
QMutex).

Synchronously request data within JavaFX thread from different thread

I've got a separate thread which needs to request some data that may change in the meantime within the JavaFX thread. I'd like to execute a blocking invocation in this separate thread that makes sure that the request becomes enqued into the JavaFX thread.
The Swing-GUI testing framework, AssertJ, provides an easy to use API for this purpose:
List list = GuiActionRunner.execute(new GuiQuery<...>...);
The invocation blocks the current thread, executes the passed code within event dispatching thread and returns the required data.
How can this be implemented in production code for JavaFX applications? What would be the recommended approach for this requirement?
Here's an alternative solution, using a FutureTask. This avoids the explicit latch and managing the synchronized data in an AtomicReference. The code here is probably simple enough that it would make including this functionality inPlatform redundant.
FutureTask<List<?>> task = new FutureTask<>( () -> {
List<?> data = ... ; // access data
return data ;
});
Platform.runLater(task);
List<?> data = task.get();
This technique is very useful if you want to pause a background thread to await user input.
Ok I think I got it now. You need to implement something like this yourself:
AtomicReference<List<?>> r = new AtomicReference<>();
CountDownLatch l = new CountDownLatch(1);
Platform.runLater( () -> {
// access data
r.set(...)
l.countDown();
})
l.await();
System.err.println(r.get());

Passing a `Disposable` object safely to the UI thread with TPL

We recently adopted the TPL as the toolkit for running some heavy background tasks.
These tasks typically produce a single object that implements IDisposable. This is because it has some OS handles internally.
What I want to happen is that the object produced by the background thread will be properly disposed at all times, also when the handover coincides with application shutdown.
After some thinking, I wrote this:
private void RunOnUiThread(Object data, Action<Object> action)
{
var t = Task.Factory.StartNew(action, data, CancellationToken.None, TaskCreationOptions.None, _uiThreadScheduler);
t.ContinueWith(delegate(Task task)
{
if (!task.IsCompleted)
{
DisposableObject.DisposeObject(task.AsyncState);
}
});
}
The background Task calls RunOnUiThread to pass its result to the UI thread. The task t is scheduled on the UI thread, and takes ownership of the data passed in. I was expecting that if t could not be executed because the ui thread's message pump was shut down, the continuation would run, and I could see that that the task had failed, and dispose the object myself. DisposeObject() is a helper that checks if the object is actually IDisposable, and non-null, prior to disposing it.
Sadly, it does not work. If I close the application after the background task t is created, the continuation is not executed.
I solved this problem before. At that time I was using the Threadpool and the WPF Dispatcher to post messages on the UI thread. It wasn't very pretty, but in the end it worked. I was hoping that the TPL was better at this scenario. It would even be better if I could somehow teach the TPL that it should Dispose all leftover AsyncState objects if they implement IDisposable.
So, the code is mainly to illustrate the problem. I want to learn about any solution that allows me to safely handover Disposable objects to the UI thread from background Tasks, and preferably one with as little code as possible.
When a process closes, all of it's kernel handles are automatically closed. You shouldn't need to worry about this:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686722(v=vs.85).aspx
Have a look at the RX library. This may allow you to do what you want.
From MSDN:
IsCompleted will return true when the Task is in one of the three
final states: RanToCompletion, Faulted, or Canceled
In other words, your DisposableObject.DisposeObject will never be called, because the continuation will always be scheduled after one of the above conditions has taken place. I believe what you meant to do was :
t.ContinueWith(t => DisposableObject.DisposeObject(task.AsyncState),
TaskContinuationOptions.NotOnRanToCompletion)
(BTW you could have simply captured the data variable rather than using the AsyncState property)
However I wouldn't use a continuation for something that you want to ensure happens at all times. I believe a try-finally block will be more fitting here:
private void RunOnUiThread2(Object data, Action<Object> action)
{
var t = Task.Factory.StartNew(() =>
{
try
{
action(data);
}
finally
{
DisposableObject.DisposeObject(task.AsyncState);
//Or use a new *foreground* thread if the disposing is heavy
}
}, CancellationToken.None, TaskCreationOptions.None, _uiThreadScheduler);
}

Locking on an object?

I'm very new to Node.js and I'm sure there's an easy answer to this, I just can't find it :(
I'm using the filesystem to hold 'packages' (folders with a status extensions 'mypackage.idle') Users can perform actions on these which would cause the status to go to something like 'qa', or 'deploying' etc... If the server is accepting lots of requests and multiple requests come in for the same package how would I check the status and then perform an action, which would change the status, guaranteeing that another request didn't alter it before/during the action took place?
so in c# something like this
lock (someLock) { checkStatus(); performAction(); }
Thanks :)
If checkStatus() and performAction() are synchronous functions called one after another, then as others mentioned earlier: their exectution will run uninterupted till completion.
However, I suspect that in reality both of these functions are asynchoronous, and the realistic case of composing them is something like:
function checkStatus(callback){
doSomeIOStuff(function(something){
callback(something == ok);
});
}
checkStatus(function(status){
if(status == true){
performAction();
}
});
The above code is subject to race conditions, as when doSomeIOStuff is being perfomed instead of waiting for it new request can be served.
You may want to check https://www.npmjs.com/package/rwlock library.
This is a bit misleading. There are many script languages that are suppose to be single threaded, but when sharing data from the same source this creates a problem. NodeJs might be single threaded when you are running a single request, but when you have multiple requests trying to access the same data, it just behaves as it creates kind of the same problem as if you were running a multithreaded language.
There is already an answer about this here : Locking on an object?
WATCH sentinel_key
GET value_of_interest
if (value_of_interest = FULL)
MULTI
SET sentinel_key = foo
EXEC
if (EXEC returned 1, i.e. succeeded)
do_something();
else
do_nothing();
else
UNWATCH
One thing you can do is lock on an external object, for instance, a sequence in a database such as Oracle or Redis.
http://redis.io/commands
For example, I am using cluster with node.js (I have 4 cores) and I have a node.js function and each time I run through it, I increment a variable. I basically need to lock on that variable so no two threads use the same value of that variable.
check this out How to create a distributed lock with Redis?
and this https://engineering.gosquared.com/distributed-locks-using-redis
I think you can run with this idea if you know what you are doing.
If you are making asynchronous calls with callbacks, this means multiple clients could potentially make the same, or related requests, and receive responses in different orders. This is definitely a case where locking is useful. You won't be 'locking a thread' in the traditional sense, but merely ensuring asynchronous calls, and their callbacks are made in a predictable order. The async-lock package looks like it handles this scenario.
https://www.npmjs.com/package/async-lock
warning, node.js change semantic if you add a log entry beucause logging is IO bound.
if you change from
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
qa_action_performed = true
perform_action()
}
}
to
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
console.log("my log stuff");
qa_action_performed = true
perform_action()
}
}
more than one thread can execute perform_action().
You don't have to worry about synchronization with Node.js since it's single threaded with an event loop. This is one of the advantage of the architecture that Node.js use.
Nothing will be executed between checkStatus() and performAction().
There are no locks in node.js -- because you shouldn't need them. There's only one thread (the event loop) and your code is never interrupted unless you perform an asynchronous action like I/O. Hence your code should never block. You can't do any parallel code execution.
That said, your code could look something like this:
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
qa_action_performed = true
perform_action()
}
}
Between check_status() and perform_action() no other thread can interrupt because there is no I/O. As soon as you enter the if clause and set qa_action_performed = true, no other code will enter the if block and hence perform_action() is never executed twice, even if perform_action() takes time performing I/O.

Qt objects - am I overusing QMutexLocker?

I have a Qt object that's used by a GUI thread and a networking thread. It looks like:
QString User::Username()
{
QMutexLocker locker(&mutex);
return username;
}
void User::SetUsername(const QString &newUsername)
{
QMutexLocker locker(&mutex);
username = newUsername;
}
QString User::Password()
{
QMutexLocker locker(&mutex);
return password;
}
...
Both the GUI and networking thread may use the object (e.g. to display the username on the screen, and to get the username to send across the network).
I'm worried something is wrong, as every method in the object has a QMutexLocker line, to make it thread safe.
Is it acceptable to use QMutexLocker in this way, or is the code structured badly?
You should be using QReadWriteLock and QReadLocker or QWriteLocker respectively. So no threads will be locked if there are only reading threads.
If there are some fields of the class which are accessed changed very frequently, and which dont change any other state of the class, you might want to give it its own dedicated lock.
I think you may be going about things the wrong way. Serializing each method call will "sort of" work, but it won't reliably handle operations like adding or removing a User object. For example, if your main thread deletes the User object, it won't matter that the network thread is carefully locking a mutex, because after the mutex-lock operation returns, the network thread will then try to access the (now deleted) User object, and trying to read OR write freed memory will cause your app to crash (or worse, just mysteriously do the wrong thing sometimes).
Here's a better way to do it (assuming that the User objects are reasonably small): Instead of having the network thread and the I/O thread share the same User object, and trying to serialize all accesses to the object at the method level, you'd be better off giving a separate copy of each User object to the I/O thread. Then when one thread changes its local copy of the User object, it should send a message to the other thread containing a copy of the updated object, and when the other thread receives the message it can update its local copy to match again. That way each thread has exclusive read/write access to its own local set of User objects, and can read/write them without any locking. This also allows each thread to add or remove objects at will (as long as it sends an update-message to the other thread afterwards, so the other thread will follow suit).
I think a better and cleaner way would be to have a "safe section"
updateUser( User ) {
User.acquireLock()
User.SetUsername(newUsername)
User.Password()
< more operations here >
User.releaseLock()
}
The advantages of this is that you are locking only once the mutex( that is an expensive operation).

Resources