Multithreaded Game Loop Rendering/Updating (boost-asio) - multithreading

So I have a single-threaded game engine class, which has separate functions for input, update and rendering, and I've just started learning to use the wonderful boost library (asio and thread components). And I was thinking of separating my update and render functions into separate threads (and perhaps separate the input and update functions from each other as well). Of course these functions will sometimes access the same locations in memory, so I decided to use boost/thread's strand functionality to prevent them from executing at the same time.
Right now my main game loop looks like this:
void SDLEngine::Start()
{
int update_time=0;
quit=false;
while(!quit)
{
update_time=SDL_GetTicks();
DoInput();//get user input and alter data based on it
DoUpdate();//update game data once per loop
if(!minimized)
DoRender();//render graphics to screen
update_time=SDL_GetTicks()-update_time;
SDL_Delay(max(0,target_time-update_time));//insert delay to run at desired FPS
}
}
If I used separate threads it would look something like this:
void SDLEngine::Start()
{
boost::asio::io_service io;
boost::asio::strand strand_;
boost::asio::deadline_timer input(io,boost::posix_time::milliseconds(16));
boost::asio::deadline_timer update(io,boost::posix_time::milliseconds(16));
boost::asio::deadline_timer render(io,boost::posix_time::milliseconds(16));
//
input.async_wait(strand_.wrap(boost::bind(&SDLEngine::DoInput,this)));
update.async_wait(strand_.wrap(boost::bind(&SDLEngine::DoUpdate,this)));
render.async_wait(strand_.wrap(boost::bind(&SDLEngine::DoRender,this)));
//
io.run();
}
So as you can see, before the loop went: Input->Update->Render->Delay->Repeat
Each one was run one after the other. If I used multithreading I would have to use strands so that updates and rendering wouldn't be run at the same time. So, is it still worth it to use multithreading here? They would still basically be running one at a time in separate cores. I basically have no experience in multithreaded applications so any help is appreciated.
Oh, and another thing: I'm using OpenGL for rendering. Would multithreading like this affect the way OpenGL renders in any way?

You are using same strand for all handlers, so there is no multithreading at all. Also, your deadline_timer is in scope of Start() and you do not pass it anywhere. In this case you will not able to restart it from the handler (note its not "interval" timer, its just a "one-call timer").
I see no point in this "revamp" since you are not getting any benefit from asio and/or threads at all in this example.
These methods (input, update, render) are too big and they do many things, you cannot call them without blocking. Its hard to say precisely because i dont know whats the game and how it works, but I'd prefer to do following steps:
Try to revamp network i/o so its become fully async
Try to use all CPU cores
About what you have tried: i think its possible if you search your code for actions that really can run in parallel right now. For example: if you calculate for each NPC something that is not depending on other characters you can io_service.post() each to make use all threads that running io_service.run() at the moment. So your program stay singlethreaded, but you can use, say, 7 other threads on some "big" operations

Related

How does NodeJS handle multi-core concurrency?

Currently I am working on a database that is updated by another java application, but need a NodeJS application to provide Restful API for website use. To maximize the performance of NodeJS application, it is clustered and running in a multi-core processor.
However, from my understanding, a clustered NodeJS application has a their own event loop on each CPU core, if so, does that mean, with cluster architect, NodeJS will have to face traditional concurrency issues like in other multi-threading architect, for example, writing to same object which is not writing protected? Or even worse, since it is multi-process running at same time, not threads within a process blocked by another...
I have been searching Internet, but seems nobody cares that at all. Can anyone explain the cluster architect of NodeJS? Thanks very much
Add on:
Just to clarify, I am using express, it is not like running multiple instances on different ports, it is actually listening on the same port, but has one process on each CPUs competing to handle requests...
the typical problem I am wondering now is: a request to update Object A base on given Object B(not finish), another request to update Object A again with given Object C (finish before first request)...then the result would base on Object B rather than C, because first request actually finishes after the second one.
This will not be problem in real single-threaded application, because second one will always be executed after first request...
The core of your question is:
NodeJS will have to face traditional concurrency issues like in other multi-threading architect, for example, writing to same object which is not writing protected?
The answer is that that scenario is usually not possible because node.js processes don't share memory. ObjectA, ObjectB and ObjectC in process A are different from ObjectA, ObjectB and ObjectC in process B. And since each process are single-threaded contention cannot happen. This is the main reason you find that there are no semaphore or mutex modules shipped with node.js. Also, there are no threading modules shipped with node.js
This also explains why "nobody cares". Because they assume it can't happen.
The problem with node.js clusters is one of caching. Because ObjectA in process A and ObjectA in process B are completely different objects, they will have completely different data. The traditional solution to this is of course not to store dynamic state in your application but to store them in the database instead (or memcache). It's also possible to implement your own cache/data synchronization scheme in your code if you want. That's how database clusters work after all.
Of course node, being a program written in C, can be easily extended in C and there are modules on npm that implement threads, mutex and shared memory. If you deliberately choose to go against node.js/javascript design philosophy then it is your responsibility to ensure nothing goes wrong.
Additional answer:
a request to update Object A base on given Object B(not finish), another request to update Object A again with given Object C (finish before first request)...then the result would base on Object B rather than C, because first request actually finishes after the second one.
This will not be problem in real single-threaded application, because second one will always be executed after first request...
First of all, let me clear up a misconception you're having. That this is not a problem for a real single-threaded application. Here's a single-threaded application in pseudocode:
function main () {
timeout = FOREVER
readFd = []
writeFd = []
databaseSock1 = socket(DATABASE_IP,DATABASE_PORT)
send(databaseSock1,UPDATE_OBJECT_B)
databaseSock2 = socket(DATABASE_IP,DATABASE_PORT)
send(databaseSock2,UPDATE_OPJECT_C)
push(readFd,databaseSock1)
push(readFd,databaseSock2)
while(1) {
event = select(readFD,writeFD,timeout)
if (event) {
for (i=0; i<length(readFD); i++) {
if (readable(readFD[i]) {
data = read(readFD[i])
if (data == OBJECT_B_UPDATED) {
update(objectA,objectB)
}
if (data == OBJECT_C_UPDATED) {
update(objectA,objectC)
}
}
}
}
}
}
As you can see, there's no threads in the program above, just asynchronous I/O using the select system call. The program above can easily be translated directly into single-threaded C or Java etc. (indeed, something similar to it is at the core of the javascript event loop).
However, if the response to UPDATE_OBJECT_C arrives before the response to UPDATE_OBJECT_B the final state would be that objectA is updated based on the value of objectB instead of objectC.
No asynchronous single-threaded program is immune to this in any language and node.js is no exception.
Note however that you don't end up in a corrupted state (though you do end up in an unexpected state). Multithreaded programs are worse off because without locks/semaphores/mutexes the call to update(objectA,objectB) can be interrupted by the call to update(objectA,objectC) and objectA will be corrupted. This is what you don't have to worry about in single-threaded apps and you won't have to worry about it in node.js.
If you need strict temporally sequential updates you still need to either wait for the first update to finish, flag the first update as invalid or generate error for the second update. Typically for web apps (like stackoverflow) an error would be returned (for example if you try to submit a comment while someone else have already updated the comments).

How do I Yield() to another thread in a Win8 C++/Xaml app?

Note: I'm using C++, not C#.
I have a bit of code that does some computation, and several bits of code that use the result. The bits that use the result are already in tasks, but the original computation is not -- it's actually in the callstack of the main thread's App::App() initialization.
Back in the olden days, I'd use:
while (!computationIsFinished())
std::this_thread::yield(); // or the like, depending on API
Yet this doesn't seem to exist for Windows Store apps (aka WinRT, pka Metro-style). I can't use a continuation because the bits that use the results are unconnected to where the original computation takes place -- in addition to that computation not being a task anyway.
Searching found Concurrency::Context::Yield(), but Context appears not to exist for Windows Store apps.
So... say I'm in a task on the background thread. How do I yield? Especially, how do I yield in a while loop?
First of all, doing expensive computations in a constructor is not usually a good idea. Even less so when it's the "App" class. Also, doing heavy work in the main (ASTA) thread is pretty much forbidden in the WinRT model.
You can use concurrency::task_completion_event<T> to interface code that isn't task-oriented with other pieces of dependent work.
E.g. in the long serial piece of code:
...
task_completion_event<ComputationResult> tce;
task<ComputationResult> computationTask(tce);
// This task is now tied to the completion event.
// Pass it along to interested parties.
try
{
auto result = DoExpensiveComputations();
// Successfully complete the task.
tce.set(result);
}
catch(...)
{
// On failure, propagate the exception to continuations.
tce.set_exception(std::current_exception());
}
...
Should work well, but again, I recommend breaking out the computation into a task of its own, and would probably start by not doing it during construction... surely an anti-pattern for a responsive UI. :)
Qt simply uses Sleep(0) in their WinRT yield implementation.

QThread execution freezes my GUI

I'm new to multithread programming. I wrote this simple multi thread program with Qt. But when I run this program it freezes my GUI and when I click inside my widow, it responds that your program is not responding .
Here is my widget class. My thread starts to count an integer number and emits it when this number is dividable by 1000. In my widget simply I catch this number with signal-slot mechanism and show it in a label and a progress bar.
Widget::Widget(QWidget *parent) :
QWidget(parent),
ui(new Ui::Widget)
{
ui->setupUi(this);
MyThread *th = new MyThread;
connect( th, SIGNAL(num(int)), this, SLOT(setNum(int)));
th->start();
}
void Widget::setNum(int n)
{
ui->label->setNum( n);
ui->progressBar->setValue(n%101);
}
and here is my thread run() function :
void MyThread::run()
{
for( int i = 0; i < 10000000; i++){
if( i % 1000 == 0)
emit num(i);
}
}
thanks!
The problem is with your thread code producing an event storm. The loop counts very fast -- so fast, that the fact that you emit a signal every 1000 iterations is pretty much immaterial. On modern CPUs, doing a 1000 integer divisions takes on the order of 10 microseconds IIRC. If the loop was the only limiting factor, you'd be emitting signals at a peak rate of about 100,000 per second. This is not the case because the performance is limited by other factors, which we shall discuss below.
Let's understand what happens when you emit signals in a different thread from where the receiver QObject lives. The signals are packaged in a QMetaCallEvent and posted to the event queue of the receiving thread. An event loop running in the receiving thread -- here, the GUI thread -- acts on those events using an instance of QAbstractEventDispatcher. Each QMetaCallEvent results in a call to the connected slot.
The access to the event queue of the receiving GUI thread is serialized by a QMutex. On Qt 4.8 and newer, the QMutex implementation got a nice speedup, so the fact that each signal emission results in locking of the queue mutex is not likely to be a problem. Alas, the events need to be allocated on the heap in the worker thread, and then deallocated in the GUI thread. Many heap allocators perform quite poorly when this happens in quick succession if the threads happen to execute on different cores.
The biggest problem comes in the GUI thread. There seems to be a bunch of hidden O(n^2) complexity algorithms! The event loop has to process 10,000 events. Those events will be most likely delivered very quickly and end up in a contiguous block in the event queue. The event loop will have to deal with all of them before it can process further events. A lot of expensive operations happen when you invoke your slot. Not only is the QMetaCallEvent deallocated from the heap, but the label schedules an update() (repaint), and this internally posts a compressible event to the event queue. Compressible event posting has to, in worst case, iterate over entire event queue. That's one potential O(n^2) complexity action. Another such action, probably more important in practice, is the progressbar's setValue internally calling QApplication::processEvents(). This can, recursively call your slot to deliver the subsequent signal from the event queue. You're doing way more work than you think you are, and this locks up the GUI thread.
Instrument your slot and see if it's called recursively. A quick-and-dirty way of doing it is
void Widget::setNum(int n)
{
static int level = 0, maxLevel = 0;
level ++;
maxLevel = qMax(level, maxLevel);
ui->label->setNum( n);
ui->progressBar->setValue(n%101);
if (level > 1 && level == maxLevel-1) {
qDebug("setNum recursed up to level %d", maxLevel);
}
level --;
}
What is freezing your GUI thread is not QThread's execution, but the huge amount of work you make the GUI thread do. Even if your code looks innocuous.
Side Note on processEvents and Run-to-Completion Code
I think it was a very bad idea to have QProgressBar::setValue invoke processEvents(). It only encourages the broken way people code things (continuously running code instead of short run-to-completion code). Since the processEvents() call can recurse into the caller, setValue becomes a persona-non-grata, and possibly quite dangerous.
If one wants to code in continuous style yet keep the run-to-completion semantics, there are ways of dealing with that in C++. One is just by leveraging the preprocessor, for example code see my other answer.
Another way is to use expression templates to get the C++ compiler to generate the code you want. You may want to leverage a template library here -- Boost spirit has a decent starting point of an implementation that can be reused even though you're not writing a parser.
The Windows Workflow Foundation also tackles the problem of how to write sequential style code yet have it run as short run-to-completion fragments. They resort to specifying the flow of control in XML. There's apparently no direct way of reusing standard C# syntax. They only provide it as a data structure, a-la JSON. It'd be simple enough to implement both XML and code-based WF in Qt, if one wanted to. All that in spite of .NET and C# providing ample support for programmatic generation of code...
The way you implemented your thread, it does not have its own event loop (because it does not call exec()). I'm not sure if your code within run() is actually executed within your thread or within the GUI thread.
Usually you should not subclass QThread. You probably did so because you read the Qt Documentation which unfortunately still recommends subclassing QThread - even though the developers long ago wrote a blog entry stating that you should not subclass QThread. Unfortunately, they still haven't updated the documentation appropriately.
I recommend reading "You're doing it wrong" on Qt Blog and then use the answer by "Kari" as an example of how to set up a basic multi-threaded system.
But when I run this program it freezes my GUI and when I click inside my window,
it responds that your program is not responding.
Yes because IMO you're doing too much work in thread that it exhausts CPU. Generally program is not responding message pops up when process show no progress in handling application event queue requests. In your case this happens.
So in this case you should find a way to divide the work. Just for the sake of example say, thread runs in chunks of 100 and repeat the thread till it completes 10000000.
Also you should have look at QCoreApplication::processEvents() when you're performing a lengthy operation.

QPointer in multi-threaded programs

According to http://doc.qt.io/qt-5/qpointer.html, QPointer is very useful. But I found it could be inefficient in the following context:
If I want to show label for three times or do something else, I have to use
if(label) label->show1();
if(label) label->show2();
if(label) label->show3();
instead of
if(label) { label->show1();label->show2();label->show3(); }
just because label might be destroyed in another thread after label->show1(); or label->show2();.
Is there a beautiful way other than three ifs to get the same functionality?
Another question is, when label is destroyed after if(label), is if(label) label->show1(); still wrong?
I don't have experience in multi-threaded programs. Any help is appreciated. ;)
I think the only safe way to do it is to make sure you only access your QWidgets from within the main/GUI thread (that is, the thread that is running Qt's event loop, inside QApplication::exec()).
If you have code that is running within a different thread, and that code wants the QLabels to be shown/hidden/whatever, then that code needs to create a QEvent object (or a subclass thereof) and call qApp->postEvent() to send that object to the main thread. Then when the Qt event loop picks up and handles that QEvent in the main thread, that is the point at which your code can safely do things to the QLabels.
Alternatively (and perhaps more simply), your thread's code could emit a cross-thread signal (as described here) and let Qt handle the event-posting internally. That might be better for your purpose.
Neither of your approaches is thread-safe. It's possible that your first thread will execute the if statement, then the other thread will delete your label, and then you will be inside of your if statement and crash.
Qt provides a number of thread synchronization constructs, you'll probably want to start with QMutex and learn more about thread-safety before you continue working on this program.
Using a mutex would make your function would look something like this:
mutex.lock();
label1->show();
label2->show();
label3->show();
mutex.unlock()
As long as your other thread is using locking that same mutex object then it will prevented from deleting your labels while you're showing them.

Updating physics engine ina separate thread, is this wise?

I'm 100% new to threading, as a start I've decided I want to muck around with using it to update my physics in a separate thread. I'm using a third party physics engine called Farseer, here's what I'm doing:
// class level declarations
System.Threading.Thread thread;
Stopwatch threadUpdate = new Stopwatch();
//In the constructor:
PhysicsEngine()
{
(...)
thread = new System.Threading.Thread(
new System.Threading.ThreadStart(PhysicsThread));
threadUpdate.Start();
thread.Start();
}
public void PhysicsThread()
{
int milliseconds = TimeSpan.FromTicks(111111).Milliseconds;
while(true)
{
if (threadUpdate.Elapsed.Milliseconds > milliseconds)
{
world.Step(threadUpdate.Elapsed.Milliseconds / 1000.0f);
threadUpdate.Stop();
threadUpdate.Reset();
threadUpdate.Start();
}
}
}
Is this an ok way to update physics or should there be some stuff I should look out for?
In a game you need to synchronise your physics update to the game's frame rate. This is because your rendering and gameplay will depend on the output of your physics engine each frame. And your physics engine will depend on user input and gameplay events each frame.
This means that the only advantage of calculating your physics on a separate thread is that it can run on a separate CPU core to the rest of your game logic and rendering. (Pretty safe for PC these days, and the mobile space is just starting to get dual-core.)
This allows them to both physics and gameplay/rendering run concurrently - but the drawback is that you need to have some mechanism to prevent one thread from modifying data while the other thread is using that data. This is generally quite difficult to implement.
Of course, if your physics isn't dependent on user input - like Angry Birds or The Incredible Machine (ie: the user presses "play" and the simulation runs) - in that case it's possible for you to calculate your physics simulation in advance, recording its output for playback. But instead of blocking the main thread you could move this time-consuming operation to a background thread - which is a well understood problem. You could even go so far as to start playing your recording back in the main thread, even before it is finished recording!
there is nothing wrong with your approach in general. Moving time-consuming operations, such as physics engine calculations to a separate thread is often a good idea. However, I am assuming that your application includes some sort of visual representation of your physics objects in the UI? If this is the case you are going to run into problems.
The UI controls in Silverlight have thread affinity, i.e. you cannot update their state from within the thread you have created in the above example. In order to update their state you are going to have to invoke via the Dispatcher, e.g. TextBox.Dispatcher.Invoke(...).
Another alternative is to use a Silverlight BackgroundWorker. This is a useful little class that allows you to do time-consuming work. It will move your work onto a background thread, avoiding the need to create your own System.Threading.Thread. It will also provide events that marshal results back onto the UI thread for you.
Much simpler!

Resources