OpenGL multiple threads rendering, compute in display interval

OpenGL multiple threads rendering, compute in display interval - multithreading

I need images to be displayed in a constant frame rate, so I use two threads, one for rendering with VSYNC on, one for computing using CUDA, which may takes long time. I want computing thread running in rendering thread interval(after swap buffer, before next frame start rendering).
I have two problems here:
How can I know when the image are exactly drawn on screen, then I can awake rending thread. After glutSwapBuffers(), the image may not be actually displayed on screen. I have not found a API to notify display completing.
How can I stop computing thread when it's time to render, I have tried this_thread::yield() but it often still runs computing thread. I am not familiar with multiple threads programming.
I use C++11 for multiple thread, CUDA for computing, OpenGL for rendering.
Update:
As computing takes long time, but rendering must be performed at 60Hz, so I have to separate into two threads.
I have just resolved this problem by using condition_variable, it is similar to 'producer-consumer' problem. And there is no need to know when the image are actually drawn on screen, you can just let it computes all the time, and CUDA computing thread seems won't interrupt OpenGL rendering thread in one GPU, they are parallel.
Here is the code:
compute Thread:
void update(){
while(1){
unique_lock<mutex> locker(buffer_mutex);
buffCond.wait(locker, []{return !updateFlag;});
runSolver(d_x);// compute next several images, d_x point to the images buffer
updateFlag = true;
locker.unlock();
buffCond.notify_one();
}
}
rendering thread:
void render(){
initGL();
glutMainLoop();
}
void display(){
if(updateFlag){
unique_lock<mutex> locker(buffer_mutex);
updateBuffer(d_x);
updateFlag = false;
locker.unlock();
buffCond.notify_one();
}
.../* OpenGL rendering*/
glutSwapBuffers();
}

Graphics rendering is generally done from a single thread, at least from single thread at a time. I believe the finer details depend on your software stack; for example, from Xlib:
Threaded applications: While Xlib does attempt to support multithreading, the API makes this difficult and error-prone.
More information from a rather opinionated but informed article:
[...] However, most real-life programs access Xlib through higher-level libraries, and the libraries do not initialize Xlib threading on their behalf. Today, most programs with multiple X11 connections and multiple threads are buggy.
With that said, multi-threaded CUDA should still be an option, so one thread can step into CUDA if it's done with OpenGL. That way CUDA can still make progress while another thread is rendering the last frame, but in the critical period delaying the next frame, no threads are waiting. Of course, if you truly have to re-join before and after each frame is rendered, you're just voluntarily incurring the cost of switching contexts without the benefit of concurrency. In this case there is no need for threading.
You may also benefit from reading the documentation or some examples of <atomic>, especially atomic_bool, for shared flags between threads. If used correctly, they can signal state safely between threads with very little cost.

Related

How should I make multithreaded program that uses GPU for computation?

I am making simulation program that uses compute shaders and i ran into a problem. I am currently using OpenGL context to render GUI stuff to control and watch simulation. And I use the same context to call glDispatchCompute.
That could cause program window to freeze, because simulation might run in any UPS (like 0.1 - 10000 updates per second) and window should update in fixed FPS (display refresh rate, 60 FPS in common).
That becomes a problem, when simulation is slow and single step takes, for example 600 ms to compute. And swap buffers function waits for all compute shaders to perform, and so - FPS drops.
How can I make updates and renders independent from each other? On CPU I can just spawn second thread, but OpenGL context is not multithreaded. Should I use Vulkan for this task?

Even with Vulkan, there is no way to just shove a giant blob of work at the GPU and guarantee that later graphics work will just interrupt the GPU's processing. The most reliable way to handle this is to break your compute work up into chunks of a size that you're reasonably sure will not break your framerate and interleave them with your rendering commands.
Vulkan offers ways that allow GPUs to execute interruptable work, but do not require any particular interrupting functionality. You can create a compute queue that has the lowest possible priority and create a graphics queue with the highest priority. But even that assumes:
The Vulkan implementation offers multiple queues at all. Many embedded ones do not.
Queue priority implementations will preempt work in progress if higher-priority work is submitted. This may happen, but the specification offers no guarantees. Here is some documentation about the behavior of GPUs, so some of them can handle that. It's a few years old, so more recent GPUs may be even better, but it should get you started.
Overall, Vulkan can help, but you'll need to profile it and you'll need to have a fallback if you care about implementations that don't have enough queues to do anything.
OpenGL is of course even less useful for this, as it has no explicit queueing system at all. So breaking the work up is really the only way to ensure that the compute task doesn't starve the renderer.

QGLWidget's paintGL() method called from which Qt thread?

suppose I use the QGLWidget's paintGL() method to draw into the widget using OpenGL. After the Qt called the paintGL() method, it automatically triggers a buffer swap. In OpenGL, this buffer swap usually blocks the calling thread until the frame rendering to the background buffer is completed, right? I wonder which Qt thread calls the paintGL as well as the buffer swap. Is it the main Qt UI thread? If it is, wouldn't that mean that the block during the buffer swap also blocks the whole UI? I could not find any information about this process in general..
Thanks

I don't use the QGLWidget very often, but consider that yes, if swapBuffers() is synchronous the Qt GUI thread is stuck. This means that during that operation you'll be unable to process events.
Anyway, if you're experiencing difficulties while doing this, consider reading this article which manage to allow multithreaded OpenGL to overcome this difficulty.
Even better, this article explains well the situation and introduces the new multithreading OpenGL capabilities in Qt 4.8, which is now in release candidate.

In OpenGL, this buffer swap usually blocks the calling thread until the frame rendering to the background buffer is completed, right?
It depends on how it is implemented. Which means that it varies from hardware to hardware and driver to driver.
If it is, wouldn't that mean that the block during the buffer swap also blocks the whole UI?
Even if it does block, it will only do so for 1/60th of a second. Maybe 1/30th if your game is slowing down. If you're really slow, 1/15th. The at most one keypress or mouse action that the user gives will still be in the message queue.
The issue with blocking isn't about the UI. It will be responsive enough for the user to not notice. But if you have strict timings (such as you might for a game), I would suggest avoiding paintGL at all. You should be rendering when you want to, not when Qt tells you to.

interesting thread synchronization problem

I am trying to come up with a synchronization model for the following scenario:
I have a GUI thread that is responsible for CPU intensive animation + blocking I/O. The GUI thread retrieves images from the network (puts them in a shared buffer) , these images are processed (CPU intensive operation..done by a worker thread) and then these images are animated ( again CPU intensive..done by the GUI thread).
The processing of images is done by a worker thread..it retrieves images from the shared buffer processes them and puts them in an output buffer.
There is only once CPU and the GUI thread should not get scheduled out while it is animating the images (the animation has to be really smooth). This means that the work thread should get the CPU only when the GUI thread is waiting for I/O operation to complete.
How do i go about achieving this? This looks like a classic producer consumer problem...but i am not quite sure how i can guarantee that the animation will be as smooth as possible ( i am open to using more threads).
I would like to use QThreads (Qt framework) for platform independence but i can consider pthreads for more control ( as currently we are only aiming for linux).
Any ideas?
EDIT:
i guess the problems boils down to one thing..how do i ensure that the animation thread is not interrupted while it is animating the images ( the animation runs when the user goes from one page to the other..all the images in the new page are animated before shown in their proper place..this is a small operation but it must be really smooth).The worker thread can only run when the animation is over..

Just thinking out loud here, but it sounds like you have two compute-intensive tasks, animation and processing, and you want animation to always have priority over processing. If that is correct then maybe instead of having these tasks in separate threads you could have a single thread that handles both animation and processing.
For instance, the thread could have two task-queues, one for animation jobs and one for processing jobs, and it only starts a job from the processing queue when the animation queue is empty. But, this will only work well if each individual processing job is relatively small and/or interruptible at arbitrary positions (otherwise animation jobs will get delayed, which is not what you want).

The first big question is: Do I really need threads? Qt 's event system and network objects make it easy to not having the technical burden of threads and all the snags that comes with it.
Have a look at alternative ways to address issues here and here. These techniques great if you are sticking to pure Qt code and do not depend on a 3rd party library. If you must use a 3rd party lib that does blocking calls then sure, you can use threads.
Here is an example of a consumer producer.
Also have a look at Advanced Qt Programming: Creating Great Software with C++ and Qt 4
My advice is to start without threads and see how it fares. You can always refactor to threads after. So, best is to design your objects/architecture without too much coupling.
If you want you can post some code to give more context.

When does the CPU wait on the GPU?

In an application which is GPU bound, I am wondering at what point the CPU will wait on the GPU to complete rendering. Is it different between DirectX and OpenGL?
Running an example similar to below, obviously the CPU doesn't run away - and looking in task manager, CPU usage (If it were a single core machine) would be below 100%.
While (running){
Clear ();
SetBuffers (); // Vertex / Index
SetTexture ();
DrawPrimitives ();
Present ();
}

The quick summary is that you will probably see the wait in Present(), but it really depends on what it is the Present() call.
Generally, unless you specifically say you want notice of when the GPU is finished, you might end up waiting at the (random to you) point the driver's input buffer fills up. Think of the GPU driver & card as a very long pipeline. You can put in work at one end and eventually after a while it comes out to the display. You might be able to put in several frames worth of commands into the pipeline before it fills up. The card could be taking a lot of time drawing primitives, but you might see the CPU waiting at a point several frames later.
If your Present() call contains the equivalent of glFinish(), that entire pipeline must drain before that call can return. So, the CPU will wait there.
I hope the following can be helpful:
Clear ();
Causes all the pixels in the current buffer to change color, so the GPU is doing
work. Lookup your GPU's clear pix/sec
rate to see what time this should be taking.
SetBuffers ();
SetTexture ();
The driver may do some work here, but generally it wants to wait until you
actually do drawing to use this new data. In any event, the GPU doesn't do
much here.
DrawPrimitives ();
Now here is where the GPU should be doing most of the work. Depending on the
primitive size you'll be limited by vertices/sec or pixels/sec. Perhaps you have
an expensive shader you'll be limited by shader instructions/sec.
However, you may not see this as the place the CPU is waiting. The driver
may buffer the commands for you, and the CPU may be able to continue on.
Present ();
At this point, the GPU work is minimal. It just changes a pointer to start displaying from a different buffer.
However, this is probably the point that appears to the CPU to be where it is waiting on the GPU. Depending on your API, "Present()" may include something like glFlush() or glFinish(). If it does, then you'll likely wait here.

On Windows the waits are in the Video driver. They depend on somewhat on driver implentation, though in a lot of cases the need for a wait is dictated by the requirements of the API you are using (whether calls are defined to be syncronous or not).
So yes, it would most likely be different between DirectX and OpenGL.

Multithreading in XNA game

Where can I use multithreading in a simple 2D XNA game? Any suggestions would be appreciated

Well, there are many options -
Most games use mutlithreading for things such as:
Physics
Networking
Resource Loading
AI/Logical updates (if you have a lot of computation in the "update" phase of your game)
You really have to think about your specific game architecture, and decide where you'd benefit the most from using multithreading.

Some games use multithreaded renderers as a core design philosophy.
For instance... thread 1 calculates all of the game logic, then sends this information to thread 2. Thread 2 precalculates a display list and passes this to the GPU. Thread 1 ends up running 2 frames behind the GPU, thread 2 runs one frame behind the GPU.
The advantage is really that you can in theory do twice as much work in a frame. Skinning can be done on the CPU and can become "free" in terms of CPU and GPU time. It does require double buffering a large amount of data and careful construction of your engine flow so that all threads stall when (and only when) necessary.
Aside from this, a pretty common technique these days is to have a number of "worker threads" running. Tasks with a common interface can be added to a shared (threadsafe) queue and executed by the worker threads. The main game thread then adds these tasks to the queue before the results are needed and continues with other processing. When the results are eventually required, the main thread has the ability to stall until the worker threads have finished processing all of the required tasks.
For instance, an expensive for loop can be changed to used tasks.
// Single threaded method.
for (i = 0; i < numExpensiveThings; i++)
{
ProcessExpensiveThings (expensiveThings[i]);
}
// Accomplishes the same work, using N worker threads.
for (i = 0; i < numExpensiveThings; i++)
{
AddTask (ProcessExpensiveThingsTask, i);
}
WaitForAll (ProcessExpensiveThingsTask);
You can do this whenever you're guaranteed that ProcessExpensiveThings() is thread-safe with respect to other calls. If you have 80 things at 1ms each and 8 worker threads, you've saved yourself roughly 70ms. (Well, not really, but it's a good hand-wavy approximation.)

There is lots of place to apply to: AI, objects interaction, multiplayer gaming etc. This depends on your concrete game.

Why do you want to use multi-threading?
If it is for practice, a reasonable and easy module to put in its own thread would be the sound system, as communication is primarily one-way.

Multi-threading with GameComponents is meant to be quite straightforward
e.g.
http://roecode.wordpress.com/2008/02/01/xna-framework-gameengine-development-part-8-multi-threading-gamecomponents/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string