Flickering frames in Imediate mode - rust

I am using vulkan (ash) to render a scene.
The rendering algorithm first does a raytracing pass (using the traditional pipeline not the NV extensions) then it renders a few frames normally.
On Nvidia it renders fine, on AMD I am experiencing a flicker.
The draw structure is:
Raytrace
Render 6 meshes with 6 draw calls
If I render that in immediate mode, I get flickering, if I do so in FIFO mode I don't. In immediate mode, if I put the thread to sleep in between the raytracing call and the mesh calls I get flickering no matter how slow the sleep is even at a 1 second sleep the flcikering occurs. The pattern of the flicker is 4 calls rendered normally, 2 calls the RT image is the only thing rendered, as if it had been called after the 6 regular mesh calls, even though it;s issued before them.
All f this suggests a synchronization bug. The issue is I have no idea what I forgot to do.
Each draw call is drawing to a swapchain image, each draw call has this structure:
Set up uniforms, desciptor sets, pipelines...
Create a fence
Submit draw command to queue signaling fence
Wait for fence
Wait for device
Delete fence
I am currently doing that at the end of every single one of the 7 calls. What did I forget to sync? What barrier am I missing?
I am using the passless rendering extension, so I don;t have any render passes. I might need an image barrier on the swapchain as a consequence but I don;t know if, and what it would need to be.

Related

Temporarily stop Qt5/QML from updating framebuffer (/dev/fb0)

On an embedded system, due to very specific hardware/software limitations, we need another program to be able to display info via the framebuffer (/dev/fb0), while keeping our Qt5/QML program running in the background. We display a custom QQuickItem derived black rectangle (with nothing but a 'return' in the update()) in QML while the second program runs, but we still see flickering on our LCD display. We surmise that QT is still painting the Screen Graph (possibly of other items layered beneath the rectangle) to /dev/fb0, thus causing flickering by both programs writing to /dev/fb0 at the same time.We cannot use a second framebuffer approach (/dev/fb1) because the compositing increases processor loads dramatically such that the system becomes unusable. One thought is iterate through screen graph tree, marking all nodes 'ItemHasContents' flag as false so the screen graph renderer will not write to FB, then re-enable when the secondary program finishes its task. Another thought is to turn off rendering via the top level QWindow, but nothing in the documentation says this is even possible... Is this possible via QT, or even though a shell script?
/dev/fb0 sounds like you'd be working on a Linux-based system.
You're not saying whether you need the Qt application really continuing to run, just without screen updates, or whether simply "freezing" it while your other app uses the frame buffer would suffice.
If you are fine with the latter, the easiest solution to stop the Qt app from rendering is simply send it a SIGSTOP signal, it will freeze and cease to upgrade the frame buffer. Once you're done with the fb, send a SIGCONT signal. Sometimes the simplest approaches are the best...

How to keep PyQt GUI responsive when blocking tasks are GUI related?

I'm writing an app that embeds a bunch of matplotlib figures into a PyQt GUI. The updating of these figures can take up to a few seconds, so I would like to introduce a waiting indicator to display while the plots are being drawn. I've moved all the data processing code into its own thread, but it seems the actual plotting calls are often making up the majority of processing time.
I have written a waiting indicator that uses a QTimer instance to trigger paintEvent on the widget. This works just fine when all the intensive processing can be pushed into another thread. The problem is that these calls to construct the matplotlib plots cannot be moved outside of the main thread due to the way Qt is designed, and so block the updating of the waiting indicator, rendering it kind of useless.
I've introduced some calls to QCoreApplication.processEvents() after the updating of each figure, which improves the performance a little. I've also toyed with the idea of monkeypatching a bunch of methods of matplotlib.axes.Axes to include calls to QCoreApplication.processEvents(), but I can see that getting messy. Is this the best I can do? Is there any way to interrupt the main thread at regular intervals and force it to process new events?
It should also help a big deal to do the actual drawing on a QPixmap in a thread. Drawing that pixmap with QPainters drawPixmap() method is very fast. And you need to recreate the pixmap only when really needed (e.g. after Zooming or so). In the meantime you just have to reuse that already drawn pixmap. The actual paintEvents using drawPixmap() will cost close to nothing and your GUI will be completely responsive.
Clobbering the code with processEvent() is not only ugly but can cause very nasty and very hard to debug malfunctions. E.g. it might cause premature deletion of objects which are still in use but were scheduled for deletion using deleteLater().
This Answer might be also of use: Python - matplotlib - PyQT: plot to QPixmap
I havn't used matplotlib, yet. But in case it uses directly QWidgets and can not be used without it won't be so easy as you mentioned above. But you could do the drawing in another process started by your GUI which uses matplotlib as in the link above and stores the pixmap to disk and your gui loads whenever a new pixmap is ready. QFileSystemWatcher might help here to avoid polling.

Efficient Direct2D multithreading

I'm writing a ebook reader app for Windows Store. I'm using Direct2D + DXGI swap chains to render book pages on screen.
My book content sometimes is quite complex (geometry, bitmaps, masks, etc), so it can take up to 100 ms to render it. So I'm trying to do an off-screen rendering to a bitmap in a separate thread, and then just show this bitmap in main thread.
However, I can't figure how to do it efficiently.
So far I've tried two approaches:
Use a single ID2D1Factory with D2D1_FACTORY_TYPE_MULTI_THREADED flag, create ID2D1BitmapRenderTarget and use it in background thread for off-screen rendering. (This additionally requires ID2D1Multithread::Enter/Leave on IDXGISwapChain::Present operations). Problem is, ID2D1RenderTarget::EndDraw operation in background thread sometimes take up to 100ms, and main thread rendering is blocked for this period due to internal Direct2D locking.
Use a separate ID2D1Factory in background thread (as described in http://www.sdknews.com/ios/using-direct2d-for-server-side-rendering) and turn off internal Direct2D synchronization. There is no cross-locking betwen two threads in this case. Unfortunately, in this case I can't use resulting bitmap in main ID2D1Factory directly, because it belongs to a different factory. I have to move bitmap data to CPU memory, then copy it into GPU memory of the main ID2D1Factory. This operation also introduce significant lags (I believe it to be due to large memory accesses, but I'm not sure).
Is there a way to do this efficiently?
P.S. All the timing here are given for Acer Switch 10 tablet. On regular Core i7 PC both approaches work without any visible lag.
Ok, I've found a solution.
Basically, all I needed is to modify approach 2 to use DXGI resource sharing between two DirectX factory sets. I'll skip all the gory details (they can be found here: http://xboxforums.create.msdn.com/forums/t/66208.aspx), but basic steps are:
Create two sets of DirectX resources: main (which will be used to onscreen rendering), and secondary (for offscreen rendering).
Using ID3D11Device2 from main resource set, create D3D 2D texture by CreateTexture2D D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE, D3D11_RESOURCE_MISC_SHARED_NTHANDLE and D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX flags.
Get shared handle from it by casting it to IDXGIResource1 and calling CreateSharedHandle from it with XGI_SHARED_RESOURCE_READ and DXGI_SHARED_RESOURCE_WRITE.
Open this shared texture in secondary resource set in background thread by calling ID3D11Device2::OpenSharedResource1.
Acquire keyed mutex of this texture (IDXGIKeyedMutex::AcquireSync), create render target from it (ID2D1Factory2::CreateDxgiSurfaceRenderTarget), draw on it and release mutex (IDXGIKeyedMutex::ReleaseSync).
On the main thread, in the main resource set, acquire mutex and create shared bitmap from texture created in step 2, draw this bitmap, then release mutex.
Note that mutex locking stuff is necessary. Not doing it results in some cryptic DirectX debug error messages, and erroneous operation or even crashing.
tl;dr: Render to bitmaps on background thread in software mode. Draw from bitmaps to render target on UI thread in hardware mode.
The best approach I've been able to find so far is to use background threads with software rendering (IWICImagingFactory::CreateBitmap and ID2D1Factory::CreateWicBitmapRenderTarget) and then copy it to a hardware bitmap back on the thread with the hardware render target via ID2D1RenderTarget::CreateBitmapFromWicBitmap. And then blit that using ID2D1RenderTarget::DrawBitmap.
This is how paint.net 4.0 does selection rendering. When you're drawing a selection with the lasso tool, it will use a background thread to draw the selection outline asynchronously (the UI thread does not wait for this to complete). You can end up with a very complicated polygon due to the stroke style and animations. I render it 4 times, where each animation frame has a slightly different offset for the dashed stroke style.
Obviously this rendering can take awhile as the polygon becomes more complex (that is, if you keep scribbling for awhile). I have a few other special optimizations for when you use the Move Selection tool which allows you to do transformations (rotate, translate, scale): if the background thread hasn't yet re-rendered the current polygon with the new transform, then I will render the old bitmap (with the current polygon and old transform) with the new transform applied. The selection outline may be distorted (scaling) or clipped (translated outside of viewable area) while the background thread catches up, but it's a small price to pay for 60fps responsiveness. This optimization works very well because you can't be modifying the polygon and transform of a selection at the same time.

Multiple OpenGL contexts, multiple windows, multithreading, and vsync

I am creating a graphical user interface application using OpenGL in which there can be any number of windows - "multi-document interface" style.
If there were one window, the main loop could look like this:
handle events
draw()
swap buffers (vsync causes this to block until vertical monitor refresh)
However consider the main loop when there are 3 windows:
each window handle events
each window draw()
window 1 swap buffers (block until vsync)
(some time later) window 2 swap buffers (block until vsync)
(some time later) window 3 swap buffers (block until vsync)
Oops... now rendering one frame of the application is happening at 1/3 of the proper framerate.
Workaround: Utility Window
One workaround is to have only one of the windows with vsync turned on, and the rest of them with vsync turned off. Call swapBuffers() on the vsync window first and draw that one, then draw the rest of the windows and swapBuffers() on each one.
This workaround will probably look fine most of the time, but it's not without issues:
it is inelegant to have one window be special
a race condition could still cause screen tearing
some platforms ignore the vsync setting and force it to be on
I read that switching which OpenGL context is bound is an expensive operation and should be avoided.
Workaround: One Thread Per Window
Since there can be one OpenGL context bound per thread, perhaps the answer is to have one thread per window.
I still want the GUI to be single threaded, however, so the main loop for a 3-window situation would look like this:
(for each window)
lock global mutex
handle events
draw()
unlock global mutex
swapBuffers()
Will this work? This other question indicates that it will not:
It turns out that the windows are 'fighting' each other: it looks like
the SwapBuffers calls are synchronized and wait for each other, even
though they are in separate threads. I'm measuring the frame-to-frame
time of each window and with two windows, this drops to 30 fps, with
three to 20 fps, etc.
To investigate this claim I created a simple test program. This program creates N windows and N threads, binds one window per thread, requests each window to have vsync on, and then reports the frame rate. So far the results are as follows:
Linux, X11, 4.4.0 NVIDIA 346.47 (2015-04-13)
frame rate is 60fps no matter how many windows are open.
OSX 10.9.5 (2015-04-13)
frame rate is not capped; swap buffers is not blocking.
Workaround: Only One Context, One Big Framebuffer
Another idea I thought of: have only one OpenGL context, and one big framebuffer, the size of all the windows put together.
Each frame, each window calls glViewport to set their respective rectangle of the framebuffer before drawing.
After all drawing is complete, swapBuffers() on the only OpenGL context.
I'm about to investigate whether this workaround will work or not. Some questions I have are:
Will it be OK to have such a big framebuffer?
Is it OK to call glViewport multiple times every frame?
Will the windowing library API that I am using even allow me to create OpenGL contexts independent of windows?
Wasted space in the framebuffer if the windows are all different sizes?
Camilla Berglund, maintainer of GLFW says:
That's not how glViewport works. It's not
how buffer swapping works either. Each window will have a
framebuffer. You can't make them share one. Buffer swapping is
per window framebuffer and a context can only be bound to a single
window at a time. That is at OS level and not a limitation of
GLFW.
Workaround: Only One Context
This question indicates that this algorithm might work:
Activate OpenGL context on window 1
Draw scene in to window 1
Activate OpenGL context on window 2
Draw scene in to window 2
Activate OpenGL context on window 3
Draw scene in to window 3
For all Windows
SwapBuffers
According to the question asker,
With V-Sync enabled, SwapBuffers will sync to the slowest monitor and
windows on faster monitors will get slowed down.
It looks like they only tested this on Microsoft Windows and it's not clear that this solution will work everywhere.
Also once again many sources tell me that makeContextCurrent() is too slow to have in the draw() routine.
It also looks like this is not spec conformant with EGL. In order to allow another thread to eglSwapBuffers(), you have to eglMakeCurrent(NULL) which means your eglSwapBuffers now is supposed to return EGL_BAD_CONTEXT.
The Question
So, my question is: what's the best way to solve the problem of having a multi-windowed application with vsync on? This seems like a common problem but I have not yet read a satisfying solution for it.
Similar Questions
Similar to this question: Synchronizing multiple OpenGL windows to vsync but I want a platform-agnostic solution - or at least a solution for each platform.
And this question: Using SwapBuffers() with multiple OpenGL canvases and vertical sync? but really this problem has nothing to do with Python.
swap buffers (vsync causes this to block until vertical monitor refresh)
No, it doesn't block. The buffer swap call may return immediately and not block. What it does however is inserting a synchronization point so that execution of commands altering the back buffer is delayed until the buffer swap happened. The OpenGL command queue is of limited length. Thus once the command queue is full, futher OpenGL calls will block the program until further commands can be pushes into the queue.
Also the buffer swap is not an OpenGL operation. It's a graphics / windowing system level operation and happens independent of the OpenGL context. Just look at the buffer swap functions: The only parameter they take are a handle to the drawable (=window). In fact even if you have multiple OpenGL contexts operating on a single drawable, you swap the buffer only once; and you can do it without a OpenGL context being current on the drawable at all.
So the usual approach is:
' first do all the drawing operations
foreach w in windows:
foreach ctx in w.contexts:
ctx.make_current(w)
do_opengl_stuff()
glFlush()
' with all the drawing commands issued
' loop over all the windows and issue
' the buffer swaps.
foreach w in windows:
w.swap_buffers()
Since the buffer swap does not block, you can issue all the buffer swaps for all the windows, without getting delayed by V-Sync. However the next OpenGL drawing command that addresses a back buffer issued for swapping will likely stall.
A workaround for that is using an FBO into which the actual drawing happens and combine this with a loop doing the FBO blit to the back buffer before the swap buffer loop:
' first do all the drawing operations
foreach w in windows:
foreach ctx in w.contexts:
ctx.make_current(w)
glBindFramebuffer(GL_DRAW_BUFFER, ctx.master_fbo)
do_opengl_stuff()
glFlush()
' blit the FBOs' renderbuffers to the main back buffer
foreach w in windows:
foreach ctx in w.contexts:
ctx.make_current(w)
glBindFramebuffer(GL_DRAW_BUFFER, 0)
blit_renderbuffer_to_backbuffer(ctx.master_renderbuffer)
glFlush()
' with all the drawing commands issued
' loop over all the windows and issue
' the buffer swaps.
foreach w in windows:
w.swap_buffers()
thanks #andrewrk for all theses research, i personnaly do like that :
Create first window and his opengl context with double buffer.
Active vsync on this window (swapinterval 1)
Create others windows and attach first context with double buffer.
Disable vsync on theses others window (swapinterval 0)
For each frame
For invert each window (the one with vsync enable at the end).
wglMakeCurrent(hdc, commonContext);
draw.
SwapBuffer
In that manner, i achieve the vsync and all window are based on this same vsync.
But i encoutered problem without aero : tearing...

"current vertex declaration does not include all the elements required"

"The current vertex declaration does not include all the elements required by the current vertex shader. TextureCoordinate0 is missing."
I get this error when I try to use a spriteFont to draw my FPS on the screen, on the line where I call spriteBatch.End()
My effect doesn't even use texture coordinates.
But I have found the root of the problem, just not how to fix it.
I have a separate thread that builds the geometry (an LOD algorithm) and somehow this seems to be why I have the problem.
If I make it non-threaded and just update my mesh per frame I don't get an error.
And also if I keep it multithreaded but don't try to draw text on the screen it works fine.
I just can't do both.
To make it even more strange, it actually compiles and runs for a little bit. But always crashes.
I put a Thread.Sleep in the method that builds the mesh so that it happens less often and I saw that the more I make this thread sleep, and the less it gets called, the longer it will run on average before crashing.
If it sleeps for 1000ms it runs for maybe a minute. If it sleeps for 10ms it doesn't even show one frame before crashing. This makes me believe that it has to do with a certain line of code being executed on the mesh building thread at the same time you are drawing text on the screen.
It seems like maybe I have to lock something when drawing the text, but I have no clue.
Any ideas?
My information comes from the presentation "Understanding XNA Framework Performance" from GDC 2008. It says:
GraphicsDevice is somewhat thread-safe
Cannot render from more than one thread at a time
Can create resources and SetData while another thread renders
ContentManager is not thread-safe
Ok to have multiple instances, but only one per thread
My guess is that you're breaking one of these rules somewhere. Or you're modifying a buffer that is being used to render without the appropriate locking.

Resources