DX11 Updating Shared Textures - multithreading

I have a shared DX11 texture that is being used with 2 different devices in separate threads.
Thread1 (operating on device 1): Called every frame and updates the shared texture
Thread2(operating on device2): Consumes the shared texture by copying it to another texture. Frequency is much lesser than thread 1.
According to MSDN "If a shared texture is updated on one device ID3D11DeviceContext::Flush must be called on that device."
However calling flush on thread1 every frame is very expensive and we see a massive performance hit. We can't flush device 1 on thread 2, because a device context is not thread safe.
Is there a way to efficiently make the shared texture update when threads 2 needs to consume it?
Thanks for your help! MSDN is not very helpful when dealing with shared textures.
emphasized text

In order to synchronize the access to the shared resource between two threads (or interprocess) you can use IDXGIKeyedMutex. It is described here in details: https://msdn.microsoft.com/en-us/library/windows/desktop/ee913554(v=vs.85).aspx#dxgi_1.1_synchronized_shared_surfaces
You can check the sample code provided as well although they show only resource sharing between two DX10 devices. It is the same for DX11 devices.
The essential part is to QueryInterface the shared texture for IDXGIResource first and then for IDXGIKeyedMutex. After that you use the mutex for synchronization by using AcquireSync and ReleaseSync functions.

Related

Am I allowed to simultaneously render from the same buffer object on multiple shared contexts in OpenGL 2.1?

In Apple's documentation, I read this:
1 — "Shared contexts share all texture objects, display lists, vertex programs, fragment programs, and buffer objects created before and after sharing is initiated."
2 — "Contexts that are on different threads can share object resources. For example, it is acceptable for one context in one thread to modify a texture, and a second context in a second thread to modify the same texture. The shared object handling provided by the Apple APIs automatically protects against thread errors."
So I expected to be able to create my buffer objects once, then use them to render simultaneously on multiple contexts. However if I do that, I get crashes on my NVIDIA GeForce GT 650M with backtraces like this:
Crashed Thread: 10 Dispatch queue: com.apple.root.default-qos
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: EXC_I386_GPFLT
…
Thread 10 Crashed:: Dispatch queue: com.apple.root.default-qos
0 GLEngine 0x00007fff924111d7 gleLookupHashObject + 51
1 GLEngine 0x00007fff925019a9 gleBindBufferObject + 52
2 GLEngine 0x00007fff9243c035 glBindBuffer_Exec + 127
I've posted my complete code at https://gist.github.com/jlstrecker/9df10ef177c2a49bae3e. At the top, there's #define SHARE_BUFFERS — when commented out it works just fine, but uncommented it crashes.
I'm not looking to debate whether I should be using OpenGL 2.1 — it's a requirement of other software I'm interfacing with. Nor am I looking to debate whether I should use GLUT — my example code just uses that since it's included on Mac and doesn't have any external dependencies. Nor am I looking for feedback on performance/optimization.
I'd just like to know if I can expect to be able to simultaneously render from a single shared buffer object on multiple contexts — and if so, why my code is crashing.
We also ran into the 'gleLookupHashObject' crash and made a small repro-case (very similar to yours) which was posted in an 'incident' to Apple support. After investigation, an Apple DTS engineer came back with the following info, quoting:
"It came to my attention that glFlush() is being called on both the main thread and also a secondary thread that binds position data. This would indeed introduce issues and, while subtle, actually does indicate that the constraints we place on threads and GL contexts aren’t being fully respected.
At this point it behoves you to either further investigate your implementation to ensure that such situations are avoided or, better yet, extend your implementation with explicit synchronization mechanisms (such as what we offer with GCD). "
So if you run into this crash you will need to do explicit synchronization on the application side (pending a fix on the driver-side).
Summary of relevant snippets related to "OpenGL, Contexts and Threading" from the official Apple Documentation:
[0] Section: "Use Multiple OpenGL Contexts"
If your application has multiple scenes that can be rendered in parallel, you can use a context for each scene you need to render. Create one context for each scene and assign each context to an operation or task. Because each task has its own context, all can submit rendering commands in parallel.
https://developer.apple.com/library/mac/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_threading/opengl_threading.html#//apple_ref/doc/uid/TP40001987-CH409-SW6
[1] Section: Guidelines for Threading OpenGL Applications
(a) Use only one thread per context. OpenGL commands for a specific context are not thread safe. You should never have more than one thread accessing a single context simultaneously.
(b) Contexts that are on different threads can share object resources. For example, it is acceptable for one context in one thread to modify a texture, and a second context in a second thread to modify the same texture. The shared object handling provided by the Apple APIs automatically protects against thread errors. And, your application is following the "one thread per context" guideline.
https://developer.apple.com/library/mac/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_threading/opengl_threading.html
[2] OpenGL Restricts Each Context to a Single Thread
Each thread in an OS X process has a single current OpenGL rendering context. Every time your application calls an OpenGL function, OpenGL implicitly looks up the context associated with the current thread and modifies the state or objects associated with that context.
OpenGL is not reentrant. If you modify the same context from multiple threads simultaneously, the results are unpredictable. Your application might crash or it might render improperly. If for some reason you decide to set more than one thread to target the same context, then you must synchronize threads by placing a mutex around all OpenGL calls to the context, such as gl* and CGL*. OpenGL commands that blockâsuch as fence commandsâdo not synchronize threads.
https://developer.apple.com/library/mac/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_threading/opengl_threading.html

How to synchronize multiple threads to one?

I have a multithreaded application where I want to allow all but one of the threads to run synchronously. However, when a specific thread wakes up I need the rest of the threads to block.
My Current implementation is:
void ManyBackgroundThreadsDoingWork()
{
AquireMutex(mutex);
DoTheBackgroundWork();
ReleaseTheMutex(mutex);
}
void MainThread()
{
AquireMutex(mutex);
DoTheMainThreadWork();
ReleaseTheMutex(mutex);
}
This works, in that it does indeed keep the background threads from operating inside the critical block while the main thread is doing its work. However, There is a lot of contention for the mutex amongst the background threads even when they don't necessarily need it. The main thread runs intermittently and the background threads are able to run concurrently with each other, just not with the main thread.
What i've effectively done is reduced a multithreaded architecture to a single threaded one using locks... which is silly. What I really want is an architecture that is multithreaded for most of the time, but then waits while a small operation completes and goes back to being multithreaded.
Edit: An explanation of the problem.
What I have is an application that displays multiple video feeds coming from pcie capture cards. The pcie capture card driver issues callbacks on threads it manages into what is effectively the ManyBackgroundThreadsDoingWork function. In this function I copy the captured video frames into buffers for rendering. The main thread is the render thread that runs intermittently. The copy threads need to block during the render to prevent tearing of the video.
My initial approach was to simply do double buffering but that is not really an option as the capture card driver won't allow me to buffer frames without pushing the frames through system memory. The technique being used is called "DirectGMA" from AMD that allows the capture card to push video frames directly into the GPU memory. The only method for synchronization is to put a glFence and mutex around the actual rendering as the capture card will be continuously streaming data to the GPU Memory. The driver offers no indication of when a frame transfer completes. The callback supplies enough information for me to know that a frame is ready to be transferred at which point I trigger the transfer. However, I need to block transfers during the scene render to prevent tearing and artifacts in the video. The technique described above is the suggested technique from the pcie card manufacturer. The technique, however, breaks down when you want more than one video playing at a time. Thus, the question.
You need a lock that supports both shared and exclusive locking modes, sometimes called a readers/writer lock. This permits multiple threads to get read (shared) locks until one thread requests an exclusive (write) lock.

Asynchronous threads drawing in Bitmaps Delphi

If many asynchronous threads draw in a global TBitmap, it will rise me an error? Should I create my code using a critical section? (From my surf on the internet I found that the TBitmap.Draw is not thread safe)
Another question: If many synchronous threads draw in a global TBitmap and a VCL Timer read asynchronously the content from the TBitmap will this rise me an error?
Thanks!
Yes, you do need to protect the TBitmap from concurrent access across multiple threads. A critical section is fine for serializing your drawing code, HOWEVER that is not enough by itself! The main thread caches GDI resources and performs cleanup on them periodically, which will affect your TBitmap. As such, you will ALSO need to Lock/Unlock() the TBitmap.Canvas whenever drawing/rendering to ensure the VCL does not rip out its resources behind your back.
Since your threads are all modifying the same bitmap, you need to serialize all access to that bitmap. That means reading its contents as well as writing to it.
Of course, this assumes that multiple threads drawing to a shared bitmap is the right solution to your problem. Without knowing what your actual problem is, I could not comment on that.
UPDATE
You must also use Lock/Unlock when drawing to the bitmap because of the issue described in Remy's answer. Which should be the accepted answer to this question.
Use monitors or semaphores to control your threads when they do changes in your TBitmap Pixels !
Can you use the TThread.Synchronize method instead?
http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/Classes_TThread_Synchronize#TThread#TThreadMethod.html
According to the doco for TThread class
Following are issues and recommendations to be aware of when using threads:
Keeping track of too many threads consumes CPU time; the recommended limit is 16 active threads per process on single processor systems.
When multiple threads update the same resources, they must be synchronized to avoid conflicts.
Most methods that access an object and update a form must only be called from within the main thread or use a synchronization object such as TMultiReadExclusiveWriteSynchronizer.

openGL volume rendering and display update in different thread

I am using openGL and "freeglut" library for volume rendering and display. In the main thread I initialize the openGL window, and then acquire volume data frame by frame, the volume rendering is done after one volume data is acquired. This works well but takes much time. Is it possible that I keep initializing openGL window in the main thread, and do the volume rendering and display in another thread? I have checked wglMakeCurrent, it does not update the window initialized in the main thread.
Multithreaded OpenGL operation is a nasty beast. You can however, and this is what I strongly suggest, map a Pixel Buffer Object into the program's address space. And that region of address space is visible to all threads. So you can update the volume data from another thread (or, like in the case of the program I'm currently working on, on another GPU), then signal the main thread to update the texture from the new data in the PBO. You can also update only sub portions of the volume from the PBO with glTexSubImage3D.

How to perform an image stream preview to a Delphi 6 frame or form from a background thread efficiently?

I have a Delphi 6 application that receives and processes an image stream from an external camera. I have the code on a background thread since it is CPU heavy and I don't want it interfering with with the user interface code that runs on the main thread. I want to update a rectangular area on a form or frame with the TBitmaps I create from the camera's JPEG frames that are received at a rate of 25 frames per second.
I want to know what method will give me the best performance and what Windows API calls or Delphi calls to use to do it. I would guess I should not use a TImage or TPicture or similar VCL component because they run on the main thread and I'm pretty sure trying to get anything done via a Synchronize() call is going to be inefficient and has the potential to slow down the threads involved. I would also want a technique that provides a smooth video display like double buffered controls do without any "striping" effects. Also, any tips on proper Canvas locking or device context management, etc. would be appreciated, especially tips on avoiding common mistakes in freeing resources.
Of course, a link to a good code sample that does what I need would be great.
AFAIK TBitmap are thread-safe, if you work only on its canvas. Synchronize is needed if you send GDI messages and need to refresh the screen, but from my experiment, using TBitmap.Canvas is just a wrapper around thread-safe Windows API. If you process the bitmap with pixel arithmetics (using e.g. Scanline), one unique bitmap per thread, you can do it on background.
But I suspect using TBitmap is not the most efficient way. Give a try to http://graphics32.org or http://aggpas.org which are very fast way to work on bitmaps.
If you can, as imajoosy proposed, the best way to process your input stream is to use direct X streaming process abilities.
For thread-safe process, if each thread is about to consume 100% of its core (which is very likely for image process), it is generally assumed that you shall better create NumberOfCPU-1 threads for your processing. For instance, you could create a pool of threads, then let those consume the bitmaps from the input stream.

Resources