SDL2: Multiple renderers for multithreading?

SDL2: Multiple renderers for multithreading? - multithreading

I am using SDL2's SDL_Renderer to:
draw various shapes into textures
showing those textures in the window
As the first step can be time-consuming due to the amount of shapes, I am considering putting it into a separate thread. A texture that is not finished would simply not show (I can ensure that in my code), but the application wouldn't have to wait for the step 1 to complete.
But I am not sure how to handle the renderer object: currently I am using a single global one for both steps 1 and 2. If I just do 1 and 2 in two threads with the same renderer object, it is going to fail miserabely, right? But I am not sure if I can create two separate renderer objects?
The answer of the question: SDL2 Multiple renderers? suggests that renderers are tied to specific windows and I am using just one.

Related

How can I share an OpenGL context/texture between 2 processes (linux)

I am trying to build 2 applications running in separate processes. One application will display live video from a camera (server) and the other will overlay UI (client) on top of that video. The solution requires low latency and therefore I would like to render both without going thru the OS compositor.
The solution I am trying to implement involves creating a shared OpenGL context or texture so that the UI can render its part to some off screen buffer/texture.
After every live image frame is rendered the server can take the information from the off-screen buffer/texture and render it on top.
This way there is no latency added due to synchronization of the processes. The server will take the latest image from the UI if one is ready. In case it is not ready it shouldnt wait for it, and use a previous image instead.
How can i pass a texture or context between processes?
The CreateContext function can take a pointer of another context and make it shared but the address as far as I understand will not be valid outside the process space.

These days the "cleanest" way to share GPU resources between processes is to create those resources using Vulkan, export them into file descriptors (POSIX) or HANDLEs (Win32) and import those into OpenGL contexts created at either side. The file descriptors you can pass by the usual methods (sendmsg with SCM_RIGHTS, or pidfd_getfd, or open("/proc/${PID}/fd/${FD}").
Exporting from Vulkan:
https://www.khronos.org/registry/vulkan/specs/1.2-khr-extensions/html/chap46.html#VK_KHR_external_memory_fd (ff.)
Importing into OpenGL:
https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_external_objects.txt
https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_external_objects_fd.txt
https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_external_objects_win32.txt
Doing with just "pure" OpenGL requires a lot of hacks. One way would be to force indirect contexts (at the cost of modern capabilities, due to the lack of GLX support for those) and sharing the X11 IDs for that. Another method is to use ptrace to access to mapped buffers in the other process. Either is quite daunting and a lot more work to properly implement (BT;DT.), than setting up a Vulkan instance, creating all the textures therein and then importing them to OpenGL.

Creating Three.js meshes in a WebWorker

I'm trying to offload as many Threejs computations as possible to a Web Worker. It seems to be relatively doable when just wanting the worker to create geometries. However, I still need to create a significant amount of meshes, which implies a hefty cycle on the main thread.
Is it possible to offload mesh creation to a web worker and just have the main thread add it to the scene (when ready)?
The idea would be to have the worker create an array of meshes, based on some data, and have it send it over to the main thread.
Many thanks

I am currently willing to tackle this problem in one of my projects. If you haven't started yet yours, I would suggest to have a look at https://github.com/kripken/webgl-worker first. There are two examples (one simple, one a bit more complex) that could help to start with.
I will update later this answer with more details about how to integrate wegl-worker with three.js, which might require more setup than simple webgl/worker implementation.

Unfortunately, THREEJS 3D objects (classes) are to "heavy" to be used in workers (object can't pass through "worker thread"-"main thread" boundary, even after I patched threejs lib to be used inside worker).
But I successfully use workers to load pretty large objects asynchronously.
I use Catiline.js for convenience.
The idea is to use THREEJS objects native format (and buffer geometry) and simply parse it to js object inside worker. After it, you can use THREE.ObjectLoader in the main thread to get real scene object. The benefit from such approach is to move parsing (which can take quite a long time for the large object) to background and minify freezing.
I use 6 workers, choose worker randomly, pass data url to it and additionaly get benefits from XMLHttpRequest caching

Threejs objects can't be passed through a postMessage.
Instead we want to set up a connection back to the main page via web-sockets. This should let us freely pass whatever is needed.

This thread might be helpful to you... I recently had to do some SSR with Three.js and the concepts are similar expect you are parsing Buffer Geometries with ObjectLoader in the worker.
https://discourse.threejs.org/t/error-with-ssr-three-js-objects/8643

Efficient Direct2D multithreading

I'm writing a ebook reader app for Windows Store. I'm using Direct2D + DXGI swap chains to render book pages on screen.
My book content sometimes is quite complex (geometry, bitmaps, masks, etc), so it can take up to 100 ms to render it. So I'm trying to do an off-screen rendering to a bitmap in a separate thread, and then just show this bitmap in main thread.
However, I can't figure how to do it efficiently.
So far I've tried two approaches:
Use a single ID2D1Factory with D2D1_FACTORY_TYPE_MULTI_THREADED flag, create ID2D1BitmapRenderTarget and use it in background thread for off-screen rendering. (This additionally requires ID2D1Multithread::Enter/Leave on IDXGISwapChain::Present operations). Problem is, ID2D1RenderTarget::EndDraw operation in background thread sometimes take up to 100ms, and main thread rendering is blocked for this period due to internal Direct2D locking.
Use a separate ID2D1Factory in background thread (as described in http://www.sdknews.com/ios/using-direct2d-for-server-side-rendering) and turn off internal Direct2D synchronization. There is no cross-locking betwen two threads in this case. Unfortunately, in this case I can't use resulting bitmap in main ID2D1Factory directly, because it belongs to a different factory. I have to move bitmap data to CPU memory, then copy it into GPU memory of the main ID2D1Factory. This operation also introduce significant lags (I believe it to be due to large memory accesses, but I'm not sure).
Is there a way to do this efficiently?
P.S. All the timing here are given for Acer Switch 10 tablet. On regular Core i7 PC both approaches work without any visible lag.

Ok, I've found a solution.
Basically, all I needed is to modify approach 2 to use DXGI resource sharing between two DirectX factory sets. I'll skip all the gory details (they can be found here: http://xboxforums.create.msdn.com/forums/t/66208.aspx), but basic steps are:
Create two sets of DirectX resources: main (which will be used to onscreen rendering), and secondary (for offscreen rendering).
Using ID3D11Device2 from main resource set, create D3D 2D texture by CreateTexture2D D3D11_BIND_RENDER_TARGET, D3D11_BIND_SHADER_RESOURCE, D3D11_RESOURCE_MISC_SHARED_NTHANDLE and D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX flags.
Get shared handle from it by casting it to IDXGIResource1 and calling CreateSharedHandle from it with XGI_SHARED_RESOURCE_READ and DXGI_SHARED_RESOURCE_WRITE.
Open this shared texture in secondary resource set in background thread by calling ID3D11Device2::OpenSharedResource1.
Acquire keyed mutex of this texture (IDXGIKeyedMutex::AcquireSync), create render target from it (ID2D1Factory2::CreateDxgiSurfaceRenderTarget), draw on it and release mutex (IDXGIKeyedMutex::ReleaseSync).
On the main thread, in the main resource set, acquire mutex and create shared bitmap from texture created in step 2, draw this bitmap, then release mutex.
Note that mutex locking stuff is necessary. Not doing it results in some cryptic DirectX debug error messages, and erroneous operation or even crashing.

tl;dr: Render to bitmaps on background thread in software mode. Draw from bitmaps to render target on UI thread in hardware mode.
The best approach I've been able to find so far is to use background threads with software rendering (IWICImagingFactory::CreateBitmap and ID2D1Factory::CreateWicBitmapRenderTarget) and then copy it to a hardware bitmap back on the thread with the hardware render target via ID2D1RenderTarget::CreateBitmapFromWicBitmap. And then blit that using ID2D1RenderTarget::DrawBitmap.
This is how paint.net 4.0 does selection rendering. When you're drawing a selection with the lasso tool, it will use a background thread to draw the selection outline asynchronously (the UI thread does not wait for this to complete). You can end up with a very complicated polygon due to the stroke style and animations. I render it 4 times, where each animation frame has a slightly different offset for the dashed stroke style.
Obviously this rendering can take awhile as the polygon becomes more complex (that is, if you keep scribbling for awhile). I have a few other special optimizations for when you use the Move Selection tool which allows you to do transformations (rotate, translate, scale): if the background thread hasn't yet re-rendered the current polygon with the new transform, then I will render the old bitmap (with the current polygon and old transform) with the new transform applied. The selection outline may be distorted (scaling) or clipped (translated outside of viewable area) while the background thread catches up, but it's a small price to pay for 60fps responsiveness. This optimization works very well because you can't be modifying the polygon and transform of a selection at the same time.

OpenGL multithreaded scene graph traversal

I am seeking to improve the performance by reduce scene graph traversal overhead before each render call.I am not very experienced with multi-threaded software design so after reading a couple of articles regarding multi-threaded rendering I am unsure how to approach this issue:
My rendering engine is completely deterministic and renders frames based on incoming transformation instructions in sequential manner at each new frame.I currently see the threaded scene graph update routine as something like this:
--------------CPU-------------------------------------|------GPU--------|----Frame Number----|
Update Frame 0 Transforms (spawn thread) | GL RenderCall | Frame 0
Update Frame 1 Transforms (spawn thread) | GL RenderCall | Frame 1
Update Frame 2 Transforms (spawn thread) | GL RenderCall | Frame 2
...
.......
...............
Before the first draw call I start updating first(Frame 1) frame in separate tread and proceed with render call.At the end of that call I start new thread for update of frame 2 ,check if the thread for frame one is done and if true , I call next render call.And so on and so on.
That is how I see this happening.I have 2 questions:
1.Is it the proper (simple) way to design this kind of system?
2.What is the likelihood of render loop stalls because the scene graph update thread hasn't finished the update in synch with the start of the next render call?
I know some of the people here will say it depends on specific scene graph tree complexity, but I would like to know how it usually goes in reality and what are the major drawbacks of such a design/

As you probably know, you shouldn't render to a common OpenGL drawable from multiple threads, as this would result in a net slowdown. However preparing the drawing, aka the frame setup is a valid step to parallelize. It always boils down to generate a linear list of objects to draw in order to maximize throughput and generate a correct result.
Of course the actual generation steps depend on the structure used. But for a multithreaded design it usually boils down to a map and reduce kind of approach. Creating and synchronizing threads has a certain overhead. Luckily those problems are addressed by systems like OpenMP. I also suggest you perform the frame setup phase during the SwapBuffers wait of the preceding frame.

How can I load a texture in separate thread in cocos2d-x?

I faced the need to use multi-threading to load an additional texture on-the-fly in order to reduce the memory footprint.
The example case is that I have 10 types of enemy to use in the a single level but the enemies will come out type by type. The context of "type by type" means one type of enemy comes out and the player kills all of its instances, then it's time to call in another type. The process goes like this until all types come out, then the level is complete.
You can see it's better to not initially load all enemy's texture at once in the starting time (it's pretty big 2048*2048 with lots of animation frames inside which I need to create them in time of creation for each type of enemy). I turn this to multi-thread to load an additional texture when I need it. But I knew that cocos2d-x is not thread-safe. I planned to use CCSpriteFrameCache class to load a texture from .plist + .png file then re-create animation there and finally create a CCSprite from it to represent a new type of enemy instance. If I don't use multi-thread, I might suffer from delay of lag that would occur of loading a large size of texture.
So how can I load a texture in separate thread in cocos2d-x following my goal above? Any idea to avoid thread-safe issue but still can accomplish my goal is also appreciated.
Note: I'm developing on iOS platform.

I found that async-loading of image is already there inside cocos2d-x.
You can build a testing project of cocos2d-x and look into "Texture2DTest", then tap on the left arrow to see how async-loading look like.
I have taken a look inside the code.
You can use addImageAsync method of CCtextureCache to load additional texture on-the-fly without interfere or slow down other parts such as the current animation that is running.
In fact, addImageAsync of CCTextureCache will load CCTexture2D object for you and return back to its callback method to receive. You have additional task to make use of it on your behalf.
Please note that CCSpriteFrameCache uses CCTextureCache to load frames. So this applies to it as well for my case to load spritesheet consisting of frames to be used in animation creation. But unfortunately async type of method is not provided for CCSpriteFrameCache class. You have to manually load texture object via CCTextureCache then plug it in
void CCSpriteFrameCache::addSpriteFramesWithFile(const char *pszPlist, CCTexture2D *pobTexture)
There's 2 file in testing project you can take a look at.
Texture2dTest.cpp
TextureCacheTest.cpp

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string