I am new to multithread OpenGL. I don't want to use shared context in separated thread. I found a way that we can map memory to do asynchronous resource loading.
However I need to tell glBufferData or glTexImage2D to reserve the exact memory size for me. For BMP, we have information in the header. But to know the number of vertices in an obj file, we need to iterate through the whole file... How do commercial game engine do it? Design its own format?
A way to load resources asynchronously that worked for me is to push the inevitable read-from-file operations into designated threads. It can be easily combined with lazy resource management like so:
Set up your scene but don't immediately load your resources from file. Instead, flag it as unitialized and store the file paths for later use.
Start rendering your scene
As soon as you encounter a resource flagged as uninitialized, start a thread that asynchronously loads the data from file into a CPU buffer and store meta info (e.g. width and height of textures, number of vertices for meshes etc.) as well. For now, replace the requested resource by some default or simply "do nothing" (assuming your renderer can handle "empty" resources).
In subsequent rendering steps and requests for the resource, check if the associated loading thread has finished and, if so, create the necessary GPU buffers, upload the data and flag the resource as initialized. The stored metadata helps you to figure out the necessary size of the resulting buffers. You can now use the resource as intended.
Handling resources this way avoids sharing OpenGL contexts between multiple threads since you only handle the (presumably very heavy) CPU-bound load-from-file operations asynchronously. Of course, you will have to cope with mutual exclusion to safely check whether or not a loading thread has finished. Furthermore, you might consider defining and maintaining an upload budget for limiting the amount of data transferred from CPU to GPU per frame in order to avoid framerate drops.
Related
I switched to Vulkan from OpenGL to use multi-threading improvements.
In OpenGL, I was able to load dynamically object to the scene (buffer, textures, etc) while rendering by using a waiting system. I was loading all app-side stuffs in a thread, then when it was ready, just before a frame render in the main thread, I was sending everything into the video memory. That was fine.
With Vulkan, I know I can call some functions between threads without provoking the well known segfault from OpenGL. But, this doesn't works with vkQueueSubmit(). I already know, I tried the naive way. To me, it seems logical you can't bother a queue from multiple threads.
I came with some ideas, but I don't know which one is good or bad.
First, I would go the OpenGL way, I will prepare everything I can from the CPU/App side, then just before render a frame, I will submit buffers (with transfer queue) to the video memory. But I feel there is no a real improvement from OpenGL way...
Second, I will try to use the synchronization mechanism to be able to send buffers in a thread and render from an other. But I keep reading there is a lot of way to slow down everything by causing irrelevant locks or by using incorrectly semaphores and fences.
So my question, is basically what path to pick to solve this problem ? How can I load a buffer dynamically from an other thread while the main thread is rendering without making too much pain to performances ? How Vulkan can help ?
If you want to stream resources for immediate use (i.e. the main render cannot proceed without them), then you're pretty much going to either block the main thread waiting, or have it spin doing something visually interesting (e.g. an animated loading screen) waiting for the resources to load.
If you want to stream resources while the app is doing real rendering then the main trick here is to load resources asynchronously in the background and only switch to using those resources in the main thread once they are already loaded. If the main thread ever ends up actually blocked on a semaphore then you've probably already started dropping frames, so your "engine" design needs to ensure that never happens. A lot of game use simple low-detail proxy objects as stand-in versions while the high-detail version is loading in the background.
None of this is particularly related to the graphics API - both GL and Vulkan need the same macro-scale behavior. Vulkan API features don't particularly help because the major bottlenecks which cause problems here are storage/network/CPU which have nothing to do with the graphics part of the problem.
I decided to trust threads !
In the first place it seems to work, I get a lot of :
[MESSAGE:Validation Error: [ UNASSIGNED-Threading-MultipleThreads ] Object 0: handle = 0x56414228bad8, type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0x141cb623 | THREADING ERROR : vkQueueSubmit(): object of type VkQueue is simultaneously used in thread 0x7f6b977fe640 and thread 0x7f6bc2bcb740]
But it works !
So, the basic idea is to have a thread for loading objects while the engine is drawing. This thread takes care of creating the UBO for the location of the object, then when the geometry is loaded from RAM, it creates the VBO and IBO (I left material with image/UBO on hold for now), then creates the graphics pipeline (with layout, descriptor layout, shaders compiled with GLSLang on the fly) (The next idea is to reuse pipeline for similar needs) and finallly sets a flag to say the object is ready to use. In the other hand, I have my main thread rendering and takes new objects when they shows up ready.
I think it works because I have a gentle video card (GTX 1070) with multiple queues setup, I had one for graphics and an other one for transfer setup.
I'm pretty sure, this will crash or goes poorly with a GPU with a single queue, and this should be why the validation layers tolds me these messages.
I am working on a multi threaded OpenGL application with OpenTK 3 and WinForms.
I have 2 shared GraphicsContexts:
a "main" rendering context, used for scene drawing and synchronous load operations.
a "secondary" resource loader context, used to load resources during draw.
This secondary context is used to load video frames coming from a Windows Media Foundation session (with a custom media sink). However, i have no control on what thread this media sink is running on, so i need a way, after each loading operation, to "unbind" that secondary GraphicsContext, so that it can be bound in the next thread where it will be needed.
Do I have to P/Invoke wglMakeCurrent(NULL, NULL) or is there a proper OpenTK way of doing this?
Short answer
Use OpenTK feature:
mycontext.MakeCurrent(null);
Long answer
Today's wglMakeCurrent doc has eliminated this old comment:
If hglrc is NULL, the function makes the calling thread's current
rendering context no longer current, and releases the device context
that is used by the rendering context. In this case, hdc is ignored.
I would trust that comment is still valid, due to so many code relying on it.
Pay attention to "releases the device context". Perhaps OpenTK does some action related to the device context. Perhaps the hdc is private (by using window style flag CS_OWNDC) So, let OpenTK handles this "NULL" case.
Better approach
Be aware that even when you use several shared contexts, is the GPU (normally one unique card) that does the loading, and not many cards allow loading while doing other jobs. Thus, it isn't guaranteed you get better performance. But shared contexts exist to this purpose, somehow.
Why should you use the same context in different threads?
I'd use a different thread for load video frames (without any gl-call) and for upload them to the GPU. This last thread is permanent and has its own gl-context, so it doesn't need to set as current every time it works. It sleeps or waits until the other thread has finished loading data, and after that task is completed it uploads that data to the GPU.
I want to share my thoughts about how to keep memory barriers in sync in multi-threading rendering. Please let me know if my thoughts about Vulkan memory barrier is wrong or if my current plan makes any sense. I don't have anyone at work to discuss with, so I'll ask here for help.
For resources in Vulkan, when I set memory barriers for them among drawcalls, I need to set both srcAccessMask and dst AccessMask. This is simple for single threaded rendering. But for multi-threading rendering, it gets complicated. dst AccessMask is not a problem, since we always know what the resource is going to be used for. But for srcAccessMask, when one command buffer tries to read the current access mask of some resource, there might be other command buffers changing it to something else. So my current thoughts of solving it is:
Each resource keeps its own state, I'll only update the state right before submitting command buffers to command queue, I will describe it later. Each command buffer maintains tracking record of how the resource state changed inside it. Doing this way, within the same command buffer the access state of each resource is clear, the only problem is the beginning state of the resource for each command buffer.
When submitting multiple command buffers to execute, as the order of command buffers are fixed now, I check the tracking record of each resource among all command buffers, update resource's state based on the end state of the resource in each command buffer, and use that to correct the beginning state of the same resource in each command buffer's tracking record.
Then I need to either insert a new command buffer to have extra memory barrier to transition resource to correct state for the first command buffer, or insert memory barrier into previous command buffer for the rest command buffers.When all these are done, I can finally submit the command buffers together as a batch.
Do these make sense to you? Are there better solutions to solve it? Or do we even need to solve the "synchronization" issue of access state for each resource?
Thank you for your time
What you're talking about only makes sense in a world where none of these rendering operations have even the slightest idea what's going on elsewhere. Where the consumer of an image has no idea how the data in the image got there. Which probably means that it doesn't really know what that image means conceptually.
Vulkan is a low-level API. The idea is that you can connect the high-level concepts of your rendering system directly to Vulkan. So at a high level, you know that resource X has meaning Y and in this frame will have its data generated from operation Z. Not because of something stored in resource X but because it is resource X; that's what resource X is for. So both the operation generating it and the operation consuming it know what's going on and how it got there.
For example, if you're doing deferred rendering and SSAO, then your SSAO renderpass knows that the texture containing the depth buffer had its values generated by rendering. The depth buffer doesn't need something stored in it to say that; that's simply the nature of your rendering. It's hard-coded to work that way.
Most of your resource dependencies are (or ought to be) that way.
If you're doing some render-to-texture operation via the framebuffer, then the consumer probably doesn't even need to know about the dependency. You can just set an appropriate external dependency for the renderpass and the subpass that generates it. And you probably know why you did the render-to-texture op, and you probably know where it's going. If you're doing RTT for reflection, you know that the destination will be some kind of shader stage texture fetch. And if you don't know how it's going to be used, then you can just be safe and set all of the destination stage bits.
What you're talking about makes some degree of sense if you're dealing with streamed objects, where objects are popping into and outof memory with some regularity. But even then, that's not really a property of each individual resource.
When you load a streamed chunk, you upload its data by generating command buffer(s) and submitting them. And here's where we have an implementation-specific divergence. Your best bet for performance is to execute these CBs on a queue dedicated for transfer operations. But since Vulkan doesn't guarantee all implementations have those, you need to be able to deliver those transfer CBs to the main rendering queue.
So you need a way to communicate to rendering threads when they can expect to start being able to use the resources. But even that doesn't need to be on a per-resource basis; they can be told "stuff from block X is available", and then they can start using it.
Furthermore, that implementation divergence becomes important. See, if it's done on another queue, a barrier isn't the right synchronization primitive. Your rendering CBs now have to have their submitted batches wait on a semaphore. And that semaphore should handle all of the synchronization needs of the memory (ie: the destination bits being everything). So in the implementation where the transfer CBs are executed on the same queue as your rendering CBs, you may as well save yourself some trouble and issue a single barrier at the end of the transfer CB that makes all of the given resources available to all stages.
So as previously stated, this kind of automated system is only useful if you have no real control over the structure of rendering. This would principally be true if you're writing some kind of middleware, where the higher-level code defines the structure of rendering. However, if that's the case, Vulkan probably isn't the right tool for that job.
I'm making a networked computer game using Unity3D version 3.x on Mac. I have a game client and a game server. Whenever data arrives at the client from the server, the client would freeze for a little bit (~0.5 seconds), render the new data, and then continue. Is there any way I can optimize my game so that incoming data does not affect user's interaction with the game client?
Here's what I'm doing now:
I created a new thread to pull data from the server.
When the data arrives I put it in a buffer which is protected by a mutual exclusion lock.
In the Unity thread, on every frame, I check if the buffer is empty. If it is not empty, I wait on the mutual exclusion lock for permission to process the data. Once I got the permission, I parse the data and render it.
I'm doing this because Unity does not allow me to create new GameObjects in the network thread that I created. But I wonder if there's anything I can do to optimize user experience.
It's always best to first profile to know the source of the freezing, i.e. is it due to waiting on the lock? due to parsing? due to GameObject instantiation? all three?
Unity Profiler (Pro only)
General solutions:
If the pause is due to the lock, try splitting up your buffering into
different bins, so that the main thread can access the next non-empty
bin without having to acquire lock, or at least such a coarse-grained lock.
If parsing is the slowdown, you can still do all of that in a
background thread. Perform as much as you possibly can before you finally must instantiate GameObjects.
If it's the final step of instantiating GameObjects, then you
can try to preinstantiate objects you expect, simply reconfiguring
them upon new network data; or divide up the instantation
process into separate, incremental phases, with unnoticeably small
pauses, e.g. a root node first, then next phase its children, and so on, until fully reconstructed.
I have four threads in a C++/CLI GUI I'm developing:
Collects raw data
The GUI itself
A background processing thread which takes chunks of raw data and produces useful information
Acts as a controller which joins the other three threads
I've got the raw data collector working and posting results to the controller, but the next step is to store all of those results so that the GUI and background processor have access to them.
New raw data is fed in one result at a time at regular (frequent) intervals. The GUI will access each new item as it arrives (the controller announces new data and the GUI then accesses the shared buffer). The data processor will periodically read a chunk of the buffer (a seconds worth for example) and produce a new result. So effectively, there's one producer and two consumers which need access.
I've hunted around, but none of the CLI-supplied stuff sounds all that useful, so I'm considering rolling my own. A shared circular buffer which allows write-locks for the collector and read locks for the gui and data processor. This will allow multiple threads to read the data as long as those sections of the buffer are not being written to.
So my question is: Are there any simple solutions in the .net libraries which could achieve this? Am I mad for considering rolling my own? Is there a better way of doing this?
Is it possible to rephrase the problem so that:
The Collector collects a new data point ...
... which it passes to the Controller.
The Controller fires a GUI "NewDataPointEvent" ...
... and stores the data point in an array.
If the array is full (or otherwise ready for processing), the Controller sends the array to the Processor ...
... and starts a new array.
If the values passed between threads are not modified after they are shared, this might save you from needing the custom thread-safe collection class, and reduce the amount of locking required.