I am writing a small WebGL engine and I was wondering if there is any good reason to keep gl context resources that are created and then buffered or compiled and saved in another object? For example: I create buffers for positions, normals, uv and indices and then buffer it into an VAO. Is it good practice to delete the buffers (i.e., all buffers except VAO) after saved in the VAO (i.e., after initialization) or could I use the buffers at some other point (and delete resources when the corresponding object is destroyed explicitly)? Same goes for createShader and deleteShader calls: after compilation and saving into a program object, can I safely delete the vsh, fsh Shaders created or could I reuse them at some other point?
The VAO does not contain vertices data, it only stores, so to say, states of pointers configuration on bound buffers. Ressources deletion is however somewhat fuzzy, since ressources need to be unbound to be released. Anyway as long you need vertices data to draw, these data needs to be in memory, and these data are not (conceptually) referenced by the VAO, but by the VBO. In the "worst" case (i never tested), if you request a VBO deletion while the VAO still in use, maybe you simply lose access to the VBO, but data still in memory. The logical way is to keep your VBO references as long you use these data to draw something. This way you keep access and control to the data. Once you want to delete all stuff, you first delete the VAO, then the related VBO(s).
For shaders this is not the same, and here, you can detach and delete shader objects once the shader program is linked, as long you don't need to link another program with the same shader object, allowing you to save memory. Here is an explanation here, relative to OpenGL, but which is equally true for WebGL: Proper way to delete GLSL shader?
Related
I want to share my thoughts about how to keep memory barriers in sync in multi-threading rendering. Please let me know if my thoughts about Vulkan memory barrier is wrong or if my current plan makes any sense. I don't have anyone at work to discuss with, so I'll ask here for help.
For resources in Vulkan, when I set memory barriers for them among drawcalls, I need to set both srcAccessMask and dst AccessMask. This is simple for single threaded rendering. But for multi-threading rendering, it gets complicated. dst AccessMask is not a problem, since we always know what the resource is going to be used for. But for srcAccessMask, when one command buffer tries to read the current access mask of some resource, there might be other command buffers changing it to something else. So my current thoughts of solving it is:
Each resource keeps its own state, I'll only update the state right before submitting command buffers to command queue, I will describe it later. Each command buffer maintains tracking record of how the resource state changed inside it. Doing this way, within the same command buffer the access state of each resource is clear, the only problem is the beginning state of the resource for each command buffer.
When submitting multiple command buffers to execute, as the order of command buffers are fixed now, I check the tracking record of each resource among all command buffers, update resource's state based on the end state of the resource in each command buffer, and use that to correct the beginning state of the same resource in each command buffer's tracking record.
Then I need to either insert a new command buffer to have extra memory barrier to transition resource to correct state for the first command buffer, or insert memory barrier into previous command buffer for the rest command buffers.When all these are done, I can finally submit the command buffers together as a batch.
Do these make sense to you? Are there better solutions to solve it? Or do we even need to solve the "synchronization" issue of access state for each resource?
Thank you for your time
What you're talking about only makes sense in a world where none of these rendering operations have even the slightest idea what's going on elsewhere. Where the consumer of an image has no idea how the data in the image got there. Which probably means that it doesn't really know what that image means conceptually.
Vulkan is a low-level API. The idea is that you can connect the high-level concepts of your rendering system directly to Vulkan. So at a high level, you know that resource X has meaning Y and in this frame will have its data generated from operation Z. Not because of something stored in resource X but because it is resource X; that's what resource X is for. So both the operation generating it and the operation consuming it know what's going on and how it got there.
For example, if you're doing deferred rendering and SSAO, then your SSAO renderpass knows that the texture containing the depth buffer had its values generated by rendering. The depth buffer doesn't need something stored in it to say that; that's simply the nature of your rendering. It's hard-coded to work that way.
Most of your resource dependencies are (or ought to be) that way.
If you're doing some render-to-texture operation via the framebuffer, then the consumer probably doesn't even need to know about the dependency. You can just set an appropriate external dependency for the renderpass and the subpass that generates it. And you probably know why you did the render-to-texture op, and you probably know where it's going. If you're doing RTT for reflection, you know that the destination will be some kind of shader stage texture fetch. And if you don't know how it's going to be used, then you can just be safe and set all of the destination stage bits.
What you're talking about makes some degree of sense if you're dealing with streamed objects, where objects are popping into and outof memory with some regularity. But even then, that's not really a property of each individual resource.
When you load a streamed chunk, you upload its data by generating command buffer(s) and submitting them. And here's where we have an implementation-specific divergence. Your best bet for performance is to execute these CBs on a queue dedicated for transfer operations. But since Vulkan doesn't guarantee all implementations have those, you need to be able to deliver those transfer CBs to the main rendering queue.
So you need a way to communicate to rendering threads when they can expect to start being able to use the resources. But even that doesn't need to be on a per-resource basis; they can be told "stuff from block X is available", and then they can start using it.
Furthermore, that implementation divergence becomes important. See, if it's done on another queue, a barrier isn't the right synchronization primitive. Your rendering CBs now have to have their submitted batches wait on a semaphore. And that semaphore should handle all of the synchronization needs of the memory (ie: the destination bits being everything). So in the implementation where the transfer CBs are executed on the same queue as your rendering CBs, you may as well save yourself some trouble and issue a single barrier at the end of the transfer CB that makes all of the given resources available to all stages.
So as previously stated, this kind of automated system is only useful if you have no real control over the structure of rendering. This would principally be true if you're writing some kind of middleware, where the higher-level code defines the structure of rendering. However, if that's the case, Vulkan probably isn't the right tool for that job.
I am new to multithread OpenGL. I don't want to use shared context in separated thread. I found a way that we can map memory to do asynchronous resource loading.
However I need to tell glBufferData or glTexImage2D to reserve the exact memory size for me. For BMP, we have information in the header. But to know the number of vertices in an obj file, we need to iterate through the whole file... How do commercial game engine do it? Design its own format?
A way to load resources asynchronously that worked for me is to push the inevitable read-from-file operations into designated threads. It can be easily combined with lazy resource management like so:
Set up your scene but don't immediately load your resources from file. Instead, flag it as unitialized and store the file paths for later use.
Start rendering your scene
As soon as you encounter a resource flagged as uninitialized, start a thread that asynchronously loads the data from file into a CPU buffer and store meta info (e.g. width and height of textures, number of vertices for meshes etc.) as well. For now, replace the requested resource by some default or simply "do nothing" (assuming your renderer can handle "empty" resources).
In subsequent rendering steps and requests for the resource, check if the associated loading thread has finished and, if so, create the necessary GPU buffers, upload the data and flag the resource as initialized. The stored metadata helps you to figure out the necessary size of the resulting buffers. You can now use the resource as intended.
Handling resources this way avoids sharing OpenGL contexts between multiple threads since you only handle the (presumably very heavy) CPU-bound load-from-file operations asynchronously. Of course, you will have to cope with mutual exclusion to safely check whether or not a loading thread has finished. Furthermore, you might consider defining and maintaining an upload budget for limiting the amount of data transferred from CPU to GPU per frame in order to avoid framerate drops.
I faced the need to use multi-threading to load an additional texture on-the-fly in order to reduce the memory footprint.
The example case is that I have 10 types of enemy to use in the a single level but the enemies will come out type by type. The context of "type by type" means one type of enemy comes out and the player kills all of its instances, then it's time to call in another type. The process goes like this until all types come out, then the level is complete.
You can see it's better to not initially load all enemy's texture at once in the starting time (it's pretty big 2048*2048 with lots of animation frames inside which I need to create them in time of creation for each type of enemy). I turn this to multi-thread to load an additional texture when I need it. But I knew that cocos2d-x is not thread-safe. I planned to use CCSpriteFrameCache class to load a texture from .plist + .png file then re-create animation there and finally create a CCSprite from it to represent a new type of enemy instance. If I don't use multi-thread, I might suffer from delay of lag that would occur of loading a large size of texture.
So how can I load a texture in separate thread in cocos2d-x following my goal above? Any idea to avoid thread-safe issue but still can accomplish my goal is also appreciated.
Note: I'm developing on iOS platform.
I found that async-loading of image is already there inside cocos2d-x.
You can build a testing project of cocos2d-x and look into "Texture2DTest", then tap on the left arrow to see how async-loading look like.
I have taken a look inside the code.
You can use addImageAsync method of CCtextureCache to load additional texture on-the-fly without interfere or slow down other parts such as the current animation that is running.
In fact, addImageAsync of CCTextureCache will load CCTexture2D object for you and return back to its callback method to receive. You have additional task to make use of it on your behalf.
Please note that CCSpriteFrameCache uses CCTextureCache to load frames. So this applies to it as well for my case to load spritesheet consisting of frames to be used in animation creation. But unfortunately async type of method is not provided for CCSpriteFrameCache class. You have to manually load texture object via CCTextureCache then plug it in
void CCSpriteFrameCache::addSpriteFramesWithFile(const char *pszPlist, CCTexture2D *pobTexture)
There's 2 file in testing project you can take a look at.
Texture2dTest.cpp
TextureCacheTest.cpp
"The current vertex declaration does not include all the elements required by the current vertex shader. TextureCoordinate0 is missing."
I get this error when I try to use a spriteFont to draw my FPS on the screen, on the line where I call spriteBatch.End()
My effect doesn't even use texture coordinates.
But I have found the root of the problem, just not how to fix it.
I have a separate thread that builds the geometry (an LOD algorithm) and somehow this seems to be why I have the problem.
If I make it non-threaded and just update my mesh per frame I don't get an error.
And also if I keep it multithreaded but don't try to draw text on the screen it works fine.
I just can't do both.
To make it even more strange, it actually compiles and runs for a little bit. But always crashes.
I put a Thread.Sleep in the method that builds the mesh so that it happens less often and I saw that the more I make this thread sleep, and the less it gets called, the longer it will run on average before crashing.
If it sleeps for 1000ms it runs for maybe a minute. If it sleeps for 10ms it doesn't even show one frame before crashing. This makes me believe that it has to do with a certain line of code being executed on the mesh building thread at the same time you are drawing text on the screen.
It seems like maybe I have to lock something when drawing the text, but I have no clue.
Any ideas?
My information comes from the presentation "Understanding XNA Framework Performance" from GDC 2008. It says:
GraphicsDevice is somewhat thread-safe
Cannot render from more than one thread at a time
Can create resources and SetData while another thread renders
ContentManager is not thread-safe
Ok to have multiple instances, but only one per thread
My guess is that you're breaking one of these rules somewhere. Or you're modifying a buffer that is being used to render without the appropriate locking.
What is a good way to think about this? Are handles lost?
glCreateProgram()?
glCreateShader()?
glGenTextures()?
glGenBuffers()?
Im wondering if I am doing what's necessary (or doing too much and leaking memory)
How do you lose a context? The OpenGL context will exist until you destroy it (or the window/HDC).
However, all OpenGL objects are bound to the context(s) that they are created in. If you destroy the context, all objects are destroyed as well (unless you have shared objects with another context. In which case, only the sharable objects will remain). So you must reload them.
For example, do I call all 3 function calls: glCreateProgram() glAttachShader() glLinkProgram() or just the last two?
If the OpenGL context is destroyed, you must call whatever OpenGL functions you need to recreate your objects. Any OpenGL objects that you got from the old context are gone. They are invalid. They are deleted pointers, and using deleted pointers is always wrong.
A new OpenGL context is new. So you must create your objects as though it were a new context. Because it is.