Rendering to different framebuffers with multithreading rendering in Vulkan

Rendering to different framebuffers with multithreading rendering in Vulkan - graphics

I was trying to reproduce a result that is similar to this video: https://www.youtube.com/watch?v=21UsMuFTN0k Specifically, I want to render the whole scene to a different texture, and put the texture inside the UI like this screenshot of the video:
The author of the video was using OpenGL to do this, and I was trying to achieve it in Vulkan instead. However, since my current program is using the third attachment to enable MSAA and rendering the whole scene using secondary command buffers, I have difficulty translating the way to do this in the video to Vulkan. I figured that I probably not only need more framebuffers, but also multiple renderpasses. Simply put, so far this is what I tried:
Begin the first renderpass, with the renderpass begin info set to a smaller render area, an offscreen renderpass, and a separate framebuffer.
Pass the offscreen renderpass and the framebuffer to an inheritance info, and pass it to secondary command buffers to do the drawing.
End the first renderpass.
Begin the second renderpass, with the renderpass begin info set to the actual size of the screen, the primary renderpass, and the primary framebuffer.
Pass the second renderpass and the framebuffer to an inheritance info, and pass it to secondary command buffers to do the drawing.
Do vkCmdExecuteCommands.
End the second renderpass.
End command buffer.
Nonetheless, when the program is executed, the validation layer shows that:
vkCmdExecuteCommands(): Cannot duplicate VkCommandBuffer 0x1c26226d2e8[] in pCommandBuffers without VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT set. The Vulkan spec states: If any element of pCommandBuffers was not recorded with the VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT flag, it
must not appear more than once in pCommandBuffers
Does this mean I have to set VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT for all secondary command buffers? Or there are other ways to correct do it? Because as far as I know, setting VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT has a performance cost.
Also, if I instead try to do vkExecuteCommands in each renderpass, the validation layer will show that the command buffer has been destroyed.
I wonder what the correct way to reproduce a similar result in Vulkan is, and whether I have to separate them so that I will have to do vkQueueSubmit multiple times.

You are trying to do two RenderPasses with different Framebuffers in a single CommandBuffer.
RenderPass Execution
Execution of RenderPasses inside Graphics Queue can be Out-of-order, ie) no guarantee that the RenderPasses will be executed based on the submission order. Since, you are recording multi-passes in the same Command Buffer, it can be a problem
Synchronization Using Semaphores
Record the First RenderPass and Second RenderPass in different Command Buffers and use Semaphores to synchronize when you submit it to the queue.
Use a Semaphore to signal in vkQueueSubmit for the First RenderPass. When you submit the Second RenderPass use the semaphore in the wait slot.
Check for VKSubmitInfo in VKQueueSubmit. This is used in internal synchronization in the queue.
ImageLayout Transition
You also have to ensure that the Texture you write and read are in the correct Layouts at the appropriate RenderPasses.
First RenderPass, the texture has to in COLOR_ATTACHMENT_OPTIMAL
Second RenderPass, the texture has to in SHADER_READ_ONLY_OPTIMAL
You can use use a ImageBarrier or make the first renderpass output to transition the attachment Layout to Read_Only thus preparing early.
General Instructions
Record First Pass with a CommandBuffer, Change Final ImageLayout of Texture to SHADER_READ_ONLY_OPTIMAL(if not using ImageBarrier)
Record Second Pass with a CommandBuffer
Submit First Pass with a wait Semaphore
Submit Second Pass with a signal Semaphore
If you use ImageBarrier, you need one more semaphore and one more Command Buffer to insert into middle of the First and Second Pass.
Command_Buffer_Simultaneous_Use_Bit
Simultaneous Bit is used when you need store some commands which needs to be submitted multiple times into different queues or recorded into different primary buffers.
This is commonly used in secondary buffers to store few commands (perhaps a common procedure in many process) and to record in primary buffers as needed.
In your case it's not needed.

Related

Vulkan image layout transition using render pass

I have a Vulkan renderer where a new render pass is started at the beginning of each frame. Then the command buffer is passed to several modules each of which submits some draw calls and other commands. Before the draw calls are submitted, it checks whether any additional textures have to be loaded which it does, at the end of the loading process an image layout transition to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL is performed.
The problem is that the layout transition uses a vkCmdPipelineBarrier command which only allows same format for input and output ( VUID-vkCmdPipelineBarrier-oldLayout-01181 ) which makes the transition useless for me.
My only solution at the moment is to move the layout transitions to a different command buffer and submit it to the command queue separately, then retrieve the result before continuing building my command buffer, but that means I have to wait for all previous commands in the queue to finish, right?
Can you imagine a cleaner solution for performing layout transitions between different layouts during an active render pass?
Regards

My only solution at the moment is to move the layout transitions to a different command buffer and submit it to the command queue separately, then retrieve the result before continuing building my command buffer, but that means I have to wait for all previous commands in the queue to finish, right?
You're not thinking fourth dimensionally.
The order you build the command buffers in is irrelevant. What matters is the order you submit them in.
The code which is building your rendering command buffer can also be building a transfer command buffer. You write your rendering command assuming the images are in the correct layout, and you write transfer commands which put those images into the right layout after the transfer is complete. The order you write these commands is irrelevant; the only order that matters is their submission order. Put one CB before the other in that order, and (barriers willing) they will execute appropriately in that order.
The CPU never needs to wait on anything. No "results" need to be retrieved; that's the GPU's job.
Being able to do this is why you have to tell Vulkan what the layout of each image is anytime you try to use it.

vulkan barriers and multi-threading

I want to share my thoughts about how to keep memory barriers in sync in multi-threading rendering. Please let me know if my thoughts about Vulkan memory barrier is wrong or if my current plan makes any sense. I don't have anyone at work to discuss with, so I'll ask here for help.
For resources in Vulkan, when I set memory barriers for them among drawcalls, I need to set both srcAccessMask and dst AccessMask. This is simple for single threaded rendering. But for multi-threading rendering, it gets complicated. dst AccessMask is not a problem, since we always know what the resource is going to be used for. But for srcAccessMask, when one command buffer tries to read the current access mask of some resource, there might be other command buffers changing it to something else. So my current thoughts of solving it is:
Each resource keeps its own state, I'll only update the state right before submitting command buffers to command queue, I will describe it later. Each command buffer maintains tracking record of how the resource state changed inside it. Doing this way, within the same command buffer the access state of each resource is clear, the only problem is the beginning state of the resource for each command buffer.
When submitting multiple command buffers to execute, as the order of command buffers are fixed now, I check the tracking record of each resource among all command buffers, update resource's state based on the end state of the resource in each command buffer, and use that to correct the beginning state of the same resource in each command buffer's tracking record.
Then I need to either insert a new command buffer to have extra memory barrier to transition resource to correct state for the first command buffer, or insert memory barrier into previous command buffer for the rest command buffers.When all these are done, I can finally submit the command buffers together as a batch.
Do these make sense to you? Are there better solutions to solve it? Or do we even need to solve the "synchronization" issue of access state for each resource?
Thank you for your time

What you're talking about only makes sense in a world where none of these rendering operations have even the slightest idea what's going on elsewhere. Where the consumer of an image has no idea how the data in the image got there. Which probably means that it doesn't really know what that image means conceptually.
Vulkan is a low-level API. The idea is that you can connect the high-level concepts of your rendering system directly to Vulkan. So at a high level, you know that resource X has meaning Y and in this frame will have its data generated from operation Z. Not because of something stored in resource X but because it is resource X; that's what resource X is for. So both the operation generating it and the operation consuming it know what's going on and how it got there.
For example, if you're doing deferred rendering and SSAO, then your SSAO renderpass knows that the texture containing the depth buffer had its values generated by rendering. The depth buffer doesn't need something stored in it to say that; that's simply the nature of your rendering. It's hard-coded to work that way.
Most of your resource dependencies are (or ought to be) that way.
If you're doing some render-to-texture operation via the framebuffer, then the consumer probably doesn't even need to know about the dependency. You can just set an appropriate external dependency for the renderpass and the subpass that generates it. And you probably know why you did the render-to-texture op, and you probably know where it's going. If you're doing RTT for reflection, you know that the destination will be some kind of shader stage texture fetch. And if you don't know how it's going to be used, then you can just be safe and set all of the destination stage bits.
What you're talking about makes some degree of sense if you're dealing with streamed objects, where objects are popping into and outof memory with some regularity. But even then, that's not really a property of each individual resource.
When you load a streamed chunk, you upload its data by generating command buffer(s) and submitting them. And here's where we have an implementation-specific divergence. Your best bet for performance is to execute these CBs on a queue dedicated for transfer operations. But since Vulkan doesn't guarantee all implementations have those, you need to be able to deliver those transfer CBs to the main rendering queue.
So you need a way to communicate to rendering threads when they can expect to start being able to use the resources. But even that doesn't need to be on a per-resource basis; they can be told "stuff from block X is available", and then they can start using it.
Furthermore, that implementation divergence becomes important. See, if it's done on another queue, a barrier isn't the right synchronization primitive. Your rendering CBs now have to have their submitted batches wait on a semaphore. And that semaphore should handle all of the synchronization needs of the memory (ie: the destination bits being everything). So in the implementation where the transfer CBs are executed on the same queue as your rendering CBs, you may as well save yourself some trouble and issue a single barrier at the end of the transfer CB that makes all of the given resources available to all stages.
So as previously stated, this kind of automated system is only useful if you have no real control over the structure of rendering. This would principally be true if you're writing some kind of middleware, where the higher-level code defines the structure of rendering. However, if that's the case, Vulkan probably isn't the right tool for that job.

How to make tkinter Canvas update only on-demand?

I'm writing a graphics program in Python and I would like to know how to make a Canvas update only on-demand; that is, stop a canvas from updating every run of the event loop and instead update only when I tell it to.
I want to do this because in my program I have a separate thread that reads graphics data from standard input to prevent blocking the event loop (given that there's no reliable, portable way to poll standard input in Python, and polling sucks anyway), but I want the screen to be updated only at intervals of a certain amount of time, not whenever the separate thread starts reading input.

You can't pause the update of the canvas without pausing the entire GUI.
A simple solution would be for you to not draw to the canvas until you're ready for the update. Instead of calling canvas commands, push those commands onto a queue. When you're ready to refresh the display, iterate over the commands and run them.
You could also do your own double-buffering, where you have two canvases. The one you are actively drawing would be behind the visible one. When you are ready to display the results, swap the stacking order of the canvases.

How can I load a texture in separate thread in cocos2d-x?

I faced the need to use multi-threading to load an additional texture on-the-fly in order to reduce the memory footprint.
The example case is that I have 10 types of enemy to use in the a single level but the enemies will come out type by type. The context of "type by type" means one type of enemy comes out and the player kills all of its instances, then it's time to call in another type. The process goes like this until all types come out, then the level is complete.
You can see it's better to not initially load all enemy's texture at once in the starting time (it's pretty big 2048*2048 with lots of animation frames inside which I need to create them in time of creation for each type of enemy). I turn this to multi-thread to load an additional texture when I need it. But I knew that cocos2d-x is not thread-safe. I planned to use CCSpriteFrameCache class to load a texture from .plist + .png file then re-create animation there and finally create a CCSprite from it to represent a new type of enemy instance. If I don't use multi-thread, I might suffer from delay of lag that would occur of loading a large size of texture.
So how can I load a texture in separate thread in cocos2d-x following my goal above? Any idea to avoid thread-safe issue but still can accomplish my goal is also appreciated.
Note: I'm developing on iOS platform.

I found that async-loading of image is already there inside cocos2d-x.
You can build a testing project of cocos2d-x and look into "Texture2DTest", then tap on the left arrow to see how async-loading look like.
I have taken a look inside the code.
You can use addImageAsync method of CCtextureCache to load additional texture on-the-fly without interfere or slow down other parts such as the current animation that is running.
In fact, addImageAsync of CCTextureCache will load CCTexture2D object for you and return back to its callback method to receive. You have additional task to make use of it on your behalf.
Please note that CCSpriteFrameCache uses CCTextureCache to load frames. So this applies to it as well for my case to load spritesheet consisting of frames to be used in animation creation. But unfortunately async type of method is not provided for CCSpriteFrameCache class. You have to manually load texture object via CCTextureCache then plug it in
void CCSpriteFrameCache::addSpriteFramesWithFile(const char *pszPlist, CCTexture2D *pobTexture)
There's 2 file in testing project you can take a look at.
Texture2dTest.cpp
TextureCacheTest.cpp

How can I perform actions while waiting for vertical sync?

I want to be able to load/download a bunch of resources and notify the user of the file that's currently being loaded, but I can't just draw a frame after each file starts loading because v-sync will wait until it can draw a frame before it continues (bottle-necking the loads to less than 60/second).
Is there a way to check whether the device is ready to draw or not (without hacky "has 1/60th of a second passed yet?), so I can perform actions until it's ready? I don't mind if the notification skips over files that finished before it's ready to draw, but I want to maximize the speed of the loads while still being able to notify the user.
Also, I'd like to avoid disabling v-sync even temporarily because I don't want to cause a graphic card crippling 300FPS rate if the computer loads really quickly.

You don't specify which version of Direct3D you're using. With D3D9 you can pass D3DPRESENT_DONOTWAIT to your Present() call and it will return D3DERR_WASSTILLDRAWING if the hardware is busy processing or waiting for a vertical sync interval. This means if you have vsync enabled, in your main loop you can just call Present with the DONOTWAIT flag whenever you've loaded a file and load another if it returns WASSTILLDRAWING.
Note that you need to get the swap chain and call Present() on the swap chain rather than calling Present() directly on the device to be able to pass this flag to present, or you can set it in the D3DPRESENT_PARAMETERS structure when you create the device or create an IDirect3DDevice9Ex instead of an IDirect3DDevice9 device and call PresentEx() instead of Present().
This doesn't solve the problem of files that take longer than a frame to load however - your frame rate will drop if you have a file that takes a long time to process. A better solution to this problem in my opinion would be to move as much of your IO as possible to another thread (in D3D9 you will still have to create D3D resources on your main thread however) and just pass the name of the file that's currently being processed to your main / render thread to display each time you present a frame.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string