confused about render pass in Vulkan API

confused about render pass in Vulkan API - graphics

Recently i started learning Vulkan API,there are some topics that confuses me, my question is what is a render pass, why it is used concurrently with command buffer recording? and finally what are sub pass, sub pass dependencies and attachments? that are commonly related to render pass.

It's the only way to get something drawn (draw commands can only be inside render pass). So don't overthink it. As a begginer you only need to create one render pass with one (mandatory) subpass and that's it. You can understand the depths of it later.
Also you should give some chance to all those videos and tutorials, which are written at length and with more care than whatever will someone write here in short SO format.
Give the spec a chance (it's not so bad — but avoids redundant semantic and conceptual information). Try to read up some intro by AMD, vulkan-tutorial.com, Vulkan in 30 minutes (this one helped me started anyway — well there was not much more available at the time), API without secrets and watch e.g. Vulkan GDC session Part1, Part2.
Now you heard some people behind it and seen some of the commands. You should get back to us with more specific aspects you do not understand about it.
OK, I am just gonna add some conceptual description of it here to formally answer the question.
Render pass is sort of a description or a map or a scheme of a graphics job (which revolves around particular organization/use of Image resources). But it does not describe the actual commands nor the actual resources (that is done in command buffer recording for render pass instance between vkCmdBeginRenderPass() and vkCmdEndRenderPass())
Maybe a "black box" or "C++ like declaration" for which you provide implementation later is a good analogy.
Render pass has some set of attachments. Let's think of them as descriptions of needed frame image outputs and temporaries (but not the specific frame images themselves).
Render pass has some set of subpasses. Subpass describes how an attachment will be treated during its execution (e.g. as a color buffer in a color image layout).
Render pass has some set of subpass dependencies. Dependencies describe the execution order between subpasses (it forms a dependency DAG). Dependency also describes an equivalent of a pipeline barrier between two subpasses, or between a subpass and outside of the whole render pass (VK_SUBPASS_EXTERNAL dependency). Subpasses are executed in any order and can overlap (at the leisure of the driver), except for what you describe in the dependencies (or otherwise synchronize).
In command buffer using vkCmdBeginRenderPass() you create render pass instance (you provide actual Images for the attachments with VkFramebuffer, and actual commands which write to them).
The things that are part of the render pass description are executed automagically (the image layout transitions, barriers, and MSAA resolutions).
For the rest you record the commands for subpasses of the renderpass instance for the current CB. You do it sequentially for subpass 0, 1, 2, 3, 4, ... — that is not what the actual execution order will be though — you have described that with the subpass dependencies (and otherwise is at the leisure of the driver).
Then the command buffer with such render pass instanc(es) is submitted to queue and actually being executed.
It is perhaps these indirections that make it harder to grasp. Commands are recorded before they are even executed. And render pass is created before it is even recorded. :)

Related

Godot - How Are Scenes Handled Outside of the Viewport?

In the Godot Engine, I am wondering what happens when objects/scenes leave the viewport? For example: I am trying to make a large map with lots of scenes/entities (such as multiple moving enemies, as well as resource nodes). I am trying to figure out the best way to handle the entities that no longer need to be loaded in memory.
My initial thought was that every tile that is moved to, check the "map" array that holds all the tiles and load the new ones off the screen a little, and vice versa for the ones that will disappear. I assume this is horrible practice. I also thought of having "regions" that once entered, could load upcoming sections - but that also gets super complicated.

I noticed that Godot is already handling part of this problem. As an example, when an object emitting particles leaves the viewport, it stops emitting particles.
Globally performancewise, having multiple instances shouldn't be a problem, but if you have a lot of entities, you may want to execute code only when they are in viewport.
For instance :
if(isInViewport):
Do everything
Else:
Do nothing but exist
To that purpose the VisibilityNotifier2D class may be usefull.

Vulkan: attachment synchronisation with implicit layout transitions

I have read almost everything that google gave me to this topic and haven't been able to reach a satisfactory conclusion. It's essentially a follow up question to this one:
Moving image layouts with barrier or renderpasses
Assume I have a color attachment which is written to in one render pass and sampled from in a second one. Let there be only one subpass in both render passes. One way to handle the layout transition and dependencies is to add a barrier between the two render passes, which changes the layout from VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.
But Vulkan also offers implicit layout transitions (vkAttachmentDescription, initialLayout and finalLayout). I guess that there is a performance advantage of using them, so let's simply try to get rid of our barrier. We set the initialLayout and finalLayout field in the vkAttachmentDescription structure and remove the barrier. The problem is, we lost the synchronisation provided by the barrier, so we need to get the synchronisation back by other means. And this is the point where the confusion starts, leading to my questions:
1) What's the recommended way to synchronize the attachment between the two render passes? Obviously I could simply re-add the barrier and not change the layout, but wouldn't that defeat the purpose of the whole exercise, which was to get better performance by using implicit layout transitions and getting rid of the barrier? Or should I add a subpass dependency from the single sub pass of render pass 1 to VK_SUBPASS_EXTERNAL? Are there any caveats of using VK_SUBPASS_EXTERNAL performance-wise?
2) What about synchronizing the attachment backwards? It is the application's responsibility to transition the attachment to the correct initial layout, which can be done with a barrier, obviously. Can this barrier be replaced to get a performance advantage? The only way I can think of would be to do the dependency part with a sub pass dependency from VK_SUBPASS_EXTERNAL to the single sub pass of render pass 1 and to use a 'fast' barrier (one that doesn't sync) which only does the layout change. Does this make sense? How would that barrier look like? Or is the 'full' barrier unavoidable in this case?
The short version of my questions is simply: how do other people do attachment synchronisation in conjunction with implicit layout transitions?

Generally speaking, when Vulkan or similar low-level APIs offers you multiple tools that can achieve what you want, you should give preference to the most specific tool that can solve your problem (without having to radically re-architect your code or fundamentally impact your design).
In your case, you have 2 options: barriers or render pass mechanisms (subpass dependencies and layout transitions). Barriers work with anything; they don't care where the image comes from, was used for, or where it is going. Render pass mechanisms only work for stuff that happens in a render pass and primarily deal with images attached to render passes (implicit layout transitions only work on attachments).
Render pass mechanisms are more specific, so you should prefer to use those tools if they meet your needs.
This is also why, if you have two "separate" rendering operations that could be in the same render pass (if you're reading from an attachment in a way that can live within the limitations of input attachments), you should prefer to put them in the same render pass.

The short version of my questions is simply: how do other people do attachment synchronisation in conjunction with implicit layout transitions?
Render pass dependencies is what You are looking for. In case of two render passes You need to use the mentioned VK_SUBPASS_EXTERNAL value.
It is the application's responsibility to transition the attachment to the correct initial layout, which can be done with a barrier, obviously. Can this barrier be replaced to get a performance advantage?
However You perform layout transition, it doesn't matter. It is Your responsibility to transfer image's layout to the one specified as the initial layout. But I think the best way would be to once again use implicit layout transitions provided by render passes. If You are using them already, it should be possible to setup them in such a way so the first render pass transitions image to a layout which is the same as the initial layout of the second render pass, and the final layout of the second render pass is the same as the initial layout of the first render pass.

Multithreaded shading pass Vulkan

I'm currently implementing a basic deferred renderer with multithreading in Vulkan. Since my G-Buffer should have the same resolution as the final image I want to do it in a single render-pass with multiple sub-passes, according to this presentation, on slide 44 (page 138). It says:
vkCmdBeginCommandBuffer
vkCmdBeginRenderPass
vkCmdExecuteCommands
vkCmdNextSubpass
vkCmdExecuteCommands
vkCmdEndRenderPass
vkCmdEndCommandBuffer
I get that in the first sub-pass, you iterate the scene graph and record one secondary commandbuffer for each entity/mesh. What I don't get is how you are supposed to do the shading pass with secondary command buffers. Do you somehow spit the screen into parts and render each part in a separate thread or just record one secondary commandbuffer for the entire second sub-pass?

To me, like you said, you can need to multi thread your command buffer for the "building g-buffer subpass". However for the shading pass, it must depends on how are you doing things. To me (again), you do not need to multi thread your shading subpasses. However, you must take into consideration that you can have one "by region dependency".
So, I encourage you to procede that way.
Before to begin your RenderPass, use a Compute Shader to splat all your lights on the screen (here you have a kind of array of "quad").
By splatting I mean this kind of thing. You have a point light (for example), the idea is to compute the quad in screen space affected by the light. With that you have 4 vertices (that represents a quad) that you put into a SSBO and you can use it as a vertex Buffer in the shading subpass.
Now you begin the render pass.
MT the scene graph rendering if needed. and do your vkCmdExecuteCommands();
NextSubpass
Use the "array of quads" you create from the earlier compute shader (do not forget a VK_SUBPASS_EXTERNAL dependency).
NextSubpass and so on
However, you said
you iterate the scene graph and record one secondary commandbuffer for each entity/mesh.
I am not sure I really understand what you meant, but if you intend to have one secondary command buffer for one mesh, I really advice you to change the way you are doing. You must use batching. Let's say you have 64 000 different meshes to draw. You could for exemple create 64 command buffers (that you dispatch on 4 threads) and each command buffers have 1000 meshes to draw. (The number are took randomly, so profile your application).
So to answer your question for the shading subpass, I would not use command buffers or only very few (by kind of lights (punctual, directional))

What I don't get is how you are supposed to do the shading pass with secondary command buffers.
The shading pass (assumably the second subpass) would possibly take the G-buffers created by the first subpass as an Input Attachment. Then it would draw to equally sized screen-size quad using data from the G-buffers + from a set of lights (or whatever your deferred shader tries to defer).
The presentation you link tries to hint at this structure style starting at page 13 (marked "Page 107").
First step would be to make it working. Use e.g. this SW example. Then the next step of optimizing it into single renderpass should be easier.

what is colorAttachment[n] in Metal?

Very new to graphics in general, starting off with Metal for immediate needs, will try out OpenGL soon enough.
Was wondering what the question means in lay man terms. Also, what is the extent of 'n', I have just used it as 0 in the 2D triangle I made.

In general, the color attachment(s) are where rendered images are stored (at least temporarily) during a render pass. It is common to only use one color attachment at index 0, so what you're doing is fine. It is also possible to render to multiple color attachments simultaneously, which is why there's an array. It's an advanced technique that you don't have to worry about until you see the need, at which point it should be straightforward how to do it.
There are two places where colorAttachments[n] appears in Metal. First is in MTLRenderPipelineDescriptor. The other is in MTLRenderPassDescriptor.
In both cases, the extent is given in the Metal Implementation Limits table, in the "Render targets" section, in the row labelled "Maximum number of color render targets per render pass descriptor".
For MTLRenderPipelineDescriptor, colorAttachments[n] is a reference to a MTLRenderPipelineColorAttachmentDescriptor. Here, you configure the pixel format, write mask, and color blending operation.
For MTLRenderPassDescriptor, colorAttachments[n] is a reference to a MTLRenderPassColorAttachmentDescriptor. This is a subclass of MTLRenderPassAttachmentDescriptor, which is where most of its properties are defined. Here, you configure which part of which texture you will render to, what should happen to that texture's data when the render pass starts and ends, and, if it's to be cleared, what color it should be cleared to.
Information about color attachments is split across those two objects based on how expensive it is to change. The render pipeline state object is fairly expensive to create from the pipeline descriptor. You would typically create all of your pipeline state objects once in a run and reuse them for the rest of your app's lifetime.
By contrast, you will create render command encoders from render pass descriptors fairly often; at least once per frame. They are relatively inexpensive to create, so you can change the descriptor and create a new one to render elsewhere.
The render pipeline state's color attachment pixel format has to match the pixel format of the texture of the render command encoder's color attachment texture.

Does using glBindAttribLocation improve performance?

My understanding is that glBindAttribLocation allows you to custom set a handle to an attribute (before linking a shader program), which you can later use when rendering with glVertexAttribPointer.
But you don't have to use it, and may instead just rely on OpenGL assigning whatever handle it so chooses in its infinite wisdom. However, you would then need to query OpenGL to find out this handle by using glGetAttribLocation at some point before rendering with glVertexAttribPointer.
Now you could use glGetAttribLocation each time you render, which would seem wasteful since you can just use glGetAttribLocation once after building your program, then store the handle.
So essentially, you can store this handle by either using glBindAttribLocation or by using glGetAttribLocation so is there any difference performance-wise and what are the pros and cons of one over the other?

I cannot speak much about the direct performance difference, but it should be irrelevant anyway, since no matter if using glBindAttribLocation or glGetAttribLocation, you're doing it at initialization time anyway (and even then calling glGetAttribLocation shouldn't hurt that much).
But the main difference and advantage of an explicit glBindAttribLocation over letting GL decide is, that it allows you to establish your own attribue semantics and keep them consistent for each and every shader.
Say you have a whole bunch of objects and a whole bunch of shaders. But each shader has some notion of a position attribute (and normal, color, ...), likewise each object has attribute data for positions, normals, ... Now with glBindAttribLocation you can bind your position attribute to location 0 in each and every different shader. So when drawing your objects with different shaders, they can use a single vertex format (i.e. how you call glVertexAttribPointer for the individual attributes, and the individual enable calls).
On the other hand glGetAttribLocation doesn't give you any guarantees about what attributes get which indices (maybe one shader has some additional attribute and the compiler thinks it's a good way to reorder them, who knows). So in this case you have a different vertex format (glVertexAttribPointer call) for each object and each shader.
This is even more important when using Vertex Array Objects (which encapsulate all the attribute state, especially the glVertexAttribPointer and glEnableVertexAttribArray calls). In this case you usually don't need (and don't want) to call glVertexAttribPointer each time you draw an object with another shader.
So the bottom line is, always use glBindAttribLocation, at best (in a large application) it saves you many object and shader management issues and many unneccessary glVertexAttribPointer calls each frame (and that can likely be a performance gain), and at least (in a very small application) it is good practice and lets you stay open and flexible for extensions. As a side note, in desktop GL 3+ (or with the ARB_explicit_attrib_location extension) you can even assign attribute locations directly in the shader without the need for any API call.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string