I am writing an iOS/Android game and looking for the most performant way to render my vertex data with OpenGL ES 2.0. I have two different kinds of data: dynamic data that changes its attributes every frame, for example the player or animated background objects, and static data such as the static background or the terrain. I googled a lot since yesterday, but I could not find a clear and unique answer to the question of what is the best was to render such data.
There are basically three options for rendering such data (If I do not miss one. If so, feel free to correct me.):
Vertex Arrays Only:
Just fill your vertex every frame on the CPU (including the dynamic data).
Vertex Buffer Objects Only:
Allocate a VBO on the GPU with GL_DYNAMIC_DRAW where both, the dynamic and static data is stored. The dynamic data is then updated every frame via glBufferSubData.
Use both:
Static data is stored and render with a VBO and the dynamic data is rendered with a Vertex Array. With this option, we need two rendering passes, one for rendering the VBO and one for rendering the vertex array.
Since the first option does not exploit the immutability of the static data and since the third option requires two rendering passes, my guess is that I should go with the second option. However, I am absolutely not sure about this and I hope you can clarify my confusion.
Allocate two Vertex Buffer Objects. One with hint GL_DYNAMIC_DRAW that will be updated frequently. Allocate a second VBO for immutable data and use the hint GL_STATIC_DRAW. According to the API documentation, GL_STATIC_DRAW should be used for data that "will be modified once and used many times"; just what you need.
Speaking of two rendering passes here is probably a misuse of the term: what you do is to render your scene in two separate drawing commands. Since drawing commands run asynchronously, you should not expericence any performance hit by doing so.
A second rendering pass, on the other hand, is when you render the entire scene twice (see for example here) with different settings, or when you do some image processing effects on outputs of previous rendering passes.
Related
After some search I've learned it is possible to create multiple Vertex Buffers, each for a specific 3D model, and set them in the Input Assembler to be read by my shaders, or at least this is what I could understand. But by reading Microsoft's documentation I've got very confused of how to do this the right way, this is what I was reading, and they say I can pass in an array of Vertex Buffers to the IA stage, but it also says that the maximum number of Vertex Buffers my Input Assembler can take in D3D11 is 32. What would I do if I needed 50 different models being rendered at the same time? And also if someone could clarify how the pOffset work in this situation with multiple models would also help, as I could understand it should always be assigned a 0 value as the beginning of my buffers is always the vertex data, but I could've understood wrong. And by last I want to add I've already rendered some buffers which consists of multiple models together, but I don't know exactly how could I deal with many individual models.
The short answer is: You don't try to draw all your models in one Draw call.
You are free to organize rendering in many ways, but here is one approach:
A 'model' consists of a one or more 'meshes'. Each mesh is collection of a vertices (in a VB), indices (in an IB), and some material information associated with each 'subset' of indices.
To draw:
foreach M in models
foreach mesh in M
foreach part in mesh
Set shaders based on material
Set VB/IB based on mesh
DrawIndexed
Since this is a number of nested loops, there are several ways to improve the performance. For example, you might just queue up the information instead of actually calling DrawIndexed, then sort by material. Then call DrawIndexed from the sorted queue.
For alpha-blending to appear correct, you have to do at least two rendering passes: First to render opaque things, then the second to render alpha-blended things.
You may also want to combine all the content in a given model into one VB and one IB with offsets rather than use individual resources.
You may have the same model in multiple locations in the world, so you may have many model instances sharing the same mesh data. In this case, sorting by VB/IB as well as material could be useful. If you are drawing the same model in many locations (100s or 1000s), then you should to look into hardware instancing.
An example implementation of this can be found in DirectX Tool Kit as Model, ModelMesh, and ModelMeshPart.
I'm currently implementing a basic deferred renderer with multithreading in Vulkan. Since my G-Buffer should have the same resolution as the final image I want to do it in a single render-pass with multiple sub-passes, according to this presentation, on slide 44 (page 138). It says:
vkCmdBeginCommandBuffer
vkCmdBeginRenderPass
vkCmdExecuteCommands
vkCmdNextSubpass
vkCmdExecuteCommands
vkCmdEndRenderPass
vkCmdEndCommandBuffer
I get that in the first sub-pass, you iterate the scene graph and record one secondary commandbuffer for each entity/mesh. What I don't get is how you are supposed to do the shading pass with secondary command buffers. Do you somehow spit the screen into parts and render each part in a separate thread or just record one secondary commandbuffer for the entire second sub-pass?
To me, like you said, you can need to multi thread your command buffer for the "building g-buffer subpass". However for the shading pass, it must depends on how are you doing things. To me (again), you do not need to multi thread your shading subpasses. However, you must take into consideration that you can have one "by region dependency".
So, I encourage you to procede that way.
Before to begin your RenderPass, use a Compute Shader to splat all your lights on the screen (here you have a kind of array of "quad").
By splatting I mean this kind of thing. You have a point light (for example), the idea is to compute the quad in screen space affected by the light. With that you have 4 vertices (that represents a quad) that you put into a SSBO and you can use it as a vertex Buffer in the shading subpass.
Now you begin the render pass.
MT the scene graph rendering if needed. and do your vkCmdExecuteCommands();
NextSubpass
Use the "array of quads" you create from the earlier compute shader (do not forget a VK_SUBPASS_EXTERNAL dependency).
NextSubpass and so on
However, you said
you iterate the scene graph and record one secondary commandbuffer for each entity/mesh.
I am not sure I really understand what you meant, but if you intend to have one secondary command buffer for one mesh, I really advice you to change the way you are doing. You must use batching. Let's say you have 64 000 different meshes to draw. You could for exemple create 64 command buffers (that you dispatch on 4 threads) and each command buffers have 1000 meshes to draw. (The number are took randomly, so profile your application).
So to answer your question for the shading subpass, I would not use command buffers or only very few (by kind of lights (punctual, directional))
What I don't get is how you are supposed to do the shading pass with secondary command buffers.
The shading pass (assumably the second subpass) would possibly take the G-buffers created by the first subpass as an Input Attachment. Then it would draw to equally sized screen-size quad using data from the G-buffers + from a set of lights (or whatever your deferred shader tries to defer).
The presentation you link tries to hint at this structure style starting at page 13 (marked "Page 107").
First step would be to make it working. Use e.g. this SW example. Then the next step of optimizing it into single renderpass should be easier.
Very new to graphics in general, starting off with Metal for immediate needs, will try out OpenGL soon enough.
Was wondering what the question means in lay man terms. Also, what is the extent of 'n', I have just used it as 0 in the 2D triangle I made.
In general, the color attachment(s) are where rendered images are stored (at least temporarily) during a render pass. It is common to only use one color attachment at index 0, so what you're doing is fine. It is also possible to render to multiple color attachments simultaneously, which is why there's an array. It's an advanced technique that you don't have to worry about until you see the need, at which point it should be straightforward how to do it.
There are two places where colorAttachments[n] appears in Metal. First is in MTLRenderPipelineDescriptor. The other is in MTLRenderPassDescriptor.
In both cases, the extent is given in the Metal Implementation Limits table, in the "Render targets" section, in the row labelled "Maximum number of color render targets per render pass descriptor".
For MTLRenderPipelineDescriptor, colorAttachments[n] is a reference to a MTLRenderPipelineColorAttachmentDescriptor. Here, you configure the pixel format, write mask, and color blending operation.
For MTLRenderPassDescriptor, colorAttachments[n] is a reference to a MTLRenderPassColorAttachmentDescriptor. This is a subclass of MTLRenderPassAttachmentDescriptor, which is where most of its properties are defined. Here, you configure which part of which texture you will render to, what should happen to that texture's data when the render pass starts and ends, and, if it's to be cleared, what color it should be cleared to.
Information about color attachments is split across those two objects based on how expensive it is to change. The render pipeline state object is fairly expensive to create from the pipeline descriptor. You would typically create all of your pipeline state objects once in a run and reuse them for the rest of your app's lifetime.
By contrast, you will create render command encoders from render pass descriptors fairly often; at least once per frame. They are relatively inexpensive to create, so you can change the descriptor and create a new one to render elsewhere.
The render pipeline state's color attachment pixel format has to match the pixel format of the texture of the render command encoder's color attachment texture.
I have an application which renders many filled polygons with OpenGL, in 2D. Filling is done by tesselation but performance is not optimal. 1900 polygons made up of 122000 vertex (that is, about 64 vertex per polygon) are displayed in about 3 seconds.
Apparently, the CPU is not the bottleneck, as if I replace calls to gluTessVertex by calls to glColor - just to test where is the bottleneck, performance is doubled.
I have the same problem with loading many small textures.
Now, which are the options to improve the performance? Seems that most time is spend in the geometry subsystem. Rendering is fast enough.
I already have a worker thread which does the load (so tesselation, texture binding) in one context, and another thread which does the draw in another context. The two contexts share objects via wglShareLists and it works like a charm.
Can I have a third thread in a third context which would handle also tesselation for half of the polygons? Anyone tried that? Is it safe? Any example of sharing objects between three contexts?
Forgot to say, I have an ATI Radeon HD 4550 graphics card, suppose it can handle more than 39kB/s of data.
Increase Performance
Sounds like you're using the old fixed-function pipeline.
If you're unsure of what that is, well, the following functions are a part of the fixed-function pipeline.
glBegin()
glEnd()
glVertex*()
glTexCoord*()
glNormal*()
glColor*()
etc.
Those functions are old and render geometry immediately. That means that each time you call the above functions, that geometry gets send to the GPU. By doing that a lot of times, you can easy make the FPS go way under 60 just by rendering simple things.
Now you need to use buffers and to be more precise VAOs with/or VBOs (and IBOs).
VBO or Vertex Buffer Object, is a buffer which can store vertices which you then can render. This is much much faster and better to use than glBegin() and glEnd(). When you create a VBO you supply it with vertices and they only require to be send to the GPU once, that's basically why they are fast, because they already are in the GPU and only require a single draw call instead of multiple.
The reason I said "with/or" is because in the newer versions you need to create a VAO which then would use a VBO, where before you could simply render the VBOs.
Tessellation
There are multiple ways to do tessellation and things which look like/would give the effect of tessellation.
For instance you could also simply render different models according to the required LOD (Level of Detail), thereby when you're up close to an object you then render the model with all it's details which probably would have a high vertices count. Then the further you're away from the model you simply render another version of that model but which have less vertices, which also equals less detail. Though if you can't really do that on something like terrain and definitely shouldn't do it on something like dynamic terrain and/or procedurally generated terrain.
You can also do actual geometry tessellation and you would do that through a Shader. Since tessellation is a really huge topic I will provide you with 2 urls which both explain and have code on them.
Both of these articles uses modern 4.0/4.0+ OpenGL.
http://prideout.net/blog/?p=48
http://antongerdelan.net/opengl/tessellation.html
Texturing
Generating and binding textures are still the same.
Instead of using gluBuild2DMipmaps() you can use glGenerateMipmap(GL_TEXTURE_2D); it was added in OpenGL version 3.0'ish if I remember correctly.
Again you can (and should) change all you glBegin() - glEnd() (and everything in between) calls out with VAOs and VBOs. You can store everything you want inside a buffer vertices, texture coordinates, normals, colors, etc. You can store the things in separate buffers or you can store them inside a single buffer, usually called an Interleaved Buffer or Interleaved VBO.
You wouldn't be needing glEnable(GL_TEXTURE_2D) and glDisable(GL_TEXTURE_2D) anymore, because you do that within a Shader, you bind textures and use them in a Shader, and since you create the Shader Program you can make it act however you want it to.
My understanding is that glBindAttribLocation allows you to custom set a handle to an attribute (before linking a shader program), which you can later use when rendering with glVertexAttribPointer.
But you don't have to use it, and may instead just rely on OpenGL assigning whatever handle it so chooses in its infinite wisdom. However, you would then need to query OpenGL to find out this handle by using glGetAttribLocation at some point before rendering with glVertexAttribPointer.
Now you could use glGetAttribLocation each time you render, which would seem wasteful since you can just use glGetAttribLocation once after building your program, then store the handle.
So essentially, you can store this handle by either using glBindAttribLocation or by using glGetAttribLocation so is there any difference performance-wise and what are the pros and cons of one over the other?
I cannot speak much about the direct performance difference, but it should be irrelevant anyway, since no matter if using glBindAttribLocation or glGetAttribLocation, you're doing it at initialization time anyway (and even then calling glGetAttribLocation shouldn't hurt that much).
But the main difference and advantage of an explicit glBindAttribLocation over letting GL decide is, that it allows you to establish your own attribue semantics and keep them consistent for each and every shader.
Say you have a whole bunch of objects and a whole bunch of shaders. But each shader has some notion of a position attribute (and normal, color, ...), likewise each object has attribute data for positions, normals, ... Now with glBindAttribLocation you can bind your position attribute to location 0 in each and every different shader. So when drawing your objects with different shaders, they can use a single vertex format (i.e. how you call glVertexAttribPointer for the individual attributes, and the individual enable calls).
On the other hand glGetAttribLocation doesn't give you any guarantees about what attributes get which indices (maybe one shader has some additional attribute and the compiler thinks it's a good way to reorder them, who knows). So in this case you have a different vertex format (glVertexAttribPointer call) for each object and each shader.
This is even more important when using Vertex Array Objects (which encapsulate all the attribute state, especially the glVertexAttribPointer and glEnableVertexAttribArray calls). In this case you usually don't need (and don't want) to call glVertexAttribPointer each time you draw an object with another shader.
So the bottom line is, always use glBindAttribLocation, at best (in a large application) it saves you many object and shader management issues and many unneccessary glVertexAttribPointer calls each frame (and that can likely be a performance gain), and at least (in a very small application) it is good practice and lets you stay open and flexible for extensions. As a side note, in desktop GL 3+ (or with the ARB_explicit_attrib_location extension) you can even assign attribute locations directly in the shader without the need for any API call.