I am using Direct3D to display a number of I-sections used in steel construction. There could be hundreds of instances of these I-sections all over my scene.
I could do this two ways:
Using method A, I have fewer surfaces. However, with backface culling turned on, the surfaces will be visible from only one side. If backface culling is turned off, then the flanges (horizontal plates) and web (vertical plate) may be rendered in the wrong order.
Method B seems correct (and I could keep backface culling turned on), but in my model the thickness of plates in the I-section is of no importance and I would like to avoid having to create a separate triangle strip for each side of the plates.
Is there a better solution? Is there a way to switch off backface culling for only certain calls of DrawIndexedPrimitives? I would also like a platform-neutral answer to this, if there is one.
First off, backface culling doesn't have anything to do with the order in which objects are rendered. Other than that, I'd go for approach B for no particular reason other than that it'll probably look better. Also this object probably isn't more than a hand full of triangles; having hundreds in a scene shouldn't be an issue. If it is, try looking into hardware instancing.
In OpenGL you can switch of backface culling for each triangle you draw:
glEnable(GL_CULL_FACE);
glCullFace(GL_FRONT);
// or
glCullFace(GL_BACK);
I think something similar is also possible in Direct3D
If your I-sections don't change that often, load all the sections into one big vertex/index buffer and draw them with a single call. That's the most performant way to draw things, and the graphic card will do a fast job even if you push half a million triangle to it.
Yes, this requires that you duplicate the vertex data for all sections, but that's how D3D9 is intended to be used.
I would go with A as the distance you would be seeing the B from would be a waste of processing power to draw all those degenerate triangles.
Also I would simply fire them at a z-buffer and allow that to sort it all out.
If it get's too slow then I would start looking at optimizing, but even consumer graphics cards can draw millions of polygons per second.
Related
Question
Is it possible to tell OpenSceneGraph to use the Z-buffer but not update it when drawing semi-transparent objects?
Motivation
When drawing semitransparent objects, the order in which they are drawn is important, as surfaces that should be visible might be occluded if they are drawn in the wrong order. In some cases, OpenSceneGraph's own intuition about the order in which the objects should be drawn fails—semitransparent surfaces become occluded by other semitransparent surfaces, and "popping" (if that word can be used in this way) may occur, when OSG thinks the order of the object centers' distance to the camera has changed and decides to change the render order. It therefore becomes necessary to manually control the render order of semitransparent objects by manually specifying the render bin for each object using the setRenderBinDetails method on the state set.
However, this might still not always work either, as it in the general case, is impossible to choose a render order for the objects (even if the individual triangles in the scene were ordered) such that all fragments are drawn correctly (see e.g. the painter's problem), and one might still get occlusion. An alternative is to use depth peeling or some other order-independent transparency method but, frankly, I don't know how difiicult this is to implement in OpenSceneGraph or how much it would slow the application down.
In my case, as a trade-off between aestetics and algorithmic complexity and speed, I would ideally always want to draw a fragment of a semi-transparent surface, even though another fragment of another semi-transparent surface that (in that pixel) is closer to the camera has already been drawn. This would prevent both popping and occlusion of semi-transparent surfaces by other semi-transparent surfaces, and would effectivelly be achieved if—for every semi-transparent object that was rendered—the Z-buffer was used to test visibility but wasn't updated when the fragment was drawn.
You're totally on the right track.
Yes, it's possible to leave Z-test enabled but turn off Z-writes with setWriteMask() during drawing:
// Disable Z-writes
osg::ref_ptr<osg::Depth> depth = new osg::Depth;
depth->setWriteMask(false);
myNode->getOrCreateStateSet()->setAttributeAndModes(depth, osg::StateAttribute::ON)
// Enable Z-test (needs to be done after Z-writes are disabled, since the latter
// also seems to disable the Z-test)
myNode->getOrCreateStateSet()->setMode(GL_DEPTH_TEST, osg::StateAttribute::ON);
https://www.mail-archive.com/osg-users#openscenegraph.net/msg01119.html
http://public.vrac.iastate.edu/vancegroup/docs/OpenSceneGraphReferenceDocs-2.8/a00206.html#a2cef930c042c5d8cda32803e5e832dae
You may wish to check out the osgTransparencyTool nodekit we wrote for a CAD client a few years ago: https://github.com/XenonofArcticus/OSG-Transparency-Tool
It includes several transparency methods that you can test with your scenes and examine the source implementation of, including an Order Independent Transparency Depth Peeling implementation and a Delayed Blend method inspired by Open Inventor. Delayed Blend is a high performance single pass unsorted approximation that probably checks all the boxes you want if absolute transparency accuracy is not the most important criteria.
Here's a paper discussing the various approaches in excruciating detail, if you haven't read it:
http://lips.informatik.uni-leipzig.de/files/bathesis_cbluemel_digital_0.pdf
I have been working on an animated graphics project with very specific requirements, and after quite a bit of searching and test coding, I have figured that I could take several approaches, but the Khronos and MDN documentation I have been reading coupled with other posts I have seen here don't answer all of my questions regarding my particular project. In the meantime, I have written short test programs (setting infrastructure for testing).
Firstly, I should describe the project:
The main object drawn to the screen is a simple quad surrounded by a black outline (LINE_LOOP or LINES will do, probably, though I have had issues with z-fighting...that will be left for another question). When the user interacts with the program, exactly one new quad is created and immediately drawn, but for a set amount of time its vertices move around until the quad moves to its final destination. (Note that translations won't do.) Random black lines are also drawn, and sometimes those lines also move around.
Once one of the quads reaches its final spot, it never moves again.
A new quad is always atop old quads (closer to the screen). That means that I need to layer the quads and lines from oldest to newest.
*this also means that it would probably be best to assign z-values to each quad and line, even if the graphics are in pixel coordinates and use an orthographic matrix. Would everyone agree with this?
Given these parameters, I have a few options with varying levels of complexity:
1> Take the object-oriented approach and just assign a buffer to each quad, and the same goes for the random lines. --creation and destruction of buffers every frame for the one shape that is moving. I truthfully think that this is a terrible idea that might only work in a higher level library that does heavy optimization underneath. This approach also doesn't take advantage of the fact that almost every quad will stay the same.
[vertices0] ... , [verticesN]
Draw x N (many draws for many small-size buffers)
2> Assign a z-value to each quad, outline, and line (as mentioned above). Allocate a huge vertex buffer and element buffer to store all permanently-in-their-final-positions quads. Resize only in the very unlikely case someone interacts for long enough. Create a second tiny buffer to store the one temporary moving quad and use bufferSubData every frame. When the quad reaches its destination, bufferSubData it into the large buffer and overwrite the small buffer upon creation of the next quad...all on the same frame. The main questions I have here are: is it possible (safe?) to use bufferSubData and draw it on the same frame? Also, would I use DYNAMIC_DRAW on both buffers even though the larger one would see fewer updates?
[permanent vertices ... | uninitialized (keep a count)]
bufferSubData -> [tempVerticesForOneQuad]
Draw 2x
3> Still create the large and small buffers, but instead of using bufferSubData every frame, create a second shader program and add an attribute for the new/moving quad that explicitly sets the vertex positions for the animation (I would pass vertex index attributes). Only draw with the small buffer when the quad is moving. For the frame when the quad reaches its destination, draw both large and small buffer, but then bufferSubData the final coordinates into the large permanent buffer to be used in the next frame.
switchToShaderProgramA();
[permanent vertices...| uninitialized (keep a count)]
switchToShaderProgramB();
[temp vertices] <- shader B accepts indices for each vertex so we can do all animation in the vertex shader
---last frame of movement arrives : bufferSubData into the permanent vertices buffer for when the the next quad is created
I get the sense that the third option might be the best, but I would like to learn whether there are some other factors that I did not consider. For example, my assumption that a program switch, additional attributes, and vertex shader manipulation would be faster than just substituting the buffer values as in 2>. The advantage of approach 3> (I think) is that I can defer the buffer substitution to a time when nothing needs to be drawn.
Still, I am still not sure of how to work with the randomly-appearing lines. I can't take the "single quad vertex buffer" approach since the number of lines cannot be predicted. Might I also allocate a large buffer for the moving lines? Those also stay after the quad is finished moving, though I don't think that I could use the vertex shader trick because there would be too many attributes to set (as opposed to the 4 for the one quad). I suppose that I could create a large "permanent line data" buffer first, but what to do during the animation is tricky because the lines move. Maybe bufferSubData() + draw on the same frame is not terrible? Or it could be. This is where I need advise.
I understand that this question might not be too specific code-wise, but I don't believe that I would be allowed to show the core of the program. All I have is the typical WebGL boilerplate ready.
I am looking forward to hearing people's thoughts on how I might proceed and whether there are any trade-offs I might have missed when considering the three options above.
Thank you in advance, and please feel free to ask any additional questions if clarification is necessary.
Honestly, for what you're describing, it doesn't sound to me like it matters which you choose. On modern hardware, drawing a few hundred quads and a few thousand lines each frame would not really tax the hardware much.
Having said that, I agree that approach 1 seems very inefficient. Approach 2 sounds perfectly fine. You can safely draw a buffer on the same frame that you uploaded the data. I don't think it matters much whether you use DYNAMIC_DRAW or STATIC_DRAW for the buffer. I tend to think of dynamic buffers as being something you're updating every frame. If you only update it every few seconds or less, then static is fine. Approach 3 is also fine. Between 2 and 3, I'd say do whichever is easier for you to understand and program.
Likewise, for the lines, I would use a separate buffer. It sounds like that one changes per frame, so I would use DYNAMIC_DRAW for that. Allocating a single large buffer for it and performing a glBufferSubData() per frame is probably a fine strategy. As always, trying it and profiling it will tell you for sure.
I understand the concept of LOD but I am trying to find out the negative side of it and I see no reference to that from Googling around. The only pro I keep coming across is that it improves performance by omitting details when an object is far and displaying better graphics when the object is near.
Seriously that is the only pro and zero con? Please advice. Tnks.
There are several kinds of LOD based on camera distance. Geometric, animation, texture, and shading variations are the most common (there are also LOD changes that can occur based on image size and, for gaming, hardware capabilities and/or frame rate considerations).
At far distances, models can change tessellation or be replaced by simpler models. Animated details (say, fingers) may simplify or disappear. Textures may move to simpler textures, bump maps vanish, spec/diffuse maps combines, etc. And shaders may also swap-put to reduce the number of texture inputs or calculation (though this is less common and may be less profitable, since when objects are far away they already fill fewer pixels -- but it's important for screen-filling entities like, say, a mountain).
The upsides are that your game/app will have to render less data, and in some cases, the LOD down-rezzed model may actually look better when far away than the more-complex model (usually because the more detailed model will exhibit aliasing when far away, but the simpler one can be tuned for that distance). This frees-up resources for the nearer models that you probably care about, and lets you render overall larger scenes -- you might only be able to render three spaceships at a time at full-res, but hundreds if you use LODs.
The downsides are pretty obvious: you need to support asset swapping, which can mean both the real-time selection of different assets and switching them but also the management (at times of having both models in your memory pipeline (one to discard, one to load)); and those models don't come from the air, someone needs to create them. Finally, and this is really tricky for PC apps, less so for more stable platforms like console gaming: HOW DO YOU MEASURE the rendering benefit? What's the best point to flip from version A of a model to B, and B to C, etc? Often LODs are made based on some pretty hand-wavy specifications from an engineer or even a producer or an art director, based on hunches. Good measurement is important.
LOD has a variety of frameworks. What you are describing fits a distance-based framework.
One possible con is that you will have inaccuracies when you choose an arbitrary point within the object for every distance calculation. This will cause popping effects at times since the viewpoint can change depending on orientation.
I have an application which renders many filled polygons with OpenGL, in 2D. Filling is done by tesselation but performance is not optimal. 1900 polygons made up of 122000 vertex (that is, about 64 vertex per polygon) are displayed in about 3 seconds.
Apparently, the CPU is not the bottleneck, as if I replace calls to gluTessVertex by calls to glColor - just to test where is the bottleneck, performance is doubled.
I have the same problem with loading many small textures.
Now, which are the options to improve the performance? Seems that most time is spend in the geometry subsystem. Rendering is fast enough.
I already have a worker thread which does the load (so tesselation, texture binding) in one context, and another thread which does the draw in another context. The two contexts share objects via wglShareLists and it works like a charm.
Can I have a third thread in a third context which would handle also tesselation for half of the polygons? Anyone tried that? Is it safe? Any example of sharing objects between three contexts?
Forgot to say, I have an ATI Radeon HD 4550 graphics card, suppose it can handle more than 39kB/s of data.
Increase Performance
Sounds like you're using the old fixed-function pipeline.
If you're unsure of what that is, well, the following functions are a part of the fixed-function pipeline.
glBegin()
glEnd()
glVertex*()
glTexCoord*()
glNormal*()
glColor*()
etc.
Those functions are old and render geometry immediately. That means that each time you call the above functions, that geometry gets send to the GPU. By doing that a lot of times, you can easy make the FPS go way under 60 just by rendering simple things.
Now you need to use buffers and to be more precise VAOs with/or VBOs (and IBOs).
VBO or Vertex Buffer Object, is a buffer which can store vertices which you then can render. This is much much faster and better to use than glBegin() and glEnd(). When you create a VBO you supply it with vertices and they only require to be send to the GPU once, that's basically why they are fast, because they already are in the GPU and only require a single draw call instead of multiple.
The reason I said "with/or" is because in the newer versions you need to create a VAO which then would use a VBO, where before you could simply render the VBOs.
Tessellation
There are multiple ways to do tessellation and things which look like/would give the effect of tessellation.
For instance you could also simply render different models according to the required LOD (Level of Detail), thereby when you're up close to an object you then render the model with all it's details which probably would have a high vertices count. Then the further you're away from the model you simply render another version of that model but which have less vertices, which also equals less detail. Though if you can't really do that on something like terrain and definitely shouldn't do it on something like dynamic terrain and/or procedurally generated terrain.
You can also do actual geometry tessellation and you would do that through a Shader. Since tessellation is a really huge topic I will provide you with 2 urls which both explain and have code on them.
Both of these articles uses modern 4.0/4.0+ OpenGL.
http://prideout.net/blog/?p=48
http://antongerdelan.net/opengl/tessellation.html
Texturing
Generating and binding textures are still the same.
Instead of using gluBuild2DMipmaps() you can use glGenerateMipmap(GL_TEXTURE_2D); it was added in OpenGL version 3.0'ish if I remember correctly.
Again you can (and should) change all you glBegin() - glEnd() (and everything in between) calls out with VAOs and VBOs. You can store everything you want inside a buffer vertices, texture coordinates, normals, colors, etc. You can store the things in separate buffers or you can store them inside a single buffer, usually called an Interleaved Buffer or Interleaved VBO.
You wouldn't be needing glEnable(GL_TEXTURE_2D) and glDisable(GL_TEXTURE_2D) anymore, because you do that within a Shader, you bind textures and use them in a Shader, and since you create the Shader Program you can make it act however you want it to.
I'm moving a character (ellipsoid) around in my physics engine. The movement must be constrained by the static geometry, but should slide on the edges, so it won't be stuck.
My current approach is to move it a little and then push it back out of the geometry. It seems to work, but I think it's mostly because of luck. I fear there must be some corner cases where this method will go haywire. For example a sharp corner where two walls keeps pushing the character into each other.
How would a "state of the art" game engine solve this?
Consider using a 3rd party physics library such as Chipmunk-physics or Box2D. When it comes to game physics, anything beyond the most basic stuff can be quite complex, and there's no need to reinvent the wheel.
Usually the problem you mention is solved by determining the amount of overlap, contact points and surface normals (e.g., by using separating-axis theorem). Then impulses are calculated and applied, which change object velocities, so that in the next iteration the objects are moved apart in a physically realistic way.
I have not developed a state of the art game engine, but I once wrote a racing game where collision was simply handled by reversing the simulation time and calculate where the edge was crossed. Then the car was allowed to bounce back into the game field. The penalty was that the controls was disabled until the car stopped.
So my suggestion is that you run your physics engine to calculate exactly where the edge is hit (it might need some non-linear equation solving approach), then you change your velocity vector to either bounce off or follow the edge.
In the case of protecting against corner cases, one could always keep a history of the last valid position within the game and state of the physics engine. If the game gets stuck, the simulation can be restarted from that point but with a different condition (say by adding some randomization to the internal parameters).