After some search I've learned it is possible to create multiple Vertex Buffers, each for a specific 3D model, and set them in the Input Assembler to be read by my shaders, or at least this is what I could understand. But by reading Microsoft's documentation I've got very confused of how to do this the right way, this is what I was reading, and they say I can pass in an array of Vertex Buffers to the IA stage, but it also says that the maximum number of Vertex Buffers my Input Assembler can take in D3D11 is 32. What would I do if I needed 50 different models being rendered at the same time? And also if someone could clarify how the pOffset work in this situation with multiple models would also help, as I could understand it should always be assigned a 0 value as the beginning of my buffers is always the vertex data, but I could've understood wrong. And by last I want to add I've already rendered some buffers which consists of multiple models together, but I don't know exactly how could I deal with many individual models.
The short answer is: You don't try to draw all your models in one Draw call.
You are free to organize rendering in many ways, but here is one approach:
A 'model' consists of a one or more 'meshes'. Each mesh is collection of a vertices (in a VB), indices (in an IB), and some material information associated with each 'subset' of indices.
To draw:
foreach M in models
foreach mesh in M
foreach part in mesh
Set shaders based on material
Set VB/IB based on mesh
DrawIndexed
Since this is a number of nested loops, there are several ways to improve the performance. For example, you might just queue up the information instead of actually calling DrawIndexed, then sort by material. Then call DrawIndexed from the sorted queue.
For alpha-blending to appear correct, you have to do at least two rendering passes: First to render opaque things, then the second to render alpha-blended things.
You may also want to combine all the content in a given model into one VB and one IB with offsets rather than use individual resources.
You may have the same model in multiple locations in the world, so you may have many model instances sharing the same mesh data. In this case, sorting by VB/IB as well as material could be useful. If you are drawing the same model in many locations (100s or 1000s), then you should to look into hardware instancing.
An example implementation of this can be found in DirectX Tool Kit as Model, ModelMesh, and ModelMeshPart.
Related
I'm taking an introductory graphics course, and while I intuitively understand that converting a click or touch into object coordinates will make the math much cleaner, reduce the chances for human error, and potentially make debugging easier, none of these are actually a very good explanation, conceptually, of why object coordinate spaces are used in selection tests, as opposed to simply using world coordinates for the test - rather, they're just observations of what tends to happen when object coordinates are used. So I ask: why?
A selection test involves comparing the click coordinates, which you get in window coordinates, against lots and lots of object features, which are represented in object coordinates.
You need to transform them into the same coordinate system in order to do the checks, so you can EITHER transform the one simple click point OR you can transform all the various object features.
Transforming one point or line is just a lot easier that transforming a whole bunch of object features of various types.
There are cases where the location of a specific object or point may not be known within a world coordinate system, but is known relative to some other coordinate system.
To summarize an example from my course text, consider the idea of two different towns, one using a grid system for its layout, and the other using what I can only describe as the New England we-made-cow-trails-into-roads method. A government employee is tasked with creating a layout of the area which includes them, and in doing so has to convert the two coordinate systems into a third, which encompasses the other two.
Sometimes, using a world atlas just isn't practical to get across the street, and so something much more local (and relevant) is used instead, as it provides much more detail over a much smaller area.
The text also explains that it may be more than simply impractical to use a given coordinate system - it may yield results that are improbable or just plain wrong. This is evidenced in the evolution of the geocentric and heliocentric models of the universe - the distance of the stars from us was calculated with very different results using the two models.
Thinking of my own example, the best that comes to mind would be something like your own internal organs - from the outside, you don't know for sure exactly the shape, size, and structure of each of them, but your own body does. In order to be able to access that information, you need to look inside the body (ideally in a way that doesn't kill you). It's not something that is plainly observable from outside.
I have an application which renders many filled polygons with OpenGL, in 2D. Filling is done by tesselation but performance is not optimal. 1900 polygons made up of 122000 vertex (that is, about 64 vertex per polygon) are displayed in about 3 seconds.
Apparently, the CPU is not the bottleneck, as if I replace calls to gluTessVertex by calls to glColor - just to test where is the bottleneck, performance is doubled.
I have the same problem with loading many small textures.
Now, which are the options to improve the performance? Seems that most time is spend in the geometry subsystem. Rendering is fast enough.
I already have a worker thread which does the load (so tesselation, texture binding) in one context, and another thread which does the draw in another context. The two contexts share objects via wglShareLists and it works like a charm.
Can I have a third thread in a third context which would handle also tesselation for half of the polygons? Anyone tried that? Is it safe? Any example of sharing objects between three contexts?
Forgot to say, I have an ATI Radeon HD 4550 graphics card, suppose it can handle more than 39kB/s of data.
Increase Performance
Sounds like you're using the old fixed-function pipeline.
If you're unsure of what that is, well, the following functions are a part of the fixed-function pipeline.
glBegin()
glEnd()
glVertex*()
glTexCoord*()
glNormal*()
glColor*()
etc.
Those functions are old and render geometry immediately. That means that each time you call the above functions, that geometry gets send to the GPU. By doing that a lot of times, you can easy make the FPS go way under 60 just by rendering simple things.
Now you need to use buffers and to be more precise VAOs with/or VBOs (and IBOs).
VBO or Vertex Buffer Object, is a buffer which can store vertices which you then can render. This is much much faster and better to use than glBegin() and glEnd(). When you create a VBO you supply it with vertices and they only require to be send to the GPU once, that's basically why they are fast, because they already are in the GPU and only require a single draw call instead of multiple.
The reason I said "with/or" is because in the newer versions you need to create a VAO which then would use a VBO, where before you could simply render the VBOs.
Tessellation
There are multiple ways to do tessellation and things which look like/would give the effect of tessellation.
For instance you could also simply render different models according to the required LOD (Level of Detail), thereby when you're up close to an object you then render the model with all it's details which probably would have a high vertices count. Then the further you're away from the model you simply render another version of that model but which have less vertices, which also equals less detail. Though if you can't really do that on something like terrain and definitely shouldn't do it on something like dynamic terrain and/or procedurally generated terrain.
You can also do actual geometry tessellation and you would do that through a Shader. Since tessellation is a really huge topic I will provide you with 2 urls which both explain and have code on them.
Both of these articles uses modern 4.0/4.0+ OpenGL.
http://prideout.net/blog/?p=48
http://antongerdelan.net/opengl/tessellation.html
Texturing
Generating and binding textures are still the same.
Instead of using gluBuild2DMipmaps() you can use glGenerateMipmap(GL_TEXTURE_2D); it was added in OpenGL version 3.0'ish if I remember correctly.
Again you can (and should) change all you glBegin() - glEnd() (and everything in between) calls out with VAOs and VBOs. You can store everything you want inside a buffer vertices, texture coordinates, normals, colors, etc. You can store the things in separate buffers or you can store them inside a single buffer, usually called an Interleaved Buffer or Interleaved VBO.
You wouldn't be needing glEnable(GL_TEXTURE_2D) and glDisable(GL_TEXTURE_2D) anymore, because you do that within a Shader, you bind textures and use them in a Shader, and since you create the Shader Program you can make it act however you want it to.
I am writing an iOS/Android game and looking for the most performant way to render my vertex data with OpenGL ES 2.0. I have two different kinds of data: dynamic data that changes its attributes every frame, for example the player or animated background objects, and static data such as the static background or the terrain. I googled a lot since yesterday, but I could not find a clear and unique answer to the question of what is the best was to render such data.
There are basically three options for rendering such data (If I do not miss one. If so, feel free to correct me.):
Vertex Arrays Only:
Just fill your vertex every frame on the CPU (including the dynamic data).
Vertex Buffer Objects Only:
Allocate a VBO on the GPU with GL_DYNAMIC_DRAW where both, the dynamic and static data is stored. The dynamic data is then updated every frame via glBufferSubData.
Use both:
Static data is stored and render with a VBO and the dynamic data is rendered with a Vertex Array. With this option, we need two rendering passes, one for rendering the VBO and one for rendering the vertex array.
Since the first option does not exploit the immutability of the static data and since the third option requires two rendering passes, my guess is that I should go with the second option. However, I am absolutely not sure about this and I hope you can clarify my confusion.
Allocate two Vertex Buffer Objects. One with hint GL_DYNAMIC_DRAW that will be updated frequently. Allocate a second VBO for immutable data and use the hint GL_STATIC_DRAW. According to the API documentation, GL_STATIC_DRAW should be used for data that "will be modified once and used many times"; just what you need.
Speaking of two rendering passes here is probably a misuse of the term: what you do is to render your scene in two separate drawing commands. Since drawing commands run asynchronously, you should not expericence any performance hit by doing so.
A second rendering pass, on the other hand, is when you render the entire scene twice (see for example here) with different settings, or when you do some image processing effects on outputs of previous rendering passes.
I have a set of objects that can be scaled and translated.
Suppose the user selects an object and drag to some position.
I was thinking about implementing this in two different ways: either changing the coordinates of the objects given the mouse position, or changing the transformation matrix.
Is one of these implementations better than the other?
My main issues are:
Performance
Code organization
Scalability
Objects have certain coordinates, and the way you look at objects has a certain frame of reference. I think it is better not to mess with your coordinates, and instead to change just the matrix that takes you from "the object is here" to "I draw the object here". It is much cleaner. Performance wise you have to apply a transformation to each object being rendered, so you may as well do it just once. From. Code organization perspective it is better to keep things "relating to something physical"; and from a scalability perspective, not applying a transformation to all objects every time the user changes the view is clearly preferable - you only apply the transformation to objects when you render them, so if you can't keep up you skip a step; if you didn't rescale some of your objects during each step you would quickly get into trouble. Finally, applying multiple transformations to the same object would tend to accumulate errors.
Stream of conscience, but clear preference, I think!
For instance:
An approach to compute efficiently the first intersection between a viewing ray and a set of three objects: one sphere, one cone and one cylinder (other 3D primitives).
What you're looking for is a spatial partitioning scheme. There are a lot of options for dealing with this, and lots of research spent in this area as well. A good read would be Christer Ericsson's Real-Time Collision Detection.
One easy approach covered in that book would be to define a grid, assign all objects to all cells it intersects, and walk along the grid cells intersecting the line, front to back, intersecting with each object associated with that grid cell. Keep in mind that an object might be associated with more grid-cells, so the intersection point computed might actually not be in the current cell, but actually later on.
The next question would be how you define that grid. Unfortunately, there's no one good answer, and you need to consider what approach might fit your scenario best.
Other partitioning schemes of interest are different tree structures, such as kd-, Oct- and BSP-trees. You could even consider using trees combined with a grid.
EDIT
As pointed out, if your set is actually these three objects, you're definately better of just intersecting each one, and just pick the earliest one. If you're looking for ray-sphere, ray-cylinder, etc, intersection tests, these are not really hard and a quick google should supply all the math you might possibly need. :)
"computationally efficient" depends on how large the set is.
For a trivial set of three, just test each of them in turn, it's really not worth trying to optimise.
For larger sets, look at data structures which divide space (e.g. KD-Trees). Whole chapters (and indeed whole books) are dedicated to this problem. My favourite reference book is An Introduction to Ray Tracing (ed. Andrew. S. Glassner)
Alternatively, if I've misread your question and you're actually asking for algorithms for ray-object intersections for specific types of object, see the same book!
Well, it depends on what you're really trying to do. If you'd like to produce a solution that is correct for almost every pixel in a simple scene, an extremely quick method is to pre-calculate "what's in front" for each pixel by pre-rendering all of the objects with a unique identifying color into a background item buffer using scan conversion (aka the z-buffer). This is sometimes referred to as an item buffer.
Using that pre-computation, you then know what will be visible for almost all rays that you'll be shooting into the scene. As a result, your ray-environment intersection problem is greatly simplified: each ray hits one specific object.
When I was doing this many years ago, I was producing real-time raytraced images of admittedly simple scenes. I haven't revisited that code in quite a while but I suspect that with modern compilers and graphics hardware, performance would be orders of magnitude better than I was seeing then.
PS: I first read about the item buffer idea when I was doing my literature search in the early 90s. I originally found it mentioned in (I believe) an ACM paper from the late 70s. Sadly, I don't have the source reference available but, in short, it's a very old idea and one that works really well on scan conversion hardware.
I assume you have a ray d = (dx,dy,dz), starting at o = (ox,oy,oz) and you are finding the parameter t such that the point of intersection p = o+d*t. (Like this page, which describes ray-plane intersection using P2-P1 for d, P1 for o and u for t)
The first question I would ask is "Do these objects intersect"?
If not then you can cheat a little and check for ray collisions in order. Since you have three objects that may or may not move per frame it pays to pre-calculate their distance from the camera (e.g. from their centre points). Test against each object in turn, by distance from the camera, from smallest to largest. Although the empty space is the most expensive part of the render now, this is more effective than just testing against all three and taking a minimum value. If your image is high res then this is especially efficient since you amortise the cost across the number of pixels.
Otherwise, test against all three and take a minimum value...
In other situations you may want to make a hybrid of the two methods. If you can test two of the objects in order then do so (e.g. a sphere and a cube moving down a cylindrical tunnel), but test the third and take a minimum value to find the final object.