Is it possible to mix VBO and immediate rendering in OpenGL ES? - graphics

I'm developing an OpenGL ES application and I need to visualize very large meshes (around 700000 triangles). The problem is that I don't have enough VBO space for these meshes and if I use immediate rendering the FPS falls like the 60% or more (projected from experiments with less triangles). Is there an intermediate solution, where I can use the maximum size of the VBO for part of the mesh and then, for the rest of it, use immediate rendering?

You can't possibly see ALL of the 700000 vertexes at the same time. Try pruning the ones you can't see and stick the rest in a VBO.
It doesn't even have to be precise, just figure out a quick way to get rid of most of the triangles outside of your view (or behind the object or too close together to matter or otherwise invisible).

Is this on some kind of embedded or handheld platform? 700,000 verts is a big model, but it isn't that much memory... maybe 22 MB depending on what your verts contain. Are you perhaps hitting the max size of a single VBO rather than running out of memory for the VBO?
You can split your model into multiple VBOs and render the pieces using one draw call for each chunk. If you're using indexed/stripped triangles, then you'll need to duplicate some verts between the chunks.

Related

How to present to a different window using IDXGISwapChain and ID3D11Device/ID3D11DeviceContext?

Previously, when I've built tools, I've used D3D version 9, where the call to Present() can take a target window and rectangle, and you can thus draw from a single device into many different windows. This is great when using D3D to accelerate desktop applications, and/or building tools rather than games!
I've also built a game renderer with D3D11 before, which is also great, because the state management and threading interfaces are well designed, and you can even target D3D 9 level hardware that's still pretty common in the wild (as opposed to D3D 10, which can only target 10-and-better).
However, now I want to build a tool with D3D11. Unfortunately, the IDXGISwapChain that comes back from D3D11CreateDeviceAndSwapChain() seems to "remember" its HWND, and only wants to present to that window. This is highly inconvenient, because I may have a large number of windows that each need fairly simple graphics drawn to them, and only in response to a WM_PAINT (again, this is for a tool, not a game).
What I want to do is to save back buffer RAM. Specifically, I used to be able to create a single back buffer, the size of the desktop, that I knew could cover all rendering needs, and then that would be the single copy allocated. Even if there are 10 overlapping windows, they all render through the same back buffer, so there's no waste of memory beyond the initial allocation. I can create textures that are not swap chains, and use them as "render targets," but I can't find a good way of presenting to an arbitrary rectangle of an arbitrary client window, without reading back the bitmap and copying it into a DIBSection, which would be really inefficient. Also, there is no way to create many swap chains, and having them share the same back buffer.
The best I can do is to create one swap chain per window, and resize the back buffer of each swap chain to be really small, except when I render to the swap chain, at which point I resize it to match the window. However, this seems inefficient, because resizing the targets is not a "free" operation AFAICT. So, is there a better way?
The answer I ended up with was to create one back buffer per separate display area, and not size it to the back buffer. I imagine that, in a world where desktop composition and transparency can happy to "anything" behind my back, that's probably helpful to the system.
Learn to love the VVM system, I guess :-) (VVM for Virtual Video Memory)

Drawing millions of segments on screen

I would like to draw millions of line segments to the screen.
Most of the time user will see only certain area of "universe", but the user should have the ability to "zoom" out to see all line segments at once.
My understanding is that the primitive is a "triangle", so I will have to express my line segments as triangles. Millions of triangles.
Is XNA the right tool for this job, or will it be too slow?
Additional Detail:
This is not for a game, but for a modeling program of some processes.
I do not care which language to use (XNA was recommended to me)
P.S.: Please let me know if you need additional detail.
My understanding is that the primitive
is a "triangle", so I will have to
express my line segments as triangles.
Millions of triangles.
Incorrect, XNA can perfectly draw lines for you in the following manner:
GraphicsDevice.DrawIndexedPrimitives(PrimitiveType.LineList, vertexOffset, 0, numVertices, startIndex, primitiveCount);
(Or PrimitiveType.LineStrip if the end vertex of line1 is the start vertex of line2).
Is XNA the right tool for this job, or
will it be too slow?
XNA is "a tool", and if you're drawing a lot of lines this is definately going to be faster than GDI+ and easy to implement than C++ in combo with Unmannaged D3D. Drawing a line is a very cheap operation. I would advice you to just install XNA and do a quick prototype to see how many lines you can draw at the same time. (My guess is at least 1 million). Then see if you really need to use the advanced techniques described by the other posters.
Also the "Polyline simplification" technique suggested by Felice Pollano doesn't work for individual lines, only for models made up of triangles (you can exchange a lot of small triangles for a few bigger once to increase performance but decrease visuals, if you're zoomed out pritty far nobody will notice) It also won't work for "thickened up lines" because they will always consist of 2 triangles. (Unless if you allow bended lines).
When you zoom into details simple bounding box check if the triangle is visible to avoid drawing invisible objects. When user zoom all, you should apply some algorithm of polyline simplification http://www.softsurfer.com/Archive/algorithm_0205/algorithm_0205.htm to avoid have too many things to draw.
This guy had your same problem, and this might help (here are the sources).
Yes, as Felice alludes to, simplifying the problem-set is the name of the game. There are obvious limits to hardware and algorithms, so the only way to draw "more stuff" is actually to draw "less stuff".
You can use techniques such as dividing your scene into an OctTree so that you can do View Frustrum Culling. There are tons of techniques for scaling out what you're drawing. One of my favorites is the use of impostors to create a composite scene which is easier to draw. Here's a paper which explains the technique:
http://academic.research.microsoft.com/Paper/1241430.aspx
Impostors are image-based primitives
commonly used to replace complex
*geometry* in order to reduce the
rendering time needed for displaying
complex scenes. However, a big problem
is the huge amount of memory required
for impostors. This paper presents an
algorithm that automatically places
impostors into a scene so that a
desired frame rate image quality is
always met, while at the same time not
requiring enormous amounts of impostor
memory. The low memory requirements
are provided by a new placement method
and through the simultaneous use of
other acceleration techniques like
visibility culling and geometric
levels of detail.

Fast pixel drawing library

My application produces an "animation" in a per-pixel manner, so i need to efficiently draw them. I've tried different strategies/libraries with unsatisfactory results especially at higher resolutions.
Here's what I've tried:
SDL: ok, but slow;
OpenGL: inefficient pixel operations;
xlib: better, but still too slow;
svgalib, directfb, (other frame buffer implementations): they seem perfect but definitely too tricky to setup for the end user.
(NOTE: I'm maybe wrong about these assertions, if it's so please correct me)
What I need is the following:
fast pixel drawing with performances comparable to OpenGL rendering;
it should work on Linux (cross-platform as a bonus feature);
it should support double buffering and vertical synchronization;
it should be portable for what concerns the hardware;
it should be open source.
Can you please give me some enlightenment/ideas/suggestions?
Are your pixels sparse or dense (e.g. a bitmap)? If you are creating dense bitmaps out of pixels, then another option is to convert the bitmap into an OpenGL texture and use OpenGL APIs to render at some framerate.
The basic problem is that graphics hardware will be very different on different hardware platforms. Either you pick an abstraction layer, which slows things down, or code more closely to the type of graphics hardware present, which isn't portable.
I'm not totally sure what you're doing wrong, but it could be that you are writing pixels one at a time to the display surface.
Don't do that.
Instead, create a rendering surface in main memory in the same format as the display surface to render to, and then copy the whole, rendered image to the display in a single operation. Modern GPU's are very slow per transaction, but can move lots of data very quickly in a single operation.
Looks like you are confusing window manager (SDL and xlib) with rendering library (opengl).
Just pick a window manager (SDL, glut, or xlib if you like a challenge), activate double buffer mode, and make sure that you got direct rendering.
What kind of graphical card do you have? Most likely it will process pixels on the GPU. Look up how to create pixel shaders in opengl. Pixel shaders are processing per pixel.

How to put textures in one file (power of 2)

I want to create big texture which is power of 2 and put in this file a lot of smaller textures.
You know, I have several textures which are not power of 2 so I cant load them to my program. I have to put them in one file (512x512 for example).
Do you know any program which can do it for me automaticly?
Is there any limit of size of texture? Can I use for example 8192x8192 file? Or I have to use few smaller.
The keyword you're looking for is texture atlas.
The maximum texture size is GPU-dependent, 8k is fine on newer cards. Such a texture consumes, however, a vast amount of VRAM (and it gets worse if you count in MIPs). So it might be better to use several smaller textures and have only those in (hot) VRAM which you really need.
There are several applications that can help you pack images into a texture atlas. Zwoptex has both a free Flash version and a commercial Mac app. TexturePacker has a Mac and Windows app, and a command line version, some features are free and some require a paid license. These output the packed images into a single image and an associated data file with coordinates on the location of the packed images. They also can trim the transparent regions from around any given image, if any, saving even more space.
There are some more tools, some open source, listed in answers to this question.
The advantages of packing into one texture are potential space saving, but also fewer OpenGL state changes when drawing several images from the texture, for better performance.

When using Direct3D, how much math is being done on the CPU?

Context: I'm just starting out. I'm not even touching the Direct3D 11 API, and instead looking at understanding the pipeline, etc.
From looking at documentation and information floating around the web, it seems like some calculations are being handled by the application. That, is, instead of simply presenting matrices to multiply to the GPU, the calculations are being done by a math library that operates on the CPU. I don't have any particular resources to point to, although I guess I can point to the XNA Math Library or the samples shipped in the February DX SDK. When you see code like mViewProj = mView * mProj;, that projection is being calculated on the CPU. Or am I wrong?
If you were writing a program, where you can have 10 cubes on the screen, where you can move or rotate cubes, as well as viewpoint, what calculations would you do on the CPU? I think I would store the geometry for the a single cube, and then transform matrices representing the actual instances. And then it seems I would use the XNA math library, or another of my choosing, to transform each cube in model space. Then get the coordinates in world space. Then push the information to the GPU.
That's quite a bit of calculation on the CPU. Am I wrong?
Am I reaching conclusions based on too little information and understanding?
What terms should I Google for, if the answer is STFW?
Or if I am right, why aren't these calculations being pushed to the GPU as well?
EDIT: By the way, I am not using XNA, but documentation notes the XNA Math Library replaces the previous DX Math library. (i see the XNA Library in the SDK as a sheer template library).
"Am I reaching conclusions based on too little information and understanding?"
Not as a bad thing, as we all do it, but in a word: Yes.
What is being done by the GPU is, generally, dependent on the GPU driver and your method of access. Most of the time you really don't care or need to know (other than curiosity and general understanding).
For mViewProj = mView * mProj; this is most likely happening on the CPU. But it is not much burden (counted in 100's of cycles at the most). The real trick is the application of the new view matrix on the "world". Every vertex needs to be transformed, more or less, along with shading, textures, lighting, etc. All if this work will be done in the GPU (if done on the CPU things will slow down really fast).
Generally you make high level changes to the world, maybe 20 CPU bound calculations, and the GPU takes care of the millions or billions of calculations needed to render the world based on the changes.
In your 10 cube example: You supply a transform for each cube, any math needed for you to create the transform is CPU bound (with exceptions). You also supply a transform for the view, again creating the transform matrix might be CPU bound. Once you have your 11 new matrices you apply the to the world. From a hardware point of view the 11 matrices need to be copied to the GPU...that will happen very, very fast...once copied the CPU is done and the GPU recalculates the world based on the new data, renders it to a buffer and poops it on the screen. So for your 10 cubes the CPU bound calculations are trivial.
Look at some reflected code for an XNA project and you will see where your calculations end and XNA begins (XNA will do everything is possibly can in the GPU).

Resources