Is there any relation (preferably an equation) between the number of polygons in a 3D object and the rendering workload? I want to see how much the rendering workload would be increased if for instance the number of polygons doubles.
There is no clear connection between the arbitrary number of polygons and the mythical "workload".
See the following samples:
You render a cube with 6 faces composed of 12 triangles. You get, say, 1000fps (without vsync). When you tesselate the cube into 120 triangles, most likely the fps counter remains 1000.
You render a single fullscreen-sized quad with a heavy fragment shader with a lot of calculation. You get 0.5fps (or more, but I hope you get the point).
Another extreme. You are rendering a thousand of similar cubes, each with different texture. The rendering state change will take most of the time, not the actual rendering.
So, polygons may have different screen area and they may be rendered not within a single primitive. If you're talking about one big vertex array with a large number of polygons, then for some certain scenarios the performance change must be something like linear. "Something" because the videocard and the drivers are clipping the invisible polys and perfrom the early-out tests for each pixel being rendered.
Could you define 'workload'? – Erno yesterday
Well, I mean working
calculations. I want to see how much overhead (for GPU, CPU,
memory,...) would be increased. Actually I want to conclude the energy
usage of the device – user1196937 2 hours ago
If that is the actual question, a comparison of energy usage:
You will have to pick specific configurations and test those. Energy usage is very different from GPU to GPU and machine to machine.
Some GPU manufactures give very detailed information on the performance of their processors but when you want to compare those you will need an actual machine.
Related
When I made my rasterizer, I realized that each pixel needed to compare all the triangles in the model to determine the depth value. But if there are, for example, a million of these triangles, then it turns out that each individual GPU core must compare a million triangles with each other? This all takes an incredibly long time, so I would like to know how this problem is avoided. I heard that this is done in hardware, but by what principle I did not understand
Depth sorting need to sort all triangles by perpendicular distance to camera and even split intersecting triangles in order to work correctly. That is a huge amount of work scaled with number of entities rendered with ~O(n.log(n)) but does not need too much additonal memory (unless too many splits)... That is why it was used in the past when memory was scarce and CPUs where slow so there where only few entities to render making it still fast enough... Also in some edge cases the depth sorting might be done by simply O(1) back face culling (simple scenes with single convex and non intersecting polygons or too far from each other to block their view)...
Nowadays situations is different we have very complex scenes with lot of entities and fast CPUs and GPUs and lot of memory so Depth buffering is used instead because its O(1), pixel perfect, but needs a shadow screen buffer holding the depths which can be a large chunk of memory ... The rendering is done like this:
clear depth buffer with most distant value
this is the slowest operation but done only once per frame and its just memory filling ... Usually done like this:
for (y=0;y<y_resolution;y++)
for (x=0;x<x_resolution;x++)
{
depth[y][x]=z_max
color[y][x]=background_color;
}
in case the buffers are stored as linear arrays you can use memset or even DMA on some platforms for this.
add condition to rendering pixel and also store rendered depth
to skip pixels if something is already rendered before them like:
void pixel(int x,int y,int z,int col)
{
if (depth[y][x]>z)
{
depth[y][x]=z; // store new dept value to buffer
color[y][x]=col; // render pixel
}
}
as this is done by HW no brunch or CACHE unfrendly operation is involved ...
This approach results in 2 images output one holding the colors (wanted image) and the depth buffer holding the rendered depths so we still have 3D info which allows to do additional processing/effect like ray picking, lighting effects, shadows, scattering and much much more ...
There are also hybrid techniques using both approaches like this:
OpenGL - How to create Order Independent transparency?
I am trying to make a normal template matching search more effizient by first doing the search on downscaled representations of the image. Basically I do a double pyrDown -> quarter resolution.
For most images and templates this works beautifully, but for some others I get really bad matching results. It seems to be especially bad for thin fonts or small contrast.
Look at this example:
And this template:
At 100% resolution I get a matching probability of 99,9%
At 50% resolution I get 90%
At 25% resolution I get 87%
I don't really know why its so bad for some images/templates. I tried to recreate and test in photoshop by hiding/showing the 25% downscaled template on top of the 25% downscaled image, and as you can see, it's not 100% congruent:
https://giphy.com/gifs/coWDjcvHysKgn95IFa
I need a way to get more probability for those matchings at low resolution because it needs to be fast.
Any ideas on how to improve my algorithm?
Here are the original files:
https://www.dropbox.com/s/llbdj9bx5eprxbk/images.zip?dl=0
This is not unusual and those scores seem perfectly fine. However here are some ideas that might help you improve the situation:
You mentioned that it seems to be especially bad for thin fonts. This could be happening because some of the pixels in the lines are being smoothed out or distorted with the Gaussian filter that is applied on pyrDown. It could also be an indication that you have reduced the resolution too much. Unfortunately I think the pyrDown function in OpenCV reduces the resolution by a factor of 2 so it does not give you the ability to fine tune it by other scale factors. Another thing you could try is the instruction resize() with interpolation set to INTER_LINEAR or INTER_CUBIC. The resize() function will allow you to resize the image using any scale factor so you might have more control of performance vs accuracy.
Use multiple templates of the same objects. If you come to a scene and can only achieve an 87% score, create a template out of that scene. Then add it to a database of templates that are to be utilized. Obviously as the amount of templates increases so does the time it takes to complete the search.
The best way to deal with this scenario is to perform an exhaustive match on the highest level of the pyramid then track it down to the lowest level using a reduced search space on lower levels. By exhaustive I mean you will search all rows and all columns across the entire top pyramid level image. You will keep track of the locations (row, col) of the highest matches on the highest level (you are already probably doing that). Then you will multiply those locations by a factor of 2 and perform a restricted search on the next lowest level (ex. 5 x 5 shift centered on the rough location). You keep doing this until you are at the bottom level. This will give you the best overall accuracy and performance. This is also the way most industrial computer vision packages do it.
I'm a beginner to computer graphics and am trying to get a better understanding. My professor has discussed fixed function pipeline and shader based programming. How do these two compare to each other? What's the difference?
The fixed-function pipeline is as the name suggests — the functionality is fixed. So someone wrote a list of different ways you'd be permitted to transform and rasterise geometry, and that's everything available. In broad terms, you can do linear transformations and then rasterise by texturing, interpolate a colour across a face, or by combinations and permutations of those things. But more than that, the fixed pipeline enshrines certain deficiencies.
For example, it was obvious at the time of design that there wasn't going to be enough power to compute lighting per pixel. So lighting is computed at vertices and linearly interpolated across the face.
There were some intermediate extensions related to specific effects — dot3 plus cubemaps for per-pixel lighting from a single source, for example — but the programmable pipeline lets you do whatever you want at each stage, giving you complete flexibility.
In the first place that allowed better lighting, then better general special effects (ripples on reflective water, imperfect glass, etc), and more recently has been used for things like deferred rendering that flip the pipeline on its end.
All support for the fixed-functionality pipeline is implemented by programming the programmable pipeline on hardware of the last decade or so. The programmable pipeline is an advance on its predecessor, afforded by hardware improvements.
Graphics Processing Units started off very simply with fixed functions, that allowed for quick 3D maths (much faster than CPU maths), and texture lookup, and some simple lighting and shading options (flat, phong, etc).
These were very basic but allowed the CPU to offload the very repetitive tasks of 3D rendering to the GPU. Once the Graphics was taken away from the CPU, and given to the GPU, Games made a massive leap forward.
It wasn't long before the fixed functions needed to be changed to assembly programs and soon there was demand for doing more than simple shading, basic reflections, and single texture maps offered by the fixed function GPUs.
So the 2nd breed of GPU was created, this had two distinct pipelines, one that processed vertex programs and moved verts around in 3D space, and the shader programs that worked with pixels allowing multiple textures to be merged, and more lights and shades to be created.
Now in the latest form of GPU all the pipes in the card are generic, and can run any type of GPU assembler code. This increased in the number of uses for the pipe - they still do vertex mapping, and pixel color calculation, but they also do geometry shaders (tessellation), and even Compute shaders (where the parallel processor is used to do a non-graphics job).
So fixed function is limited but easy, and now in the past for all but the most limited devices. Programmable function shaders using OpenGL (GLSL) or DirectX (HLSL) are the de-facto standard for modern GPUs.
Essential the fixed function pipeline is a hardwired implementation of a, well, fixed program, through which each piece of data a GPU processes traverses, without the ability to change the details of any step. The only thing you can parameterize are the occasional branch to switch between hardcoded paths in the program (like enabling or disabling lighting, or using a separate specular) or some constants used (light colors and positions, texture environment base color modulation). And each and every step follows a specific formula.
In a programmable pipeline however the GPU is clean slate. It's completely up to the programmer how the various stages of the rendering process (vertex transformation, tesselation, fragment processing) are carried out. And you can use whatever formula you see fit for the task.
Fixed function pipeline GPUs have exactly one illumination mode: A Lambertian illumination model, implemented using Gourad or Phong shading. There were a few tricks to slightly alter the illumination model, for example to be anisotropic, but you had to somehow outsmart (or outdumb to be hones) the GPU for this. With a programmable pipeline you simply do what you wanted to do in the first place.
I'm just learning OpenCL, and I'm at the point when trying to launch a kernel. Why is it that the GPU threads are managed in a grid?
I'm going to read more about this in detail, but it would be nice with a simple explanation. Is it always like this when working with GPGPUs?
This is a common approach, which is used in CUDA, OpenCL and I think ATI stream.
The idea behind the grid is to provide a simple, but flexible, mapping between the data being processed and the threads doing the data processing. In the simple version of the GPGPU execution model, one GPU thread is "allocated" for each output element in a 1D, 2D or 3D grid of data. To process this output element, the thread will read one (or more) elements from the corresponding location or adjacent locations in the input data grid(s). By organizing the threads in a grid, it's easier for the threads to figure out which input data elements to read and where to store the output data elements.
This contrasts with the common multi-core, CPU threading model where one thread is allocated per CPU core and each thread processes many input and output elements (e.g. 1/4 of the data in a quad-core system).
The simple answer is that GPUs are designed to process images and textures that are 2D grids of pixels. When you render a triangle in DirectX or OpenGL, the hardware rasterizes it into a grid of pixels.
I will invoke the classic analogy of putting a square peg in a round hole. Well, in this case the GPU is a very square hole and not as well rounded as GP (general purpose) would suggest.
The above explanations put forward the ideas of 2D textures, etc. The architecture of the GPU is such that all processing is done in streams with the pipeline being identical in each stream, so the data being processed need to be segmented like that.
One reason why this is a nice API is that typically you are working with an algorithm that has several nested loops. If you have one, two or three loops then a grid of one, two or three dimensions maps nicely to the problem, giving you a thread for the value of each index.
So values that you need in your kernel (index values) are naturally expressed in the API.
An old Direct3D book says
"...you can achieve an acceptable frame
rate with hardware acceleration while
displaying between 2000 and 4000
polygons per frame..."
What is one polygon in Direct3D? Do they mean one primitive (indexed or otherwise) or one triangle?
That book means triangles. Otherwise, what if I wanted 1000-sided polygons? Could I still achieve 2000-4000 such shapes per frame?
In practice, the only thing you'll want it to be is a triangle because if a polygon is not a triangle it's generally tessellated to be one anyway. (Eg, a quad consists of two triangles, et cetera). A basic triangulation (tessellation) algorithm for that is really simple; you just loop though the vertices and turn every three vertices into a triangle.
Here, a "polygon" refers to a triangle. All . However, as you point out, there are many more variables than just the number of triangles which determine performance.
Key issues that matter are:
The format of storage (indexed or not; list, fan, or strip)
The location of storage (host-memory vertex arrays, host-memory vertex buffers, or GPU-memory vertex buffers)
The mode of rendering (is the draw primitive command issued fully from the host, or via instancing)
Triangle size
Together, those variables can create much greater than a 2x variation in performance.
Similarly, the hardware on which the application is running may vary 10x or more in performance in the real world: a GPU (or integrated graphics processor) that was low-end in 2005 will perform 10-100x slower in any meaningful metric than a current top-of-the-line GPU.
All told, any recommendation that you use 2-4000 triangles is so ridiculously outdated that it should be entirely ignored today. Even low-end hardware today can easily push 100,000 triangles in a frame under reasonable conditions. Further, most visually interesting applications today are dominated by pixel shading performance, not triangle count.
General rules of thumb for achieving good triangle throughput today:
Use [indexed] triangle (or quad) lists
Store data in GPU-memory vertex buffers
Draw large batches with each draw primitives call (thousands of primitives)
Use triangles mostly >= 16 pixels on screen
Don't use the Geometry Shader (especially for geometry amplification)
Do all of those things, and any machine today should be able to render tens or hundreds of thousands of triangles with ease.
According to this page, a polygon is n-sided in Direct3d.
In C#:
public static Mesh Polygon(
Device device,
float length,
int sides
)
As others already said, polygons here means triangles.
Main advantage of triangles is that, since 3 points define a plane, triangles are coplanar by definition. This means that every point within the triangle is exactly defined as a linear combination of polygon points. More vertices aren't necessarily coplanar, and they don't define a unique curved plane.
An advantage more in mechanical modeling than in graphics is that triangles are also undeformable.