I have reading a lot about the potential use of sparse voxel octrees in future graphics engines.
However I have been unable to find technical information on them.
I understand what a voxel is, however I dont know what sparse voxel octrees are or how are they any more efficient than the polygonal techniques in use now.
Could somebody explain or point me to an explanation for this?
Here's a snippet about id Software on this subject.
id Tech 6 will use a more advanced technique that builds upon the MegaTexture idea and virtualizes both the geometry and the textures to obtain unique geometry down to the equivalent of the texel: the Sparse Voxel Octree (SVO).
It works by raycasting the geometry represented by voxels (instead of triangles) stored in an octree.
The goal being to be able to stream parts of the octree into video memory, going further down along the tree for nearby objects to give them more details, and to use higher level, larger voxels for further objects, which give an automatic level of detail (LOD) system for both geometry and textures at the same time.
Also here's a paper on this.
Found more information in this great blog entry.
Well, voxels alone are not that interesting, because for any reasonably detailed modeled, you would need extremely huge amounts of voxels (if using an uniform grid).
So, a hierarchical system is needed, which brings us to octrees. An octree is a very simple spatial data structure, which subdivides each node into 8 equally large subnodes.
A sparse octree is an octree where most of the nodes are empty, similar to the sparse matrices that you get when discretizing differential equations
an octree has 8 neighbors because if you imagine a square, that was cut into 4 equal quarters like so
______________
| | |
| | |
|_____|______|
| | |
| | |
|_____|______|
then it would be a "quad"(four)-tree.
but in 3 dimensions, you have yourself, a cube, rather then a square, so cutting it horizontally, vertically, and along the Z axis, you'll find 8 chunks rather then 4 like so
_____________
/ / / |
/-----/-----/ |
/_____/_____/ | |
| | | |/|
|-----|-----|/| |
| | | |/
|_____|_____|/
hope that makes since..
what makes the SVO unique, is that it stores Voxel information, which is a point in space, that has properties such as Color, Normal, etc..
the idea behind SVO is to ignore triangles, and the need of textures, by putting them together into a single SVO which contains the Voxelized Triangle Hull(the Model), and its surface textures all in one object.
The reason a Octree is needed here, is that otherwise a uniform grid structure would require far to much memory for existing graphics cards to handle..
so using the SVO allows for a sort of Mip-Mapped 3D Texture..
MipMapping basically is the same image, but at difference scales, one which has more detail, and the latest which has the least detail(but look fairly similar from a distance)
that way near objects can stream from the SVO with greater detail, while further objects stream with less detail.. that is if you're using Ray-Casting.. the further away the ray from the camera, the less we dig into our Mega-Texture/SVO
But, if you think outside the box like "Euclideon" with its "unlimited-detail", you would just use frustum slicing, and plane/aabb intersection, with projected UV of our sliced billboard for finding each texels color on the screen, opposed to Width*Height pixels, shooting out rays, with nvidia's naive "beam optimizations".
PS(sorta off topic): for anyone who doesn't understand how Euclideon does their shi, I believe thats the most practical solution, and I have reason to back it up(that they DO NOT use ray casting)
The biggest mystery they have, isn't rendering, but storing their data.. RLE simply doesn't cut it.. because some volume/voxel data may be more random, and less "solid" where RLE is usless, also compression of which for me typically requires at least 5 bytes into anything less. they say they output roughly half of what is put in) through their compression.. so they're using 2.5 bytes, which is about the same as a Triangle now-adays
A NVIDIA whitepaper named Efficient Sparse Voxel Octrees ā Analysis, Extensions, and Implementation
describes it very detailed here
actually, the 1.15 bits make me suspect they just store things sequentially, in some brilliantly simple way. that is, if they're only storing the volume data and not things like colour or texture data as well.
think about it like this: 1 voxel only needs to be 1 bit: is it there or is it not there? (to be or not to be, in other words :P). the octree node it's in is made of 8 voxels and a bit to store whether the thing contains anything at all. that's one bit per voxel plus one per 8. 1 + 1/8 = 1,125. add another parent node with 7 siblings and you get 1 + 1/8 + 1/8/8 = 1,140625. suspiciously close to the 1.15 they mentioned. although i'm probably way off, it may give someone some clue.
You can even simply raster all the points, you needent raytrace or raycast these days, since video cards can project obscene amounts of points. You use an octree because its a cube shape, continually dividing making smaller and smaller cubes. (voxels) I have an engine on the way now using a rastered technique and its looking good. For those that say you cant animate voxels I think they really havent thought much about the topic, of course its possible. As I see it, making the world is alot like "infinite 3d-coat" so look up 3d coat and the level design will be very similar to the way this program works. Main draw backs involve streaming speed not being fast enough, the raytracing or rastering not quite making 60 fps, and plotting the actual voxel objects is very computationally expensive, at the moment I can plot a 1024x1024x1024 sphere in about 12 seconds, But all these problems could be remedied, its an exciting future. My maximum world size at the moment is a meg by a meg by a meg, but I actually might make it 8 times bigger than this. Of course the other problem which is actually quite serious is it takes about 100 meg to store a 8192x8192x8192 character even after compression, so an environment will be even more than this. Even though, saying your going to have characters 8192x8192x8192 is completely absurd compared to what we see in games today... an entire world used to be 8192x8192x8192 :)
How you do it by only storing bits per pointer is the pointers are constructed at runtime in video memory... get your mind around that and you could have your own engine. :)
Related
When I made my rasterizer, I realized that each pixel needed to compare all the triangles in the model to determine the depth value. But if there are, for example, a million of these triangles, then it turns out that each individual GPU core must compare a million triangles with each other? This all takes an incredibly long time, so I would like to know how this problem is avoided. I heard that this is done in hardware, but by what principle I did not understand
Depth sorting need to sort all triangles by perpendicular distance to camera and even split intersecting triangles in order to work correctly. That is a huge amount of work scaled with number of entities rendered with ~O(n.log(n)) but does not need too much additonal memory (unless too many splits)... That is why it was used in the past when memory was scarce and CPUs where slow so there where only few entities to render making it still fast enough... Also in some edge cases the depth sorting might be done by simply O(1) back face culling (simple scenes with single convex and non intersecting polygons or too far from each other to block their view)...
Nowadays situations is different we have very complex scenes with lot of entities and fast CPUs and GPUs and lot of memory so Depth buffering is used instead because its O(1), pixel perfect, but needs a shadow screen buffer holding the depths which can be a large chunk of memory ... The rendering is done like this:
clear depth buffer with most distant value
this is the slowest operation but done only once per frame and its just memory filling ... Usually done like this:
for (y=0;y<y_resolution;y++)
for (x=0;x<x_resolution;x++)
{
depth[y][x]=z_max
color[y][x]=background_color;
}
in case the buffers are stored as linear arrays you can use memset or even DMA on some platforms for this.
add condition to rendering pixel and also store rendered depth
to skip pixels if something is already rendered before them like:
void pixel(int x,int y,int z,int col)
{
if (depth[y][x]>z)
{
depth[y][x]=z; // store new dept value to buffer
color[y][x]=col; // render pixel
}
}
as this is done by HW no brunch or CACHE unfrendly operation is involved ...
This approach results in 2 images output one holding the colors (wanted image) and the depth buffer holding the rendered depths so we still have 3D info which allows to do additional processing/effect like ray picking, lighting effects, shadows, scattering and much much more ...
There are also hybrid techniques using both approaches like this:
OpenGL - How to create Order Independent transparency?
For one of my classes, I made a 3D graphing application (using Visual Basic). It takes in a string (z=f(x,y)) as input, parses it into RPN notation, then evaluates and graphs the equation. While it did work, it took about 20 seconds to graph. I would have liked to add slide bars to rotate the graph vertically and horizontally, but it was definitely too slow to allow that.
Does anyone know what programming languages would be best for this type of thing? Ideally, I will be able to smoothly rotate the function once it is graphed.
Also, Iām trying to find a better way to rotate the function. Right now, I evaluate it at a bunch of points, and then plot the points to the screen. Every time it is rotated, it must be re-evaluated and plot all the new points. This takes just as long as the original graph process, as it basically treats it as a completely new function.
Lastly, I need a better way to display the graph. Currently (using VB with visual studio) I plot 200,000 points to a chart, but this does not look great by any means. Eventually, I would like to be able to change color based on height, and other graphics manipulation to make it look better.
To be clear, I am not asking for someone to do any of this for me, but rather the means to go about coding this in an efficient way. I will greatly appreciate any advice anyone can give to help with any of these three concerns.
So I will explain how I would go about it using C++ and OpenGL. This doesn't mean those are the tools that you must use, it's just those are standard graphics tools.
Your function's surface is essentially a 2D manifold, which has the nice property of having an intuitive mapping to a 2D space. What is commonly referred to as UV mapping.
What you should do is pick the ranges for the rectangle domain you want to display (minimum x, maximum x, minimum y, maximum y) And make 2 nested for loops of the form:
// Pseudocode
for (x=minimum; x<maximum; x++)
for (y=minimum; y=maximum; y++)
3D point = (x,y, f(x,y))
Store all of these points into a container (std vector for c++ works fine) and this will be your "mesh".
This is done once, prior to rendering. You then render those points using, for example GL_POINTS, and rotate your graph mesh using rotations on the GPU.
This will only show scattered points, not a surface.
If you also wish to show the surface of your function, and not just the points, you can triangulate that set of points fairly easily.
Group each 4 contiguous vertices (i.e the vertices at indices <x,y>, <x+1,y>, <x+1,y>, <x+1,y+1>) and create the 2 triangles:
(<x,y>, <x+1,y>, <x,y+1>), (<x+1,y>, <x+1,y+1>, <x,y+1>)
This will fill triangulate the surface of your mesh.
Essentially you only need to build your mesh once, and this way rendering should be 60 fps for something with 20 000 vertices, regardless of whether you only render points or triangles too.
Programming language is mostly not relevant, so VB itself is probably not the issue. You can have the same issues in Python, C#, C++, etc. Of course you must master the programming language you choose.
One key aspect is using the right algorithms and data-structures. Proper use of memory allocations and memory layout for maximizing CPU (and GPU) cache are also key. Then you must take advantage of the platform and hardware capabilities (GPU and Multithreading). For the last point you definetely need to use a graphics library such as OpenGL or Vulkan.
Disclaimer: I'm not 100% on whether this is a well-formed question, so please feel free to comment and suggest improvements. I'll be actively looking out for ways to improve this question.
I have a triangle mesh, let's say the Stanford Bunny. Now, I want to raycast a ray from a source point in 3D along a 3D direction vector, and identify just the first intersection of that ray with the triangle mesh.
I already have a naive implementation cooked up. However, I'm looking for a more advanced implementation. In particular, I'll be casting many millions of rays in many directions, so I'm looking for a multi-threaded or GPU-accelerated implementation.
I have to believe that there must be some pretty complete projects online, as raycasting triangle meshes is a fundamental part of 3D computer graphics. However, I can't find anything beyond personal projects, which leads me to believe that I am using the wrong search terms, or something pretty simple along those lines.
I am looking for suggestions on existing tools that can raytrace polygonal meshes.
If all you need to do is find the distance to the mesh for millions of rays. Then it might be a good idea to look up CUDA raytracing tutorial online. This will show you how to cast many millions of rays. In most tutorials, raytracing is used to render to the screen with the camera matrix. However, this is not necessary. Simply adjust the rays starting parameters to what you need them to be such as 3D vector and position. Then output the data back to the CPU. Be weary of the bandwidth between the GPU and CPU sending millions of intersection points between the CPU and GPU can make the program run exceptionally slow.
I am looking for an algorithm that given two meshes could clip one using another.
The simplest form of this is clipping a mesh using a plane. I've already implemented that by following something similar to what is described here.
What it does is basically inspecting all mesh vertices and triangles with respect to the plane (the plane's normal and point are given). If the triangle is completely above the plane, it is left untouched. If it falls completely below the plane, it is discarded. If some of the edges of the triangle intersect with the plane, the intersecting points with the plane are calculated and added as the new vertices. Finally a cap is generated for the hole on the place the mesh was cut.
The problem is that the algorithm assumes that the plane is unlimited, therefore whatever is in its path is clipped. In the simplest form, I need an extension of this without the assumption of a plane of "infinite" size.
To clarify, imagine that we have a 3D model of a desk with 2 boxes on it. The boxes are adjacent (but not touching or stacked). The user will define a cutting plane of a limited width and height underneath the first box and performs the cut. We end up with a desk model (mesh) with a box on it and another box (mesh) that can be freely moved around/manipulated.
In the general form, I'd like the user to be able to define a bounding box for the box he/she wants to separate from the desk model and perform the cut using that bounding box.
If I could extend the algorithm I already have to an algorithm with limited-sized planes, that would be great for now.
What you're looking for are constructive solid geometry/boolean algorithms with arbitrary meshes. It's considerably more complex than slicing meshes by an infinite plane.
Among the earliest and simplest research in this area, and a good starting point, is Constructive Solid Geometry for Polyhedral Objects by Trumbore and Hughes.
http://cs.brown.edu/~jfh/papers/Laidlaw-CSG-1986/main.htm
From the original paper:
More elaborate solutions extend upon this subject with a variety of data structures.
The real complexity of the operation lies in the slicing algorithm to slice one triangle against another. The nightmare of implementing robust CSG lies in numerical precision. It's easy when you involve objects far more complex than a cube to run into cases where a slice is made just barely next to a vertex (at which point you have the tough decision of merging the new split vertex or not prior to carrying out more splits), where polygons are coplanar (or almost), etc.
So I suggest initially erring on the side of using very high-precision floating point numbers, possibly even higher than double precision to focus on getting something working correctly and robustly. You can optimize later (first pass should be to use an accelerator like an octree/kd-tree/bvh), but you'll avoid many headaches this way in your first iteration.
This is vastly simpler to implement at render time if you're focusing on a raytracer rather than a modeling software, e.g. With raytracers, all you have to do to do this kind of arbitrary clipping is pretend that an object used to subtract from another has its polygons flipped in the culling process, e.g. It's easy to solve robustly at the ray level, but quite a bit harder to do robustly at the geometric level.
Another thing you can do to make your life so much easier if you can afford it is to voxelize your object, find subtractions/additions/unions of voxels, and then translate the voxels back into a mesh. This is so much easier to make robust, but harder to do efficiently and the voxel->polygon conversion can get quite involved if you want better results than what marching cubes provide.
It's a really tough area to do extremely well and requires perseverance, and thus the reason for the existence of things like this: http://carve-csg.com/about.
If someone is interested, currently there is a solution for this problem in CGAL library. It allows clipping one triangular mesh using another mesh as bounding volume. The usage example can be found here.
Is there any relation (preferably an equation) between the number of polygons in a 3D object and the rendering workload? I want to see how much the rendering workload would be increased if for instance the number of polygons doubles.
There is no clear connection between the arbitrary number of polygons and the mythical "workload".
See the following samples:
You render a cube with 6 faces composed of 12 triangles. You get, say, 1000fps (without vsync). When you tesselate the cube into 120 triangles, most likely the fps counter remains 1000.
You render a single fullscreen-sized quad with a heavy fragment shader with a lot of calculation. You get 0.5fps (or more, but I hope you get the point).
Another extreme. You are rendering a thousand of similar cubes, each with different texture. The rendering state change will take most of the time, not the actual rendering.
So, polygons may have different screen area and they may be rendered not within a single primitive. If you're talking about one big vertex array with a large number of polygons, then for some certain scenarios the performance change must be something like linear. "Something" because the videocard and the drivers are clipping the invisible polys and perfrom the early-out tests for each pixel being rendered.
Could you define 'workload'? ā Erno yesterday
Well, I mean working
calculations. I want to see how much overhead (for GPU, CPU,
memory,...) would be increased. Actually I want to conclude the energy
usage of the device ā user1196937 2 hours ago
If that is the actual question, a comparison of energy usage:
You will have to pick specific configurations and test those. Energy usage is very different from GPU to GPU and machine to machine.
Some GPU manufactures give very detailed information on the performance of their processors but when you want to compare those you will need an actual machine.