Efficient way of computing pytorch gradients? - pytorch

Just for context, I am working on a nerf-like model for dynamic scenes that, for a given point in space, outputs how occupied the space is at this point in space for all the frames in the recorded scene. That means that, if the recorded scene is a set of 300-frames videos (from different viewpoints) then, doing inference on a given XYZ coordinate will return 300 scalar values
Now, here comes the problem. I need to get the normal vector of the surface at a given point in space. This is usually done by computing the gradient of the occupancy with respect to the coordinates.
This is simple when working with models that for each xyz coordinate produce only one scalar value (models for static scenes), but I can’t come up with a way of doing this for my current scenario.
What I would usually do is something like
normal_vectors=-1*torch.autograd.grad(occupancies.sum(), coordinates)
where occupancies is a tensor of size [batch_size,3] and coordinates is a tensor of size [batch_size,300]. It produces a tensor of the same size as coordinates. I would need it to produce a tensor of size [batch_size,300,3], as for each item in coordinates, 300 items are produced and I want the gradient of each one of those with respect to the 3 components of the given item.
The only approach I think may work (untested yet) is something like
normal_vectors=[]
for i in range(300):
normal_vectors.append(-1*torch.autograd.grad(occupancies[:,i].sum(),coordinates))
But this will be painfully slow.
Any alternative approach?

Related

What information is stored in the magnitudes of normal map vectors in UE5?

As I understand it, normal maps simply describe the direction that the surface "faces". As such, it does not make much sense to me for these vectors to have a magnitude other than 1. However, I have come across multiple learning resources suggesting that scaling a normal map is a good way to increase "bumpiness" of a surface.
I would expect the correct approach to be scaling up the bump map, and then generating a new normal map (with normalized vectors). Multiplying normals should not give the same result.
Additionally, I was looking at Unreal's "FlattenNormal" material function. This function lerps between a normal and (0, 0, 1). This clearly does not always result in a normalized vector. This suggests to me that either:
the magnitude of a normal vector does in fact matter;
or normal vectors get normalized before being used.
If 2. is the case, then scaling up normal vectors by a scalar should not achieve anything, right?

Why is a normal vector necessary for STL files?

STL is the most popular 3d model file format for 3d printing. It records triangular surfaces that makes up a 3d shape.
I read the specification the STL file format. It is a rather simple format. Each triangle is represented by 12 float point number. The first 3 define the normal vector, and the next 9 define three vertices. But here's one question. Three vertices are sufficient to define a triangle. The normal vector can be computed by taking the cross product of two vectors (each pointing from a vertex to another).
I know that a normal vector can be useful in rendering, and by including a normal vector, the program doesn't have to compute the normal vectors every time it loads the same model. But I wonder what would happen if the creation software include wrong normal vectors on purpose? Would it produce wrong results in the rendering software?
On the other hand, 3 vertices says everything about a triangle. Include normal vectors will allow logical conflicts in the information and increase the size of file by 33%. Normal vectors can be computed by the rendering software under reasonable amount of time if necessary. So why should the format include it? The format was created in 1987 for stereolithographic 3D printing. Was computing normal vectors to costly to computers back then?
I read in a thread that Autodesk Meshmixer would disregard the normal vector and graph triangles according to the vertices. Providing wrong normal vector doesn't seem to change the result.
Why do Stereolithography (.STL) files require each triangle to have a normal vector?
At least when using Cura to slice a model, the direction of the surface normal can make a difference. I have regularly run into STL files that look just find when rendered as solid objects in any viewer, but because some faces have the wrong direction of the surface normal, the slicer "thinks" that a region (typically concave) which should be empty is part of the interior, and the slicer creates a "top layer" covering up the details of the concave region. (And this was with an STL exported from a Meshmixer file that was imported from some SketchUp source).
FWIW, Meshmixer has a FlipSurfaceNormals tool to help deal with this.

Why a CNN learns different feature maps

I understand (and please correct me if my understanding is wrong) that the primary purpose of a CNN is to reduce the number of parameters from what you would need if you were to use a fully connected NN. And CNN achieves this by extracting "features" of images.
CNN can do this because in a natural image, there are small features such as lines and elementary curves that may occur in an "invariant" fashion, and constitute the image much like elementary building blocks.
My question is: when we create layers of feature maps, say, 5 of them, and we get these by using the sliding window of a size, say, 5x5 on an image that has pixels of, say, 100x100, Initially, these feature maps are initialized as random number weight matrices, and must progressively adjust the weights with gradient descent right? But then, if we are getting these feature maps by using the exactly same sized windows, sliding in exactly the same ways (sharing the same starting point and the same stride value), on the exactly same image, how can these maps learn different features of the image? Won't they all come out the same, say, a line or a curve?
Is it due to the different initial values of the weight matrices? (I.e. some weight matrices are more receptive to learning a certain particular feature than others?)
Thanks!! I wrote my 4 questions/opinions and indexed them, for the ease of addressing them separately!

Calculating average curvature of a polygonal chain with non-uniform segment length

I'm using a polygonal chain to approximate a curve. I want to approximate the average of a function of curvature of all points that lie on the curve. One function of curvature that I need is, for example, the square of curvature.
I can get near that by choosing some points on the chain, calculating the curvature in those points, applying the function on it (for example squaring it), and then averaging the calculated values.
I need both accuracy and speed. I appreciate both — fast, but approximate; as well as accurate, but slow solutions. I'm working in Java, but the answer doesn't need to be written in Java — it doesn't even need to contain any code at all.
Polygonal chain with uniform segment length
If the polygonal chain's segments all have equal length, I can just calculate the curvature in the vertices and then average that. I see two ways to get the curvature in a vertex.
One way is to get the circle that goes through the selected vertex, the vertex before it, and the one after it. The curvature is then 1/radius of the circle.
The other way is to calculate the external angle (in radians) of the two segments connected at the selected vertex and then divide its absolute value by the length of a segment. In the following image, φ marks the external angle:
I am not sure if this method is correct, as I haven't mathematically derived it, but I've noticed through experimentation that it gives similar results to the above method.
Polygonal chain with non-uniform segment length
Unfortunately, though, there's no guarantee that the segments have uniform length.
If I try using the first of the above methods, vertices connected to longer segments give lower curvatures, even if they are visibly sharper. I tried substituting previous and next vertices with a point x units before the selected vertex and a point x units after it. I don't know what to set the x constant to, to get accurate results. All the values I've tried seemed to give inaccurate results.
If I try using the second method, I don't know what length to divide the angle by. If I don't divide by anything at all, I actually get pretty good results for comparing two curves and determining which one is curvier, but I need to be able to determine the actual curvature in a point.
With both of these methods there's also the problem that parts with shorter segments (where points are denser) will affect the average more.
Another possible solution would be to ignore the vertices and instead use an array of points on the chain that are evenly spaced, treat them as a new polygonal chain (connect the points with straight lines), and then calculate curvatures on this new chain instead, using one of the methods I mentioned under the header titled "Polygonal chain with uniform segment length".
Finding such an array of points is not trivial, though, because I have to choose a segment length, and only after producing the points, I can see if the length of the resulting chain is divisible by the chosen segment length.
If you aren't short on space, the last solution you mentioned would be the best, because the "sphere" approximation, as you've perhaps realized, would give awful results in more extreme cases, especially if the curvature is large or changes sign quickly.
There are many ways to do interpolations, the simplest being quadratic and cubic splines. However if you have more pre-processing time, Lagrange polynomials produce very good results: https://en.wikipedia.org/wiki/Lagrange_polynomial.
Side note on your angle division method, consider this diagram:
(From simple geometry the inside angle there is also theta)
For a << l. So the curvature:
So your approximation is in fact correct for small curvatures.
An alternative is to use a local parabola approximation to estimate the curvature. Basically, to estimate the curvature at point P(i), you take P(i-1), P(i) and P(i+1) and construct a parabola from these 3 points. Then, you compute the curvature at P(i) from the parabola. Remember to use chord-length (or centripetal) parametrization when constructing the parabola.

Binary space partition tree for 3D map

I have a project which takes a picture of topographic map and makes it a 3D object.
When I draw the 3D rectangles of the object, it works very slowly. I read about BSP trees and I didn't really understand it. Can someone please explain how to use BSP in 3D (maybe give an example)? and how to use it in my case, when some mountains in the map cover other parts so I need to organize the rectangles in order to draw them well?
In n-D a BSP tree is a spatial partitioning data structure that recursively splits the space into cells using splitting n-D hyperplanes (or even n-D hypersurfaces).
In 2D, the whole space is recursively split with 2D lines (into (possibly infinite) convex polygons).
In 3D, the whole space is recursively split with 3D planes (into (possibly infinite) convex polytopes).
How to build a BSP tree in 3D (from a model)
The model is made of a list of primitives (triangles or quads which is I believe what you call rectangles).
Start with an initial root node in the BSP tree that represents a cell covering the whole 3D space and initially holding all the primitives of your model.
Compute an optimal splitting plane for the considered primitives.
The goal of this step is to find a plane that will split the primitives into two groups of primitives of approximately the same size (either the same spatial extents or the same count of primitives).
A simple splitting strategy could be to chose a direction at random (which will be the normal of your plane) for the splitting. Then sort all the primitives spatially along this axis. And traverse the sorted list of primitives to find the position that will split the primitives into two groups of roughly equal size (i.e. this simply finds the median position from the primitives along this axis). With this direction and this position, the splitting plane is defined.
One typically used splitting strategy is however:
Compute the centroid of all the considered primitives.
Compute the covariance matrix of all the considered primitives.
The centroid gives the position of the splitting plane.
The eigenvector for the largest eigenvalue of the covariance matrix gives the normal of the splitting plane, which is the direction where the primitives are the most spread (and where the current cell should be split).
Split the current node, create two child nodes and assign primitives to each of them or to the current node.
Having found a suitable splitting plane in 1., the 3D space can be now be divided into two half-spaces: one positive, pointed to by the plane normal, and one negative (on the other side of the splitting plane). The goal of this step is to cut in half the considered primitives by assigning the primitives to the half-space where they belong.
Test each primitive of the current node against the splitting plane and assign it to either the left or right child node depending on whether it in the positive half-space or in the negative half-space.
Some primitives may intersect the splitting plane. They can be clipped by the plane into smaller primitives (and maybe also triangulated) so that these smaller primitives are fully inside one of the half-spaces and only belong to one of the cells corresponding to the child nodes. Another option is to simply attach the overlapping primitives to the current node.
Apply recursively this splitting strategy to the created child nodes (and their respective child nodes), until some criterion to stop splitting is met (typically not having enough primitives in the current node).
How to use a BSP tree in 3D
In all use cases, the hierarchical structure of the BSP tree is used to discard irrelevant part of the model for the query.
Locating a point
Traverse the BSP tree with your query point. At each node, go left or right depending on where the query point is located w.r.t. to the splitting plane of the node.
Compute a ray / model intersection
To find all the triangles of your model intersecting a ray (you may need this for picking your map), do something similar to 1.. Traverse the BSP tree with your query ray. At each node, compute the intersection of the ray with the splitting plane. Also check the primitives stored at the node (if any) and report the ones that intersect the ray. Continue traversing the children of this node that whose cell intersect your ray.
Discarding invisible data
Another possible use is to discard pieces of your model that lie outside the view frustum of your camera (that's probably what you are interested in here). The view frustum is exactly bounded by six planes and has 6 quad faces. Like in 1. and 2., you can traverse the BSP tree, check recursively which cell overlaps with the view frustum and completely discard the ones (and the corresponding pieces of your model) that don't. For the plane / view frustum intersection test, you could check whether any of the 6 quads of the view frustum intersect the plane, or you could conservatively approximate the view frustum with a bounding volume (sphere / axis-aligned bounding box / oriented bounding box) or even do a combination of both.
That being said, the solution to your slow rendering problem might be elsewhere (you may not be able to discard a lot of data with a 3D BSP tree for your model):
62K squares is not that big: if you're using OpenGL, you should however not draw these squares individually or continously stream the geometry to the GPU. You can put all the vertices in a single static vertex buffer and draw the quads by preparing a static index buffer containing the list of indices for the squares with either triangles or (better) triangle strips primitives to draw the corresponding squares in a single draw call.
Your data is highly structured (a regular grid with elevation). If you happen to have much larger data sets (that don't even fit in memory anymore), then you need not only spatial partitioning (that exploits the 2.5D structure of your data and its regularity, like a quadtree) but perhaps LOD techniques as well (to replace pieces of your data by a cheaper representation instead of simply discarding the data). You should then investigate LOD techniques for terrain rendering. This page lists a few resources (papers + implementations). A simplified Chunked LOD could be used as a starting point.

Resources