What is a subspace of a dimension in pytorch? - pytorch

The documentation of torch.Tensor.view says:
each new view dimension must either be a subspace of an original dimension, or only span across original dimensions ...
https://pytorch.org/docs/stable/tensors.html?highlight=view#torch.Tensor.view
What is a subspace of a dimension?

The 'subspace of an original dimension' dilemma
In order to use tensor.view() the tensor must satisfy two conditions-
each new view dimension must either be a
subspace of an original dimension
or only span across original dimensions ...
Lets discuss this one by one,
First, regarding subspace of an original dimension you need to understand the concept behind subspace. Not going into mathematical detail but in short - subspace is a subset of infinite number of n dimensional vectors(Rn) and the vectors inside the subspace must follow 2 rules -
i) Sub space will contain the Zero vector(0n)
ii) Must satisfy closure under Multiplication and addition
To visualise this in mind you can consider a 2D plane containing infinite lines. So the subspace of that 2D vector space will be set of lines which will pass through origin. These lines satisfies above two conditions.
Now there is a concept called projection of subspaces. Without digging into too much mathematical detail you can consider it as a regular line projection but for subspaces.
Now back to the point, lets assume if you have a tensor of size (4,5), you can actually consider it as a 20 dimensional vector. Assume you have a 20D space, and the subspaces will pass through the origin, and if you want to make projection of any line l1 from subspace with respect to any 2 axes, tensor.view() will output projection of that line with (2,10).
For the second part, you need to understand the concept of contiguous and non-contiguous memory allocation in pytorch. As its out of scope of the question I am going to explain it in very brief. If you view a n dimensional vector as (n/k, k) and if you run tensor.stride() on the new vector it will show you the stride for memory allocation in x and y direction. Now if you run view() again with different dimensions then this following equation must hold true for successful conversion due to non-contiguous memory allocation.
I tried my best to explain it in brief, let me know if you have more questions.

After some thought, I think the sentence could be interpreted as follows (despite not being mathematically formal).
Suppose to have a tensor t of size (4, 6), that can be seen as an ordered set of four 6d row vectors residing in a vector space.
We can perform a tensor view v = t.view(4, 2, 3). v has now two new view dimensions: 2 and 3. We can see it as an ordered set containing four ordered sets of two 3d vectors, or three 2d vectors, depending on how we consider them.
Such new, smaller vectors can be mathematically seen as projections of the original 6-element vectors onto a vector subspace.

Related

Efficient way of computing pytorch gradients?

Just for context, I am working on a nerf-like model for dynamic scenes that, for a given point in space, outputs how occupied the space is at this point in space for all the frames in the recorded scene. That means that, if the recorded scene is a set of 300-frames videos (from different viewpoints) then, doing inference on a given XYZ coordinate will return 300 scalar values
Now, here comes the problem. I need to get the normal vector of the surface at a given point in space. This is usually done by computing the gradient of the occupancy with respect to the coordinates.
This is simple when working with models that for each xyz coordinate produce only one scalar value (models for static scenes), but I can’t come up with a way of doing this for my current scenario.
What I would usually do is something like
normal_vectors=-1*torch.autograd.grad(occupancies.sum(), coordinates)
where occupancies is a tensor of size [batch_size,3] and coordinates is a tensor of size [batch_size,300]. It produces a tensor of the same size as coordinates. I would need it to produce a tensor of size [batch_size,300,3], as for each item in coordinates, 300 items are produced and I want the gradient of each one of those with respect to the 3 components of the given item.
The only approach I think may work (untested yet) is something like
normal_vectors=[]
for i in range(300):
normal_vectors.append(-1*torch.autograd.grad(occupancies[:,i].sum(),coordinates))
But this will be painfully slow.
Any alternative approach?

Random Index from a Tensor (Sampling with Replacement from a Tensor)

I'm trying to manipulate individual weights of different neural nets to see how their performance degrades. As part of these experiments, I'm required to sample randomly from their weight tensors, which I've come to understand as sampling with replacement (in the statistical sense). However, since it's high-dimensional, I've been stumped by how to do this in a fair manner. Here are the approaches and research I've put into considering this problem:
This was previously implemented by selecting a random layer and then selecting a random weight in that layer (ignore the implementation of picking a random weight). Since layers are different sizes, we discovered that weights were being sampled unevenly.
I considered what would happen if we sampled according to the numpy.shape of the tensor; however, I realize now that this encounters the same problem as above.
Consider what happens to a rank 2 tensor like this:
[[*, *, *],
[*, *, *, *]]
Selecting a row randomly and then a value from that row results in an unfair selection. This method could work if you're able to assert that this scenario never occurs, but it's far from a general solution.
Note that this possible duplicate actually implements it in this fashion.
I found people suggesting flattening the tensor and use numpy.random.choice to select randomly from a 1D array. That's a simple solution, except I have no idea how to invert the flattened tensor back into its original shape. Further, flattening millions of weights would be a somewhat slow implementation.
I found someone discussing tf.random.multinomial here, but I don't understand enough of it to know whether it's applicable or not.
I ran into this paper about resevoir sampling, but again, it went over my head.
I found another paper which specifically discusses tensors and sampling techniques, but it went even further over my head.
A teammate found this other paper which talks about random sampling from a tensor, but it's only for rank 3 tensors.
Any help understanding how to do this? I'm working in Python with Keras, but I'll take an algorithm in any form that it exists. Thank you in advance.
Before I forget to document the solution we arrived at, I'll talk about the two different paths I see for implementing this:
Use a total ordering on scalar elements of the tensor. This is effectively enumerating your elements, i.e. flattening them. However, you can do this while maintaining the original shape. Consider this pseudocode (in Python-like syntax):
def sample_tensor(tensor, chosen_index: int) -> Tuple[int]:
"""Maps a chosen random number to its index in the given tensor.
Args:
tensor: A ragged-array n-tensor.
chosen_index: An integer in [0, num_scalar_elements_in_tensor).
Returns:
The index that accesses this element in the tensor.
NOTE: Entirely untested, expect it to be fundamentally flawed.
"""
remaining = chosen_index
for (i, sub_list) in enumerate(tensor):
if type(sub_list) is an iterable:
if |sub_list| > remaining:
remaining -= |sub_list|
else:
return i joined with sample_tensor(sub_list, remaining)
else:
if len(sub_list) <= remaining:
return tuple(remaining)
First of all, I'm aware this isn't a sound algorithm. The idea is to count down until you reach your element, with bookkeeping for indices.
We need to make crucial assumptions here. 1) All lists will eventually contain only scalars. 2) By direct consequence, if a list contains lists, assume that it also doesn't contain scalars at the same level. (Stop and convince yourself for (2).)
We also need to make a critical note here too: We are unable to measure the number of scalars in any given list, unless the list is homogeneously consisting of scalars. In order to avoid measuring this magnitude at every point, my algorithm above should be refactored to descend first, and subtract later.
This algorithm has some consequences:
It's the fastest in its entire style of approaching the problem. If you want to write a function f: [0, total_elems) -> Tuple[int], you must know the number of preceding scalar elements along the total ordering of the tensor. This is effectively bound at Theta(l) where l is the number of lists in the tensor (since we can call len on a list of scalars).
It's slow. It's too slow compared to sampling nicer tensors that have a defined shape to them.
It begs the question: can we do better? See the next solution.
Use a probability distribution in conjunction with numpy.random.choice. The idea here is that if we know ahead of time what the distribution of scalars is already like, we can sample fairly at each level of descending the tensor. The hard problem here is building this distribution.
I won't write pseudocode for this, but lay out some objectives:
This can be called only once to build the data structure.
The algorithm needs to combine iterative and recursive techniques to a) build distributions for sibling lists and b) build distributions for descendants, respectively.
The algorithm will need to map indices to a probability distribution respective to sibling lists (note the assumptions discussed above). This does require knowing the number of elements in an arbitrary sub-tensor.
At lower levels where lists contain only scalars, we can simplify by just storing the number of elements in said list (as opposed to storing probabilities of selecting scalars randomly from a 1D array).
You will likely need 2-3 functions: one that utilizes the probability distribution to return an index, a function that builds the distribution object, and possibly a function that just counts elements to help build the distribution.
This is also faster at O(n) where n is the rank of the tensor. I'm convinced this is the fastest possible algorithm, but I lack the time to try to prove it.
You might choose to store the distribution as an ordered dictionary that maps a probability to either another dictionary or the number of elements in a 1D array. I think this might be the most sensible structure.
Note that (2) is truly the same as (1), but we pre-compute knowledge about the densities of the tensor.
I hope this helps.

Binary space partition tree for 3D map

I have a project which takes a picture of topographic map and makes it a 3D object.
When I draw the 3D rectangles of the object, it works very slowly. I read about BSP trees and I didn't really understand it. Can someone please explain how to use BSP in 3D (maybe give an example)? and how to use it in my case, when some mountains in the map cover other parts so I need to organize the rectangles in order to draw them well?
In n-D a BSP tree is a spatial partitioning data structure that recursively splits the space into cells using splitting n-D hyperplanes (or even n-D hypersurfaces).
In 2D, the whole space is recursively split with 2D lines (into (possibly infinite) convex polygons).
In 3D, the whole space is recursively split with 3D planes (into (possibly infinite) convex polytopes).
How to build a BSP tree in 3D (from a model)
The model is made of a list of primitives (triangles or quads which is I believe what you call rectangles).
Start with an initial root node in the BSP tree that represents a cell covering the whole 3D space and initially holding all the primitives of your model.
Compute an optimal splitting plane for the considered primitives.
The goal of this step is to find a plane that will split the primitives into two groups of primitives of approximately the same size (either the same spatial extents or the same count of primitives).
A simple splitting strategy could be to chose a direction at random (which will be the normal of your plane) for the splitting. Then sort all the primitives spatially along this axis. And traverse the sorted list of primitives to find the position that will split the primitives into two groups of roughly equal size (i.e. this simply finds the median position from the primitives along this axis). With this direction and this position, the splitting plane is defined.
One typically used splitting strategy is however:
Compute the centroid of all the considered primitives.
Compute the covariance matrix of all the considered primitives.
The centroid gives the position of the splitting plane.
The eigenvector for the largest eigenvalue of the covariance matrix gives the normal of the splitting plane, which is the direction where the primitives are the most spread (and where the current cell should be split).
Split the current node, create two child nodes and assign primitives to each of them or to the current node.
Having found a suitable splitting plane in 1., the 3D space can be now be divided into two half-spaces: one positive, pointed to by the plane normal, and one negative (on the other side of the splitting plane). The goal of this step is to cut in half the considered primitives by assigning the primitives to the half-space where they belong.
Test each primitive of the current node against the splitting plane and assign it to either the left or right child node depending on whether it in the positive half-space or in the negative half-space.
Some primitives may intersect the splitting plane. They can be clipped by the plane into smaller primitives (and maybe also triangulated) so that these smaller primitives are fully inside one of the half-spaces and only belong to one of the cells corresponding to the child nodes. Another option is to simply attach the overlapping primitives to the current node.
Apply recursively this splitting strategy to the created child nodes (and their respective child nodes), until some criterion to stop splitting is met (typically not having enough primitives in the current node).
How to use a BSP tree in 3D
In all use cases, the hierarchical structure of the BSP tree is used to discard irrelevant part of the model for the query.
Locating a point
Traverse the BSP tree with your query point. At each node, go left or right depending on where the query point is located w.r.t. to the splitting plane of the node.
Compute a ray / model intersection
To find all the triangles of your model intersecting a ray (you may need this for picking your map), do something similar to 1.. Traverse the BSP tree with your query ray. At each node, compute the intersection of the ray with the splitting plane. Also check the primitives stored at the node (if any) and report the ones that intersect the ray. Continue traversing the children of this node that whose cell intersect your ray.
Discarding invisible data
Another possible use is to discard pieces of your model that lie outside the view frustum of your camera (that's probably what you are interested in here). The view frustum is exactly bounded by six planes and has 6 quad faces. Like in 1. and 2., you can traverse the BSP tree, check recursively which cell overlaps with the view frustum and completely discard the ones (and the corresponding pieces of your model) that don't. For the plane / view frustum intersection test, you could check whether any of the 6 quads of the view frustum intersect the plane, or you could conservatively approximate the view frustum with a bounding volume (sphere / axis-aligned bounding box / oriented bounding box) or even do a combination of both.
That being said, the solution to your slow rendering problem might be elsewhere (you may not be able to discard a lot of data with a 3D BSP tree for your model):
62K squares is not that big: if you're using OpenGL, you should however not draw these squares individually or continously stream the geometry to the GPU. You can put all the vertices in a single static vertex buffer and draw the quads by preparing a static index buffer containing the list of indices for the squares with either triangles or (better) triangle strips primitives to draw the corresponding squares in a single draw call.
Your data is highly structured (a regular grid with elevation). If you happen to have much larger data sets (that don't even fit in memory anymore), then you need not only spatial partitioning (that exploits the 2.5D structure of your data and its regularity, like a quadtree) but perhaps LOD techniques as well (to replace pieces of your data by a cheaper representation instead of simply discarding the data). You should then investigate LOD techniques for terrain rendering. This page lists a few resources (papers + implementations). A simplified Chunked LOD could be used as a starting point.

find orthonormal basis for a planar 3D ( possibly degenerate) polygon

Given a general planar 3D polygon, is there a general way to find the orthonormal basis for that planar polygon?
The most straight forward way to do it is to assume to take the first 3 points of the polygon, and form two vectors each, and these are the two orthonormal basis vectors that we are looking for. But the problem for this approach is that these 3 points may line on the same line in the polygon, and hence instead of getting two orthonormal vectors, we get only one.
Another approach to find the second orthonormal vector is to loop through the polygon and find another point that forms a different orthonormal vector than the first one, but this approach is susceptible to numerical errors (e.g, what if the second vector is almost the same with the first vector? The numerical errors can be significant).
Is there any other better approach?
You can use the cross product of any two lines connected by any two vertices. If the cross product is too low then you're in degenerate territory.
You can also take the centroid (the avg of the points, which is guaranteed to lie on the same plane) and do pick the largest of any two cross products of the vectors from the centroid to any vertex. This will be the most accurate normal. Please note that if the largest cross product is small, you may have an inaccurate normal.
If you can't find any cross product that isn't close to 0, your original poly is degenerate and a normal will be hard to find. You could use arbitrary precision or adaptive precision algebra in this case, but, of course, the round-off error is already significant in the source data, so this may not help. If possible, remove degenerate polys first, and if you have to, sew the mesh back up :).
It's a bit ott but one way would be to compute the covariance matrix of the points, and then diagonalise that. If the points are indeed planar then one of the eigenvalues of the covariance matrix will be zero (or rather very small, due to finite precision arithmetic) and the corresponding eigenvector will be a normal to the plane; the other two eigenvectors will span the plane of the polygon.
If you have N points, and the i'th coordinate of the k'th point is p[k,i], then the mean (vector) and (3x3) covariance matrix can be computed by
m[i] = Sum{ k | p[k,i]}/N (i=1..3)
C[i,j] = Sum{ k | (p[k,i]-m[i])*(p[k,j]-m[j]) }/N (i,j=1..3)
Note that C is symmetric, so that to find how to diagonalise it you might want to look up the "symmetric eigenvalue problem"

process 35 x 35 kernel using convolution method

Dear all, I would like to do a convolution using a 35 x 35 kernel. Any suggestion? or any method already in opencv i can use? Because now the cvfilter2d can only support until 10 x 10 kernel.
If you just need quick-and-dirty solution due to OpenCV's size limitation, then you can divide the 35x35 kernel into a 5x5 set of 7x7 "kernel tiles", apply each "kernel tile" to the image to get an output, then shift the result and combine them to get the final sum.
General suggestions for convolution with large 2D kernels:
Try to use kernels that are separable, i.e. a kernel that is the outer product of a column vector and a row vector. In other words, the matrix that represents the kernel is rank-1.
Try use the FFT method. Convolution in the spatial domain is the same as elementwise conjugate multiplication in the frequency domain.
If the kernel is full-rank and for the application's purpose it cannot be modified, then consider using SVD to decompose the kernel into a set of 35 rank-1 matrices (each of which can be expressed as the outer product of a column vector and a row vector), and perform convolution only with the matrices associated with the largest singular values. This introduces errors into the results, but the error can be estimated based on the singular values. (a.k.a. the MATLAB method)
Other special cases:
Kernels that can be expressed as sum of overlapping rectangular blocks can be computed using the integral image (the method used in Viola-Jones face detection).
Kernels that are smooth and modal (with a small number of peaks) can be approximated by sum of 2D Gaussians.

Resources