Dear all, I would like to do a convolution using a 35 x 35 kernel. Any suggestion? or any method already in opencv i can use? Because now the cvfilter2d can only support until 10 x 10 kernel.
If you just need quick-and-dirty solution due to OpenCV's size limitation, then you can divide the 35x35 kernel into a 5x5 set of 7x7 "kernel tiles", apply each "kernel tile" to the image to get an output, then shift the result and combine them to get the final sum.
General suggestions for convolution with large 2D kernels:
Try to use kernels that are separable, i.e. a kernel that is the outer product of a column vector and a row vector. In other words, the matrix that represents the kernel is rank-1.
Try use the FFT method. Convolution in the spatial domain is the same as elementwise conjugate multiplication in the frequency domain.
If the kernel is full-rank and for the application's purpose it cannot be modified, then consider using SVD to decompose the kernel into a set of 35 rank-1 matrices (each of which can be expressed as the outer product of a column vector and a row vector), and perform convolution only with the matrices associated with the largest singular values. This introduces errors into the results, but the error can be estimated based on the singular values. (a.k.a. the MATLAB method)
Other special cases:
Kernels that can be expressed as sum of overlapping rectangular blocks can be computed using the integral image (the method used in Viola-Jones face detection).
Kernels that are smooth and modal (with a small number of peaks) can be approximated by sum of 2D Gaussians.
Related
I have a 5D input tensor and I also know what 5D tensor I need as the output. How can I find out if I would need padding or not? Also, how can I calculate the kernel size and stride size? There are some formulations but stride size and kernel sizes would contain 3 elements like (a, b, c). When it comes to calculate these multiple elements, solving those equations would be more complicated. How can I compute thos elements?
For example, if I have a tensor of size [16,1024,16,14,14], how can I calculate the stride and kernel sizes in a conv3d if I want an output with the size of [16,32,16,3,5]? How can I know that if padding is needed?
Is there any website that automatically can calculate these parameters?
The documentation of torch.Tensor.view says:
each new view dimension must either be a subspace of an original dimension, or only span across original dimensions ...
https://pytorch.org/docs/stable/tensors.html?highlight=view#torch.Tensor.view
What is a subspace of a dimension?
The 'subspace of an original dimension' dilemma
In order to use tensor.view() the tensor must satisfy two conditions-
each new view dimension must either be a
subspace of an original dimension
or only span across original dimensions ...
Lets discuss this one by one,
First, regarding subspace of an original dimension you need to understand the concept behind subspace. Not going into mathematical detail but in short - subspace is a subset of infinite number of n dimensional vectors(Rn) and the vectors inside the subspace must follow 2 rules -
i) Sub space will contain the Zero vector(0n)
ii) Must satisfy closure under Multiplication and addition
To visualise this in mind you can consider a 2D plane containing infinite lines. So the subspace of that 2D vector space will be set of lines which will pass through origin. These lines satisfies above two conditions.
Now there is a concept called projection of subspaces. Without digging into too much mathematical detail you can consider it as a regular line projection but for subspaces.
Now back to the point, lets assume if you have a tensor of size (4,5), you can actually consider it as a 20 dimensional vector. Assume you have a 20D space, and the subspaces will pass through the origin, and if you want to make projection of any line l1 from subspace with respect to any 2 axes, tensor.view() will output projection of that line with (2,10).
For the second part, you need to understand the concept of contiguous and non-contiguous memory allocation in pytorch. As its out of scope of the question I am going to explain it in very brief. If you view a n dimensional vector as (n/k, k) and if you run tensor.stride() on the new vector it will show you the stride for memory allocation in x and y direction. Now if you run view() again with different dimensions then this following equation must hold true for successful conversion due to non-contiguous memory allocation.
I tried my best to explain it in brief, let me know if you have more questions.
After some thought, I think the sentence could be interpreted as follows (despite not being mathematically formal).
Suppose to have a tensor t of size (4, 6), that can be seen as an ordered set of four 6d row vectors residing in a vector space.
We can perform a tensor view v = t.view(4, 2, 3). v has now two new view dimensions: 2 and 3. We can see it as an ordered set containing four ordered sets of two 3d vectors, or three 2d vectors, depending on how we consider them.
Such new, smaller vectors can be mathematically seen as projections of the original 6-element vectors onto a vector subspace.
I'm trying to manipulate individual weights of different neural nets to see how their performance degrades. As part of these experiments, I'm required to sample randomly from their weight tensors, which I've come to understand as sampling with replacement (in the statistical sense). However, since it's high-dimensional, I've been stumped by how to do this in a fair manner. Here are the approaches and research I've put into considering this problem:
This was previously implemented by selecting a random layer and then selecting a random weight in that layer (ignore the implementation of picking a random weight). Since layers are different sizes, we discovered that weights were being sampled unevenly.
I considered what would happen if we sampled according to the numpy.shape of the tensor; however, I realize now that this encounters the same problem as above.
Consider what happens to a rank 2 tensor like this:
[[*, *, *],
[*, *, *, *]]
Selecting a row randomly and then a value from that row results in an unfair selection. This method could work if you're able to assert that this scenario never occurs, but it's far from a general solution.
Note that this possible duplicate actually implements it in this fashion.
I found people suggesting flattening the tensor and use numpy.random.choice to select randomly from a 1D array. That's a simple solution, except I have no idea how to invert the flattened tensor back into its original shape. Further, flattening millions of weights would be a somewhat slow implementation.
I found someone discussing tf.random.multinomial here, but I don't understand enough of it to know whether it's applicable or not.
I ran into this paper about resevoir sampling, but again, it went over my head.
I found another paper which specifically discusses tensors and sampling techniques, but it went even further over my head.
A teammate found this other paper which talks about random sampling from a tensor, but it's only for rank 3 tensors.
Any help understanding how to do this? I'm working in Python with Keras, but I'll take an algorithm in any form that it exists. Thank you in advance.
Before I forget to document the solution we arrived at, I'll talk about the two different paths I see for implementing this:
Use a total ordering on scalar elements of the tensor. This is effectively enumerating your elements, i.e. flattening them. However, you can do this while maintaining the original shape. Consider this pseudocode (in Python-like syntax):
def sample_tensor(tensor, chosen_index: int) -> Tuple[int]:
"""Maps a chosen random number to its index in the given tensor.
Args:
tensor: A ragged-array n-tensor.
chosen_index: An integer in [0, num_scalar_elements_in_tensor).
Returns:
The index that accesses this element in the tensor.
NOTE: Entirely untested, expect it to be fundamentally flawed.
"""
remaining = chosen_index
for (i, sub_list) in enumerate(tensor):
if type(sub_list) is an iterable:
if |sub_list| > remaining:
remaining -= |sub_list|
else:
return i joined with sample_tensor(sub_list, remaining)
else:
if len(sub_list) <= remaining:
return tuple(remaining)
First of all, I'm aware this isn't a sound algorithm. The idea is to count down until you reach your element, with bookkeeping for indices.
We need to make crucial assumptions here. 1) All lists will eventually contain only scalars. 2) By direct consequence, if a list contains lists, assume that it also doesn't contain scalars at the same level. (Stop and convince yourself for (2).)
We also need to make a critical note here too: We are unable to measure the number of scalars in any given list, unless the list is homogeneously consisting of scalars. In order to avoid measuring this magnitude at every point, my algorithm above should be refactored to descend first, and subtract later.
This algorithm has some consequences:
It's the fastest in its entire style of approaching the problem. If you want to write a function f: [0, total_elems) -> Tuple[int], you must know the number of preceding scalar elements along the total ordering of the tensor. This is effectively bound at Theta(l) where l is the number of lists in the tensor (since we can call len on a list of scalars).
It's slow. It's too slow compared to sampling nicer tensors that have a defined shape to them.
It begs the question: can we do better? See the next solution.
Use a probability distribution in conjunction with numpy.random.choice. The idea here is that if we know ahead of time what the distribution of scalars is already like, we can sample fairly at each level of descending the tensor. The hard problem here is building this distribution.
I won't write pseudocode for this, but lay out some objectives:
This can be called only once to build the data structure.
The algorithm needs to combine iterative and recursive techniques to a) build distributions for sibling lists and b) build distributions for descendants, respectively.
The algorithm will need to map indices to a probability distribution respective to sibling lists (note the assumptions discussed above). This does require knowing the number of elements in an arbitrary sub-tensor.
At lower levels where lists contain only scalars, we can simplify by just storing the number of elements in said list (as opposed to storing probabilities of selecting scalars randomly from a 1D array).
You will likely need 2-3 functions: one that utilizes the probability distribution to return an index, a function that builds the distribution object, and possibly a function that just counts elements to help build the distribution.
This is also faster at O(n) where n is the rank of the tensor. I'm convinced this is the fastest possible algorithm, but I lack the time to try to prove it.
You might choose to store the distribution as an ordered dictionary that maps a probability to either another dictionary or the number of elements in a 1D array. I think this might be the most sensible structure.
Note that (2) is truly the same as (1), but we pre-compute knowledge about the densities of the tensor.
I hope this helps.
Background so I don't throw out an XY problem -- I'm trying to check the goodness of fit of a GMM because I want statistical back-up for why I'm choosing the number of clusters I've chosen to group these samples. I'm checking AIC, BIC, entropy, and root mean squared error. This question is about entropy.
I've used kmeans to cluster a bunch of samples, and I want an entropy greater than 0.9 (stats and psychology are not my expertise and this problem is both). I have 59 samples; each sample has 3 features in it. I look for the best covariance type via
for cv_type in cv_types:
for n_components in n_components_range:
# Fit a Gaussian mixture with EM
gmm = mixture.GaussianMixture(n_components=n_components,
covariance_type=cv_type)
gmm.fit(data3)
where the n_components_range is just [2] (later I'll check 2 through 5).
Then I take the GMM with the lowest AIC or BIC, saved as best_eitherAB, (not shown) of the four. I want to see if the label assignments of the predictions are stable across time (I want to run for 1000 iterations), so I know I then need to calculate the entropy, which needs class assignment probabilities. So I predict the probabilities of the class assignment via gmm's method,
probabilities = best_eitherAB.predict_proba(data3)
all_probabilities.append(probabilities)
After all the iterations, I have an array of 1000 arrays, each contains 59 rows (sample size) by 2 columns (for the 2 classes). Each inner row of two sums to 1 to make the probability.
Now, I'm not entirely sure what to do regarding the entropy. I can just feed the whole thing into scipy.stats.entropy,
entr = scipy.stats.entropy(all_probabilities)
and it spits out numbers - as many samples as I have, I get a 2 item numpy matrix for each. I could feed just one of the 1000 tests in and just get 1 small matrix of two items; or I could feed in just a single column and get a single values back. But I don't know what this is, and the numbers are between 1 and 3.
So my questions are -- am I totally misunderstanding how I can use scipy.stats.entropy to calculate the stability of my classes? If I'm not, what's the best way to find a single number entropy that tells me how good my model selection is?
I understand (and please correct me if my understanding is wrong) that the primary purpose of a CNN is to reduce the number of parameters from what you would need if you were to use a fully connected NN. And CNN achieves this by extracting "features" of images.
CNN can do this because in a natural image, there are small features such as lines and elementary curves that may occur in an "invariant" fashion, and constitute the image much like elementary building blocks.
My question is: when we create layers of feature maps, say, 5 of them, and we get these by using the sliding window of a size, say, 5x5 on an image that has pixels of, say, 100x100, Initially, these feature maps are initialized as random number weight matrices, and must progressively adjust the weights with gradient descent right? But then, if we are getting these feature maps by using the exactly same sized windows, sliding in exactly the same ways (sharing the same starting point and the same stride value), on the exactly same image, how can these maps learn different features of the image? Won't they all come out the same, say, a line or a curve?
Is it due to the different initial values of the weight matrices? (I.e. some weight matrices are more receptive to learning a certain particular feature than others?)
Thanks!! I wrote my 4 questions/opinions and indexed them, for the ease of addressing them separately!