Check if PyTorch tensors are equal within epsilon - pytorch

How do I check if two PyTorch tensors are semantically equal?
Given floating point errors, I want to know if the the elements differ only by a small epsilon value.

At the time of writing, this is a undocumented function in the latest stable release (0.4.1), but the documentation is in the master (unstable) branch.
torch.allclose() will return a boolean indicating whether all element-wise differences are equal allowing for a margin of error.
Additionally, there's the undocumented isclose():
>>> torch.isclose(torch.Tensor([1]), torch.Tensor([1.00000001]))
tensor([1], dtype=torch.uint8)

Related

Does equal probabilities not summing to one in torch.utils.data.WeightedRandomSampler still make it uniform?

In pytorch, there is a sampler class called WeightedRandomSampler (https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler). It ('weights' parameter) expects probabilities for N samples. For uniform distribution, I believe it expects array with 1/N value.
But if I put say 0.5 for each sample, where N*0.5 is not equal to 1, does it still make the sampling uniform, given equal probabilities are there for each sample?
Yes, the sampling will still be uniform. Only the relative magnitude of the weights with respect to the other weights is important, not the absolute magnitude, as pytorch normalizes the weights.
If we look under the hood of WeightedRandomSampler, it makes a call to torch.multinomial which itself makes a call to torch.distributions.Categorical, which we can see here (line 57) normalizes the weights such that they sum to one.

Keras loss functions: how to round?

I'm trying to recognize turning points in sequences, the points after which some process behaves differently. I use a keras model to do this. Input is the sequence (always the same length) and output should be 0 before the turning points, a 1 after the turning point.
I want the loss function to depend on the distance between the actual turning point and the predicted turning point.
I tried to round (to obtain the label 0 or 1), followed by summing the total number of 1's to get the "index" of the turning point. Assumed here is that the model gives just one turning point, as the data (synthetically produced) also has just one turning point. Tried is:
def dist_loss(yTrue,yPred):
turningPointTrue = K.sum(yTrue)
turningPointPred = K.sum(K.round(yPred))
return K.abs(turningPointTrue-turningPointPred)
This does not work, the following error is given:
ValueError: An operation has None for gradient. Please make sure
that all of your ops have a gradient defined (i.e. are
differentiable). Common ops without gradient: K.argmax, K.round,
K.eval.
I think this means that K.round(yPred) gives a singular value, instead of a vector/tensor. Does anyone know how to solve this issue?
The round operation has no defined gradient, so it cannot be used at all inside a loss function, since for training of a neural network the gradient of the loss with respect to the weights has to be computed, and this implies that all the parts of the network and loss must be differentiable (or a differentiable approximation must be available).
In your case you should try to find an approximation to round that is differentiable, but unfortunately I don't know if there is one. One example of such approximation is the softmax function as approximation of the max function.

Random Index from a Tensor (Sampling with Replacement from a Tensor)

I'm trying to manipulate individual weights of different neural nets to see how their performance degrades. As part of these experiments, I'm required to sample randomly from their weight tensors, which I've come to understand as sampling with replacement (in the statistical sense). However, since it's high-dimensional, I've been stumped by how to do this in a fair manner. Here are the approaches and research I've put into considering this problem:
This was previously implemented by selecting a random layer and then selecting a random weight in that layer (ignore the implementation of picking a random weight). Since layers are different sizes, we discovered that weights were being sampled unevenly.
I considered what would happen if we sampled according to the numpy.shape of the tensor; however, I realize now that this encounters the same problem as above.
Consider what happens to a rank 2 tensor like this:
[[*, *, *],
[*, *, *, *]]
Selecting a row randomly and then a value from that row results in an unfair selection. This method could work if you're able to assert that this scenario never occurs, but it's far from a general solution.
Note that this possible duplicate actually implements it in this fashion.
I found people suggesting flattening the tensor and use numpy.random.choice to select randomly from a 1D array. That's a simple solution, except I have no idea how to invert the flattened tensor back into its original shape. Further, flattening millions of weights would be a somewhat slow implementation.
I found someone discussing tf.random.multinomial here, but I don't understand enough of it to know whether it's applicable or not.
I ran into this paper about resevoir sampling, but again, it went over my head.
I found another paper which specifically discusses tensors and sampling techniques, but it went even further over my head.
A teammate found this other paper which talks about random sampling from a tensor, but it's only for rank 3 tensors.
Any help understanding how to do this? I'm working in Python with Keras, but I'll take an algorithm in any form that it exists. Thank you in advance.
Before I forget to document the solution we arrived at, I'll talk about the two different paths I see for implementing this:
Use a total ordering on scalar elements of the tensor. This is effectively enumerating your elements, i.e. flattening them. However, you can do this while maintaining the original shape. Consider this pseudocode (in Python-like syntax):
def sample_tensor(tensor, chosen_index: int) -> Tuple[int]:
"""Maps a chosen random number to its index in the given tensor.
Args:
tensor: A ragged-array n-tensor.
chosen_index: An integer in [0, num_scalar_elements_in_tensor).
Returns:
The index that accesses this element in the tensor.
NOTE: Entirely untested, expect it to be fundamentally flawed.
"""
remaining = chosen_index
for (i, sub_list) in enumerate(tensor):
if type(sub_list) is an iterable:
if |sub_list| > remaining:
remaining -= |sub_list|
else:
return i joined with sample_tensor(sub_list, remaining)
else:
if len(sub_list) <= remaining:
return tuple(remaining)
First of all, I'm aware this isn't a sound algorithm. The idea is to count down until you reach your element, with bookkeeping for indices.
We need to make crucial assumptions here. 1) All lists will eventually contain only scalars. 2) By direct consequence, if a list contains lists, assume that it also doesn't contain scalars at the same level. (Stop and convince yourself for (2).)
We also need to make a critical note here too: We are unable to measure the number of scalars in any given list, unless the list is homogeneously consisting of scalars. In order to avoid measuring this magnitude at every point, my algorithm above should be refactored to descend first, and subtract later.
This algorithm has some consequences:
It's the fastest in its entire style of approaching the problem. If you want to write a function f: [0, total_elems) -> Tuple[int], you must know the number of preceding scalar elements along the total ordering of the tensor. This is effectively bound at Theta(l) where l is the number of lists in the tensor (since we can call len on a list of scalars).
It's slow. It's too slow compared to sampling nicer tensors that have a defined shape to them.
It begs the question: can we do better? See the next solution.
Use a probability distribution in conjunction with numpy.random.choice. The idea here is that if we know ahead of time what the distribution of scalars is already like, we can sample fairly at each level of descending the tensor. The hard problem here is building this distribution.
I won't write pseudocode for this, but lay out some objectives:
This can be called only once to build the data structure.
The algorithm needs to combine iterative and recursive techniques to a) build distributions for sibling lists and b) build distributions for descendants, respectively.
The algorithm will need to map indices to a probability distribution respective to sibling lists (note the assumptions discussed above). This does require knowing the number of elements in an arbitrary sub-tensor.
At lower levels where lists contain only scalars, we can simplify by just storing the number of elements in said list (as opposed to storing probabilities of selecting scalars randomly from a 1D array).
You will likely need 2-3 functions: one that utilizes the probability distribution to return an index, a function that builds the distribution object, and possibly a function that just counts elements to help build the distribution.
This is also faster at O(n) where n is the rank of the tensor. I'm convinced this is the fastest possible algorithm, but I lack the time to try to prove it.
You might choose to store the distribution as an ordered dictionary that maps a probability to either another dictionary or the number of elements in a 1D array. I think this might be the most sensible structure.
Note that (2) is truly the same as (1), but we pre-compute knowledge about the densities of the tensor.
I hope this helps.

Computation in fixed point or int

I am using fixed point numbers within my network based on keras framework. My concern is when there are multiplication operations in the network on theano variables, the result is float32 ( even if the numbers supplied are in fixed point). Is there any intrinsic way to get the result in fixed point format, or even int.
If not, what can be alternative approaches?

standard error of addition, subtraction, multiplication and ratio

Let's say, I have two random variables,x and y, both of them have n observations. I've used a forecasting method to estimate xn+1 and yn+1, and I also got the standard error for both xn+1 and yn+1. So my question is that what the formula would be if I want to know the standard error of xn+1 + yn+1, xn+1 - yn+1, (xn+1)*(yn+1) and (xn+1)/(yn+1), so that I can calculate the prediction interval for the 4 combinations. Any thought would be much appreciated. Thanks.
Well, the general topic you need to look at is called "change of variables" in mathematical statistics.
The density function for a sum of random variables is the convolution of the individual densities (but only if the variables are independent). Likewise for the difference. In special cases, that convolution is easy to find. For example, for Gaussian variables the density of the sum is also a Gaussian.
For product and quotient, there aren't any simple results, except in special cases. For those, you might as well compute the result directly, maybe by sampling or other numerical methods.
If your variables x and y are not independent, that complicates the situation. But even then, I think sampling is straightforward.

Resources