Calculate Batch Pairwise Sinkhorn Distance in PyTorch - pytorch

I have two tensors and both are of same shape. I want to calculate pairwise sinkhorn distance using GeomLoss.
What i have tried:
import torch
import geomloss # pip install git+https://github.com/jeanfeydy/geomloss
a = torch.rand((8,4))
b = torch.rand((8,4))
geomloss.SamplesLoss('sinkhorn')(a,b)
# ^ input shape [batch, feature_dim]
# will return a scalar value
geomloss.SamplesLoss('sinkhorn')(a.unsqueeze(1),b.unsqueeze(1))
# ^ input shape [batch, n_points, feature_dim]
# will return a tensor of size [batch] of distances between a[i] and b[i] for each i
However I would like to compute pairwise distance where the resultant tensor should be of size [batch, batch]. To achieve this, I tried the following to use broadcasting:
geomloss.SamplesLoss('sinkhorn')(a.unsqueeze(0), b.unsqueeze(1))
But I got this error message:
ValueError: Samples x and y should have the same batchsize.

Since the documentation doesn't give examples on how to use the distance's forward function. Here's a way to do it, which will require you to call the distance function batch times.
We will construct the distance matrix line by line. Line i corresponds to the distances a[i]<->b[0], a[i]<->b[1], through to a[i]<->b[batch]. To do so we need to construct, for each line i, a (8x4) repeated version of tensor a[i].
This will do:
a_i = torch.stack(8*[a[i]], dim=0)
Then we calculate the distance between a[i] and each batch in b:
dist(a_i.unsqueeze(1), b.unsqueeze(1))
Having a total of batch lines we can construct our final tensor stack.
Here's the complete code:
batch = a.shape[0]
dist = geomloss.SamplesLoss('sinkhorn')
distances = [dist(torch.stack(batch*[a[i]]).unsqueeze(1), b.unsqueeze(1)) for i in range(batch)]
D = torch.stack(distances)

Related

Pytorch: Custom thresholding activation function - gradient

I created an activation function class Threshold that should operate on one-hot-encoded image tensors.
The function performs min-max feature scaling on each channel followed by thresholding.
class Threshold(nn.Module):
def __init__(self, threshold=.5):
super().__init__()
if threshold < 0.0 or threshold > 1.0:
raise ValueError("Threshold value must be in [0,1]")
else:
self.threshold = threshold
def min_max_fscale(self, input):
r"""
applies min max feature scaling to input. Each channel is treated individually.
input is assumed to be N x C x H x W (one-hot-encoded prediction)
"""
for i in range(input.shape[0]):
# N
for j in range(input.shape[1]):
# C
min = torch.min(input[i][j])
max = torch.max(input[i][j])
input[i][j] = (input[i][j] - min) / (max - min)
return input
def forward(self, input):
assert (len(input.shape) == 4), f"input has wrong number of dims. Must have dim = 4 but has dim {input.shape}"
input = self.min_max_fscale(input)
return (input >= self.threshold) * 1.0
When I use the function I get the following error, since the gradients are not calculated automatically I assume.
Variable._execution_engine.run_backward(RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I already had a look at How to properly update the weights in PyTorch? but could not get a clue how to apply it to my case.
How is it possible to calculate the gradients for this function?
Thanks for your help.
The issue is you are manipulating and overwriting elements, this time of operation can't be tracked by autograd. Instead, you should stick with built-in functions. You example is not that tricky to tackle: you are looking to retrieve the minimum and maximum values along input.shape[0] x input.shape[1]. Then you will scale your whole tensor in one go i.e. in vectorized form. No for loops involved!
One way to compute min/max along multiple axes is to flatten those:
>>> x_f = x.flatten(2)
Then, find the min-max on the flattened axis while retaining all shapes:
>>> x_min = x_f.min(axis=-1, keepdim=True).values
>>> x_max = x_f.max(axis=-1, keepdim=True).values
The resulting min_max_fscale function would look something like:
class Threshold(nn.Module):
def min_max_fscale(self, x):
r"""
Applies min max feature scaling to input. Each channel is treated individually.
Input is assumed to be N x C x H x W (one-hot-encoded prediction)
"""
x_f = x.flatten(2)
x_min, x_max = x_f.min(-1, True).values, x_f.max(-1, True).values
x_f = (x_f - x_min) / (x_max - x_min)
return x_f.reshape_as(x)
Important note:
You would notice that you can now backpropagate on min_max_fscale... but not on forward. This is because you are applying a boolean condition which is not a differentiable operation.

To calculate euclidean distance between vectors in a torch tensor with multiple dimensions

There is a random initialized torch tensor of the shape as below.
Inputs
tensor1 = torch.rand((4,2,3,100))
tensor2 = torch.rand((4,2,3,100))
tensor1 and tensor2 are torch tensors with 24 100-dimensional vectors, respectively.
I want to get a tensor with a shape of torch.size([4,2,3]) by obtaining the Euclidean distance between vectors with the same index of two tensors.
I used dist = torch.nn.functional.pairwise_distance(tensor1, tensor2) to get the results I wanted.
However, the pairwise_distance function calculates the euclidean distance for the second dimension of the tensor. So dist shape is torch.size([4,3,100]).
I have performed transpose several times to solve these problems. My code is as follows.
tensor1 = tensor1.transpose(1,3)
tensor2 = tensor2.transpose(1,3)
dist = torch.nn.functional.pairwise_distance(tensor1, tensor2)
dist = dist.transpose(1,2)
Is there a simpler or easier way to get the result I want?
Here ya go
dist = (tensor1 - tensor2).pow(2).sum(3).sqrt()
Basically that's what Euclidean distance is.
Subtract -> power by 2 -> sum along the unfortunate axis you want to eliminate-> square root

Attention weighted aggregation

Let the tensor shown below be the representation of two sentences (batch_size = 2) composed with 3 words (max_lenght = 3) and each word being represented by vectors of dimension equal to 5 (hidden_size = 5) obtained as output from a neural network:
net_output
# tensor([[[0.7718, 0.3856, 0.2545, 0.7502, 0.5844],
# [0.4400, 0.3753, 0.4840, 0.2483, 0.4751],
# [0.4927, 0.7380, 0.1502, 0.5222, 0.0093]],
# [[0.5859, 0.0010, 0.2261, 0.6318, 0.5636],
# [0.0996, 0.2178, 0.9003, 0.4708, 0.7501],
# [0.4244, 0.7947, 0.5711, 0.0720, 0.1106]]])
Also consider the following attention scores:
att_scores
# tensor([[0.2425, 0.5279, 0.2295],
# [0.2461, 0.4789, 0.2751]])
Which efficient approach allows obtaining the aggregation of vectors in net_output weighted by att_scores resulting in a vector of shape (2, 5)?
This should work:
weighted = (net_output * att_scores[..., None]).sum(axis = 1)
Uses broadcasting to (elementwise) multiply the attention weights to each vector and aggregates (them by summing) all vectors in a batch.

Torch - Interpolate missing values

I have a stock of tensor images of a form NumOfImagesxHxW that includes zeros. I am looking for a way to interpolate the missing values (zeros) using the information in the same image only (no connection between the images). Is there a way to do it using pytorch?
F.interpolate seems to work only for reshaping. I need to fill the zeros, while keeping the dimensions and the gradients of the tensor.
Thanks.
EDIT: Turns out the below does not answer the OP as it does not provide a solution to track gradients for back-propagation. Still leaving it as it can be used as part of a solution.
One way is to convert the tensor to numpy array and use scipy interpolation, e.g. scipy.interpolate.LinearGridInterpolator [1] or other possible numpy array interpolation options (some detailed here). Not sure this helps as this is not pytorch + may involve copying the tensor around.
As scipy interpolation may be slow, one possible solution is to only use pixels adjacent to missing values for interpolation (can be easily obtained by dilation on missing values mask). I think that this might speed things up by an order of magnitude, depeding on tensor dimensions and number of missing values.
Edit: implemented it, seems to give a speedup of two orders of magnitude in my case.
def fillMissingValues(target_for_interp, copy=True,
interpolator=scipy.interpolate.LinearNDInterpolator):
import cv2, scipy, numpy as np
if copy:
target_for_interp = target_for_interp.copy()
def getPixelsForInterp(img):
"""
Calculates a mask of pixels neighboring invalid values -
to use for interpolation.
"""
# mask invalid pixels
invalid_mask = np.isnan(img) + (img == 0)
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
#dilate to mark borders around invalid regions
dilated_mask = cv2.dilate(invalid_mask.astype('uint8'), kernel,
borderType=cv2.BORDER_CONSTANT, borderValue=int(0))
# pixelwise "and" with valid pixel mask (~invalid_mask)
masked_for_interp = dilated_mask * ~invalid_mask
return masked_for_interp.astype('bool'), invalid_mask
# Mask pixels for interpolation
mask_for_interp, invalid_mask = getPixelsForInterp(target_for_interp)
# Interpolate only holes, only using these pixels
points = np.argwhere(mask_for_interp)
values = target_for_interp[mask_for_interp]
interp = interpolator(points, values)
target_for_interp[invalid_mask] = interp(np.argwhere(invalid_mask))
return target_for_interp
# For the target tensor:
target_filled = fillMissingValues(target.numpy().squeeze())
# transform back to tensor etc..
Note that interpolated values will be np.nan outside of the convex hull of valid points, as provided to LinearNDInterpolator.
If you only want nearest neighbor interpolation, you can make #Yuri Feldman's answer differentiable by returning the index mapping instead of the interpolated image.
What I did is to create a new class from scipy.interpolate.NearestNDInterpolator and override its __call__ method. It's just returning indices instead of values.
from scipy.interpolate.interpnd import _ndim_coords_from_arrays
class NearestNDInterpolatorIndex(NearestNDInterpolator):
def __init__(self, x, y, rescale=False, tree_options=None):
NearestNDInterpolator.__init__(self, x, y, rescale=rescale, tree_options=tree_options)
self.points = np.asarray(x)
def __call__(self, *args):
"""
Evaluate interpolator at given points.
Parameters
----------
xi : ndarray of float, shape (..., ndim)
Points where to interpolate data at.
"""
xi = _ndim_coords_from_arrays(args, ndim=self.points.shape[1])
xi = self._check_call_shape(xi)
xi = self._scale_x(xi)
dist, i = self.tree.query(xi)
return self.points[i]
Then, in fillMissingValues, instead of returning target_for_interp, we return these:
source_indices = np.argwhere(invalid_mask)
target_indices = interp(source_indices)
return source_indices, target_indices
Pass the new interpolator to fillMissingValues, then we can get the nearest neighbor interpolation of the image by
img[..., source_indices[:, 0], source_indices[:, 1]] = img[..., target_indices[:, 0], target_indices[:, 1]]
assuming that the image size is on the last two dimensions.
EDIT: This is not differentiable as I just tested. The problem lies in the index mapping. We need to use masking instead of the in-place operation, and then problem solved.

how can I insert a Tensor into another Tensor in pytorch

I have pytorch Tensor with shape (batch_size, step, vec_size), for example, a Tensor(32, 64, 128), let's call it A.
I have another Tensor(batch_size, vec_size), e.g. Tensor(32, 128), let's call it B.
I want to insert B into a certain position at axis 1 of A. The insert positions are given in a Tensor(batch_size), named P.
I understand there is no Empty tensor(like an empty list) in pytorch, so, I initialize A as zeros, and add B at a certain position at axis 1 of A.
A = Variable(torch.zeros(batch_size, step, vec_size))
What I'm doing is like:
for i in range(batch_size):
pos = P[i]
A[i][pos] = A[i][pos] + B[i]
But I get an Error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
Then, I make a clone of A each inside the loop:
for i in range(batch_size):
A_clone = A.clone()
pos = P[i]
A_clone[i][pos] = A_clone[i][pos] + B[i]
This is very slow for autograd, I wonder if there any better solutions? Thank you.
You can use a mask instead of cloning.
See the code below
# setup
batch, step, vec_size = 64, 10, 128
A = torch.rand((batch, step, vec_size))
B = torch.rand((batch, vec_size))
pos = torch.randint(10, (64,)).long()
# computations
# create a mask where pos is 0 if it is to be replaced
mask = torch.ones( (batch, step)).view(batch,step,1).float()
mask[torch.arange(batch), pos]=0
# expand B to have same dimension as A and compute the result
result = A*mask + B.unsqueeze(dim=1).expand([-1, step, -1])*(1-mask)
This way you avoid using for loops and cloning as well.

Resources