What is the most efficient way to calculate the mahalanobis distance: in pytorch?
Based on SciPy's implementation of the mahalanobis distance, you would do this in PyTorch. Assuming u and v are 1D and cov is the 2D covariance matrix.
def mahalanobis(u, v, cov):
delta = u - v
m = torch.dot(delta, torch.matmul(torch.inverse(cov), delta))
return torch.sqrt(m)
Note: scipy.spatial.distance.mahalanobis takes in the inverse of the covariance matrix.
Related
I am implementing the paper Deep multiscale convolutional feature learning for weakly supervised localization of chest pathologies in X-ray images
According to my understanding the layer relevance weights belong to the last layer of each dense block.
I tried implementing the weight constraints as shown below:
def weight_constraints(self):
weights= {'feat1': self.model.features.denseblock2.denselayer12.conv2.weight.data,
'feat2':self.model.features.denseblock3.denselayer24.conv2.weight.data,
'feat3':self.model.features.denseblock4.denselayer16.conv2.weight.data}
sum(weights.values()) == 1
for i in weights.keys():
w = weights[i]
w1 = w.clamp(min= 0)
weights[i] = w1
return weights
weights= self.weight_constraints()
for i in weights.keys():
w = weights[i]
l = logits[i]
p = torch.matmul(w , l[0])
sum = sum + p
where logits is a dictionary which contains out of FC layer from each block as shown in the diagram.
logits = {'feat1': [tensor([[-0.0630]], ...ackward0>)], 'feat2': [tensor([[-0.0323]], ...ackward0>)], 'feat3': [tensor([[-8.2897e-06...ackward0>)]}
I get the following error :
mat1 and mat2 shapes cannot be multiplied (12288x3 and 1x1)
Is this the right approach?
The paper states
The
logit response from all the layers have same dimension (equal to the number of
category for classification) and now can be combined using class specific convex
combination to obtain the probability score for the class pc.
The function matmul you used perfroms matrix multiplications, it requires mat1.shape[-1] == mat2.shape[-2].
If you assume sum(w)==1, and torch.all(w > 0), you could compute the convex combination of l as (w * l).sum(-1) that is multiply w and l element-wise, broadcasting over the batch dimensions of l, and requiring w.shape[-1] == l.shape[-1] (presumably 3).
If you want to stick with matmul you can add one dimension to w and l, and perform the vector product as a matrix multiplication: torch.matmul(w[...,None,:], l[..., :, None]).
I want to sample a tensor of probability distributions with shape (N, C, H, W), where dimension 1 (size C) contains normalized probability distributions with āCā possibilities. Is there a pytorch function to efficiently sample all the distributions in the tensor in parallel? I just need to sample each distribution once, so the result could either be a one-hot tensor with the same shape or a tensor of indices with shape (N, 1, H, W).
There was no single function to sample that I saw, but I was able to sample the tensor in several steps by computing the cumulative probabilities, sampling each point independently, and then picking the first point that sampled a 1 in the distribution dimension:
reverse_cumulative = torch.flip(torch.cumsum(torch.flip(probabilities, [1]), dim=1), [1])
cumulative = probabilities / reverse_cumulative
sampled = (torch.rand(cumulative.shape, device=device()) <= cumulative)
idxs = sampled * one_hot
idxs[~sampled] = self.tile_count
sampled_idxs = idxs.min(dim=1).indices
I have a neural network. For simplicity, there's only one layer and the weight matrix is of shape 2-by-2. I need the output of the network to be the rotated version of the input, i.e., the matrix should be a valid rotation matrix. I have tried the following:
def rotate(val):
w1 = tf.constant_initializer([[cos45, -sin45], [sin45, cos45]])
return tf.layers.dense(inputs=val, units=2, kernel_initializer=w1, activation=tf.nn.tanh)
While training, I do not want to lose properties of the rotation matrix. In other words, I need the layer(s) to estimate only the angle (argument) of trigonometric functions in the matrix.
I read that kernel_constraint can help in this aspect, by normalizing the values. But applying kernel_constraint does not guarantee diagonal entries being equal and the off diagonal entries being negatives of each other (in this case). In general, the two properties that need to be satisfied are, the determinant should be 1 and R^T*R = I.
Is there any other way to achieve this?
You could define your custom Keras layer. Something along the lines of:
from tensorflow.keras.layers import Layer
import tensorflow as tf
class Rotate(Layer):
def build(self, input_shape):
sh = input_shape[0]
shape = [sh, sh]
# Initial weight matrix
w = self.add_weight(shape=shape,
initializer='random_uniform')
# Set upper diagonal elements to negative of lower diagonal elements
mask = tf.cast(tf.linalg.band_part(tf.ones(shape), -1, 0), tf.float32)
w = mask * w
w -= tf.transpose(w)
# Set the same weight to the diagonal
diag_mask = 1 - tf.linalg.diag(tf.ones(sh))
w = diag_mask * w
diag_w = self.add_weight(shape=(1,),
initializer='random_uniform')
diagonal = tf.linalg.diag(tf.ones(sh)) * diag_w
self.kernel = w + diagonal
def call(self, inputs, **kwargs):
return tf.matmul(inputs, self.kernel)
Note that the matrix of learnable weights self.kernel has this aspect: [[D, -L], [L, D]]
I can use torch.sparse.mm() or torch.spmm() to do multiplication between sparse matrix and dense matrix directly, but which function should I choose to do element-wise multiplication?
You can implement this multiplication yourself
def sparse_dense_mul(s, d):
i = s._indices()
v = s._values()
dv = d[i[0,:], i[1,:]] # get values from relevant entries of dense matrix
return torch.sparse.FloatTensor(i, v * dv, s.size())
Note that thanks to the linearity of the multiplication operation you do not need to worry if s is coalesced or not.
I have two functions that are suppose to produce equal results: f1(x,theta)=f2(x,theta).
Given input x, I need to find the parameters theta that makes this equality hold as well as possible.
Initially I was thinking of using squared loss and minimizing (f1(x,theta)-f2(x,theta))^2 and solving via SGD.
However I was thinking of making the loss more precise and using huber (or absolute loss) of the difference.
Huber loss is a piecewise function (ie initially it is quadratic and then it changes into a linear function).
How can I take the gradient of my huber loss in theano?
A pretty simple implementation of huber loss in theano can be found here
Here is a code snippet
import theano.tensor as T
delta = 0.1
def huber(target, output):
d = target - output
a = .5 * d**2
b = delta * (abs(d) - delta / 2.)
l = T.switch(abs(d) <= delta, a, b)
return l.sum()
The function huber will return a symbolic representation of the loss which you can then plug in theano.tensor.grad to get the gradient and use it to minimize using SGD