Calculating pairwise distances between entries in a `torch.tensor`

Calculating pairwise distances between entries in a `torch.tensor` - pytorch

I'm trying to implement a manifold alignment type of loss illustrated here.
Given a tensor embs
tensor([[ 0.0178, 0.0004, -0.0217, ..., -0.0724, 0.0698, -0.0180],
[ 0.0160, 0.0002, -0.0217, ..., -0.0725, 0.0655, -0.0207],
[ 0.0155, -0.0010, -0.0153, ..., -0.0750, 0.0688, -0.0253],
...,
[ 0.0130, -0.0113, -0.0078, ..., -0.0805, 0.0634, -0.0241],
[ 0.0120, -0.0047, -0.0135, ..., -0.0846, 0.0722, -0.0230],
[ 0.0120, -0.0048, -0.0142, ..., -0.0843, 0.0734, -0.0246]],
grad_fn=<AddmmBackward0>)
of shape (256,64) which is a batch of embeddings produced by a network, I want to compute all the pairwise distances between the row entries. I've tried with torch.nn.PairwiseDistance but it is not clear to me if it is useful for what I'm looking for.

Thought it was strange that there was none. There is and it is called torch.cdist but it is "hidden" in the top level.
>>> a = torch.rand((5,3))
>>> a
tensor([[0.0215, 0.0843, 0.3414],
[0.9878, 0.5835, 0.3052],
[0.0903, 0.7347, 0.0711],
[0.9774, 0.8202, 0.7721],
[0.7877, 0.9891, 0.4619]])
>>> torch.cdist(a,a)
tensor([[0.0000, 1.0883, 0.7077, 1.2809, 1.1918],
[1.0883, 0.0000, 0.9398, 0.5236, 0.4787],
[0.7077, 0.9398, 0.0000, 1.1339, 0.8390],
[1.2809, 0.5236, 1.1339, 0.0000, 0.4010],
[1.1918, 0.4787, 0.8390, 0.4010, 0.0000]])
>>> torch.nn.functional.pairwise_distance(a[0], a[2])
tensor(0.7077)

Related

Concat two tensors with different dimensions

I have two tensors a and b which are of different dimensions. a is of shape [100,100] and b is of the shape [100,3,10]. I want to concatenate these two tensors.
For example:
a = torch.randn(100,100)
tensor([[ 1.3236, 2.4250, 1.1547, ..., -0.7024, 1.0758, 0.2841],
[ 1.6699, -1.2751, -0.0120, ..., -0.2290, 0.9522, -0.4066],
[-0.3429, -0.5260, -0.7748, ..., -0.5235, -1.8952, 1.2944],
...,
[-1.3465, 1.2641, 1.6785, ..., 0.5144, 1.7024, -1.0046],
[-0.7652, -1.2940, -0.6964, ..., 0.4661, -0.3998, -1.2428],
[-0.4720, -1.0981, -2.3715, ..., 1.6423, 0.0560, 1.0676]])
The tensor b is as follows:
tensor([[[ 0.4747, -1.9529, -0.0448, ..., -0.9694, 0.8009, -0.0610],
[ 0.5160, 0.0810, 0.1037, ..., -1.7519, -0.3439, 1.2651],
[-0.5975, -0.2000, -1.6451, ..., 1.3082, -0.4023, -0.3105]],
...,
[[ 0.4747, -1.9529, -0.0448, ..., -0.9694, 0.8009, -0.0610],
[ 0.1939, 1.0365, -0.0927, ..., -2.4948, -0.2278, -0.2390],
[-0.5975, -0.2000, -1.6451, ..., 1.3082, -0.4023, -0.3105]]],
dtype=torch.float64, grad_fn=<CopyBackwards>)
I want to concatenate such that the first row in tensor a of size [100] is concatenated with the first row in tensor b which is of size [3,10]. This should be applicable to all rows in both tensors. That is, in simple words, considering just the first row in a and b, I want to get an output with size [100,130] as follows:
[ 1.3236, 2.4250, 1.1547, ..., -0.7024, 1.0758, 0.2841, 0.4747, -1.9529, -0.0448, ..., -0.9694, 0.8009, -0.0610, 0.5160, 0.0810, 0.1037, ..., -1.7519, -0.3439, 1.2651, -0.5975, -0.2000, -1.6451, ..., 1.3082, -0.4023, -0.3105]
In order to do this, I performed unsqueezed to tensor a to get the two tensors in the same dimensions as follows.
a = a.unsqueeze(1)
When I perform torch.cat([a,b], I still get an error. Can somebody help me in solving this?
Thanks in advance.

Reshape b tensor accordingly and then merge it to a using torch.cat on 1 dim
torch.cat((a, b.reshape(100, -1)), dim=1)

why does LedoitWolf return zeros for off-diagonal elements?

from sklearn.covariance import EmpiricalCovariance, LedoitWolf
a = np.array([[ 0.6278, -1.1273 ],
[ 0.2323, 0.4533 ],
[ 0.3234, 1.5356 ],
[1.7473 , -0.3113 ],
[-0.3525 , 0.2577 ]])
Empirical_cov = EmpiricalCovariance().fit(a)
Empirical_cov.covariance_
array([[ 0.48009421, -0.23144627],
[-0.23144627, 0.77341954]])
LedoitWolf_cov = LedoitWolf().fit(a)
LedoitWolf_cov.covariance_
array([[ 1., -0.],
[-0., 1.]])
Why is LedoitWolf giving me zeros for the off-diagonal elements? I have a few larger datasets in which this happens.
Is this a bug? The empirical covariance is non-zero, so shouldn't LedoitWolf also be non-zero? I read that using LedoitWolf would be slightly more accurate than empirical covariance, but what is the point if most covariance elements turn out to be zero?

Pytorch: why does torch.where method does not work like numpy.where?

In order to replace positive values with a certain number and negative ones with another number in a random vector using Numpy one can do the following:
npy_p = np.random.randn(4,6)
quant = np.where(npy_p>0, c_plus , np.where(npy_p<0, c_minus , npy_p))
However where method in Pytorch throws out the following error:
expected scalar type double but found float
Can you help me with that?

I can't reproduce this error, maybe it will be better if you could share a specific example where it failed (it might be the values you try to fill the tensor with):
import torch
x = torch.rand(4,6)
res = torch.where(x > 0.3,torch.tensor(0.), torch.where(x < 0.1, torch.tensor(-1.), x))
Where x is and it's of dtype float32:
tensor([[0.1391, 0.4491, 0.2363, 0.3215, 0.7740, 0.4879],
[0.3051, 0.0870, 0.2869, 0.2575, 0.8825, 0.8201],
[0.4419, 0.1138, 0.0825, 0.9489, 0.1553, 0.6505],
[0.8376, 0.7639, 0.9291, 0.0865, 0.5984, 0.3953]])
And the res is:
tensor([[ 0.1391, 0.0000, 0.2363, 0.0000, 0.0000, 0.0000],
[ 0.0000, -1.0000, 0.2869, 0.2575, 0.0000, 0.0000],
[ 0.0000, 0.1138, -1.0000, 0.0000, 0.1553, 0.0000],
[ 0.0000, 0.0000, 0.0000, -1.0000, 0.0000, 0.0000]])
The problem is caused because you mix data types in the torch.where, if you explicitly use the same datatype as the tensor in your constants it works fine.

Extract sub tensor in PyTorch

For this tensor is PyTorch,
tensor([[ 0.7646, 0.5573, 0.4000, 0.2188, 0.7646, 0.5052, 0.2042, 0.0896,
0.7667, 0.5938, 0.3167, 0.0917],
[ 0.4271, 0.1354, 0.5000, 0.1292, 0.4260, 0.1354, 0.4646, 0.0917,
-1.0000, -1.0000, -1.0000, -1.0000],
[ 0.7208, 0.5656, 0.3000, 0.1688, 0.7177, 0.5271, 0.1521, 0.0667,
0.7198, 0.5948, 0.2438, 0.0729],
[ 0.6292, 0.8250, 0.4000, 0.2292, 0.6271, 0.7698, 0.2083, 0.0812,
0.6281, 0.8604, 0.3604, 0.0917]], device='cuda:0')
How can I extract to new Tensor for those values
0.7646, 0.5573, 0.4000, 0.2188
0.4271, 0.1354, 0.5000, 0.1292
How to get the first 4 of two rows into a new tensor?

Actually the question was answered from #zihaozhihao in the Comments but in case you are wondering where that comes from it would be helpful if you structured your Tensor like this:
x = torch.Tensor([
[ 0.7646, 0.5573, 0.4000, 0.2188, 0.7646, 0.5052, 0.2042, 0.0896, 0.7667, 0.5938, 0.3167, 0.0917],
[ 0.4271, 0.1354, 0.5000, 0.1292, 0.4260, 0.1354, 0.4646, 0.0917, -1.0000, -1.0000, -1.0000, -1.0000],
[ 0.7208, 0.5656, 0.3000, 0.1688, 0.7177, 0.5271, 0.1521, 0.0667, 0.7198, 0.5948, 0.2438, 0.0729],
[ 0.6292, 0.8250, 0.4000, 0.2292, 0.6271, 0.7698, 0.2083, 0.0812, 0.6281, 0.8604, 0.3604, 0.0917]
])
so now it is more clear that you have a shape (4, 12) you can think about it like an excel file, you have 4 rows and 12 columns. Now what you want is to extract from the two first rows the 4 first columns and that's why your solution would be:
x[:2, :4] # 2 means you want to take all the rows until the second row and then you set that you want all the columns until the fourth column, this Code will also give the same result x[0:2, 0:4]

Sklearn KNN + mahalanobis on python

I try to use the function NearestNeighbors on Sklearn. I write an example to understand what's happening on these function.
from sklearn.neighbors import NearestNeighbors
samples = [[0.2, 0], [0.5, 0.1], [0.4,0.4]]
neigh = NearestNeighbors(n_neighbors=2,metric='mahalanobis')
neigh.fit(samples)
print(neigh.kneighbors([[272,7522752]])) # use any point to test
Above code work well and it can correctly compute the 2 - nearest point .
But when I try to use my dataset , and some mistakes are happend. Dataset matrix are 9959 * 384 matrix. I print the matrix below , and I declare the matrix training_data
[[ 0.069915 0.020142 0.070054 ..., 0.333937 0.477351 0.055993]
[ 0.131826 0.038203 0.131573 ..., 0.353589 0.426197 0.048557]
[ 0.130338 0.02595 0.130351 ..., 0.315951 0.32355 0.098884]
...,
[ 0.053331 0.023395 0.0534 ..., 0.366064 0.404756 0.066217]
[ 0.063554 0.021197 0.063671 ..., 0.235945 0.439595 0.105366]
[ 0.123632 0.045492 0.12322 ..., 0.308702 0.437344 0.040144]]
And when I use training_data into above code which just change the samples to training_data, it has a mistake.
LinAlgError: 0-dimensional array given. Array must be at least two- dimensional
Please help me solve these questions, tks a lot !

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Calculating pairwise distances between entries in a `torch.tensor` - pytorch

Related

Concat two tensors with different dimensions

why does LedoitWolf return zeros for off-diagonal elements?

Pytorch: why does torch.where method does not work like numpy.where?

Extract sub tensor in PyTorch

Sklearn KNN + mahalanobis on python

Categories

Resources