all pairwise dot product pytorch - pytorch

Is there a built in function to calculate efficiently all pairwaise dot products of two tensors in Pytorch?
e.g.
input - tensor A (shape NxD)
tensor B (shape NxD)
output - tensor C (shape NxN) such that C_i,j = torch.dot(A_i, B_j) ?

Isn't it simply
C = torch.mm(A, B.T) # same as C = A # B.T
BTW,
A very flexible tool for matrix/vector/tensor dot products is torch.einsum:
C = torch.einsum('id,jd->ij', A, B)

Related

Retrieve elements from a 3D tensor with a 2D index tensor

I am playing around with GPT2 and I have 2 tensors:
O: An output tensor of shaped (B, S-1, V) where B is the batch size S is the the number of timestep and V is the vocabulary size. This is the output of a generative model and is softmaxed along the 2nd dimension.
L: A 2D tensor shaped (B, S-1) where each element is the index of the correct token for each timestep for each sample. This is basically the labels.
I want to extract the predicted probability of the corresponding correct token from tensor O based on tensor L such that I will end up with a 2D tensor shaped (B, S). Is there an efficient way of doing this apart from using loops?
For reference, I based my answer on this Medium article.
Essentially, your answer lies in torch.gather, assuming that both of your tensors are just regular torch.Tensors (or can be converted to one).
import torch
# Specify some arbitrary dimensions for now
B = 3
V = 6
S = 4
# Make example reproducible
torch.manual_seed(42)
# L necessarily has to be a torch.LongTensor, otherwise indexing will fail.
L = torch.randint(0, V, size=[B, S])
O = torch.rand([B, S, V])
# Now collect the results. L needs to have similar dimension,
# except in the axis you want to collect along.
X = torch.gather(O, dim=2, index=L.unsqueeze(dim=2))
# Make sure X has no "unnecessary" dimension
X = X.squeeze(dim=2)
It is a bit difficult to see whether this produces the exact correct results, which is why I included a random seed which makes the example deterministic in the result, and you an easily verify that it gets you the desired results. However, for clarification, one could also use a lower-dimensional tensor, for which this becomes clearer what exactly torch.gather does.
Note that torch.gather also allows you to index multiple indexes in the same row theoretically. Meaning if you instead got a multiclass example for which multiple values are correct, you could similarly use a tensor L of shape [B, S, number_of_correct_samples].

How to train a parameter outside the model?

I am implementing the following architecture in Tensorflow 2.0 Dual Encoder LSTM
C and R are sentences encoded into a fixed dimension by the two LSTM's. Then they are passed through a function sigmoid(CMR). We can assume that R and C are both 256 dimensional matrices and M is a 256 * 256 matrix. The matrix M is learned during training.
Since I want to train M, I declared M = tf.Variable(shape,trainable = True).
But after fitting the model, the values of M are still not changing. How to tell tensorflow to compute the gradients for M automatically ? Below is my code.
Code

sklearn customized standarization of data

Suppose I have a 2D numpy array:
X = np.array[
[..., ...],
[..., ...]]
And I want to standardize the data either with:
X = StandardScaler().fit_transform(X)
or:
X = (X - X.mean())/X.std()
The results are different. Why are they different?
Assuming X is a feature matrix of shape (n x m) (n instances and m features). We want to scale each feature so its instances are distributed with a mean of zero and with unit variance.
To do this you need to calculate the mean and standard deviation of each feature for the provided instances (column of X) and then calculate the scaled feature vectors. Currently you are calculating the mean and standard deviation of the whole dataset and scaling the data using these values: this will give you meaningless results in all but a few special cases (i.e., X = np.ones((100,2)) is such a special case).
Practically, to calculate these statistics for each feature you will need to set the axis parameter of the .mean() or .std() methods to 0. This will perform the calculations along the columns and return a (1 x m) shaped array (actually a (m,) array, but thats another story), where each value is the mean or standard deviation for the given column. You can then use numpy broadcasting to correctly scale the feature vectors.
The below example shows how you can correctly implement it manually. x1 and x2 are 2 features with 100 training instances. We store them in a feature matrix X.
x1 = np.linspace(0, 100, 100)
x2 = 10 * np.random.normal(size=100)
X = np.c_[x1, x2]
# scale the data using the sklearn implementation
X_scaled = StandardScaler().fit_transform(X)
# scale the data taking mean and std along columns
X_scaled_manual = (X - X.mean(axis=0)) / X.std(axis=0)
If you print the two you will see they match exactly, explicitly:
print(np.sum(X_scaled-X_scaled_manual))
returns 0.0.

Defining new kernels to use in approximate_kernel.py

I am trying to test a new kernel method in Kernel Ridge Regression and want to do this by implementing the Fastfood transformation (https://arxiv.org/abs/1408.3060). I can write a function which computes this transform but it isn't playing nicely with the kernel ridge regression function in sklearn. As a result I have gone to the source code for sklearn kernel ridge regression (https://insight.io/github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_ridge.py) and approximate_kernel.py (https://insight.io/github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_approximation.py) in order to try and define this new kernel as a class definition in approximate_kernel.py. The problem is that I have no idea how to convert my construction to something which will work in the approximate_kernel KernelRidge programs. Would anybody be able to advise how best to do this please?
My construction for the fastfood transform is:
def fastfood_product(d):
'''
Constructs the fastfood matrix composition V = const*S*H*G*Pi*B where
S is a scaling matrix
H is Hadamard transform
G is a diagonal random Gaussian
Pi is a permutation matrix
B is a diagonal Rademacher matrix.
Inputs: n - dimensionality of the feature vectors for the kernel.
must be a power of two and be divisible by d. If not then can
pad the matrix with zeros but for simplicity assume this condition
is always met.
Output: V'''
S = np.zeros(shape=(d,d))
G = np.zeros_like(S)
B = np.zeros_like(S)
H = hadamard(d)
Pi = np.eye(d)
np.random.shuffle(Pi) # Permutation matrix
# Construct the simple matrices
np.fill_diagonal(B, 2*np.random.randint(low=0,high=2,size=(d,1)).flatten() - 1)
np.fill_diagonal(G, np.random.randn(G.shape[0],1)) # May want to change standard normal to arbitrary which will affect the scaling for V
np.fill_diagonal(S, np.linalg.norm(G,'fro')**(-0.5))
#print('Shapes of B {}, S {}, G {}, H{}, Pi {}'.format(B.shape, S.shape, G.shape, H.shape, Pi.shape))
V = d**(-0.5)*S.dot(H).dot(G).dot(Pi).dot(H).dot(B)
return V
def fastfood_feature_map(X, n):
'''Given a matrix X of data compute the fastfood transformation and feature mapping.
Input: X data of dimension d by m, n = the number of nonlinear basis functions to choose (power of 2)
Outputs: Phi - matrix of random features for fastfood kernel approximation.
Usage: Phi must be transposed for computation in the kernel ridge regression.
i.e solve ||Phi.T * w - b || + regulariser
Comments: This only uses a standard normal distribution but this could
be altered with different hyperparameters.'''
d,m = A.shape
V = fastfood_product(d)
Phi = n**(-0.5)*np.exp(1j*np.dot(V, X))
return Phi
I think the imports numpy as np and from linalg import hadamard will be necessary for the above.

Creating square matrix with eigenvalues on the diagonal in theano

I would like to create a square matrix with the eigenvalues on the diagonal:
eigen_values, eigen_vectors = theano.tensor.nlinalg.eig(covariance_matrix)
D = T.nlinalg.AllocDiag(eigen_values)
However apparently theano does not treat the D matrix I created as a standard matrix, thus I cannot use it in the succeding computations.
theano.tensor.var.AsTensorError: ('Cannot convert <theano.tensor.nlinalg.AllocDiag object at 0x7face5708450> to TensorType', <class 'theano.tensor.nlinalg.AllocDiag'>)
You're using an operation class as if it were an operation function.
Instead of
D = T.nlinalg.AllocDiag(eigen_values)
try
D = T.nlinalg.AllocDiag()(eigen_values)
or
D = T.nlinalg.alloc_diag(eigen_values)

Resources