Rebuild n SV-decomposed matrices without loops - python-3.x

I have the problem that I need to enforce rank 2 of n 3x3 matrices and I don't want to use loops for it.
So I created an SVD of all matrices and set the minimum element of the diagonal matrix to 0.
import numpy as np
from numpy import linalg as la
[u, s, vT] = la.svd(A)
s[:,2] = 0
So when I just had one matrix I would do the following:
sD = np.diag(s) # no idea how to solve this for n matrices
Ar2 = u.dot(sD).dot(vT) # doing the dot products as two dot products
# using np.einsum for n matrices
Ok, so I have the problem to build my diagonal matrices from an (n,3) array. I tried to use np.diag after some reshapes but I don't think that this function can handle the two dimensions of the s-array. A loop would be a possible solution but it is too slow. So, what is the cleanest and quickest way to build my s-matrices into diagonal form respectively calculating the two dot-products with the given information?
# dimensions of arrays:
# A -> (n,3,3)
# u -> (n,3,3)
# s -> (n,3)
# vT -> (n,3,3)

As you suspected, you can use einsum:
Ar2 = np.einsum('ijk,ik,ikl->ijl', u, s, vT)

Related

How to get a 2D output from linear layer in pytorch?

I would like to project a tensor into a space with an additional dimension.
I tried
torch.nn.Linear(
in_features=num_inputs,
out_features=(num_inputs, num_additional),
)
But this results in an error
A workaround would be to
torch.nn.Linear(
in_features=num_inputs,
out_features=num_inputs*num_additional,
)
and then change the view the output
output.view(batch_size, num_inputs, num_additional)
But I imagine this workaround will get tricky to read, especially when a projection into more than one additional dimension is desired.
Is there a more direct way to code this operation?
Perhaps the source code for linear can be changed
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear
To accept more dimensions for the weight and bias initialization, and F.linear seems like it would need to be replaced with a different function.
IMO the workaround you provided is already clear enough. However, if you want to express this as a single operation, you can always write your own module by subclassing torch.nn.Linear:
import numpy as np
import torch
class MultiDimLinear(torch.nn.Linear):
def __init__(self, in_features, out_shape, **kwargs):
self.out_shape = out_shape
out_features = np.prod(out_shape)
super().__init__(in_features, out_features, **kwargs)
def forward(self, x):
out = super().forward(x)
return out.reshape((len(x), *self.out_shape))
if __name__ == '__main__':
tmp = torch.empty((32, 10))
linear = MultiDimLinear(in_features=10, out_shape=(10, 10))
out = linear(tmp)
print(out.shape) # (32, 10, 10)
Another way would be to use torch.einsum
https://pytorch.org/docs/stable/generated/torch.einsum.html
torch.einsum can prevent summation across dimensions in tensor to tensor multiplication operations. This can allow separate multiplication operations to happen in parallel. [ I do not know if this would necessarily result in GPU efficiency; if the operations are still occurring in the same kernel. In fact, it may be slower https://github.com/pytorch/pytorch/issues/32591 ]
How this would work is to directly initialize the weight and bias tensors (look at source code for the torch linear layer for that code)
Say that the input (X) has dimensions (a, b), where a is the batch size.
Say that you want to pass this input through a series of classifiers, represented in a single weight tensor (W) with dimensions (c, d, e), where c is the number of classifiers, and e is the number of classes for the classifier
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 2)
torch.einsum('ab, cbe -> ace', x, w)
in the last line, a and b are the dimensions of the input as mentioned above. What might be the tricky part is c, b, and e are the dimensions of the classifiers weight tensor; I didn't use d, I used b instead. That is because the vector multiplication is happening along that dimension for the inputs tensor and the weight tensor. So that's why the left side of the einsum equation is ab, cbe. The right side of the einsum equation is simply what dimensions to exclude from summation.
The final dimensions we want is (a, c, e). a is the batch size, c is the number of classifiers, and e is the number of classes for each classifier. We do not want to add those values, so to preserve their separation, the left side of the equation is ace.
For those unfamiliar with einsum, this will be harder to read than the word around I created (though I highly recommend learning it, because it gets very easy and intuitive very fast even though it's a bit tricky at first https://www.youtube.com/watch?v=pkVwUVEHmfI )
However, for paralyzing certain operations (especially on GPU), it seems that einsum is the only way to do it. For example so that in my previous example, I didn't want to use a classification head yet, I just wanted to project to multiple dimensions.
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 4)
y = torch.einsum('ab, cbe -> ace', x, w)
And say I do a few other operations to y, perhaps some non linear operations, activations, etc.
z = f(y)
z will still have the dimensions 2, 5, 4. Batch size two, 5 hidden states per batch, and the dimension of those hidden states are 4.
And then I want to apply a classifier to each separate tensor.
w2 = torch.arange(4*2).view(4, 2)
final = torch.einsum('fgh, hj -> fgj', z, w2)
Quick refresh, 2 is the batch size, 5 is the number of classifier, and 2 is the number of outputs for each classifier.
The output dimensions, f, g, j (2, 5, 2) will not be summed across, and thus will be preserved in the output.
As cited in the github link, this may be slower than just using regular linear layers. There may be efficiencies in a very large number of parallel operations.

Multiply every element of matrix with a vector to obtain a matrix whose elements are vectors themselves

I need help in speeding up the following block of code:
import numpy as np
x = 100
pp = np.zeros((x, x))
M = np.ones((x,x))
arrayA = np.random.uniform(0,5,2000)
arrayB = np.random.uniform(0,5,2000)
for i in range(x):
for j in range(x):
y = np.multiply(arrayA, np.exp(-1j*(M[j,i])*arrayB))
p = np.trapz(y, arrayB) # Numerical evaluation/integration y
pp[j,i] = abs(p**2)
Is there a function in numpy or another method to rewrite this piece of code with so that the nested for-loops can be omitted? My idea would be a function that multiplies every element of M with the vector arrayB so we get a 100 x 100 matrix in which each element is a vector itself. And then further each vector gets multiplied by arrayA with the np.multiply() function to then again obtain a 100 x 100 matrix in which each element is a vector itself. Then at the end perform numerical integration for each of those vectors with np.trapz() to obtain a 100 x 100 matrix of which each element is a scalar.
My problem though is that I lack knowledge of such functions which would perform this.
Thanks in advance for your help!
Edit:
Using broadcasting with
M = np.asarray(M)[..., None]
y = 1000*arrayA*np.exp(-1j*M*arrayB)
return np.trapz(y,B)
works and I can ommit the for-loops. However, this is not faster, but instead a little bit slower in my case. This might be a memory issue.
y = np.multiply(arrayA, np.exp(-1j*(M[j,i])*arrayB))
can be written as
y = arrayA * np.exp(-1j*M[:,:,None]*arrayB
producing a (x,x,2000) array.
But the next step may need adjustment. I'm not familiar with np.trapz.
np.trapz(y, arrayB)

Retrieve elements from a 3D tensor with a 2D index tensor

I am playing around with GPT2 and I have 2 tensors:
O: An output tensor of shaped (B, S-1, V) where B is the batch size S is the the number of timestep and V is the vocabulary size. This is the output of a generative model and is softmaxed along the 2nd dimension.
L: A 2D tensor shaped (B, S-1) where each element is the index of the correct token for each timestep for each sample. This is basically the labels.
I want to extract the predicted probability of the corresponding correct token from tensor O based on tensor L such that I will end up with a 2D tensor shaped (B, S). Is there an efficient way of doing this apart from using loops?
For reference, I based my answer on this Medium article.
Essentially, your answer lies in torch.gather, assuming that both of your tensors are just regular torch.Tensors (or can be converted to one).
import torch
# Specify some arbitrary dimensions for now
B = 3
V = 6
S = 4
# Make example reproducible
torch.manual_seed(42)
# L necessarily has to be a torch.LongTensor, otherwise indexing will fail.
L = torch.randint(0, V, size=[B, S])
O = torch.rand([B, S, V])
# Now collect the results. L needs to have similar dimension,
# except in the axis you want to collect along.
X = torch.gather(O, dim=2, index=L.unsqueeze(dim=2))
# Make sure X has no "unnecessary" dimension
X = X.squeeze(dim=2)
It is a bit difficult to see whether this produces the exact correct results, which is why I included a random seed which makes the example deterministic in the result, and you an easily verify that it gets you the desired results. However, for clarification, one could also use a lower-dimensional tensor, for which this becomes clearer what exactly torch.gather does.
Note that torch.gather also allows you to index multiple indexes in the same row theoretically. Meaning if you instead got a multiclass example for which multiple values are correct, you could similarly use a tensor L of shape [B, S, number_of_correct_samples].

How to filter unwanted values in arrays for plotting? ValueError in matplotlib using numpy arrays

I am working on a new routine inside some codes based on OOP, and encountered a problem while modifying the array of the data (short example of the code is below).
Basically, this routine is about taking the array R, transposing it and then sorting it, and then filter out the data below the pre-determined value of thres. Then, I re-transpose back this array into its original dimension, and then plot each of its rows with the first element of T.
import numpy as np
import matplotlib.pyplot as plt
R = np.random.rand(3,8)
R = R.transpose() # transpose the random matrix
R = R[R[:,0].argsort()] # sort this matrix
print(R)
T = ([i for i in np.arange(1,9,1.0)],"temps (min)")
thres = float(input("Define the threshold of coherence: "))
if thres >= 0.0 and thres <= 1.0 :
R = R[R[:, 0] >= thres] # how to filter unwanted values? changing to NaN / zeros ?
else :
print("The coherence value is absurd or you're not giving a number!")
print("The final results are ")
print(R)
print(R.transpose())
R.transpose() # re-transpose this matrix
ax = plt.subplot2grid( (4,1),(0,0) )
ax.plot(T[0],R[0])
ax.set_ylabel('Coherence')
ax = plt.subplot2grid( (4,1),(1,0) )
ax.plot(T[0],R[1],'.')
ax.set_ylabel('Back-azimuth')
ax = plt.subplot2grid( (4,1),(2,0) )
ax.plot(T[0],R[2],'.')
ax.set_ylabel('Velocity\nkm/s')
ax.set_xlabel('Time (min)')
However, I encounter an error
ValueError: x and y must have same first dimension, but have shapes (8,) and (3,)
I comment the part of where I think the problem might reside (how to filter unwanted values?), but then the question remains.
How can I plot this two arrays (R and T) while still being able to filter out unwanted values below thres? Can I transform these unwanted values to zero or NaN and then successfully plot them? If yes, how can I do that?
Your help would be much appreciated.
With the help of a techie friend, the problem is simply resolved by keeping this part
R = R[R[:, 0] >= thres]
because removing unwanted elements is more preferable than changing them to NaN or zero. And then the problem with plotting is fixed by adding a slight modification in this part
ax.plot(T[0][:len(R[0])],R[0])
and also for the subsequent plotting part. This slices T into the same dimension as R.

Defining new kernels to use in approximate_kernel.py

I am trying to test a new kernel method in Kernel Ridge Regression and want to do this by implementing the Fastfood transformation (https://arxiv.org/abs/1408.3060). I can write a function which computes this transform but it isn't playing nicely with the kernel ridge regression function in sklearn. As a result I have gone to the source code for sklearn kernel ridge regression (https://insight.io/github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_ridge.py) and approximate_kernel.py (https://insight.io/github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_approximation.py) in order to try and define this new kernel as a class definition in approximate_kernel.py. The problem is that I have no idea how to convert my construction to something which will work in the approximate_kernel KernelRidge programs. Would anybody be able to advise how best to do this please?
My construction for the fastfood transform is:
def fastfood_product(d):
'''
Constructs the fastfood matrix composition V = const*S*H*G*Pi*B where
S is a scaling matrix
H is Hadamard transform
G is a diagonal random Gaussian
Pi is a permutation matrix
B is a diagonal Rademacher matrix.
Inputs: n - dimensionality of the feature vectors for the kernel.
must be a power of two and be divisible by d. If not then can
pad the matrix with zeros but for simplicity assume this condition
is always met.
Output: V'''
S = np.zeros(shape=(d,d))
G = np.zeros_like(S)
B = np.zeros_like(S)
H = hadamard(d)
Pi = np.eye(d)
np.random.shuffle(Pi) # Permutation matrix
# Construct the simple matrices
np.fill_diagonal(B, 2*np.random.randint(low=0,high=2,size=(d,1)).flatten() - 1)
np.fill_diagonal(G, np.random.randn(G.shape[0],1)) # May want to change standard normal to arbitrary which will affect the scaling for V
np.fill_diagonal(S, np.linalg.norm(G,'fro')**(-0.5))
#print('Shapes of B {}, S {}, G {}, H{}, Pi {}'.format(B.shape, S.shape, G.shape, H.shape, Pi.shape))
V = d**(-0.5)*S.dot(H).dot(G).dot(Pi).dot(H).dot(B)
return V
def fastfood_feature_map(X, n):
'''Given a matrix X of data compute the fastfood transformation and feature mapping.
Input: X data of dimension d by m, n = the number of nonlinear basis functions to choose (power of 2)
Outputs: Phi - matrix of random features for fastfood kernel approximation.
Usage: Phi must be transposed for computation in the kernel ridge regression.
i.e solve ||Phi.T * w - b || + regulariser
Comments: This only uses a standard normal distribution but this could
be altered with different hyperparameters.'''
d,m = A.shape
V = fastfood_product(d)
Phi = n**(-0.5)*np.exp(1j*np.dot(V, X))
return Phi
I think the imports numpy as np and from linalg import hadamard will be necessary for the above.

Resources