I am trying to test a new kernel method in Kernel Ridge Regression and want to do this by implementing the Fastfood transformation (https://arxiv.org/abs/1408.3060). I can write a function which computes this transform but it isn't playing nicely with the kernel ridge regression function in sklearn. As a result I have gone to the source code for sklearn kernel ridge regression (https://insight.io/github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_ridge.py) and approximate_kernel.py (https://insight.io/github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_approximation.py) in order to try and define this new kernel as a class definition in approximate_kernel.py. The problem is that I have no idea how to convert my construction to something which will work in the approximate_kernel KernelRidge programs. Would anybody be able to advise how best to do this please?
My construction for the fastfood transform is:
def fastfood_product(d):
'''
Constructs the fastfood matrix composition V = const*S*H*G*Pi*B where
S is a scaling matrix
H is Hadamard transform
G is a diagonal random Gaussian
Pi is a permutation matrix
B is a diagonal Rademacher matrix.
Inputs: n - dimensionality of the feature vectors for the kernel.
must be a power of two and be divisible by d. If not then can
pad the matrix with zeros but for simplicity assume this condition
is always met.
Output: V'''
S = np.zeros(shape=(d,d))
G = np.zeros_like(S)
B = np.zeros_like(S)
H = hadamard(d)
Pi = np.eye(d)
np.random.shuffle(Pi) # Permutation matrix
# Construct the simple matrices
np.fill_diagonal(B, 2*np.random.randint(low=0,high=2,size=(d,1)).flatten() - 1)
np.fill_diagonal(G, np.random.randn(G.shape[0],1)) # May want to change standard normal to arbitrary which will affect the scaling for V
np.fill_diagonal(S, np.linalg.norm(G,'fro')**(-0.5))
#print('Shapes of B {}, S {}, G {}, H{}, Pi {}'.format(B.shape, S.shape, G.shape, H.shape, Pi.shape))
V = d**(-0.5)*S.dot(H).dot(G).dot(Pi).dot(H).dot(B)
return V
def fastfood_feature_map(X, n):
'''Given a matrix X of data compute the fastfood transformation and feature mapping.
Input: X data of dimension d by m, n = the number of nonlinear basis functions to choose (power of 2)
Outputs: Phi - matrix of random features for fastfood kernel approximation.
Usage: Phi must be transposed for computation in the kernel ridge regression.
i.e solve ||Phi.T * w - b || + regulariser
Comments: This only uses a standard normal distribution but this could
be altered with different hyperparameters.'''
d,m = A.shape
V = fastfood_product(d)
Phi = n**(-0.5)*np.exp(1j*np.dot(V, X))
return Phi
I think the imports numpy as np and from linalg import hadamard will be necessary for the above.
Related
I would like to project a tensor into a space with an additional dimension.
I tried
torch.nn.Linear(
in_features=num_inputs,
out_features=(num_inputs, num_additional),
)
But this results in an error
A workaround would be to
torch.nn.Linear(
in_features=num_inputs,
out_features=num_inputs*num_additional,
)
and then change the view the output
output.view(batch_size, num_inputs, num_additional)
But I imagine this workaround will get tricky to read, especially when a projection into more than one additional dimension is desired.
Is there a more direct way to code this operation?
Perhaps the source code for linear can be changed
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear
To accept more dimensions for the weight and bias initialization, and F.linear seems like it would need to be replaced with a different function.
IMO the workaround you provided is already clear enough. However, if you want to express this as a single operation, you can always write your own module by subclassing torch.nn.Linear:
import numpy as np
import torch
class MultiDimLinear(torch.nn.Linear):
def __init__(self, in_features, out_shape, **kwargs):
self.out_shape = out_shape
out_features = np.prod(out_shape)
super().__init__(in_features, out_features, **kwargs)
def forward(self, x):
out = super().forward(x)
return out.reshape((len(x), *self.out_shape))
if __name__ == '__main__':
tmp = torch.empty((32, 10))
linear = MultiDimLinear(in_features=10, out_shape=(10, 10))
out = linear(tmp)
print(out.shape) # (32, 10, 10)
Another way would be to use torch.einsum
https://pytorch.org/docs/stable/generated/torch.einsum.html
torch.einsum can prevent summation across dimensions in tensor to tensor multiplication operations. This can allow separate multiplication operations to happen in parallel. [ I do not know if this would necessarily result in GPU efficiency; if the operations are still occurring in the same kernel. In fact, it may be slower https://github.com/pytorch/pytorch/issues/32591 ]
How this would work is to directly initialize the weight and bias tensors (look at source code for the torch linear layer for that code)
Say that the input (X) has dimensions (a, b), where a is the batch size.
Say that you want to pass this input through a series of classifiers, represented in a single weight tensor (W) with dimensions (c, d, e), where c is the number of classifiers, and e is the number of classes for the classifier
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 2)
torch.einsum('ab, cbe -> ace', x, w)
in the last line, a and b are the dimensions of the input as mentioned above. What might be the tricky part is c, b, and e are the dimensions of the classifiers weight tensor; I didn't use d, I used b instead. That is because the vector multiplication is happening along that dimension for the inputs tensor and the weight tensor. So that's why the left side of the einsum equation is ab, cbe. The right side of the einsum equation is simply what dimensions to exclude from summation.
The final dimensions we want is (a, c, e). a is the batch size, c is the number of classifiers, and e is the number of classes for each classifier. We do not want to add those values, so to preserve their separation, the left side of the equation is ace.
For those unfamiliar with einsum, this will be harder to read than the word around I created (though I highly recommend learning it, because it gets very easy and intuitive very fast even though it's a bit tricky at first https://www.youtube.com/watch?v=pkVwUVEHmfI )
However, for paralyzing certain operations (especially on GPU), it seems that einsum is the only way to do it. For example so that in my previous example, I didn't want to use a classification head yet, I just wanted to project to multiple dimensions.
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 4)
y = torch.einsum('ab, cbe -> ace', x, w)
And say I do a few other operations to y, perhaps some non linear operations, activations, etc.
z = f(y)
z will still have the dimensions 2, 5, 4. Batch size two, 5 hidden states per batch, and the dimension of those hidden states are 4.
And then I want to apply a classifier to each separate tensor.
w2 = torch.arange(4*2).view(4, 2)
final = torch.einsum('fgh, hj -> fgj', z, w2)
Quick refresh, 2 is the batch size, 5 is the number of classifier, and 2 is the number of outputs for each classifier.
The output dimensions, f, g, j (2, 5, 2) will not be summed across, and thus will be preserved in the output.
As cited in the github link, this may be slower than just using regular linear layers. There may be efficiencies in a very large number of parallel operations.
In the EM algorithm for Gaussian Mixture Models. We go through the above steps. 9.23 composes the E step and 9.24,9.25,9.26 composes the M step.
However, Implementing 9.25 always gave $\Sigma_K$ that were singular. No matter mean or sigma I chose, the algorithm would create a singular $\Sigma$ after the first iteration.
My implementation of M step:
for k in range(K):
mu[:,k] = np.sum(X*gamma[:,k][:,None],axis = 0)/N[k]
for k in range(K):
A = X - mu[:,k]
SigmaK = A.T#(A *gamma[:,k][:,None])
SigmaK /=self.N[k]
sigma[k] = SigmaK
pi = N*(1/N.sum())
Is there something particularly wrong with this implementation that would cause covariance to always be singular?
Suppose I have a 2D numpy array:
X = np.array[
[..., ...],
[..., ...]]
And I want to standardize the data either with:
X = StandardScaler().fit_transform(X)
or:
X = (X - X.mean())/X.std()
The results are different. Why are they different?
Assuming X is a feature matrix of shape (n x m) (n instances and m features). We want to scale each feature so its instances are distributed with a mean of zero and with unit variance.
To do this you need to calculate the mean and standard deviation of each feature for the provided instances (column of X) and then calculate the scaled feature vectors. Currently you are calculating the mean and standard deviation of the whole dataset and scaling the data using these values: this will give you meaningless results in all but a few special cases (i.e., X = np.ones((100,2)) is such a special case).
Practically, to calculate these statistics for each feature you will need to set the axis parameter of the .mean() or .std() methods to 0. This will perform the calculations along the columns and return a (1 x m) shaped array (actually a (m,) array, but thats another story), where each value is the mean or standard deviation for the given column. You can then use numpy broadcasting to correctly scale the feature vectors.
The below example shows how you can correctly implement it manually. x1 and x2 are 2 features with 100 training instances. We store them in a feature matrix X.
x1 = np.linspace(0, 100, 100)
x2 = 10 * np.random.normal(size=100)
X = np.c_[x1, x2]
# scale the data using the sklearn implementation
X_scaled = StandardScaler().fit_transform(X)
# scale the data taking mean and std along columns
X_scaled_manual = (X - X.mean(axis=0)) / X.std(axis=0)
If you print the two you will see they match exactly, explicitly:
print(np.sum(X_scaled-X_scaled_manual))
returns 0.0.
I've just started to use Python for scientific drawing to plot numerical solutions of differential equations. I know how to use modules to solve and plot single differential equations, but have no idea about systems of differential equation. How can I plot following coupled system?
My system of differential equation is:
dw/dx=y and
dy/dx=-a-3*H*y and
dz/dx=-H*(1+z)
that a = 0.1 and H=sqrt((1+z)**3+w+u**2/(2*a))
And my code is:
import numpy as N
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def model(w,y,z,x,H):
dwdx=y
dydx=-a-3*H*y
dzdx=-H*(1+z)
a=0.1
H=sqrt((1+z)**3+w+u**2/(2*a))
return [w,y,z,H]
z0=1100 #initial condition
w0=-2.26e-8
y0=-.38e-4
H0=36532.63
b=0
c=10000
x=N.arange(b,c,0.01)
y=odeint(model,y0,x) #f=Function name that returns derivative values at requested y and t values as dydt = f(y,t)
w=odeint(model,w0,x)
z=odeint(model,z0,x)
plt.plot(w,x)
plt.plot(y,x)
plt.plot(z,x)
plt.legend(loc='best')
plt.show()
General purpose ODE integrators expect the dynamical system reduced to an abstract first order system. Such a system has a state vector space and the differential equation provides velocity vectors for that space. Here the state has 3 scalar components, which gives a 3D vector as state. If you want to use the components separately, the first step in the ODE function is to extract these components from the state vector, and the last step is to compose the return vector from the derivatives of the components in the correct order.
Also, you need to arrange the computation steps in order of dependence
def model(u,t):
w, y, z = u
a=0.1
H=sqrt((1+z)**3+w+u**2/(2*a))
dwdx=y
dydx=-a-3*H*y
dzdx=-H*(1+z)
return [dwdx, dydx, dzdx]
and then call the integrator once with the combined initial state
u0 = [ w0, y0, z0]
u = odeint(model, u0, x)
w,y,z = u.T
Please also check the arguments of the plot function, the general scheme is plot(x,y).
I have the problem that I need to enforce rank 2 of n 3x3 matrices and I don't want to use loops for it.
So I created an SVD of all matrices and set the minimum element of the diagonal matrix to 0.
import numpy as np
from numpy import linalg as la
[u, s, vT] = la.svd(A)
s[:,2] = 0
So when I just had one matrix I would do the following:
sD = np.diag(s) # no idea how to solve this for n matrices
Ar2 = u.dot(sD).dot(vT) # doing the dot products as two dot products
# using np.einsum for n matrices
Ok, so I have the problem to build my diagonal matrices from an (n,3) array. I tried to use np.diag after some reshapes but I don't think that this function can handle the two dimensions of the s-array. A loop would be a possible solution but it is too slow. So, what is the cleanest and quickest way to build my s-matrices into diagonal form respectively calculating the two dot-products with the given information?
# dimensions of arrays:
# A -> (n,3,3)
# u -> (n,3,3)
# s -> (n,3)
# vT -> (n,3,3)
As you suspected, you can use einsum:
Ar2 = np.einsum('ijk,ik,ikl->ijl', u, s, vT)