I would like to project a tensor into a space with an additional dimension.
I tried
torch.nn.Linear(
in_features=num_inputs,
out_features=(num_inputs, num_additional),
)
But this results in an error
A workaround would be to
torch.nn.Linear(
in_features=num_inputs,
out_features=num_inputs*num_additional,
)
and then change the view the output
output.view(batch_size, num_inputs, num_additional)
But I imagine this workaround will get tricky to read, especially when a projection into more than one additional dimension is desired.
Is there a more direct way to code this operation?
Perhaps the source code for linear can be changed
https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear
To accept more dimensions for the weight and bias initialization, and F.linear seems like it would need to be replaced with a different function.
IMO the workaround you provided is already clear enough. However, if you want to express this as a single operation, you can always write your own module by subclassing torch.nn.Linear:
import numpy as np
import torch
class MultiDimLinear(torch.nn.Linear):
def __init__(self, in_features, out_shape, **kwargs):
self.out_shape = out_shape
out_features = np.prod(out_shape)
super().__init__(in_features, out_features, **kwargs)
def forward(self, x):
out = super().forward(x)
return out.reshape((len(x), *self.out_shape))
if __name__ == '__main__':
tmp = torch.empty((32, 10))
linear = MultiDimLinear(in_features=10, out_shape=(10, 10))
out = linear(tmp)
print(out.shape) # (32, 10, 10)
Another way would be to use torch.einsum
https://pytorch.org/docs/stable/generated/torch.einsum.html
torch.einsum can prevent summation across dimensions in tensor to tensor multiplication operations. This can allow separate multiplication operations to happen in parallel. [ I do not know if this would necessarily result in GPU efficiency; if the operations are still occurring in the same kernel. In fact, it may be slower https://github.com/pytorch/pytorch/issues/32591 ]
How this would work is to directly initialize the weight and bias tensors (look at source code for the torch linear layer for that code)
Say that the input (X) has dimensions (a, b), where a is the batch size.
Say that you want to pass this input through a series of classifiers, represented in a single weight tensor (W) with dimensions (c, d, e), where c is the number of classifiers, and e is the number of classes for the classifier
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 2)
torch.einsum('ab, cbe -> ace', x, w)
in the last line, a and b are the dimensions of the input as mentioned above. What might be the tricky part is c, b, and e are the dimensions of the classifiers weight tensor; I didn't use d, I used b instead. That is because the vector multiplication is happening along that dimension for the inputs tensor and the weight tensor. So that's why the left side of the einsum equation is ab, cbe. The right side of the einsum equation is simply what dimensions to exclude from summation.
The final dimensions we want is (a, c, e). a is the batch size, c is the number of classifiers, and e is the number of classes for each classifier. We do not want to add those values, so to preserve their separation, the left side of the equation is ace.
For those unfamiliar with einsum, this will be harder to read than the word around I created (though I highly recommend learning it, because it gets very easy and intuitive very fast even though it's a bit tricky at first https://www.youtube.com/watch?v=pkVwUVEHmfI )
However, for paralyzing certain operations (especially on GPU), it seems that einsum is the only way to do it. For example so that in my previous example, I didn't want to use a classification head yet, I just wanted to project to multiple dimensions.
import torch
x = torch.arange(2*4).view(2, 4)
w = torch.arange(5*4*6).view(5, 4, 4)
y = torch.einsum('ab, cbe -> ace', x, w)
And say I do a few other operations to y, perhaps some non linear operations, activations, etc.
z = f(y)
z will still have the dimensions 2, 5, 4. Batch size two, 5 hidden states per batch, and the dimension of those hidden states are 4.
And then I want to apply a classifier to each separate tensor.
w2 = torch.arange(4*2).view(4, 2)
final = torch.einsum('fgh, hj -> fgj', z, w2)
Quick refresh, 2 is the batch size, 5 is the number of classifier, and 2 is the number of outputs for each classifier.
The output dimensions, f, g, j (2, 5, 2) will not be summed across, and thus will be preserved in the output.
As cited in the github link, this may be slower than just using regular linear layers. There may be efficiencies in a very large number of parallel operations.
I want to fit a f(x,y) function using lmfit. Dataset is small and there are many fitting parameters (6 points on x-axis, 11 points on y-axis and 16 unconstrained fitting parameters). Using all defaults from Model.fit I cannot obtain covariance matrix and during the fitting process the values of free parameters are not being changed at all.
I tried to change initial values for the parameters. However, when I set the same kind of problem in OriginPro Surface Fitting functionality, the Levenberg-Marquardt algorithm manages to fit data and estimate the errors (although quite large-valued for certain parameters). This means that there has to be some problem with my code. I can't find where the problem lies. I'm not Python master.
The MWE is as below.
import numpy as np
from lmfit import Model, Parameters
import numdifftools # not calling this doesn't change anything
x, y = np.array([226.5, 361.05, 404.41, 589, 632.8, 1013.98]), np.linspace(0,100,11)
X, Y = np.meshgrid(x, y)
Z = np.array([[1.3945, 1.34896, 1.34415, 1.33432, 1.33306, 1.32612],\
[1.39422, 1.3487, 1.34389, 1.33408, 1.33282, 1.32591],\
[1.39336, 1.34795, 1.34315, 1.33336, 1.33211, 1.32524],\
[1.39208, 1.34682, 1.34205, 1.3323, 1.33105, 1.32424],\
[1.39046, 1.3454, 1.34065, 1.33095, 1.32972, 1.32296],\
[1.38854, 1.34373, 1.33901, 1.32937, 1.32814, 1.32145],\
[1.38636, 1.34184, 1.33714, 1.32757, 1.32636, 1.31974],\
[1.38395, 1.33974, 1.33508, 1.32559, 1.32438, 1.31784],\
[1.38132, 1.33746, 1.33284, 1.32342, 1.32223, 1.31576],\
[1.37849, 1.33501, 1.33042, 1.32109, 1.31991, 1.31353],\
[1.37547, 1.33239, 1.32784, 1.31861, 1.31744, 1.31114]])
#This has to be defined beforehand (otherwise parameters names are not defined error)
a1,a2,a3,a4 = 1.3208, -1.2325E-5, -1.8674E-6, 5.0233E-9
b1,b2,b3,b4 = 5208.2413, -0.5179, -2.284E-2, 6.9608E-5
c1,c2,c3,c4 = -2.5551E8, -18341.336, -920, 2.7729
d1,d2,d3,d4 = 9.3495, 2E-3, 3.6733E-5, -1.2932E-7
# Function to fit
def model(x, y, *args):
return a1+a2*y+a3*np.power(y,2)+a4*np.power(y,3)+\
(b1+b2*y+b3*np.power(y,2)+b4*np.power(y,3))/np.power(x,2)+\
(c1+c2*y+c3*np.power(y,2)+c4*np.power(y,3))/np.power(x,4)+\
(d1+d2*y+d3*np.power(y,2)+d4*np.power(y,3))/np.power(x,6)
# This is the callable that is passed to Model.fit. M is a (2,N) array
# where N is the total number of data points in Z, which will be ravelled
# to one dimension.
def _model(M, **args):
x, y = M
arr = model(x, y, params)
return arr
# We need to ravel the meshgrids of X, Y points to a pair of 1-D arrays.
xdata = np.vstack((X.ravel(), Y.ravel()))
# Fitting parameters.
fmodel = Model(_model)
params = Parameters()
params.add_many(('a1',1.3208,True,1,np.inf,None,None),\
('a2',-1.2325E-5,True,-np.inf,np.inf,None,None),\
('a3',-1.8674E-6,True,-np.inf,np.inf,None,None),\
('a4',5.0233E-9,True,-np.inf,np.inf,None,None),\
('b1',5208.2413,True,-np.inf,np.inf,None,None),\
('b2',-0.5179,True,-np.inf,np.inf,None,None),\
('b3',-2.284E-2,True,-np.inf,np.inf,None,None),\
('b4',6.9608E-5,True,-np.inf,np.inf,None,None),\
('c1',-2.5551E8,True,-np.inf,np.inf,None,None),\
('c2',-18341.336,True,-np.inf,np.inf,None,None),\
('c3',-920,True,-np.inf,np.inf,None,None),\
('c4',2.7729,True,-np.inf,np.inf,None,None),\
('d1',9.3495,True,-np.inf,np.inf,None,None),\
('d2',2E-3,True,-np.inf,np.inf,None,None),\
('d3',3.6733E-5,True,-np.inf,np.inf,None,None),\
('d4',-1.2932E-7,True,-np.inf,np.inf,None,None))
result = fmodel.fit(Z.ravel(), params, M=xdata)
fit = model(X, Y, result.params)
print(result.covar)
This code results in covariance being NoneType. I expect that it will after all be calculated, because Origin can somehow manage. If it is needed I can provide all parameters from Origin Surface Fitting Parameters.
When plotting Z-fit difference, there is quite large discrepancy for low x-values (not happening in Origin).
You are not defining your model function in a way that can be used sensibly by lmfit. You have:
def _model(M, **args):
x, y = M
arr = model(x, y, params)
return arr
def model(x, y, *args):
return a1+a2*y+a3*np.power(y,2)+a4*np.power(y,3)+\
(b1+b2*y+b3*np.power(y,2)+b4*np.power(y,3))/np.power(x,2)+\
(c1+c2*y+c3*np.power(y,2)+c4*np.power(y,3))/np.power(x,4)+\
(d1+d2*y+d3*np.power(y,2)+d4*np.power(y,3))/np.power(x,6)
model = Model(_model)
Which has a few problems:
args is not used in _model, and params is not defined in the function so will be module-level.
Similarly in model, args is not used and a1, a2, etc will be taken from the module-level (programming) variables and (importantly!!) these will not be updated in the fit.
In short, your model function never sees varying values for the parameters.
lmfit.Model takes the named function arguments and turns those into parameter names. It does not turn **kws or *position_args into parameter names. So I think that what you want to do is write a model function like this:
def model(x, y, a1, a2, a2, a4, b1, b2, b3 ,b3, c1, c2, c3, c4,
d1, d2, d3, d4):
return a1+a2*y+a3*np.power(y,2)+a4*np.power(y,3)+\
(b1+b2*y+b3*np.power(y,2)+b4*np.power(y,3))/np.power(x,2)+\
(c1+c2*y+c3*np.power(y,2)+c4*np.power(y,3))/np.power(x,4)+\
(d1+d2*y+d3*np.power(y,2)+d4*np.power(y,3))/np.power(x,6)
Then create a model from that with:
# Note: don't give a function and Model instance the same name!!
my_model = Model(model, independent_vars=('x', 'y'))
With that model defined you can run the fit, and without having to unravel your data (the independent data in lmfit can be of almost any data type, and data arrays can be multi-dimensional):
result = my_model.fit(Z, params, x=X, y=Y)
For what it is worth, making such changes works for me in the sense that the fit runs to completion. The fit still gets stuck with some of the parameters not updating from their initial values, but that is sort of a separate question from the mechanics of setting up and running the fit, and is probably due to polynomials being pretty unstable or poor initial estimates.
As an aside: np.power(y,n) can be spelled y**n and readability counts. Also, numerical stability is sometimes improved with replaced
a + b*x + c*x**2 + d*x**3
with
a + x*(b + x*(c + x*d))
Though I do not know if that would help in your case.
I want to make use of Theano's logistic regression classifier, but I would like to make an apples-to-apples comparison with previous studies I've done to see how deep learning stacks up. I recognize this is probably a fairly simple task if I was more proficient in Theano, but this is what I have so far. From the tutorials on the website, I have the following code:
def errors(self, y):
# check if y has same dimension of y_pred
if y.ndim != self.y_pred.ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', self.y_pred.type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(self.y_pred, y))
I'm pretty sure this is where I need to add the functionality, but I'm not certain how to go about it. What I need is either access to y_pred and y for each and every run (to update my confusion matrix in python) or to have the C++ code handle the confusion matrix and return it at some point along the way. I don't think I can do the former, and I'm unsure how to do the latter. I've done some messing around with an update function along the lines of:
def confuMat(self, y):
x=T.vector('x')
classes = T.scalar('n_classes')
onehot = T.eq(x.dimshuffle(0,'x'),T.arange(classes).dimshuffle('x',0))
oneHot = theano.function([x,classes],onehot)
yMat = T.matrix('y')
yPredMat = T.matrix('y_pred')
confMat = T.dot(yMat.T,yPredMat)
confusionMatrix = theano.function(inputs=[yMat,yPredMat],outputs=confMat)
def confusion_matrix(x,y,n_class):
return confusionMatrix(oneHot(x,n_class),oneHot(y,n_class))
t = np.asarray(confusion_matrix(y,self.y_pred,self.n_out))
print (t)
But I'm not completely clear on how to get this to interface with the function in question and give me a numpy array I can work with.
I'm quite new to Theano, so hopefully this is an easy fix for one of you. I'd like to use this classifer as my output layer in a number of configurations, so I could use the confusion matrix with other architectures.
I suggest using a brute force sort of a way. You need an output for a prediction first. Create a function for it.
prediction = theano.function(
inputs = [index],
outputs = MLPlayers.predicts,
givens={
x: test_set_x[index * batch_size: (index + 1) * batch_size]})
In your test loop, gather the predictions...
labels = labels + test_set_y.eval().tolist()
for mini_batch in xrange(n_test_batches):
wrong = wrong + int(test_model(mini_batch))
predictions = predictions + prediction(mini_batch).tolist()
Now create confusion matrix this way:
correct = 0
confusion = numpy.zeros((outs,outs), dtype = int)
for index in xrange(len(predictions)):
if labels[index] is predictions[index]:
correct = correct + 1
confusion[int(predictions[index]),int(labels[index])] = confusion[int(predictions[index]),int(labels[index])] + 1
You can find this kind of an implementation in this repository.