How to code Pytorch to fit a different polynomal to every column/row in an image? - pytorch

Fitting a single polynomial to a bunch of data is pretty easy in Pytorch using an nn.Linear layer. I've included a trivial example at the end of this post. But suppose I have tons of data split into groups, and I want to fit a different polynomial to each group. As an example, find the particular quadratic coefficients that fit each column in this image:
In other words, I want to simultaneously find the coefficients for N polynomials of order n, given m data per set to be fit:
In the image above, there are m=80 points per dataset, and N=100 sets to fit.
This perfectly lends itself to tensor manipulation and Pytorch on a gpu should make this blindingly fast by fitting all N at once. Problem is, I'm having a terrible brain fart, and haven't been able to wrap my head around the right layer configuration. Basically I need N nn.Linear layers, each operating on its own dataset. If this were convolution, I'd use a depthwise layer...
Example network to fit one polynomial where X are the m x p abscissa data, y are the m ordinate data, and we want to find the p coefficients.
class polyfit(torch.nn.Module):
def __init__(self,n=2):
super(polyfit, self).__init__()
self.poly = torch.nn.Linear(n,1,bias=False,)
def forward(self, x):
print(x.shape,self.poly)
return self.poly(x)
model = polyfit(n)
loss = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
for epoch in range(100): # or however I want to run the loops
output = model(X)
mse = loss(output, y)
optimizer.zero_grad()
mse.backward()
optimizer.step()

Figured it out after thinking about my Depthwise Convolution comment. A Conv1D with just 3 parameters times a tensor with values [1,x,x**2] is a quadratic, same as with a Linear layer with n=3. So the layer needs to be:
self.poly = torch.nn.Conv1d(N,N,n+1,bias=False,groups=N)
Just have to make sure the X,y tensors are the right dimensions of [m, N, n] and [m, N, 1] respectively.

Related

Custom loss for single-label, multi-class problem

I have a single-label, multi-class classification problem, i.e., a given sample is in exactly one class (say, class 3), but for training purposes, predicting class 2 or 5 is still okay to not penalise the model that heavily.
For example, the ground truth for 1 sample is [0,1,1,0,1] of 5 classes, instead of a one-hot vector. This implies that, the model predicting any one (not necessarily all) of the above classes (2,3 or 5) is fine.
For every batch, the predicted output dimension is of the shape bs x n x nc, where bs is the batch size, n is the number of samples per point and nc is the number of classes. The ground truth is also of the same shape as the predicted tensor.
For every batch, I'm expecting my loss function to compare n tensors across nc classes and then average it across n.
Eg: When dimensions are 32 x 8 x 5000. There are 32 batch points in a batch (for bs=32). Each batch point has 8 vector points, and each vector point has 5000 classes. For a given batch point, I wish to compute loss across all (8) vector points, compute their average and do so for the rest of the batch points (32). Final loss would be loss over all losses from each batch point.
How can I approach designing such a loss function? Any help would be deeply appreciated
P.S.: Let me know if the question is ambiguous
One way to go about this was to use a sigmoid function on the network output, which removes the implicit interdependency between class scores that a softmax function has.
As for the loss function, you can then calculate the loss based on the highest prediction for any of your target classes and ignore all other class predictions. For your example:
# your model output
y_out = torch.tensor([[0.1, 0.2, 0.95, 0.1, 0.01]], requires_grad=True)
# class labels
y = torch.tensor([[0,1,1,0,1]])
since we only care about the highest class probability, we set all other class scores to the maximum value achieved for one of the classes:
class_mask = y == 1
max_class_score = torch.max(y_out[class_mask])
y_hat = torch.where(class_mask, max_class_score, y_out)
From which we can use a regular Cross-Entropy loss function
loss_fn = torch.nn.CrossEntropyLoss()
loss = loss_fn(y_hat, y.float())
loss.backward()
when inspecting the gradients, we see that this only updates the prediction that achieved the highest score as well ass all predictions outside of any of the classes.
>>> y_out.grad
tensor([[ 0.3326, 0.0000, -0.6653, 0.3326, 0.0000]])
Predictions for other target classes do not receive a gradient update. Note that if you have a very high ratio of possible classes, this might slow down your convergence.

Fit a Gaussian curve with a neural network using Pytorch

Suppose the following model :
import torch.nn as nn
class PGN(nn.Module):
def __init__(self, input_size):
super(PGN, self).__init__()
self.linear = nn.Sequential(
nn.Linear(in_features=input_size, out_features=128),
nn.ReLU(),
nn.Linear(in_features=128, out_features=1)
)
def forward(self, x):
return self.linear(x)
I figure I have to modify the model to fit a 2-dimensional curve.
Is there a way to fit a Gaussian curve with mu=0 and sigma=0 using Pytorch? If so, can you show me?
A neural network can approximate an arbitrary function of any number of parameters to a space of any dimension.
To fit a 2 dimensional curve your network should be fed with vectors of size 2, that is a vector of x and y coordinates. The output is a single value of size 1.
For training you must generate ground truth data, that is a mapping between coordinates (x and y) and the value (z). The loss function should compare this ground truth value with the estimate of your network.
If it is just a tutorial to learn Pytorch and not a real application, you can define a function that for a given x and y output the gaussian value according to your parameters.
Then during training you randomly choose a x and y and feed this to the networks then do backprop with the true value.
For a function y = a*exp(-((x-b)^2)/2c^2),
Create this mathematical equation, for some values of x, (and a,b,c), get the outputs y. This will be your training set with x values as inputs and y values as output labels. Since this is not a linear equation, you will have to experiment with no of layers/neurons and other stuff, but it will give you a good enough approximation. For different values of a,b,c, generate your data for that and maybe try different things like adding those as inputs with x.

Keras custom loss function that depends on the input features

I have a multilabel classification problem with K labels and also I have a function, let's call it f that for each example in the dataset takes in two matrices, let's call them H and P. Both matrices are part of the input data.
For each vector of labels y (for one example), i.e. y is a vector with dimension (K \times 1), I compute a scalar value f_out = f(H, P, y).
I want to define a loss function that minimizes the mean absolute percentage error between the two vectors formed by the values f_out_true = f(H, P, y_true) and f_out_pred = f(H, P, y_pred) for all examples.
Seeing the documentation of Keras, I know that customized loss function comes in the form custmLoss(y_pred, y_true), however, the loss function I want to define depends on the input data and these values f_out_true and f_out_pred need to be computed example by example to form the two vectors that I want to minimize the mean absolute percentage error.
As far as I am aware, there is no way to make a loss function that takes anything other than the model output and the corresponding ground truth. So, the only way to do what you want is to make the input part of your model's output. To do this, simply build your model with the functional API, and then add the input tensor to the list of outputs:
input = Input(input_shape)
# build the rest of your model with the standard functional API here
# this example model was taken from the Keras docs
x = Dense(100, activation='relu')(input)
x = Dense(100, activation='relu')(x)
x = Dense(100, activation='relu')(x)
output = Dense(10, activation='softmax')(x)
model = Model(inputs=[input], outputs=[output, input])
Then, make y_true a combination of your input data and the original ground truth.
I don't have a whole lot of experience with the functional API so it's hard to be more specific, but hopefully this points you in the right direction.

Multi-label classification with class weights in Keras

I have a 1000 classes in the network and they have multi-label outputs. For each training example, the number of positive output is same(i.e 10) but they can be assigned to any of the 1000 classes. So 10 classes have output 1 and rest 990 have output 0.
For the multi-label classification, I am using 'binary-cross entropy' as cost function and 'sigmoid' as the activation function. When I tried this rule of 0.5 as the cut-off for 1 or 0. All of them were 0. I understand this is a class imbalance problem. From this link, I understand that, I might have to create extra output labels.Unfortunately, I haven't been able to figure out how to incorporate that into a simple neural network in keras.
nclasses = 1000
# if we wanted to maximize an imbalance problem!
#class_weight = {k: len(Y_train)/(nclasses*(Y_train==k).sum()) for k in range(nclasses)}
inp = Input(shape=[X_train.shape[1]])
x = Dense(5000, activation='relu')(inp)
x = Dense(4000, activation='relu')(x)
x = Dense(3000, activation='relu')(x)
x = Dense(2000, activation='relu')(x)
x = Dense(nclasses, activation='sigmoid')(x)
model = Model(inputs=[inp], outputs=[x])
adam=keras.optimizers.adam(lr=0.00001)
model.compile('adam', 'binary_crossentropy')
history = model.fit(
X_train, Y_train, batch_size=32, epochs=50,verbose=0,shuffle=False)
Could anyone help me with the code here and I would also highly appreciate if you could suggest a good 'accuracy' metric for this problem?
Thanks a lot :) :)
I have a similar problem and unfortunately have no answer for most of the questions. Especially the class imbalance problem.
In terms of metric there are several possibilities: In my case I use the top 1/2/3/4/5 results and check if one of them is right. Because in your case you always have the same amount of labels=1 you could take your top 10 results and see how many percent of them are right and average this result over your batch size. I didn't find a possibility to include this algorithm as a keras metric. Instead, I wrote a callback, which calculates the metric on epoch end on my validation data set.
Also, if you predict the top n results on a test dataset, see how many times each class is predicted. The Counter Class is really convenient for this purpose.
Edit: If found a method to include class weights without splitting the output.
You need a numpy 2d array containing weights with shape [number classes to predict, 2 (background and signal)].
Such an array could be calculated with this function:
def calculating_class_weights(y_true):
from sklearn.utils.class_weight import compute_class_weight
number_dim = np.shape(y_true)[1]
weights = np.empty([number_dim, 2])
for i in range(number_dim):
weights[i] = compute_class_weight('balanced', [0.,1.], y_true[:, i])
return weights
The solution is now to build your own binary crossentropy loss function in which you multiply your weights yourself:
def get_weighted_loss(weights):
def weighted_loss(y_true, y_pred):
return K.mean((weights[:,0]**(1-y_true))*(weights[:,1]**(y_true))*K.binary_crossentropy(y_true, y_pred), axis=-1)
return weighted_loss
weights[:,0] is an array with all the background weights and weights[:,1] contains all the signal weights.
All that is left is to include this loss into the compile function:
model.compile(optimizer=Adam(), loss=get_weighted_loss(class_weights))

Tensorflow- How to display accuracy rate for a linear regression model

I have a linear regression model that seems to work. I first load the data into X and the target column into Y, after that I implement the following...
X_train, X_test, Y_train, Y_test = train_test_split(
X_data,
Y_data,
test_size=0.2
)
rng = np.random
n_rows = X_train.shape[0]
X = tf.placeholder("float")
Y = tf.placeholder("float")
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
pred = tf.add(tf.multiply(X, W), b)
cost = tf.reduce_sum(tf.pow(pred-Y, 2)/(2*n_rows))
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(cost)
init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
sess.run([init, init_local])
for epoch in range(FLAGS.training_epochs):
avg_cost = 0
for (x, y) in zip(X_train, Y_train):
sess.run(optimizer, feed_dict={X:x, Y:y})
# display logs per epoch step
if (epoch + 1) % FLAGS.display_step == 0:
c = sess.run(
cost,
feed_dict={X:X_train, Y:Y_train}
)
print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(c))
print("Optimization Finished!")
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
print(sess.run(accuracy))
I cannot figure out how to print out the model's accuracy. For example, in sklearn, it is simple, if you have a model you just print model.score(X_test, Y_test). But I do not know how to do this in tensorflow or if it is even possible.
I think I'd be able to calculate the Mean Squared Error. Does this help in any way?
EDIT
I tried implementing tf.metrics.accuracy as suggested in the comments but I'm having an issue implementing it. The documentation says it takes 2 arguments, labels and predictions, so I tried the following...
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
print(sess.run(accuracy))
But this gives me an error...
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value accuracy/count
[[Node: accuracy/count/read = IdentityT=DT_FLOAT, _class=["loc:#accuracy/count"], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
How exactly does one implement this?
Turns out, since this is a multi-class Linear Regression problem, and not a classification problem, that tf.metrics.accuracy is not the right approach.
Instead of displaying the accuracy of my model in terms of percentage, I instead focused on reducing the Mean Square Error (MSE) instead.
From looking at other examples, tf.metrics.accuracy is never used for Linear Regression, and only classification. Normally tf.metric.mean_squared_error is the right approach.
I implemented two ways of calculating the total MSE of my predictions to my testing data...
pred = tf.add(tf.matmul(X, W), b)
...
...
Y_pred = sess.run(pred, feed_dict={X:X_test})
mse = tf.reduce_mean(tf.square(Y_pred - Y_test))
OR
mse = tf.metrics.mean_squared_error(labels=Y_test, predictions=Y_pred)
They both do the same but obviously the second approach is more concise.
There's a good explanation of how to measure the accuracy of a Linear Regression model here.
I didn't think this was clear at all from the Tensorflow documentation, but you have to declare the accuracy operation, and then initialize all global and local variables, before you run the accuracy calculation:
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
# ...
init_global = tf.global_variables_initializer
init_local = tf.local_variables_initializer
sess.run([init_global, init_local])
# ...
# run accuracy calculation
I read something on Stack Overflow about the accuracy calculation using local variables, which is why the local variable initializer is necessary.
After reading the complete code you posted, I noticed a couple other things:
In your calculation of pred, you use
pred = tf.add(tf.multiply(X, W), b). tf.multiply performs element-wise multiplication, and will not give you the fully connected layers you need for a neural network (which I am assuming is what you are ultimately working toward, since you're using TensorFlow). To implement fully connected layers, where each layer i (including input and output layers) has ni nodes, you need separate weight and bias matrices for each pair of successive layers. The dimensions of the i-th weight matrix (the weights between the i-th layer and the i+1-th layer) should be (ni, ni + 1), and the i-th bias matrix should have dimensions (ni + 1, 1). Then, going back to the multiplication operation - replace tf.multiply with tf.matmul, and you're good to go. I assume that what you have is probably fine for a single-class linear regression problem, but this is definitely the way you want to go if you plan to solve a multiclass regression problem or implement a deeper network.
Your weight and bias tensors have a shape of (1, 1). You give the variables the initial value of np.random.randn(), which according to the documentation, generates a single floating point number when no arguments are given. The dimensions of your weight and bias tensors need to be supplied as arguments to np.random.randn(). Better yet, you can actually initialize these to random values in Tensorflow: W = tf.Variable(tf.random_normal([dim0, dim1], seed = seed) (I always initialize random variables with a seed value for reproducibility)
Just a note in case you don't know this already, but non-linear activation functions are required for neural networks to be effective. If all your activations are linear, then no matter how many layers you have, it will reduce to a simple linear regression in the end. Many people use relu activation for hidden layers. For the output layer, use softmax activation for multiclass classification problems where the output classes are exclusive (i.e., where only one class can be correct for any given input), and sigmoid activation for multiclass classification problems where the output classes are not exlclusive.

Resources