Fit a Gaussian curve with a neural network using Pytorch - python-3.x

Suppose the following model :
import torch.nn as nn
class PGN(nn.Module):
def __init__(self, input_size):
super(PGN, self).__init__()
self.linear = nn.Sequential(
nn.Linear(in_features=input_size, out_features=128),
nn.ReLU(),
nn.Linear(in_features=128, out_features=1)
)
def forward(self, x):
return self.linear(x)
I figure I have to modify the model to fit a 2-dimensional curve.
Is there a way to fit a Gaussian curve with mu=0 and sigma=0 using Pytorch? If so, can you show me?

A neural network can approximate an arbitrary function of any number of parameters to a space of any dimension.
To fit a 2 dimensional curve your network should be fed with vectors of size 2, that is a vector of x and y coordinates. The output is a single value of size 1.
For training you must generate ground truth data, that is a mapping between coordinates (x and y) and the value (z). The loss function should compare this ground truth value with the estimate of your network.
If it is just a tutorial to learn Pytorch and not a real application, you can define a function that for a given x and y output the gaussian value according to your parameters.
Then during training you randomly choose a x and y and feed this to the networks then do backprop with the true value.

For a function y = a*exp(-((x-b)^2)/2c^2),
Create this mathematical equation, for some values of x, (and a,b,c), get the outputs y. This will be your training set with x values as inputs and y values as output labels. Since this is not a linear equation, you will have to experiment with no of layers/neurons and other stuff, but it will give you a good enough approximation. For different values of a,b,c, generate your data for that and maybe try different things like adding those as inputs with x.

Related

Custom loss for single-label, multi-class problem

I have a single-label, multi-class classification problem, i.e., a given sample is in exactly one class (say, class 3), but for training purposes, predicting class 2 or 5 is still okay to not penalise the model that heavily.
For example, the ground truth for 1 sample is [0,1,1,0,1] of 5 classes, instead of a one-hot vector. This implies that, the model predicting any one (not necessarily all) of the above classes (2,3 or 5) is fine.
For every batch, the predicted output dimension is of the shape bs x n x nc, where bs is the batch size, n is the number of samples per point and nc is the number of classes. The ground truth is also of the same shape as the predicted tensor.
For every batch, I'm expecting my loss function to compare n tensors across nc classes and then average it across n.
Eg: When dimensions are 32 x 8 x 5000. There are 32 batch points in a batch (for bs=32). Each batch point has 8 vector points, and each vector point has 5000 classes. For a given batch point, I wish to compute loss across all (8) vector points, compute their average and do so for the rest of the batch points (32). Final loss would be loss over all losses from each batch point.
How can I approach designing such a loss function? Any help would be deeply appreciated
P.S.: Let me know if the question is ambiguous
One way to go about this was to use a sigmoid function on the network output, which removes the implicit interdependency between class scores that a softmax function has.
As for the loss function, you can then calculate the loss based on the highest prediction for any of your target classes and ignore all other class predictions. For your example:
# your model output
y_out = torch.tensor([[0.1, 0.2, 0.95, 0.1, 0.01]], requires_grad=True)
# class labels
y = torch.tensor([[0,1,1,0,1]])
since we only care about the highest class probability, we set all other class scores to the maximum value achieved for one of the classes:
class_mask = y == 1
max_class_score = torch.max(y_out[class_mask])
y_hat = torch.where(class_mask, max_class_score, y_out)
From which we can use a regular Cross-Entropy loss function
loss_fn = torch.nn.CrossEntropyLoss()
loss = loss_fn(y_hat, y.float())
loss.backward()
when inspecting the gradients, we see that this only updates the prediction that achieved the highest score as well ass all predictions outside of any of the classes.
>>> y_out.grad
tensor([[ 0.3326, 0.0000, -0.6653, 0.3326, 0.0000]])
Predictions for other target classes do not receive a gradient update. Note that if you have a very high ratio of possible classes, this might slow down your convergence.

How to code Pytorch to fit a different polynomal to every column/row in an image?

Fitting a single polynomial to a bunch of data is pretty easy in Pytorch using an nn.Linear layer. I've included a trivial example at the end of this post. But suppose I have tons of data split into groups, and I want to fit a different polynomial to each group. As an example, find the particular quadratic coefficients that fit each column in this image:
In other words, I want to simultaneously find the coefficients for N polynomials of order n, given m data per set to be fit:
In the image above, there are m=80 points per dataset, and N=100 sets to fit.
This perfectly lends itself to tensor manipulation and Pytorch on a gpu should make this blindingly fast by fitting all N at once. Problem is, I'm having a terrible brain fart, and haven't been able to wrap my head around the right layer configuration. Basically I need N nn.Linear layers, each operating on its own dataset. If this were convolution, I'd use a depthwise layer...
Example network to fit one polynomial where X are the m x p abscissa data, y are the m ordinate data, and we want to find the p coefficients.
class polyfit(torch.nn.Module):
def __init__(self,n=2):
super(polyfit, self).__init__()
self.poly = torch.nn.Linear(n,1,bias=False,)
def forward(self, x):
print(x.shape,self.poly)
return self.poly(x)
model = polyfit(n)
loss = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
for epoch in range(100): # or however I want to run the loops
output = model(X)
mse = loss(output, y)
optimizer.zero_grad()
mse.backward()
optimizer.step()
Figured it out after thinking about my Depthwise Convolution comment. A Conv1D with just 3 parameters times a tensor with values [1,x,x**2] is a quadratic, same as with a Linear layer with n=3. So the layer needs to be:
self.poly = torch.nn.Conv1d(N,N,n+1,bias=False,groups=N)
Just have to make sure the X,y tensors are the right dimensions of [m, N, n] and [m, N, 1] respectively.

What does PyTorch classifier output?

So i am new to deep learning and started learning PyTorch. I created a classifier model with following structure.
class model(nn.Module):
def __init__(self):
super(model, self).__init__()
resnet = models.resnet34(pretrained=True)
layers = list(resnet.children())[:8]
self.features1 = nn.Sequential(*layers[:6])
self.features2 = nn.Sequential(*layers[6:])
self.classifier = nn.Sequential(nn.BatchNorm1d(512), nn.Linear(512, 3))
def forward(self, x):
x = self.features1(x)
x = self.features2(x)
x = F.relu(x)
x = nn.AdaptiveAvgPool2d((1,1))(x)
x = x.view(x.shape[0], -1)
return self.classifier(x)
So basically I wanted to classify among three things {0,1,2}. While evaluating, I passed the image it returned a Tensor with three values like below
(tensor([[-0.1526, 1.3511, -1.0384]], device='cuda:0', grad_fn=<AddmmBackward>)
So my question is what are these three numbers? Are they probability ?
P.S. Please pardon me If I asked something too silly.
The final layer nn.Linear (fully connected layer) of self.classifier of your model produces values, that we can call a scores, for example, it may be: [10.3, -3.5, -12.0], the same you can see in your example as well: [-0.1526, 1.3511, -1.0384] which are not normalized and cannot be interpreted as probabilities.
As you can see it's just a kind of "raw unscaled" network output, in other words these values are not normalized, and it's hard to use them or interpret the results, that's why the common practice is converting them to normalized probability distribution by using softmax after the final layer, as #skinny_func has already described. After that you will get the probabilities in the range of 0 and 1, which is more intuitive representation.
So after training what you would want to do is to apply softmax to the output tensor to extract the probability of each class, then you choose the maximal value (highest probability).
in your case:
prob = torch.nn.functional.softmax(model(x), dim=1)
_, pred_class = torch.max(prob, dim=1)

Keras custom loss function that depends on the input features

I have a multilabel classification problem with K labels and also I have a function, let's call it f that for each example in the dataset takes in two matrices, let's call them H and P. Both matrices are part of the input data.
For each vector of labels y (for one example), i.e. y is a vector with dimension (K \times 1), I compute a scalar value f_out = f(H, P, y).
I want to define a loss function that minimizes the mean absolute percentage error between the two vectors formed by the values f_out_true = f(H, P, y_true) and f_out_pred = f(H, P, y_pred) for all examples.
Seeing the documentation of Keras, I know that customized loss function comes in the form custmLoss(y_pred, y_true), however, the loss function I want to define depends on the input data and these values f_out_true and f_out_pred need to be computed example by example to form the two vectors that I want to minimize the mean absolute percentage error.
As far as I am aware, there is no way to make a loss function that takes anything other than the model output and the corresponding ground truth. So, the only way to do what you want is to make the input part of your model's output. To do this, simply build your model with the functional API, and then add the input tensor to the list of outputs:
input = Input(input_shape)
# build the rest of your model with the standard functional API here
# this example model was taken from the Keras docs
x = Dense(100, activation='relu')(input)
x = Dense(100, activation='relu')(x)
x = Dense(100, activation='relu')(x)
output = Dense(10, activation='softmax')(x)
model = Model(inputs=[input], outputs=[output, input])
Then, make y_true a combination of your input data and the original ground truth.
I don't have a whole lot of experience with the functional API so it's hard to be more specific, but hopefully this points you in the right direction.

how to use model.fit in spatial pyramid pooling net with keras

SPP net is used for variable size input images. SPP Net implementation in keras in here uses two model.fit for two sizes of images. I have 278 images of all different sizes, so how to use model.fit in this case? and how keras calculating efficiency and other performance parameter after two model.fit uses? I am quoting some lines from spp paper where author write that
For a single network to accept variable input sizes, we approximate it by multiple networks that share all parameters, while each of these networks is trained using a fixed input size. In each epoch we train the network with a given input size, and switch to another input size for the next epoch. Experiments show that this multi-size training converges just as the traditional single-size training,and leads to better testing accuracy.
Should we have to use as many epoch as we have no. of variable size images?
You can try generator (see stanford.edu).
If all images are different sizes, then the batch size should be 1. Pay attention to the X and y shapes.
class WordImageGeneratorSPP(object):
def __init__(self, shuffle=True):
self.shuffle = shuffle
def generate(self, items):
while 1:
if self.shuffle:
random.shuffle(items)
for item in items:
X = item[0]
y = item[1]
yield X, y
train_gen = WordImageGeneratorSPP().generate(train_data)
model.fit_generator(generator=train_gen, ...)
You can plot the generator output, but you should also pay attention to the X and y shapes.
import matplotlib.pyplot as plt
train_gen = WordImageGeneratorSPP().generate(train_data)
for X, y in train_gen:
plt.imshow(X)
plt.title(y)
plt.show()

Resources