I would like to create a loss function that takes the predictions from my network and maps the values onto the normal distribution, before performing some more calculations. Below is a snippet of what I'm doing
#staticmethod
def gaussian_preds(preds):
preds_argsort = torch.argsort(preds)
fractional_ranks = (preds_argsort + 1) / (preds_argsort.shape[0] + 1)
gaussian_preds = torch.special.ndtri(fractional_ranks)
return gaussian_preds
However, the torch.argsort function obviously kills the gradient, so I cannot use this in the loss function. Does anyone have any ideas or workarounds or approximations that would get me there?
Related
I'm working on a regression problem in pytorch. I get good results on my evaluation set, but I want to make sure it's not because I have many small elements and less large ones. Therefore, I would like to check whether I get similar loss for the large elements (eg. elements > 0.01). I use mse loss.
Can anyone pls suggest a way of doing so?
Thanks!
You can zero-out loss for smaller elements (assuming size of elements is based on your regression target), you can implement your own loss function like this:
import torch
class CustomMSE:
def __init__(self, threshold=0.01, reduction=torch.mean):
self.threshold = threshold
self.reduction = reduction
def __call__(self, output, target):
# Do not reduce, so you get per-element loss
loss = torch.nn.functional.mse_loss(output, target, reduction="none")
loss[target < self.threshold] = 0
return self.reduction(loss)
criterion = CustomMSE()
You can use it just like torch.nn.MSELoss, this should give you an overall idea.
I'm writing a custom loss function for a (sort of) semantic segmentation task where I compute binary crossentropy (using keras backend) for the target (2-d array) and predictions. The overall loss is a sum of 4 different loss functions.
In two of those functions I need to make a customized target array and compute binary crossentropy with these targets and the predictions. Here I would like to ignore and not calculate the loss where label is 1 (foreground) in the target array.
In pytorch's nll_loss() there is a parameter 'ignore_index' which is similar to what I'm trying to do.
I'm trying to achieve something like:
def binary_crossentropy(y_true, y_pred, ignore_label=1):
if ignore_label == 1:
return -(1-y_true)*log(1-y_pred)
if ignore_label == 0:
return -y_true*log(y_pred)
return -(y_true*log(y_pred) + (1-y_true)*log(1-y_pred))
But for a keras custom loss function.
I am doing one project by using Keras with tensorflow back-end. For business reason, I need to give high weight to some kind of errors, so I implement one customized loss function. For example:
error = np.abs(y_true - predict)
if error > low_limit:
error = error * 10
I found that this customized loss function really changed the error value displayed during the training. What I am wondering is whether this new loss function really can change the behavior of backpropagation during the training? Because I did not see too much difference from the weights of my model.
Short answer: yes, this loss function does change the behavior of BP, but I quite surprise that this loss function works...
Anyway, I feel the following loss function makes more sense
def my_mae( y_true, y_pred, low_bound=1e-3, coef=10. ) :
raw_mae = keras.losses.mae( y_true, y_pred )
mask = K.cast( raw_mae <= low_bound, dtype='float32' )
#new_mae = mask * raw_mae * coef + (1-mask) * raw_mae
new_mae = (1 + mask * (coef-1)) *raw_mae
return new_mae
which gives extra loss when a sample loss is too low.
For diagnostic purposes, I am grabbing the gradients of the network periodically. One way to do this is to return the gradients as output of the theano function. However, copying the gradients from the GPU to CPU memory every time may be costly so I would prefer to do it only periodically. At the moment, I am achieving this by creating two function objects, one which returns the gradient and one which doesn't.
However, I do not know whether this is optimal and am looking for a more elegant way to achieve the same thing.
Your first function obviously executes a training step and updates all your parameters.
The second function must return the gradients of your parameters.
The fastest way to do what you are asking is to add the updates for the training step to the second function and when logging the gradients, don't call the first function, but only the second.
gradients = [ ... ]
train_f = theano.function([x, y], [], updates=updates)
train_grad_f = theano.function([x, y], gradients, updates=updates)
num_iters = 1000
grad_array = []
for i in range(num_iters):
# every 10 training steps keep log of gradients
if i % 10 == 0:
grad_array.append(train_grad_f(...))
else:
train_f(...)
Update
if you wish to have a single function to do this, you can do the following
from theano.ifelse import ifelse
no_grad = T.iscalar('no_grad')
example_gradient = T.grad(example_cost, example_variable)
# if no_grad is > 0 then return the gradient, otherwise return zeros array
out_grad = ifelse(T.gt(no_grad,0), example_gradient, T.zeros_like(example_variable))
train_f = theano.function([x, y, no_grad], [out_grad], updates=updates)
So when you want to retrieve the gradients you call
train_f(x_data, y_data, 1)
otherwise
train_f(x_data, y_data, 0)
I want to make use of Theano's logistic regression classifier, but I would like to make an apples-to-apples comparison with previous studies I've done to see how deep learning stacks up. I recognize this is probably a fairly simple task if I was more proficient in Theano, but this is what I have so far. From the tutorials on the website, I have the following code:
def errors(self, y):
# check if y has same dimension of y_pred
if y.ndim != self.y_pred.ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', self.y_pred.type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(self.y_pred, y))
I'm pretty sure this is where I need to add the functionality, but I'm not certain how to go about it. What I need is either access to y_pred and y for each and every run (to update my confusion matrix in python) or to have the C++ code handle the confusion matrix and return it at some point along the way. I don't think I can do the former, and I'm unsure how to do the latter. I've done some messing around with an update function along the lines of:
def confuMat(self, y):
x=T.vector('x')
classes = T.scalar('n_classes')
onehot = T.eq(x.dimshuffle(0,'x'),T.arange(classes).dimshuffle('x',0))
oneHot = theano.function([x,classes],onehot)
yMat = T.matrix('y')
yPredMat = T.matrix('y_pred')
confMat = T.dot(yMat.T,yPredMat)
confusionMatrix = theano.function(inputs=[yMat,yPredMat],outputs=confMat)
def confusion_matrix(x,y,n_class):
return confusionMatrix(oneHot(x,n_class),oneHot(y,n_class))
t = np.asarray(confusion_matrix(y,self.y_pred,self.n_out))
print (t)
But I'm not completely clear on how to get this to interface with the function in question and give me a numpy array I can work with.
I'm quite new to Theano, so hopefully this is an easy fix for one of you. I'd like to use this classifer as my output layer in a number of configurations, so I could use the confusion matrix with other architectures.
I suggest using a brute force sort of a way. You need an output for a prediction first. Create a function for it.
prediction = theano.function(
inputs = [index],
outputs = MLPlayers.predicts,
givens={
x: test_set_x[index * batch_size: (index + 1) * batch_size]})
In your test loop, gather the predictions...
labels = labels + test_set_y.eval().tolist()
for mini_batch in xrange(n_test_batches):
wrong = wrong + int(test_model(mini_batch))
predictions = predictions + prediction(mini_batch).tolist()
Now create confusion matrix this way:
correct = 0
confusion = numpy.zeros((outs,outs), dtype = int)
for index in xrange(len(predictions)):
if labels[index] is predictions[index]:
correct = correct + 1
confusion[int(predictions[index]),int(labels[index])] = confusion[int(predictions[index]),int(labels[index])] + 1
You can find this kind of an implementation in this repository.