slicing keras Variable custom objective function - theano

I've been trying to implement a custom objective function in Keras (the negative log likelihood of the normal distribution)
Keras expects one argument for the ground truth tensor, and one for the predictions tensor; for y_pred,I'm passing a tensor that should represent a nx2 matrix where the first column is the mean of the distribution, and the second the precision.
My problem is that I haven't been able to get a clear idea how I properly slice y_pred before passing it into the likelihood function without yielding the error
'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?'
While I understand that I'm feeding l_func arguments of the variable type when it expects an array,I don't seem to be able to grok how to properly split the input y_pred variable into its mean and precision components to plug into the likelihood function. Here are some attempts; if someone could enlighten me about how to proceed, I would greatly appreciate it.
def log_likelihood(y_true,y_pred):
mu = T.vector('mu')
beta = T.vector('beta')
x=T.vector('x')
likelihood = .5*(beta*(x-mu)**2)-T.log(beta/(2*np.pi))
l_func = function([mu,beta,x], likelihood)
return(l_func(y_pred[:,0],y_pred[:,1],y_true))
def log_likelihood(y_true,y_pred):
likelihood = .5*(y_pred[:,1]*(y_true-y_pred[:,0])**2)-T.log(y_pred[:,1]/(2*np.pi))
l_func = function([y_true,y_pred], likelihood)
return(l_func(y_true,y_pred))
def log_likelihood(y_true,y_pred):
mu=y_pred[:,0]
beta=y_pred[:,1]
x=y_true
mu_function=function([y_pred],mu)
beta_function=function([y_pred],beta)
id_function=function([y_true],x)
likelihood = .5*(beta_function(y_pred)*(id_function(y_true)-mu_function(y_pred))**2)-T.log(beta_function(y_pred)/(2*np.pi))
l_func = function([y_true,y_pred], likelihood)
return(l_func(y_true,y_pred))

Related

Custom loss function in pytorch 1.10.1

I am struggeling with defining a custom loss function for pytorch 1.10.1. My model outputs a float ranging from -1 to +1. The target values are floats of arbitrary range. The loss should be a sum of pruducts if the sign between the model output and target is different.
I have searched the internet for quite some hours, but it seems there have been some changes to pytorch throughout the last versions, so I don't really know which example would best fit to my use case and pytorch 1.10.1.
Here is my approach so far:
class Loss(torch.nn.Module):
#staticmethod
def forward(self, output, target) -> Tensor:
loss = 0.0
for i in range(len(target)):
o = output[i,0]
t = target[i]
l = o * t
if l<0: #if different sign
loss -= l
return loss
Question:
Should I subclass torch.nn.Module or torch.autograd.Function?
Do I need to define #staticmethod?
On some examples, I saw ctx instead of self being used and invocations of ctx.save_for_backward etc. Do I need this? What is its purpose?
When subclassing torch.nn.Module, my code complains: 'Tensor' object has no attribute 'children'. What am I missing?
When subclassing torch.autograd.Function, my code complains about not having a backward function defined. How should my backward function look like?
Custom loss functions can be as simple as a python function. You can simplify this a bit:
def custom_loss(output, target):
prod = output[:,0]*target
return -prod[prod<0].sum()

Multiclass semantic segmentation model evaluation

I am doing a project on multiclass semantic segmentation. I have formulated a model that outputs pretty descent segmented images by decreasing the loss value. However, I cannot evaluate the model performance in metrics, such as meanIoU or Dice coefficient.
In case of binary semantic segmentation it was easy just to set the threshold of 0.5, to classify the outputs as an object or background, but it does not work in the case of multiclass semantic segmentation. Could you please tell me how to obtain model performance on the aforementioned metrics? Any help will be highly appreciated!
By the way, I am using PyTorch framework and CamVid dataset.
If anyone is interested in this answer, please also look at this issue. The author of the issue points out that mIoU can be computed in a different way (and that method is more accepted in literature). So, consider that before using the implementation for any formal publication.
Basically, the other method suggested by the issue-poster is to separately accumulate the intersections and unions over the entire dataset and divide them at the final step. The method in the below original answer computes intersection and union for a batch of images, then divides them to get IoU for the current batch, and then takes a mean of the IoUs over the entire dataset.
However, this below given original method is problematic because the final mean IoU would vary with the batch-size. On the other hand, the mIoU would not vary with the batch size for the method mentioned in the issue as the separate accumulation would ensure that batch size is irrelevant (though higher batch size can definitely help speed up the evaluation).
Original answer:
Given below is an implementation of mean IoU (Intersection over Union) in PyTorch.
def mIOU(label, pred, num_classes=19):
pred = F.softmax(pred, dim=1)
pred = torch.argmax(pred, dim=1).squeeze(1)
iou_list = list()
present_iou_list = list()
pred = pred.view(-1)
label = label.view(-1)
# Note: Following for loop goes from 0 to (num_classes-1)
# and ignore_index is num_classes, thus ignore_index is
# not considered in computation of IoU.
for sem_class in range(num_classes):
pred_inds = (pred == sem_class)
target_inds = (label == sem_class)
if target_inds.long().sum().item() == 0:
iou_now = float('nan')
else:
intersection_now = (pred_inds[target_inds]).long().sum().item()
union_now = pred_inds.long().sum().item() + target_inds.long().sum().item() - intersection_now
iou_now = float(intersection_now) / float(union_now)
present_iou_list.append(iou_now)
iou_list.append(iou_now)
return np.mean(present_iou_list)
Prediction of your model will be in one-hot form, so first take softmax (if your model doesn't already) followed by argmax to get the index with the highest probability at each pixel. Then, we calculate IoU for each class (and take the mean over it at the end).
We can reshape both the prediction and the label as 1-D vectors (I read that it makes the computation faster). For each class, we first identify the indices of that class using pred_inds = (pred == sem_class) and target_inds = (label == sem_class). The resulting pred_inds and target_inds will have 1 at pixels labelled as that particular class while 0 for any other class.
Then, there is a possibility that the target does not contain that particular class at all. This will make that class's IoU calculation invalid as it is not present in the target. So, you assign such classes a NaN IoU (so you can identify them later) and not involve them in the calculation of the mean.
If the particular class is present in the target, then pred_inds[target_inds] will give a vector of 1s and 0s where indices with 1 are those where prediction and target are equal and zero otherwise. Taking the sum of all elements of this will give us the intersection.
If we add all the elements of pred_inds and target_inds, we'll get the union + intersection of pixels of that particular class. So, we subtract the already calculated intersection to get the union. Then, we can divide the intersection and union to get the IoU of that particular class and add it to a list of valid IoUs.
At the end, you take the mean of the entire list to get the mIoU. If you want the Dice Coefficient, you can calculate it in a similar fashion.

Why torch.dot(a,b) makes requires_grad=False

I have some losses in a loop storing them in a tensor loss. Now I want to multiply a weight tensor to the loss tensor to have final loss, but after torch.dot(), the result scalar, ll_new, has requires_grad=False. The following is my code.
loss_vector = torch.FloatTensor(total_loss_q)
w_norm = F.softmax(loss_vector, dim=0)
ll_new = torch.dot(loss_vector,w_norm)
How can I have requires_grad=False for the ll_new after doing the above?
I think the issue is in the line: loss_vector = torch.FloatTensor(total_loss_q) as requires_grad for loss_vector is False (default value). So, you should do:
loss_vector = torch.FloatTensor(total_loss_q, requires_grad=True)
The issue most likely lies within this part:
I have some losses in a loop storing them in a tensor loss
You are most likely losing requires_grad somewhere in the process before torch.dot. E.g. if you use something like .item() on individual losses when constructing total_loss_q tensor.
What type is your total_loss_q? If it is a list of integers then there is no way your gradients will propagate through that. You need to construct total_loss_q in such a way that it is a tensor which knows how each individual loss was constructed (i.e. can propagate gradients to your trainable weights).

How to correctly implement a batch-input LSTM network in PyTorch?

This release of PyTorch seems provide the PackedSequence for variable lengths of input for recurrent neural network. However, I found it's a bit hard to use it correctly.
Using pad_packed_sequence to recover an output of a RNN layer which were fed by pack_padded_sequence, we got a T x B x N tensor outputs where T is the max time steps, B is the batch size and N is the hidden size. I found that for short sequences in the batch, the subsequent output will be all zeros.
Here are my questions.
For a single output task where the one would need the last output of all the sequences, simple outputs[-1] will give a wrong result since this tensor contains lots of zeros for short sequences. One will need to construct indices by sequence lengths to fetch the individual last output for all the sequences. Is there more simple way to do that?
For a multiple output task (e.g. seq2seq), usually one will add a linear layer N x O and reshape the batch outputs T x B x O into TB x O and compute the cross entropy loss with the true targets TB (usually integers in language model). In this situation, do these zeros in batch output matters?
Question 1 - Last Timestep
This is the code that i use to get the output of the last timestep. I don't know if there is a simpler solution. If it is, i'd like to know it. I followed this discussion and grabbed the relative code snippet for my last_timestep method. This is my forward.
class BaselineRNN(nn.Module):
def __init__(self, **kwargs):
...
def last_timestep(self, unpacked, lengths):
# Index of the last output for each sequence.
idx = (lengths - 1).view(-1, 1).expand(unpacked.size(0),
unpacked.size(2)).unsqueeze(1)
return unpacked.gather(1, idx).squeeze()
def forward(self, x, lengths):
embs = self.embedding(x)
# pack the batch
packed = pack_padded_sequence(embs, list(lengths.data),
batch_first=True)
out_packed, (h, c) = self.rnn(packed)
out_unpacked, _ = pad_packed_sequence(out_packed, batch_first=True)
# get the outputs from the last *non-masked* timestep for each sentence
last_outputs = self.last_timestep(out_unpacked, lengths)
# project to the classes using a linear layer
logits = self.linear(last_outputs)
return logits
Question 2 - Masked Cross Entropy Loss
Yes, by default the zero padded timesteps (targets) matter. However, it is very easy to mask them. You have two options, depending on the version of PyTorch that you use.
PyTorch 0.2.0: Now pytorch supports masking directly in the CrossEntropyLoss, with the ignore_index argument. For example, in language modeling or seq2seq, where i add zero padding, i mask the zero padded words (target) simply like this:
loss_function = nn.CrossEntropyLoss(ignore_index=0)
PyTorch 0.1.12 and older: In the older versions of PyTorch, masking was not supported, so you had to implement your own workaround. I solution that i used, was masked_cross_entropy.py, by jihunchoi. You may be also interested in this discussion.
A few days ago, I found this method which uses indexing to accomplish the same task with a one-liner.
I have my dataset batch first ([batch size, sequence length, features]), so for me:
unpacked_out = unpacked_out[np.arange(unpacked_out.shape[0]), lengths - 1, :]
where unpacked_out is the output of torch.nn.utils.rnn.pad_packed_sequence.
I have compared it with the method described here, which looks similar to the last_timestep() method Christos Baziotis is using above (also recommended here), and the results are the same in my case.

pymc3 theano function usage

I'm trying to define a complex custom likelihood function using pymc3. The likelihood function involves a lot of iteration, and therefore I'm trying to use theano's scan method to define iteration directly within theano. Here's a greatly simplified example that illustrates the challenge that I'm facing. The (fake) likelihood function I'm trying to define is simply the sum of two pymc3 random variables, p and theta. Of course, I could simply return p+theta, but the actual likelihood function I'm trying to write is more complicated, and I believe I need to use theano.scan since it involves a lot of iteration.
import pymc3 as pm
from pymc3 import Model, Uniform, DensityDist
import theano.tensor as T
import theano
import numpy as np
### theano test
theano.config.compute_test_value = 'raise'
X = np.asarray([[1.0,2.0,3.0],[1.0,2.0,3.0]])
### pymc3 implementation
with Model() as bg_model:
p = pm.Uniform('p', lower = 0, upper = 1)
theta = pm.Uniform('theta', lower = 0, upper = .2)
def logp(X):
f = p+theta
print("f",f)
get_ll = theano.function(name='get_ll',inputs = [p, theta], outputs = f)
print("p keys ",p.__dict__.keys())
print("theta keys ",theta.__dict__.keys())
print("p name ",p.name,"p.type ",p.type,"type(p)",type(p),"p.tag",p.tag)
result=get_ll(p, theta)
print("result",result)
return result
y = pm.DensityDist('y', logp, observed = X) # Nx4 y = f(f,x,tx,n | p, theta)
When I run this, I get the error:
TypeError: ('Bad input argument to theano function with name "get_ll" at index 0(0-based)', 'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')
I understand that the issue occurs in line
result=get_ll(p, theta)
because p and theta are of type pymc3.TransformedRV, and that the input to a theano function needs to be a scalar number of a simple numpy array. However, a pymc3 TransformedRV does not seem to have any obvious way of obtaining the current value of the random variable itself.
Is it possible to define a log likelihood function that involves the use of a theano function that takes as input a pymc3 random variable?
The problem is that your th.function get_ll is a compiled theano function, which takes as input numerical arrays. Instead, pymc3 is sending it a symbolic variable (theano tensor). That's why you're getting the error.
As to your solution, you're right in saying that just returning p+theta is the way to go. If you have scans and whatnot in your logp, then you would return the scan variable of interest; there is no need to compile a theano function here. For example, if you wanted to add 1 to each element of a vector (as an impractical toy example), you would do:
def logp(X):
the_sum, the_sum_upd = th.scan(lambda x: x+1, sequences=[X])
return the_sum
That being said, if you need gradients, you would need to calculate your the_sum variable in a theano Op and provide a grad() method along with it (you can see a toy example of that on the answer here). If you do not need gradients, you might be better off doing everything in python (or C, numba, cython, for performance) and using the as_op decorator.

Resources