What could I be doing wrong. I am trying to implement discriminator loss. When I use the function when training I get wrong values. I don't understand much about GAN, I have been using some tutorials.
import torch
import numpy as np
from torch.nn.functional import binary_cross_entropy_with_logits as bce_loss
def discriminator_loss(logits_real, logits_fake):
"""
Computes the discriminator loss.
You should use the stable torch.nn.functional.binary_cross_entropy_with_logits
Inputs:
- logits_real: PyTorch Tensor of shape (N,) giving scores for the real data.
- logits_fake: PyTorch Tensor of shape (N,) giving scores for the fake data.
Returns:
- loss: PyTorch Tensor containing (scalar) the loss for the discriminator.
"""
loss = None
ones = np.ones(logits_real.shape) + np.random.uniform(-0.1, 0.1)
zeros = np.zeros(logits_real.shape) + np.random.uniform(0, 0.2)
ones = torch.from_numpy(ones).float()
zeros = torch.from_numpy(zeros).float()
loss_real = bce_loss(logits_real, ones)
loss_fake = bce_loss(logits_fake, zeros)
loss = loss_fake + loss_real
return loss
def generator_loss(logits_fake):
"""
Computes the generator loss.
Inputs:
- logits_fake: PyTorch Tensor of shape (N,) giving scores for the fake data.
Returns:
- loss: PyTorch Tensor containing the (scalar) loss for the generator.
"""
loss = None
ones = torch.ones_like(logits_fake)
loss = bce_loss(logits_fake, ones)
return loss
Related
I would like to code in tf.Keras a Neural Network with a couple of loss functions. One is a standard mse (mean squared error) with a factor loading, while the other is basically a regularization term on the output of a hidden layer. This second loss is added through self.add_loss() in a user-defined class inheriting from tf.keras.layers.Layer. I have a couple of questions (the first is more important though).
1) The error I get when trying to combine the two losses together is the following:
ValueError: Shapes must be equal rank, but are 0 and 1
From merging shape 0 with other shapes. for '{{node AddN}} = AddN[N=2, T=DT_FLOAT](loss/weighted_loss/value, model/new_layer/mul_1)' with input shapes: [], [100].
So it comes from the fact that the tensors which should add up to make one unique loss value have different shapes (and ranks). Still, when I try to print the losses during the training, I clearly see that the vectors returned as losses have shape batch_size and rank 1. Could it be that when the 2 losses are summed I have to provide them (or at least the loss of add_loss) as scalar? I know the mse is usually returned as a vector where each entry is the mse from one sample in the batch, hence having batch_size as shape. I think I tried to do the same with the "regularization" loss. Do you have an explanation for this behavio(u)r?
The sample code which gives me error is the following:
import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input
def rate_mse(rate=1e5):
#tf.function # also needed for printing
def loss(y_true, y_pred):
tmp = rate*K.mean(K.square(y_pred - y_true), axis=-1)
# tf.print('shape %s and rank %s output in mse'%(K.shape(tmp), tf.rank(tmp)))
tf.print('shape and rank output in mse',[K.shape(tmp), tf.rank(tmp)])
tf.print('mse loss:',tmp) # print when I put tf.function
return tmp
return loss
class newLayer(tf.keras.layers.Layer):
def __init__(self, rate=5e-2, **kwargs):
super(newLayer, self).__init__(**kwargs)
self.rate = rate
# #tf.function # to be commented for NN training
def call(self, inputs):
tmp = self.rate*K.mean(inputs*inputs, axis=-1)
tf.print('shape and rank output in regularizer',[K.shape(tmp), tf.rank(tmp)])
tf.print('regularizer loss:',tmp)
self.add_loss(tmp, inputs=True)
return inputs
tot_n = 10000
xx = np.random.rand(tot_n,1)
yy = np.pi*xx
train_size = int(0.9*tot_n)
xx_train = xx[:train_size]; xx_val = xx[train_size:]
yy_train = yy[:train_size]; yy_val = yy[train_size:]
reg_layer = newLayer()
input_layer = Input(shape=(1,)) # input
hidden = Dense(20, activation='relu', input_shape=(2,))(input_layer) # hidden layer
hidden = reg_layer(hidden)
output_layer = Dense(1, activation='linear')(hidden)
model = Model(inputs=[input_layer], outputs=[output_layer])
model.compile(optimizer='Adam', loss=rate_mse(), experimental_run_tf_function=False)
#model.compile(optimizer='Adam', loss=None, experimental_run_tf_function=False)
model.fit(xx_train, yy_train, epochs=100, batch_size = 100,
validation_data=(xx_val,yy_val), verbose=1)
#new_xx = np.random.rand(10,1); new_yy = np.pi*new_xx
#model.evaluate(new_xx,new_yy)
print(model.predict(np.array([[1]])))
2) I would also have a secondary question related to this code. I noticed that printing with tf.print inside the function rate_mse only works with tf.function. Similarly, the call method of newLayer is only taken into consideration if the same decorator is commented during training. Can someone explain why this is the case or reference me to a possible solution?
Thanks in advance to whoever can provide me help. I am currently using Tensorflow 2.2.0 and keras version is 2.3.0-tf.
I stuck with the same problem for a few days. "Standard" loss is going to be a scalar at the moment when we add it to the loss from add_loss. The only way how I get it working is to add one more axis while calculating mean. So we will get a scalar, and it will work.
tmp = self.rate*K.mean(inputs*inputs, axis=[0, -1])
I have question regarding the computation made by the Categorical Cross Entropy Loss from Pytorch.
I have made this easy code snippet and because I use the argmax of the output tensor as the targets, I cannot understand why the loss is still high.
import torch
import torch.nn as nn
ce_loss = nn.CrossEntropyLoss()
output = torch.randn(3, 5, requires_grad=True)
targets = torch.argmax(output, dim=1)
loss = ce_loss(outputs, targets)
print(loss)
Thanks for the help understanding it.
Best regards
Jerome
So here is a sample data from your code with the output, label and loss having the following values
outputs = tensor([[ 0.5968, -0.8249, 1.5018, 2.7888, -0.6125],
[-1.1534, -0.4921, 1.0688, 0.2241, -0.0257],
[ 0.3747, 0.8957, 0.0816, 0.0745, 0.2695]], requires_grad=True)requires_grad=True)
labels = tensor([3, 2, 1])
loss = tensor(0.7354, grad_fn=<NllLossBackward>)
So let's examine the values,
If you compute the softmax output of your logits (outputs), using something like this torch.softmax(outputs,axis=1) you will get
probs = tensor([[0.0771, 0.0186, 0.1907, 0.6906, 0.0230],
[0.0520, 0.1008, 0.4801, 0.2063, 0.1607],
[0.1972, 0.3321, 0.1471, 0.1461, 0.1775]], grad_fn=<SoftmaxBackward>)
So these will be your prediction probabilities.
Now cross-entropy loss is nothing but a combination of softmax and negative log likelihood loss. Hence, your loss can simply be computed using
loss = (torch.log(1/probs[0,3]) + torch.log(1/probs[1,2]) + torch.log(1/probs[2,1])) / 3
, which is the average of the negative log of the probabilities of your true labels. The above equation evaluates to 0.7354, which is equivalent to the value returned from the nn.CrossEntropyLoss module.
I am using the input gradient as feature important and want to compare the feature importance of a train datapoint with the human annotated feature importance. I would like to make this comparison differentiable such that it can be learned through backpropagation. For that, I am writing a custom loss function that in addition to the regular loss (e.g. m.s.e. on the prediction vs true labels) also checks whether the input gradient is correct (e.g. m.s.e. of the input gradient vs the human annotated feature importance).
With the following code I am able to get the input gradient:
from keras import backend as K
import numpy as np
from keras.models import Model
from keras.layers import Input, Dense
def normalize(x):
# utility function to normalize a tensor by its L2 norm
return x / (K.sqrt(K.mean(K.square(x))) + 1e-5)
# Amount of training samples
N = 1000
input_dim = 10
# Generate training set make the 1st and 2nd feature same as the target feature
X = np.random.standard_normal(size=(N, input_dim))
y = np.random.randint(low=0, high=2, size=(N, 1))
X[:, 1] = y[:, 0]
X[:, 2] = y[:, 0]
# Create simple model
inputs = Input(shape=(input_dim,))
x = Dense(10, name="dense1")(inputs)
output = Dense(1, activation='sigmoid')(x)
model = Model(input=[inputs], output=output)
# Compile and fit model
model.compile(optimizer='adam', loss="mse", metrics=['accuracy'])
model.fit([X], y, epochs=100, batch_size=64)
# Get function to get input gradients
gradients = K.gradients(model.output, model.input)[0]
gradient_function = K.function([model.input], [normalize(gradients)])
# Get input gradient values of the training-set
grads_val = gradient_function([X])[0]
print(grads_val[:2])
This prints the following (you can see that the 1st and the 2nd features have the highest importance):
[[ 1.2629046e-02 2.2765596e+00 2.1479919e+00 2.1558853e-02
4.5277486e-03 2.9851785e-03 9.5279224e-04 -1.0903150e-02
-1.2230731e-02 2.1960819e-02]
[ 1.1318034e-02 2.0402350e+00 1.9250139e+00 1.9320872e-02
4.0577268e-03 2.6752844e-03 8.5390132e-04 -9.7713526e-03
-1.0961102e-02 1.9681118e-02]]
How can I write a custom loss function in which the input gradients are differentiable?
I started with the following loss function.
from keras.losses import mean_squared_error
def custom_loss():
# human annotated feature importance
# Let's say that it says to only look at the second feature
human_feature_importance = []
for i in range(N):
human_feature_importance.append([0,0,1,0,0,0,0,0,0,0])
def loss(y_true, y_pred):
# Get regular loss
regular_loss_value = mean_squared_error(y_true, y_pred)
# Somehow get the input gradient of each training sample as a tensor
# It should be differential w.r.t. all of the weights
gradients = ??
feature_importance_loss_value = mean_squared_error(gradients, human_feature_importance)
# Combine the both losses
return regular_loss_value + feature_importance_loss_value
return loss
I also found an implementation in tensorflow to make the input gradient differentialble: https://github.com/dtak/rrr/blob/master/rrr/tensorflow_perceptron.py#L18
According to Keras Code, Keras computes the loss value considering optional weights and masks.
for i in range(len(self.outputs)):
if i in skip_target_indices:
continue
y_true = self.targets[i]
y_pred = self.outputs[i]
weighted_loss = weighted_losses[i]
sample_weight = sample_weights[i]
mask = masks[i]
loss_weight = loss_weights_list[i]
with K.name_scope(self.output_names[i] + '_loss'):
output_loss = weighted_loss(y_true, y_pred,
sample_weight, mask)
if len(self.outputs) > 1:
self.metrics_tensors.append(output_loss)
self.metrics_names.append(self.output_names[i] + '_loss')
if total_loss is None:
total_loss = loss_weight * output_loss
else:
total_loss += loss_weight * output_loss
On the other hand, in Keras documentation I see the basic loss function is introduced in compile function, and then sample or class weights can be introduced in fit command.
I am not sure how to relate 'weights and masks' in the first code to 'sample and class weights' in the second document. Can anybody give me more explanation?
My application is actually a Convolutional LSTM network, where I input a series of images and want the network to produce an output map (with the same size of input maps) of pixel classes, but some pixels don't have valid labels during training. Should I use weight or mask, sample or class?
I tried to run linear regression on ForestFires dataset.
Dataset is available on Kaggle and gist of my attempt is here:
https://gist.github.com/Chandrak1907/747b1a6045bb64898d5f9140f4cf9a37
I am facing two problems:
Output from prediction is of shape 32x1 and target data shape is 32.
input and target shapes do not match: input [32 x 1], target [32]¶
Using view I reshaped predictions tensor.
y_pred = y_pred.view(inputs.shape[0])
Why there is a mismatch in shapes of predicted tensor and actual tensor?
SGD in pytorch never converges. I tried to compute MSE manually using
print(torch.mean((y_pred - labels)**2))
This value does not match
loss = criterion(y_pred,labels)
Can someone highlight where is the mistake in my code?
Thank you.
Problem 1
This is reference about MSELoss from Pytorch docs: https://pytorch.org/docs/stable/nn.html#torch.nn.MSELoss
Shape:
- Input: (N,∗) where * means, any number of additional dimensions
- Target: (N,∗), same shape as the input
So, you need to expand dims of labels: (32) -> (32,1), by using: torch.unsqueeze(labels, 1) or labels.view(-1,1)
https://pytorch.org/docs/stable/torch.html#torch.unsqueeze
torch.unsqueeze(input, dim, out=None) → Tensor
Returns a new tensor with a dimension of size one inserted at the specified position.
The returned tensor shares the same underlying data with this tensor.
Problem 2
After reviewing your code, I realized that you have added size_average param to MSELoss:
criterion = torch.nn.MSELoss(size_average=False)
size_average (bool, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
That's why 2 computed values not matched. This is sample code:
import torch
import torch.nn as nn
loss1 = nn.MSELoss()
loss2 = nn.MSELoss(size_average=False)
inputs = torch.randn(32, 1, requires_grad=True)
targets = torch.randn(32, 1)
output1 = loss1(inputs, targets)
output2 = loss2(inputs, targets)
output3 = torch.mean((inputs - targets) ** 2)
print(output1) # tensor(1.0907)
print(output2) # tensor(34.9021)
print(output3) # tensor(1.0907)