How can I calculate the loss without the weight decay in Keras? - keras

I defined a convolutional layer and also use the L2 weight decay in Keras.
When I define the loss in the model.fit(), has all the weight decay loss been included in this loss? If the weight decay loss has been included in the total loss, how can I get the loss without this weight decay during the training?
I want to investigate the loss without the weight decay, while I want this weight decay to attend this training.

Yes, weight decay losses are included in the loss value printed on the screen.
The value you want to monitor is the total loss minus the sum of regularization losses.
The total loss is just model.total_loss
.
The regularization losses are collected in the list model.losses.
The following lines can be found in the source code of model.compile():
# Add regularization penalties
# and other layer-specific losses.
for loss_tensor in self.losses:
total_loss += loss_tensor
To get the loss without weight decay, you can reverse the above operations. I.e., the value to be monitored is model.total_loss - sum(model.losses).
Now, how to monitor this value is a bit tricky. Fortunately, the list of metrics used by a Keras model is not fixed until model.fit() is called. So you can append this value to the list, and it'll be printed on the screen during model fitting.
Here's a simple example:
input_tensor = Input(shape=(64, 64, 3))
hidden = Conv2D(32, 1, kernel_regularizer=l2(0.01))(input_tensor)
hidden = GlobalAveragePooling2D()(hidden)
out = Dense(1)(hidden)
model = Model(input_tensor, out)
model.compile(loss='mse', optimizer='adam')
loss_no_weight_decay = model.total_loss - sum(model.losses)
model.metrics_tensors.append(loss_no_weight_decay)
model.metrics_names.append('loss_no_weight_decay')
When you run model.fit(), something like this will be printed to the screen:
Epoch 1/1
100/100 [==================] - 0s - loss: 0.5764 - loss_no_weight_decay: 0.5178
You can also verify whether this value is correct by computing the L2 regularization manually:
conv_kernel = model.layers[1].get_weights()[0]
print(np.sum(0.01 * np.square(conv_kernel)))
In my case, the printed value is 0.0585, which is indeed the difference between loss and loss_no_weight_decay (with some rounding error).

Related

Pytorch - Repeating Loss

I am new to PyTorch and I found a problem when displaying the loss of my model.
Pytorch Adam Optimizer - Model Loss Figure
Pytorch SGD Optimizer - Model Loss Figure
As you can see, the model seem to go up and down multiple times, with a recurrent pattern (the pattern starting to repeat at the begging of every epoch).
The full code can be found at: https://github.com/19valentin99/Kaggle/tree/main/Iris%20Flowers
in main_test.py (the # lines are the ones that I used to debug the code and the answer should be below).
When we just take the loss of the last element (or the loss over the
whole epoch) we will see a smooth decrease in loss
The reason your loss is smooth is because you are looking at the loss of the exact same batch on every iteration. Indeed your train data loader isn't shuffling your instance:
train2 = DataLoader(flowers_data_train, batch_size=BATCH_SIZE)
This means the same batch will appear last on every epoch. That's all there is to it, this doesn't mean the learning is different, it means you are looking at a part of the complete dataset loss.
The difference between "not working" and "working" is based of when the loss is recorded.
The idea is that: overall, the loss converges, but in this time until it converges it jumps up and down.
While it jumps up and down, we might see a pattern if we are sampling too often. The pattern is given by the data we use for training (as the data we use to train is the same every epoch - in batches).
As a result:
For the not-working version: I was recording the loss every epoch, after every batch.
For the working version: I was recording only the latest loss in the epoch.
Pytorch Adam Optimizer - Model Loss (working)
Pytorch SGD Optimizer - Model Loss (working)
Furthermore, I will attach the code which generates the non working version:
loss_list = []
for epoch in range(EPOCHS):
for idx, (x, y) in enumerate(train_load):
x, y = x.to(device), y.to(device)
#Compute Error
prediction = model(x)
#print(prediction, y)
loss = loss_fn(prediction, y)
#debuging
loss_list.append(loss.item())
##Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
plt.plot(loss_list)
plt.show()
The working code:
loss_list2 = np.zeros((EPOCHS,))
for epoch in range(EPOCHS):
for batch, (x, y) in enumerate(train_load):
x = x.to(device=device)
y = y.to(device=device)
y_pred = model(x)
loss = loss_fn(y_pred, y)
loss_list2[epoch] = loss.item()
# Zero gradients
optimizer.zero_grad()
loss.backward()
optimizer.step()
plt.plot(loss_list2)
plt.show()
In the end, I would like to mention that I know that there are a couple of other threads out there that say how to solve this problem (like: clip the gradients, remove the last batch, model is too simple to capture the data) but in the end, what I discovered is that it wasn't actually a problem but more "when the recording of the data is done".
I hope that this will help other people as well.

keras and shape of input and losses

In all code examples for keras I see that the input shape is passed directly and it is surmised that the batch size is the first one , eg:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)
However when it comes to custom losses I see that the last axis (axis=-1) is used.
def loss(y_true,y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
When writing the loss should one think of y_true and y_pred as batches or singular samples?
I'm assuming it's the former , but if that's the case I can't understand why it's specifying the last axis
In your custom loss function, you treat y_true and y_pred as batches which is also the case for the returned value of the function. If you only calculate one loss for your network, you could also get rid of the specified axis, since you only want a single value for your loss in the end.
But if you have multiple outputs in your network and you want to calculate the total loss, where each output might use its own loss functions, things begin to change.
Please check out: https://github.com/keras-team/keras/blob/master/keras/engine/training.py#L658
where the function to calculate the total loss, _prepare_total_loss, is called.
In this function, the following code is executed:
output_loss = loss_fn(y_true, y_pred, sample_weight=sample_weight)
which returns the loss for a single output of your network. This is also where your custom loss function gets called. If there are multiple outputs, all of them are calculated, weighted and added to the total loss: total_loss += loss_weight * output_loss
In the end, _prepare_total_loss returns K.mean(total_loss). So in the simplest case, if your custom loss function returned a vector with its length equal to the batch size, and there is only one output with loss in your network, the final loss will be the mean of the output-vector returned by your custom loss.
But in case of multiple outputs and multiple losses, you first want to calculate the loss vector of a batch for each output and therefore loss function, take their weighted sum and then calculate the final loss by taking the mean of the resulting vector.
If your loss functions would return a single loss value each instead of a batch-sized vector, the final loss would be the mean of multiple mean loss values which differs from the mean loss of the whole batch.

Different loss function for validation set in Keras

I have unbalanced training dataset, thats why I built custom weighted categorical cross entropy loss function. But the problem is my validation set is balanced one and I want to use the regular categorical cross entropy loss. So can I pass different loss function for validation set within Keras? I mean the wighted one for training and regular one for validation set?
def weighted_loss(y_pred, y_ture):
'
'
'
return loss
model.compile(loss= weighted_loss, metric='accuracy')
You can try the backend function K.in_train_phase(), which is used by the Dropout and BatchNormalization layers to implement different behaviors in training and validation.
def custom_loss(y_true, y_pred):
weighted_loss = ... # your implementation of weighted crossentropy loss
unweighted_loss = K.sparse_categorical_crossentropy(y_true, y_pred)
return K.in_train_phase(weighted_loss, unweighted_loss)
The first argument of K.in_train_phase() is the tensor used in training phase, and the second is the one used in test phase.
For example, if we set weighted_loss to 0 (just to verify the effect of K.in_train_phase() function):
def custom_loss(y_true, y_pred):
weighted_loss = 0 * K.sparse_categorical_crossentropy(y_true, y_pred)
unweighted_loss = K.sparse_categorical_crossentropy(y_true, y_pred)
return K.in_train_phase(weighted_loss, unweighted_loss)
model = Sequential([Dense(100, activation='relu', input_shape=(100,)), Dense(1000, activation='softmax')])
model.compile(optimizer='adam', loss=custom_loss)
model.outputs[0]._uses_learning_phase = True # required if no dropout or batch norm in the model
X = np.random.rand(1000, 100)
y = np.random.randint(1000, size=1000)
model.fit(X, y, validation_split=0.1)
Epoch 1/10
900/900 [==============================] - 1s 868us/step - loss: 0.0000e+00 - val_loss: 6.9438
As you can see, the loss in training phase is indeed the one multiplied by 0.
Note that if there's no dropout or batch norm in your model, you'll need to manually "turn on" the _uses_learning_phase boolean switch, otherwise K.in_train_phase() will have no effect by default.
The validation loss function is just a metric and actually not needed for training. It's there because it make sense to compare the metrics which your network is actually optimzing on.
So you can add any other loss function as metric during compilation and you'll see it during training.

Vector regression with Keras

Suppose, for example, a regression problem with five scalars as output, where each output has approximately the same range. In Keras, we can model this using a 5-output dense layer without activation function (vector regression):
output_layer = layers.Dense(5, activation=None)(previous_layer)
model = models.Model(input_layer, output_layer)
model.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
Is the total loss (metric) simply the sum of the individual losses (metrics)? Is this equivalent to the following multi-output model, where the outputs have the same implicit loss weights? In my experiments, I haven't observed any significant differences but want to make sure that I didn't miss anything fundamental.
output_layer_list = []
for _ in range(5):
output_layer_list.append(layers.Dense(1, activation=None)(previous_layer))
model = models.Model(input_layer, output_layer_list)
model.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
Is there an easy way to attach weights to the outputs in the first solution similar to specifying loss_weights in case of multi-output models?
Those models are the same. To answer your questions let's look at the mse loss:
def mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
Is the total loss (metric) simply the sum of the individual losses (metrics)? Yes, because the mse loss applies the K.mean function so you can argue it is the sum of all the elements in the output vector.
Is this equivalent to the following multi-output model, where the outputs have the same implicit loss weights? Yes, because subtraction and squaring are done element wise in vector form, so scalar outputs will produce the same as a single vector output. And a multi-output model loss is the sum of losses of individual outputs.
Yes, both are equivalent. To replicate the loss_weights functionality with your first model, you can define your own custom loss function. Something along these lines:
import tensorflow as tf
weights = K.variable(value=np.array([[0.1, 0.1, 0.1, 0.1, 0.6]]))
def custom_loss(y_true, y_pred):
return tf.matmul(K.square(y_true - y_pred), tf.transpose(weights))
and pass this function to the loss argument upon compiling:
model.compile(optimizer='rmsprop', loss=custom_loss, metrics=['mse'])

How does Keras evaluate loss on test set?

I am implementing a neural network classifier, to print loss and accuracy of this NN I'm using:
score = model.evaluate(x_test, y_test, verbose=False)
model.metrics_names
print('Test score: ', score[0]) #Loss on test
print('Test accuracy: ', score[1])
I would to know how Keras calculate the loss of a model. Specially whether it is evaluated on the first (and only) step of the test set.
I have search on keras.io, but I don't have find anything about it.
From the documentation:
evaluate
Computes the loss on some input data, batch by batch.
Returns
Scalar test loss (if the model has no metrics) or list of scalars (if the model computes other metrics). The attribute model.metrics_names will give you the display labels for the scalar outputs.
So it returns you either a single value that represents a loss, or a list of values that correspond to different metrics that were added to your model. These values are calculated based on the whole test set, i. e. all values in x_test and y_test.
Firstly, print model.metrics_names. Assume the output is ['loss', 'accuracy']
Then you can do:
[test_loss,test_acc] = model.evaluate(X_test,y_test)
print('test accuracy: ',test_acc*100,'%')
print('test loss: ',test_loss)

Resources