Tensorflow 1.15 / Keras 2.3.1 Model.train_on_batch() returns more values than there are outputs/loss functions - python-3.x

I am trying to train a model that has more than one output and as a result, also has more than one loss function attached to it when I compile it.
I haven't done something similar in the past (not from scratch at least).
Here's some code I am using to figure out how this works.
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model
batch_size = 50
input_size = 10
i = Input(shape=(input_size,))
x = Dense(100)(i)
x_1 = Dense(output_size)(x)
x_2 = Dense(output_size)(x)
model = Model(i, [x_1, x_2])
model.compile(optimizer = 'adam', loss = ["mse", "mse"])
# Data creation
x = np.random.random_sample([batch_size, input_size]).astype('float32')
y = np.random.random_sample([batch_size, output_size]).astype('float32')
loss = model.train_on_batch(x, [y,y])
print(loss) # sample output [0.8311912, 0.3519104, 0.47928077]
I would expect the variable loss to have two entries (one for each loss function), however, I get back three. I thought maybe one of them is the weighted average but that does not look to be the case.
Could anyone explain how passing in multiple loss functions works, because obviously, I am misunderstanding something.

I believe the three outputs are the sum of all the losses, followed by the individual losses on each output.
For example, if you look at the sample output you've printed there:
0.3519104 + 0.47928077 = 0.83119117 ≈ 0.8311912

Your assumption that there should be two losses in incorrect. You have a model with two outputs, and you specified one loss for each output, but the model has to be trained on a single loss, so Keras trains the model on a new loss that is the sum of the per-output losses.
You can control how these losses are mixed using the loss_weights parameter in model.compile. I think by default it takes weights values equal to 1.0.
So in the end what train_on_batch returns is the loss, output one mse, and output two mse. That is why you get three values.

Related

mse loss function not compatible with regularization loss (add_loss) on hidden layer output

I would like to code in tf.Keras a Neural Network with a couple of loss functions. One is a standard mse (mean squared error) with a factor loading, while the other is basically a regularization term on the output of a hidden layer. This second loss is added through self.add_loss() in a user-defined class inheriting from tf.keras.layers.Layer. I have a couple of questions (the first is more important though).
1) The error I get when trying to combine the two losses together is the following:
ValueError: Shapes must be equal rank, but are 0 and 1
From merging shape 0 with other shapes. for '{{node AddN}} = AddN[N=2, T=DT_FLOAT](loss/weighted_loss/value, model/new_layer/mul_1)' with input shapes: [], [100].
So it comes from the fact that the tensors which should add up to make one unique loss value have different shapes (and ranks). Still, when I try to print the losses during the training, I clearly see that the vectors returned as losses have shape batch_size and rank 1. Could it be that when the 2 losses are summed I have to provide them (or at least the loss of add_loss) as scalar? I know the mse is usually returned as a vector where each entry is the mse from one sample in the batch, hence having batch_size as shape. I think I tried to do the same with the "regularization" loss. Do you have an explanation for this behavio(u)r?
The sample code which gives me error is the following:
import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input
def rate_mse(rate=1e5):
#tf.function # also needed for printing
def loss(y_true, y_pred):
tmp = rate*K.mean(K.square(y_pred - y_true), axis=-1)
# tf.print('shape %s and rank %s output in mse'%(K.shape(tmp), tf.rank(tmp)))
tf.print('shape and rank output in mse',[K.shape(tmp), tf.rank(tmp)])
tf.print('mse loss:',tmp) # print when I put tf.function
return tmp
return loss
class newLayer(tf.keras.layers.Layer):
def __init__(self, rate=5e-2, **kwargs):
super(newLayer, self).__init__(**kwargs)
self.rate = rate
# #tf.function # to be commented for NN training
def call(self, inputs):
tmp = self.rate*K.mean(inputs*inputs, axis=-1)
tf.print('shape and rank output in regularizer',[K.shape(tmp), tf.rank(tmp)])
tf.print('regularizer loss:',tmp)
self.add_loss(tmp, inputs=True)
return inputs
tot_n = 10000
xx = np.random.rand(tot_n,1)
yy = np.pi*xx
train_size = int(0.9*tot_n)
xx_train = xx[:train_size]; xx_val = xx[train_size:]
yy_train = yy[:train_size]; yy_val = yy[train_size:]
reg_layer = newLayer()
input_layer = Input(shape=(1,)) # input
hidden = Dense(20, activation='relu', input_shape=(2,))(input_layer) # hidden layer
hidden = reg_layer(hidden)
output_layer = Dense(1, activation='linear')(hidden)
model = Model(inputs=[input_layer], outputs=[output_layer])
model.compile(optimizer='Adam', loss=rate_mse(), experimental_run_tf_function=False)
#model.compile(optimizer='Adam', loss=None, experimental_run_tf_function=False)
model.fit(xx_train, yy_train, epochs=100, batch_size = 100,
validation_data=(xx_val,yy_val), verbose=1)
#new_xx = np.random.rand(10,1); new_yy = np.pi*new_xx
#model.evaluate(new_xx,new_yy)
print(model.predict(np.array([[1]])))
2) I would also have a secondary question related to this code. I noticed that printing with tf.print inside the function rate_mse only works with tf.function. Similarly, the call method of newLayer is only taken into consideration if the same decorator is commented during training. Can someone explain why this is the case or reference me to a possible solution?
Thanks in advance to whoever can provide me help. I am currently using Tensorflow 2.2.0 and keras version is 2.3.0-tf.
I stuck with the same problem for a few days. "Standard" loss is going to be a scalar at the moment when we add it to the loss from add_loss. The only way how I get it working is to add one more axis while calculating mean. So we will get a scalar, and it will work.
tmp = self.rate*K.mean(inputs*inputs, axis=[0, -1])

Is it possible to predict a certain numerical value given a DNA sequence using LSTM?

I have 16 letters of DNA sequence. From this 16-letter DNA sequence, there is an output value so called 'Inhibition value' which ranges from 0 to 100. When I tried using LSTM, the prediction only output a constant. Is the problem lies in the code or is it just not a suitable task for LSTM or RNN in general to solve?
I have tried to increase batch size and epochs, make the LSTM deeper, change the number of LSTM units, but none of them works.
I was also wondering whether the labeling method matters or not. I tried to use One-hot encoder at first, but it didn't work. Then, I changed it to LabelEncoder, but it's also not working. Same constant output is produced.
Below here is the code for my model structure
def create_model():
input1 = Input(shape=(16,1))
classifier = LSTM(64, input_shape=(16,1), return_sequences=True)(input1)
for i in range(2):
classifier = LSTM(32, return_sequences=True)(classifier)
classifier = LSTM(32)(classifier)
classifier = Dense(1, activation='relu')(classifier)
model = Model(inputs = [input1], outputs = classifier)
adam = keras.optimizers.adam(lr=0.01)
model.compile(loss='mean_squared_error', optimizer=adam)
return model
If anyone wondering why I use functional API instead of sequential, it is because there is a possible modification where I need to use 2 input variables that needs to be processed independently before concatenating it at the end.
Thank you in advance.

Multi-label classification with class weights in Keras

I have a 1000 classes in the network and they have multi-label outputs. For each training example, the number of positive output is same(i.e 10) but they can be assigned to any of the 1000 classes. So 10 classes have output 1 and rest 990 have output 0.
For the multi-label classification, I am using 'binary-cross entropy' as cost function and 'sigmoid' as the activation function. When I tried this rule of 0.5 as the cut-off for 1 or 0. All of them were 0. I understand this is a class imbalance problem. From this link, I understand that, I might have to create extra output labels.Unfortunately, I haven't been able to figure out how to incorporate that into a simple neural network in keras.
nclasses = 1000
# if we wanted to maximize an imbalance problem!
#class_weight = {k: len(Y_train)/(nclasses*(Y_train==k).sum()) for k in range(nclasses)}
inp = Input(shape=[X_train.shape[1]])
x = Dense(5000, activation='relu')(inp)
x = Dense(4000, activation='relu')(x)
x = Dense(3000, activation='relu')(x)
x = Dense(2000, activation='relu')(x)
x = Dense(nclasses, activation='sigmoid')(x)
model = Model(inputs=[inp], outputs=[x])
adam=keras.optimizers.adam(lr=0.00001)
model.compile('adam', 'binary_crossentropy')
history = model.fit(
X_train, Y_train, batch_size=32, epochs=50,verbose=0,shuffle=False)
Could anyone help me with the code here and I would also highly appreciate if you could suggest a good 'accuracy' metric for this problem?
Thanks a lot :) :)
I have a similar problem and unfortunately have no answer for most of the questions. Especially the class imbalance problem.
In terms of metric there are several possibilities: In my case I use the top 1/2/3/4/5 results and check if one of them is right. Because in your case you always have the same amount of labels=1 you could take your top 10 results and see how many percent of them are right and average this result over your batch size. I didn't find a possibility to include this algorithm as a keras metric. Instead, I wrote a callback, which calculates the metric on epoch end on my validation data set.
Also, if you predict the top n results on a test dataset, see how many times each class is predicted. The Counter Class is really convenient for this purpose.
Edit: If found a method to include class weights without splitting the output.
You need a numpy 2d array containing weights with shape [number classes to predict, 2 (background and signal)].
Such an array could be calculated with this function:
def calculating_class_weights(y_true):
from sklearn.utils.class_weight import compute_class_weight
number_dim = np.shape(y_true)[1]
weights = np.empty([number_dim, 2])
for i in range(number_dim):
weights[i] = compute_class_weight('balanced', [0.,1.], y_true[:, i])
return weights
The solution is now to build your own binary crossentropy loss function in which you multiply your weights yourself:
def get_weighted_loss(weights):
def weighted_loss(y_true, y_pred):
return K.mean((weights[:,0]**(1-y_true))*(weights[:,1]**(y_true))*K.binary_crossentropy(y_true, y_pred), axis=-1)
return weighted_loss
weights[:,0] is an array with all the background weights and weights[:,1] contains all the signal weights.
All that is left is to include this loss into the compile function:
model.compile(optimizer=Adam(), loss=get_weighted_loss(class_weights))

how to implement Loss function of paper ''Semantic Image Inpainting with Deep Generative Models' in keras

I have trained GAN on celebA dataset. After that i separate G and D. Then i pick one image from celebA training dataset say yTrue and now i want to find the closest image to yTrue that G can generate say yPred. So the loss at output of G is ||yTrue - yPred||_2^{2} and i minimized it w.r.t generator input(latent variable from normal distribution). Below is code that is giving good results. Now the problem is i want to also add prior loss (log(1-D(G(z))) 1 in first line but i am not getting how to do it as D is not connected to G now and if i directly add k.mean(k.log(1-D.predict(G.output))) in first line it returns numpy array not tensor that is not allowed.
`loss = K.mean(K.square(yTrue - gf.output))
grad = K.gradients(loss,[gf.input])[0]
fn = K.function([gf.input], [grad])
generator_input = np.random.normal(0,1,[1,100])
for i in range(5000):
grad1 = fn([generator_input])
generator_input -= grads[0]*.01
recovered = gf.predict(generator_input)`
In keras, you get the final output to create loss functions. Then, you will have to train the full network to achieve that loss. (Train G+D joined as a single model).
In the loss function, you will have y_true and y_pred, and you use them to compare:
PS: if MSE is not taking the output of the discriminator, please detail your questoin better.
import keras.backend as K
def customLoss(yTrue,yPred):
mse = K.mean(K.square(yTrue-yPred)
prior = K.mean(K.log(1-yPred))
return mse + prior
Pass this function when compiling the model
discriminator.compile(loss=customLoss,optimizer=.....)

Strange behaviour sequence to sequence learning for variable length sequences

I am training a sequence to sequence model for variable length sequences with Keras, but I am running into some unexpected problems. It is unclear to me whether the behaviour I am observing is the desired behaviour of the library and why it would be.
Model Creation
I've made a recurrent model with an embeddings layer and a GRU recurrent layer that illustrates the problem. I used mask_zero=0.0 for the embeddings layer instead of a masking layer, but changing this doesn't seem to make a difference (nor does adding a masking layer before the output):
import numpy
from keras.layers import Embedding, GRU, TimeDistributed, Dense, Input
from keras.models import Model
import keras.preprocessing.sequence
numpy.random.seed(0)
input_layer = Input(shape=(3,), dtype='int32', name='input')
embeddings = Embedding(input_dim=20, output_dim=2, input_length=3, mask_zero=True, name='embeddings')(input_layer)
recurrent = GRU(5, return_sequences=True, name='GRU')(embeddings)
output_layer = TimeDistributed(Dense(1), name='output')(recurrent)
model = Model(input=input_layer, output=output_layer)
output_weights = model.layers[-1].get_weights()
output_weights[1] = numpy.array([0.2])
model.layers[-1].set_weights(output_weights)
model.compile(loss='mse', metrics=['mse'], optimizer='adam', sample_weight_mode='temporal')
I use masking and the sample_weight parameter to exclude the padding values from the training/evaluation. I will test this model on one input/output sequence which I pad using the Keras padding function:
X = [[1, 2]]
X_padded = keras.preprocessing.sequence.pad_sequences(X, dtype='float32', maxlen=3)
Y = [[[1], [2]]]
Y_padded = keras.preprocessing.sequence.pad_sequences(Y, maxlen=3, dtype='float32')
Output Shape
Why the output is expected to be formatted in this way. Why can I not use input/output sequences that have exactly the same dimensionality? model.evaluate(X_padded, Y_padded) gives me a dimensionality error.
Then, when I run model.predict(X_padded) I get the following output (with numpy.random.seed(0) before generating the model):
[[[ 0.2 ]
[ 0.19946882]
[ 0.19175649]]]
Why isn't the first input masked for the output layer? Is the output_value computed anyways (and equal to the bias, as the hidden layer values are 0? This does not seem desirable. Adding a Masking layer before the output layer does not solve this problem.
MSE calculation
Then, when I evaluate the model (model.evaluate(X_padded, Y_padded)), this returns the Mean Squared Error (MSE) of the entire sequence (1.3168) including this first value, which I suppose is to be expected when it isn't masked, but not what I would want.
From the Keras documentation I understand I should use the sample_weight parameter to solve this problem, which I tried:
sample_weight = numpy.array([[0, 1, 1]])
model_evaluation = model.evaluate(X_padded, Y_padded, sample_weight=sample_weight)
print model.metrics_names, model_evaluation
The output I get is
['loss', 'mean_squared_error'] [2.9329459667205811, 1.3168648481369019]
This leaves the metric (MSE) unaltered, it is still the MSE over all values, including the one that I wanted masked. Why? This is not what I want when I evaluate my model. It does cause a change in the loss value, which appears to be the MSE over the last two values normalised to not give more weight to longer sequences.
Am I doing something wrong with the sample weights? Also, I can really not figure out how this loss value came about. What should I do to exclude the padded values from both training and evaluation (I assume the sample_weight parameter works the same in the fit function).
It was indeed a bug in the library, in Keras 2 this issue is resolved.

Resources