Keras: How to define input shape for 1st DENSE layer? - keras

I am new to deep learning & keras.
Refer to below code.
I want to confirm the terminology. Before processing to create batches by the TimeSeries Generator, there are 10 samples. After processing by the Generator, am I correct to say
there are 8 samples in 1 batch?
I don't understand why yhat differs when I define the 1st layer input shape as 'input_shape' vs 'input_dim'. yhat should only be (1,1) - a single value.
If instead, I use a simple RNN layer as my 1st layer, what should the inputs shape be?
Thank you
# univariate one step problem with mlp
from numpy import array
from keras.models import Sequential
from keras.layers import Dense
from keras.preprocessing.sequence import TimeseriesGenerator
# define dataset
series = array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) # 10 samples before processing by the Generator
# define generator
timestep = 2
generator = TimeseriesGenerator(series, series, length=timestep, batch_size=8)
# number of batch
print('Batches: %d' % len(generator))
# OUT --> Batches: 1
# print each batch
for i in range(len(generator)):
x, y = generator[i]
print('%s => %s' % (x, y))
#OUT:
[[1 2]
[2 3]
[3 4]
[4 5]
[5 6]
[6 7]
[7 8]
[8 9]] => [ 3 4 5 6 7 8 9 10]
#After processing by the Generator, there are 8 samples in 1 batch.
x, y = generator[0]
print(x.shape)
# define model
model = Sequential()
#TensorFlow assumes the first dimension is the batch_size which can have any size so you don't need to define it. The 2nd D is the number of time steps. The 3rd D is the number of features
#1st LAYER with input shape defined by input_shape
#model.add(Dense(100, activation='relu', input_shape= (timestep,1)))
#1st LAYER with input shape defined by input_dim
model.add(Dense(100, activation='relu', input_dim=timestep))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit_generator(generator, steps_per_epoch=1, epochs=200, verbose=0)
# make a one step prediction out of sample
x_input = array([9, 10]).reshape((1, timestep))
print(x_input.shape)
yhat = model.predict(x_input, verbose=0)
print(yhat)
# OUT: [[9.3066435, 10.239568]] if 1st layer's shape is input_shape
# OUT: [[11.545249]] if 1st layer's shape is input_dim

Related

Multiclass predictions with a categorical_crossentropy loss

Let's say this example implements a simple binary classification.
X = array([[1,2,3],[2,3,4],[3,4,5]])
y = array([0],[1],[0])
...
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(X, y, epochs=50, verbose=0)
# new instance where we do not know the answer
Xnew = array([[4, 5, 6]])
# make a prediction
ynew = model.predict(Xnew)
#show the inputs and predicted outputs
print("X=%s, Predicted=%s" % (Xnew[0], ynew[0]))
...
results
X=[4, 5, 6], Predicted=[0 or 1]
And this one implements multiclass classification.
X = array([[1,2,3],[2,3,4],[3,4,5]])
y = array([4],[5],[6])
...
model.compile(loss='categorical_crossentropy', optimizer='adam')
# fit model
model.fit(X, y, epochs=50, verbose=2)
model.reset_states()
# evaluate model on new data
yhat = model.predict((X))
...
results decoded
X=[4, 5, 6], Predicted=[4, 5, 6]
How to implement multiclass classification with single output to get something like this? (similar to forecasting time series)
X = array([[1,2,3],[2,3,4],[3,4,5]])
y = array([4],[5],[6])
# new instance where we do not know the answer
Xnew = array([[4, 5, 6]])
yhat = model.predict_classes(Xnew)
results decoded
X=[4, 5, 6], Predicted=[7]
What you are looking for is the loss='sparse_categorical_crossentropy' function which will assume that the integer targets are class labels. So if your model has 7 outputs, and you give target 2, sparse_categorical_crossentropy will convert 2 into [0,0,1,0,0,0,0] as the target and apply categorical_crossentropy as usual.
In this case, your output layer activation function should be softmax and number of outputs be equal to the number of classes. Most likely something like Dense(num_classes, activation='softmax')
If your integer classes are just [4,5,6] then you need to shift them to [0,1,2] to satisfy the condition max(Y_targets) < num_classes.

How to change the Keras sample code model.fit to generator manner?

I'm trying to implement a CNN+RNN+LSTM structure(1) with Keras.
And I found a related Keras sample code.
How can I convert the model.fit to model.fit_generator correctly?
Original code:
from keras.models import Sequential
from keras.layers import Activation, MaxPooling2D, Dropout, LSTM, Flatten, Merge, TimeDistributed
import numpy as np
from keras.layers import Concatenate
from keras.layers.convolutional import Conv2D
# Generate fake data
# Assumed to be 1730 grayscale video frames
x_data = np.random.random((1730, 1, 8, 10))
sequence_lengths = None
Izda=Sequential()
Izda.add(TimeDistributed(Conv2D(40,(3,3),padding='same'), input_shape=(sequence_lengths, 1,8,10)))
Izda.add(Activation('relu'))
Izda.add(TimeDistributed(MaxPooling2D(data_format="channels_first", pool_size=(2, 2))))
Izda.add(Dropout(0.2))
Dcha=Sequential()
Dcha.add(TimeDistributed(Conv2D(40,(3,3),padding='same'), input_shape=(sequence_lengths, 1,8,10)))
Dcha.add(Activation('relu'))
Dcha.add(TimeDistributed(MaxPooling2D(data_format="channels_first", pool_size=(2, 2))))
Dcha.add(Dropout(0.2))
Frt=Sequential()
Frt.add(TimeDistributed(Conv2D(40,(3,3),padding='same'), input_shape=(sequence_lengths, 1,8,10)))
Frt.add(Activation('relu'))
Frt.add(TimeDistributed(MaxPooling2D(data_format="channels_first", pool_size=(2, 2))))
Frt.add(Dropout(0.2))
merged=Merge([Izda, Dcha,Frt], mode='concat', concat_axis=2)
#merged=Concatenate()([Izda, Dcha, Frt], axis=2)
# Output from merge is (batch_size, sequence_length, 120, 4, 5)
# We want to get this down to (batch_size, sequence_length, 120*4*5)
model=Sequential()
model.add(merged)
model.add(TimeDistributed(Flatten()))
model.add(LSTM(240, return_sequences=True))
model.compile(loss='mse', optimizer='adam')
model.summary()
After my modification:
from keras.models import Sequential
from keras.layers import Activation, MaxPooling2D, Dropout, LSTM, Flatten, Merge, TimeDistributed
import numpy as np
from keras.layers import Concatenate
from keras.layers.convolutional import Conv2D
# Generate fake data
# Assumed to be 1730 grayscale video frames
x_data = np.random.random((1730, 1, 8, 10))
sequence_lengths = None
def defModel():
model=Sequential()
model.add(TimeDistributed(Conv2D(40,(3,3),padding='same'), input_shape=(sequence_lengths, 1,8,10)))
model.add(Activation('relu'))
model.add(TimeDistributed(MaxPooling2D(data_format="channels_first", pool_size=(2, 2))))
model.add(Dropout(0.2))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(240, return_sequences=True))
model.compile(loss='mse', optimizer='adam')
model.summary()
return model
def gen():
for i in range(1730):
x_train = np.random.random((1, 8, 10))
y_train = np.ones((15, 240))
yield (x_train, y_train)
def main():
model = defModel()
# Slice our long, single sequence up into shorter sequeunces of images
# Let's make 50 examples of 15 frame videos
x_train = []
seq_len = 15
for i in range(50):
x_train.append(x_data[i*5:i*5+seq_len, :, :, :])
x_train = np.asarray(x_train, dtype='float32')
print(x_train.shape)
# >> (50, 15, 1, 8, 10)
model.fit_generator(
generator = gen(),
steps_per_epoch = 1,
epochs = 2)
if __name__ == "__main__":
main()
How can I resolve this error produce from by my modification?
ValueError: Error when checking input: expected
time_distributed_1_input to have 5 dimensions, but got array with
shape (1, 8, 10)
(1) Wang, S., Clark, R., Wen, H., & Trigoni, N. (2017). DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks. Proceedings - IEEE International Conference on Robotics and Automation, 2043–2050.
Update: Concatenate CNN and LSTM as sample code
model.add(TimeDistributed(Conv2D(16, (7, 7),padding='same'),input_shape=(None, 540, 960, 1)))
model.add(Activation('relu'))
model.add(TimeDistributed(Conv2D(32, (5, 5),padding='same'))) model.add(Activation('relu'))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(num_classes, return_sequences=True))
Got error
ValueError: Error when checking target: expected lstm_1 to have 3 dimensions, but got array with shape (4, 3)
Update2
The goal is to extract image feature by CNN, then combine 3 feature from 3 images and feed into LSTM.
Goal
#Input image
(540, 960, 1) ==> (x,y,ch) ==> CNN ==> (m,n,k)┐
(540, 960, 1) ==> (x,y,ch) ==> CNN ==> (m,n,k)---> (3, m,n,k) --flatten--> (3, mnk)
(540, 960, 1) ==> (x,y,ch) ==> CNN ==> (m,n,k)」
(3, mnk) => LSTM => predict three regression value
Model
model = Sequential()
model.add(TimeDistributed(Conv2D(16, (7, 7), padding='same'),input_shape=(None, 540, 960, 1)))
model.add(Activation('relu'))
model.add(TimeDistributed(Conv2D(32, (5, 5), padding='same')))
model.add(Activation('relu'))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(num_classes, return_sequences=True))
model.compile(loss='mean_squared_error', optimizer='adam')
The generator
a = readIMG(filenames[start]) # (540, 960, 1)
b = readIMG(filenames[start + 1]) # (540, 960, 1)
c = readIMG(filenames[start + 2]) # (540, 960, 1)
x_train = np.array([[a, b, c]]) # (1, 3, 540, 960, 1)
Then I still got the error:
ValueError: Error when checking target: expected lstm_1 to have 3 dimensions, but got array with shape (1, 3)
The problem is a plain shape mismatch problem.
You defined input_shape=(sequence_lengths, 1,8,10), so your model is expecting five dimensions as input: (batch_size, sequence_lengths, 1, 8, 10)
All you need is to make your generator output the correct shapes with 5 dimensions.
def gen():
x_data = np.random.random((numberOfVideos, videoLength, 1, 8, 10))
y_data = np.ones((numberOfVideos, videoLength, 240))
for video in range(numberOfVideos):
x_train = x_data[video:video+1]
y_train = y_data[video:video+1]
yield (x_train, y_train)
Here is the working example of CNNLSTM using generator: https://gist.github.com/HTLife/25c0cd362faa91477b8f28f6033adb45

Concatenation of Keras parallel layers changes wanted target shape

I'm a bit new to Keras and deep learning. I'm currently trying to replicate this paper but when I'm compiling the first model (without the LSTMs) I get the following error:
"ValueError: Error when checking target: expected dense_3 to have shape (None, 120, 40) but got array with shape (8, 40, 1)"
The description of the model is this:
Input (length T is appliance specific window size)
Parallel 1D convolution with filter size 3, 5, and 7
respectively, stride=1, number of filters=32,
activation type=linear, border mode=same
Merge layer which concatenates the output of
parallel 1D convolutions
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim=T , activation type=linear
My code is this:
from keras import layers, Input
from keras.models import Model
# the window sizes (seq_length?) are 40, 1075, 465, 72 and 1246 for the kettle, dish washer,
# fridge, microwave, oven and washing machine, respectively.
def ae_net(T):
input_layer = Input(shape= (T,))
branch_a = layers.Conv1D(32, 3, activation= 'linear', padding='same', strides=1)(input_layer)
branch_b = layers.Conv1D(32, 5, activation= 'linear', padding='same', strides=1)(input_layer)
branch_c = layers.Conv1D(32, 7, activation= 'linear', padding='same', strides=1)(input_layer)
merge_layer = layers.concatenate([branch_a, branch_b, branch_c], axis=1)
dense_1 = layers.Dense(128, activation='relu')(merge_layer)
dense_2 =layers.Dense(128, activation='relu')(dense_1)
output_dense = layers.Dense(T, activation='linear')(dense_2)
model = Model(input_layer, output_dense)
return model
model = ae_net(40)
model.compile(loss= 'mean_absolute_error', optimizer='rmsprop')
model.fit(X, y, batch_size= 8)
where X and y are numpy arrays of 8 sequences of a length of 40 values. So X.shape and y.shape are (8, 40, 1). It's actually one batch of data. The thing is I cannot understand how the output would be of shape (None, 120, 40) and what these sizes would mean.
As you noted, your shapes contain batch_size, length and channels: (8,40,1)
Your three convolutions are, each one, creating a tensor like (8,40,32).
Your concatenation in the axis=1 creates a tensor like (8,120,32), where 120 = 3*40.
Now, the dense layers only work on the last dimension (the channels in this case), leaving the length (now 120) untouched.
Solution
Now, it seems you do want to keep the length at the end. So you won't need any flatten or reshape layers. But you will need to keep the length 40, though.
You're probably doing the concatenation in the wrong axis. Instead of the length axis (1), you should concatenate in the channels axis (2 or -1).
So, this should be your concatenate layer:
merge_layer = layers.Concatenate()([branch_a, branch_b, branch_c])
#or layers.Concatenate(axis=-1)([branch_a, branch_b, branch_c])
This will output (8, 40, 96), and the dense layers will transform the 96 in something else.

Shape must be rank 1 but is rank 2 tflearn error

I am using a DNN provided by tflearn to learn from some data. My data variable has a shape of (6605, 32) and my labels data has a shape of (6605,) which I reshape in the code below to (6605, 1)...
# Target label used for training
labels = np.array(data[label], dtype=np.float32)
# Reshape target label from (6605,) to (6605, 1)
labels = tf.reshape(labels, shape=[-1, 1])
# Data for training minus the target label.
data = np.array(data.drop(label, axis=1), dtype=np.float32)
# DNN
net = tflearn.input_data(shape=[None, 32])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 1, activation='softmax')
net = tflearn.regression(net)
# Define model.
model = tflearn.DNN(net)
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)
This gives me a couple of errors, the first is...
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 2 for 'strided_slice' (op: 'StridedSlice') with input shapes: [6605,1], [1,16], [1,16], [1].
...and the second is...
During handling of the above exception, another exception occurred:
ValueError: Shape must be rank 1 but is rank 2 for 'strided_slice' (op: 'StridedSlice') with input shapes: [6605,1], [1,16], [1,16], [1].
I have no idea what rank 1 and rank 2 are, so I do not have an idea as to how to fix this issue.
In Tensorflow, rank is the number of dimensions of a tensor (not similar to the matrix rank). As an example, following tensor has a rank of 2.
t1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(t1.shape) # prints (3, 3)
Moreover, following tensor has a rank of 3.
t2 = np.array([[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]])
print(t2.shape) # prints (2, 2, 3)
Since tflearn is build on top of Tensorflow, inputs should not be tensors. I have modified your code as follows and commented where necessary.
# Target label used for training
labels = np.array(data[label], dtype=np.float32)
# Reshape target label from (6605,) to (6605, 1)
labels =np.reshape(labels,(-1,1)) #makesure the labels has the shape of (?,1)
# Data for training minus the target label.
data = np.array(data.drop(label, axis=1), dtype=np.float32)
data = np.reshape(data,(-1,32)) #makesure the data has the shape of (?,32)
# DNN
net = tflearn.input_data(shape=[None, 32])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 1, activation='softmax')
net = tflearn.regression(net)
# Define model.
model = tflearn.DNN(net)
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)
Hope this helps.

How do I mask a loss function in Keras with the TensorFlow backend?

I am trying to implement a sequence-to-sequence task using LSTM by Keras with the TensorFlow backend. The inputs are English sentences with variable lengths. To construct a dataset with 2-D shape [batch_number, max_sentence_length], I add EOF at the end of the line and pad each sentence with enough placeholders, e.g. #. And then each character in the sentence is transformed into a one-hot vector, so that the dataset has 3-D shape [batch_number, max_sentence_length, character_number]. After LSTM encoder and decoder layers, softmax cross-entropy between output and target is computed.
To eliminate the padding effect in model training, masking could be used on input and loss function. Mask input in Keras can be done by using layers.core.Masking. In TensorFlow, masking on loss function can be done as follows: custom masked loss function in TensorFlow.
However, I don't find a way to realize it in Keras, since a user-defined loss function in Keras only accepts parameters y_true and y_pred. So how to input true sequence_lengths to loss function and mask?
Besides, I find a function _weighted_masked_objective(fn) in \keras\engine\training.py. Its definition is
Adds support for masking and sample-weighting to an objective function.
But it seems that the function can only accept fn(y_true, y_pred). Is there a way to use this function to solve my problem?
To be specific, I modify the example of Yu-Yang.
from keras.models import Model
from keras.layers import Input, Masking, LSTM, Dense, RepeatVector, TimeDistributed, Activation
import numpy as np
from numpy.random import seed as random_seed
random_seed(123)
max_sentence_length = 5
character_number = 3 # valid character 'a, b' and placeholder '#'
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
encoder_output = LSTM(10, return_sequences=False)(masked_input)
repeat_output = RepeatVector(max_sentence_length)(encoder_output)
decoder_output = LSTM(10, return_sequences=True)(repeat_output)
output = Dense(3, activation='softmax')(decoder_output)
model = Model(input_tensor, output)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
X = np.array([[[0, 0, 0], [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]],
[[0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_true = np.array([[[0, 0, 1], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0]], # the batch is ['##abb','#babb'], padding '#'
[[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_pred = model.predict(X)
print('y_pred:', y_pred)
print('y_true:', y_true)
print('model.evaluate:', model.evaluate(X, y_true))
# See if the loss computed by model.evaluate() is equal to the masked loss
import tensorflow as tf
logits=tf.constant(y_pred, dtype=tf.float32)
target=tf.constant(y_true, dtype=tf.float32)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(target * tf.log(logits),axis=2))
losses = -tf.reduce_sum(target * tf.log(logits),axis=2)
sequence_lengths=tf.constant([3,4])
mask = tf.reverse(tf.sequence_mask(sequence_lengths,maxlen=max_sentence_length),[0,1])
losses = tf.boolean_mask(losses, mask)
masked_loss = tf.reduce_mean(losses)
with tf.Session() as sess:
c_e = sess.run(cross_entropy)
m_c_e=sess.run(masked_loss)
print("tf unmasked_loss:", c_e)
print("tf masked_loss:", m_c_e)
The output in Keras and TensorFlow are compared as follows:
As shown above, masking is disabled after some kinds of layers. So how to mask the loss function in Keras when those layers are added?
If there's a mask in your model, it'll be propagated layer-by-layer and eventually applied to the loss. So if you're padding and masking the sequences in a correct way, the loss on the padding placeholders would be ignored.
Some Details:
It's a bit involved to explain the whole process, so I'll just break it down to several steps:
In compile(), the mask is collected by calling compute_mask() and applied to the loss(es) (irrelevant lines are ignored for clarity).
weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]
# Prepare output masks.
masks = self.compute_mask(self.inputs, mask=None)
if masks is None:
masks = [None for _ in self.outputs]
if not isinstance(masks, list):
masks = [masks]
# Compute total loss.
total_loss = None
with K.name_scope('loss'):
for i in range(len(self.outputs)):
y_true = self.targets[i]
y_pred = self.outputs[i]
weighted_loss = weighted_losses[i]
sample_weight = sample_weights[i]
mask = masks[i]
with K.name_scope(self.output_names[i] + '_loss'):
output_loss = weighted_loss(y_true, y_pred,
sample_weight, mask)
Inside Model.compute_mask(), run_internal_graph() is called.
Inside run_internal_graph(), the masks in the model is propagated layer-by-layer from the model's inputs to outputs by calling Layer.compute_mask() for each layer iteratively.
So if you're using a Masking layer in your model, you shouldn't worry about the loss on the padding placeholders. The loss on those entries will be masked out as you've probably already seen inside _weighted_masked_objective().
A Small Example:
max_sentence_length = 5
character_number = 2
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
output = LSTM(3, return_sequences=True)(masked_input)
model = Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')
X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
[[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
[[[ 0. 0. 0. ]
[ 0. 0. 0. ]
[-0.11980877 0.05803877 0.07880752]
[-0.00429189 0.13382857 0.19167568]
[ 0.06817091 0.19093043 0.26219055]]
[[ 0. 0. 0. ]
[ 0.0651961 0.10283815 0.12413475]
[-0.04420842 0.137494 0.13727818]
[ 0.04479844 0.17440712 0.24715884]
[ 0.11117355 0.21645413 0.30220413]]]
# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()
print(model.evaluate(X, y_true))
0.881977558136
print(masked_loss)
0.881978
print(unmasked_loss)
0.917384
As can be seen from this example, the loss on the masked part (the zeroes in y_pred) is ignored, and the output of model.evaluate() is equal to masked_loss.
EDIT:
If there's a recurrent layer with return_sequences=False, the mask stop propagates (i.e., the returned mask is None). In RNN.compute_mask():
def compute_mask(self, inputs, mask):
if isinstance(mask, list):
mask = mask[0]
output_mask = mask if self.return_sequences else None
if self.return_state:
state_mask = [None for _ in self.states]
return [output_mask] + state_mask
else:
return output_mask
In your case, if I understand correctly, you want a mask that's based on y_true, and whenever the value of y_true is [0, 0, 1] (the one-hot encoding of "#") you want the loss to be masked. If so, you need to mask the loss values in a somewhat similar way to Daniel's answer.
The main difference is the final average. The average should be taken over the number of unmasked values, which is just K.sum(mask). And also, y_true can be compared to the one-hot encoded vector [0, 0, 1] directly.
def get_loss(mask_value):
mask_value = K.variable(mask_value)
def masked_categorical_crossentropy(y_true, y_pred):
# find out which timesteps in `y_true` are not the padding character '#'
mask = K.all(K.equal(y_true, mask_value), axis=-1)
mask = 1 - K.cast(mask, K.floatx())
# multiply categorical_crossentropy with the mask
loss = K.categorical_crossentropy(y_true, y_pred) * mask
# take average w.r.t. the number of unmasked entries
return K.sum(loss) / K.sum(mask)
return masked_categorical_crossentropy
masked_categorical_crossentropy = get_loss(np.array([0, 0, 1]))
model = Model(input_tensor, output)
model.compile(loss=masked_categorical_crossentropy, optimizer='adam')
The output of the above code then shows that the loss is computed only on the unmasked values:
model.evaluate: 1.08339476585
tf unmasked_loss: 1.08989
tf masked_loss: 1.08339
The value is different from yours because I've changed the axis argument in tf.reverse from [0,1] to [1].
If you're not using masks as in Yu-Yang's answer, you can try this.
If you have your target data Y with length and padded with the mask value, you can:
import keras.backend as K
def custom_loss(yTrue,yPred):
#find which values in yTrue (target) are the mask value
isMask = K.equal(yTrue, maskValue) #true for all mask values
#since y is shaped as (batch, length, features), we need all features to be mask values
isMask = K.all(isMask, axis=-1) #the entire output vector must be true
#this second line is only necessary if the output features are more than 1
#transform to float (0 or 1) and invert
isMask = K.cast(isMask, dtype=K.floatx())
isMask = 1 - isMask #now mask values are zero, and others are 1
#multiply this by the inputs:
#maybe you might need K.expand_dims(isMask) to add the extra dimension removed by K.all
yTrue = yTrue * isMask
yPred = yPred * isMask
return someLossFunction(yTrue,yPred)
If you have padding only for the input data, or if Y has no length, you can have your own mask outside the function:
masks = [
[1,1,1,1,1,1,0,0,0],
[1,1,1,1,0,0,0,0,0],
[1,1,1,1,1,1,1,1,0]
]
#shape (samples, length). If it fails, make it (samples, length, 1).
import keras.backend as K
masks = K.constant(masks)
Since masks depend on your input data, you can use your mask value to know where to put zeros, such as:
masks = np.array((X_train == maskValue).all(), dtype='float64')
masks = 1 - masks
#here too, if you have a problem with dimensions in the multiplications below
#expand masks dimensions by adding a last dimension = 1.
And make your function taking masks from outside of it (you must recreate the loss function if you change the input data):
def customLoss(yTrue,yPred):
yTrue = masks*yTrue
yPred = masks*yPred
return someLossFunction(yTrue,yPred)
Does anyone know if keras automatically masks the loss function??
Since it provides a Masking layer and says nothing about the outputs, maybe it does it automatically?
I took both anwers and imporvised a way for Multiple Timesteps, single Missing target Values, Loss for LSTM(or other RecurrentNN) with return_sequences=True.
Daniels Answer would not suffice for multiple targets, due to isMask = K.all(isMask, axis=-1). Removing this aggregation made the function undifferentiable, probably. I do not know for shure, since I never run the pure function and cannot tell if its able to fit a model.
I fused Yu-Yangs's and Daniels answer together and it worked.
from tensorflow.keras.layers import Layer, Input, LSTM, Dense, TimeDistributed
from tensorflow.keras import Model, Sequential
import tensorflow.keras.backend as K
import numpy as np
mask_Value = -2
def get_loss(mask_value):
mask_value = K.variable(mask_value)
def masked_loss(yTrue,yPred):
#find which values in yTrue (target) are the mask value
isMask = K.equal(yTrue, mask_Value) #true for all mask values
#transform to float (0 or 1) and invert
isMask = K.cast(isMask, dtype=K.floatx())
isMask = 1 - isMask #now mask values are zero, and others are 1
isMask
#multiply this by the inputs:
#maybe you might need K.expand_dims(isMask) to add the extra dimension removed by K.all
yTrue = yTrue * isMask
yPred = yPred * isMask
# perform a root mean square error, whereas the mean is in respect to the mask
mean_loss = K.sum(K.square(yPred - yTrue))/K.sum(isMask)
loss = K.sqrt(mean_loss)
return loss
#RootMeanSquaredError()(yTrue,yPred)
return masked_loss
# define timeseries data
n_sample = 10
timesteps = 5
feat_inp = 2
feat_out = 2
X = np.random.uniform(0,1, (n_sample, timesteps, feat_inp))
y = np.random.uniform(0,1, (n_sample,timesteps, feat_out))
# define model
model = Sequential()
model.add(LSTM(50, activation='relu',return_sequences=True, input_shape=(timesteps, feat_inp)))
model.add(Dense(feat_out))
model.compile(optimizer='adam', loss=get_loss(mask_Value))
model.summary()
# %%
model.fit(X, y, epochs=50, verbose=0)
Note that Yu-Yang's answer does not appear to work on Tensorflow Keras 2.7.0
Surprisingly, model.evaluate does not compute masked_loss or unmasked_loss. Instead, it assumes that the loss from all masked input steps is zero (but still includes those steps in the mean() calculation). This means that every masked timestep actually reduces the calculated error!
#%% Yu-yang's example
# https://stackoverflow.com/a/47060797/3580080
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
# Fix the random seed for repeatable results
np.random.seed(5)
tf.random.set_seed(5)
max_sentence_length = 5
character_number = 2
input_tensor = keras.Input(shape=(max_sentence_length, character_number))
masked_input = keras.layers.Masking(mask_value=0)(input_tensor)
output = keras.layers.LSTM(3, return_sequences=True)(masked_input)
model = keras.Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')
X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
[[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()
print(f"model.evaluate= {model.evaluate(X, y_true)}")
print(f"masked loss= {masked_loss}")
print(f"unmasked loss= {unmasked_loss}")
Prints:
[[[ 0. 0. 0. ]
[ 0. 0. 0. ]
[ 0.05340272 -0.06415359 -0.11803789]
[ 0.08775083 0.00600774 -0.10454659]
[ 0.11212641 0.07632366 -0.04133942]]
[[ 0. 0. 0. ]
[ 0.05394626 0.08956442 0.03843312]
[ 0.09092357 -0.02743799 -0.10386454]
[ 0.10791279 0.04083341 -0.08820333]
[ 0.12459432 0.09971555 -0.02882453]]]
1/1 [==============================] - 1s 658ms/step - loss: 0.6865
model.evaluate= 0.6864957213401794
masked loss= 0.9807082414627075
unmasked loss= 0.986495852470398
(This is intended as a comment rather than an answer).

Resources