Merge a forward lstm and a backward lstm in Keras - keras

I would like to merge a forward LSTM and a backward LSTM in Keras. The input array of the backward LSTM is different from that of a forward LSTM. Thus, I cannot use keras.layers.Bidirectional.
The forward input is (10, 4).
The backward input is (12, 4) and it is reversed before put into the model. I would like to reverse it again after LSTM and merge it with the forward.
The simplified model is as follows.
from lambdawithmask import Lambda as MaskLambda
def reverse_func(x, mask=None):
return tf.reverse(x, [False, True, False])
forward = Sequential()
backward = Sequential()
model = Sequential()
forward.add(LSTM(input_shape = (10, 4), output_dim = 4, return_sequences = True))
backward.add(LSTM(input_shape = (12, 4), output_dim = 4, return_sequences = True))
backward.add(MaskLambda(function=reverse_func, mask_function=reverse_func))
model.add(Merge([forward, backward], mode = "concat", concat_axis = 1))
When I run this, the error message is:
Tensors in list passed to 'values' of 'ConcatV2' Op have types [bool, float32] that don't all match.
Could anyone help me? I coded in Python 3.5.2 with Keras (2.0.5) and the backend is tensorflow (1.2.1).

First of all, if you have two different inputs, you cannot use a Sequential model. You must use the functional API Model:
from keras.models import Model
The two first models can be sequential, no problem, but the junction must be a regular model. When it's about concatenating, I also use the functional approach (create the layer, then pass the input):
junction = Concatenate(axis=1)([forward.output,backward.output])
Why axis=1? You can only concatenate things with the same shape. Since you have 10 and 12, they're not compatible unless you use this exact axis for the merge, which is the second axis, considering you have (BatchSize, TimeSteps, Units)
For creating the final model, use the Model, specify the inputs and outputs:
model = Model([forward.input,backward.input], junction)
In the model to be reversed, use simply a Lambda layer. A MaskLambda does more than just the function you want. I also suggest you use the keras backend insted of tensorflow functions:
import keras.backend as K
#instead of the MaskLambda:
backward.add(Lambda(lambda x: K.reverse(x,axes=[1]), output_shape=(12,?))
Here, the ? is the amount of units your LSTM layers have. See PS at the end.
PS: I'm not sure output_dim is useful in the LSTM layer. It's necessary in Lambda layers, but I never use it anywhere else. Shapes are natural consequences of the amount of "units" you put in your layers. Strangely, you didn't specify the amount of units.
PS2: How exactly do you want to concatenate two sequences with different sizes?

As said in above answer, using a Functional API offers you much flexibility in case of multi input/output models. You can simply set the go_backwards argument as True to reverse the traversal of the input vector by the LSTM layer.
I have defined the smart_merge function below which merges the forward and backward LSTM layers together along with handling the single traversal case.
from keras.models import Model
from keras.layers import Input, merge
def smart_merge(vectors, **kwargs):
return vectors[0] if len(vectors)==1 else merge(vectors, **kwargs)
input1 = Input(shape=(10,4), dtype='int32')
input2 = Input(shape=(12,4), dtype='int32')
LtoR_LSTM = LSTM(56, return_sequences=False)
LtoR_LSTM_vector = LtoR_LSTM(input1)
RtoL_LSTM = LSTM(56, return_sequences=False, go_backwards=True)
RtoL_LSTM_vector = RtoL_LSTM(input2)
BidireLSTM_vector = [LtoR_LSTM_vector]
BidireLSTM_vector.append(RtoL_LSTM_vector)
BidireLSTM_vector= smart_merge(BidireLSTM_vector, mode='concat')

Related

Concatenate outputs of LSTM in Keras

I intend to feed all outputs of timesteps from a LSTM to a fully-connected layer. However, the following codes fail. How can I reduce 3D output of LSTM to 2D by concatenating each output of timestep?
X = LSTM(units=128,return_sequences=True)(input_sequence)
X = Dropout(rate=0.5)(X)
X = LSTM(units=128,return_sequences=True)(X)
X = Dropout(rate=0.5)(X)
X = Concatenate()(X)
X = Dense(n_class)(X)
X = Activation('softmax')(X)
You can use the Flatten layer to flatten the 3D output of LSTM layer to a 2D shape.
As a side note, it is better to use dropout and recurrent_dropout arguments of LSTM layer instead of using Dropout layer directly with recurrent layers.
Additional to #todays answer:
It seems like you want to use return_sequences just to concatenate it into a dense layer. If you did not already try it with return_sequeunces=False, I would recommend you to do to so. The main purpose of return_sequences is to stack LSTMS or to make seq2seq predictions. In your case it should be enough to just use the LSTM.

How to use the result of embedding with mask_zero=True in keras

In keras, I want to calculate the mean of nonzero embedding output.
I wonder what is the difference between mask_zero=True or False in Embedding Layer.
I tried the code below :
input_data = Input(shape=(5,), dtype='int32', name='input')
embedding_layer = Embedding(1000, 24, input_length=5,mask_zero=True,name='embedding')
out = word_embedding_layer(input_data)
def antirectifier(x):
x = K.mean(x, axis=1, keepdims=True)
return x
def antirectifier_output_shape(input_shape):
shape = list(input_shape)
return tuple(shape)
out = Lambda(antirectifier, output_shape=antirectifier_output_shape,name='lambda')(out)
But it seems that the result is the mean of all the elements, how can i just calculate the mean of all nonzero inputs?
From the function's doc :
If this is True then all subsequent layers in the model need to
support masking
Your lambda function doesn't support masking. For example Recurrent layers in Keras support masking. If you set mask_zero=True in your embeddings, then all the 0 indices that you feed to the embedding layer will be propagated as "masked" and the following layers that are able to understand the "masked" information will use them.
Basically, if you build a "mean" layer that grabs the mask and computes the average only for non-masked values, then you will get the desired results.
You can find here a way to build your lambda layers that support masking
I hope it helps.

Generative Adversarial Networks (GANs) in Keras - creating the combined model

I'm trying to create a pretty simple GANs model, and not sure how to combine the generator and the discriminator for training the generator
from keras import optimizers
from keras.layers import Input, Dense
from keras.models import Sequential, Model
import numpy as np
def build_generator(input_dim=10, output_dim=40, hidden_dim=28):
model = Sequential()
model.add(Dense(hidden_dim, input_dim=input_dim, activation='sigmoid', kernel_initializer="random_uniform"))
model.add(Dense(output_dim, activation='sigmoid', kernel_initializer="random_uniform"))
return model
def build_discriminator(input_dim=40, hidden_dim=28, output_dim=50):
input_d = Input(shape=(input_dim,))
encoded = Dense(hidden_dim, activation='sigmoid', kernel_initializer="random_uniform")(input_d)
decoded = Dense(output_dim, activation='sigmoid', kernel_initializer="random_uniform")(encoded)
x = Dense(1, activation='relu')(encoded)
y = Dense(1, activation='sigmoid')(encoded)
model = Model(inputs=input_d, outputs=[decoded, x, y])
return model
sgd = optimizers.SGD(lr=0.1)
generator = build_generator(10, 100, 70)
discriminator = build_discriminator(100, 60, 80)
generator.compile(loss='mean_squared_error', optimizer=sgd)
discriminator.trainable = True
discriminator.compile(loss='mean_squared_error', optimizer=sgd)
discriminator.trainable = False
Now I'm not sure how to combine them both, so the discriminator will receive the generator output and than will pass the generator back propagation data
For this, the best to do is to use the functional Model API. This is suited for more complex models, accepting branches, concatenations, etc.
(It's still possible, in this specific case to use the sequential models, but using the functional API always sounded better to me, for freedom and further experiments on the models)
So, you may preserve your two sequential models. All you have to do is to build a third model that contains these two.
generator = build_generator(....) #don't create a new generator, use the one you have.
discriminator = build_discriminator(....)
Now, a functional API model has its input shape defined like this:
inputTensor = Input(inputShape) #inputShape must be the same as in generator
And we work by passing inputs to layers and getting outputs:
#Getting the output of the generator given our input tensor:
genOut = generator(inputTensor) #you call a model just like you call a layer
#and we pass the generator's output to the discriminator, getting its output:
discOut = discriminator(genOut)
Finally, we create the actual model by defining its start and end points:
GAN = Model(inputTensor, discOut)
Use the model.layers[i].trainable parameter before compile to define which layers will be trainable or not in each of the models.
Combining the Generator & Discriminator models can, indeed, sometimes be quite confusing. I found this repository in the link below, which demonstrates quite well with a detailed code of how to construct multiple architectures of GANs in keras:
https://github.com/kochlisGit/Keras-GAN

Dimensions not matching in keras LSTM model

I want to use an LSTM neural Network with keras to forecast groups of time series and I am having troubles in making the model match what I want. The dimensions of my data are:
input tensor: (data length, number of series to train, time steps to look back)
output tensor: (data length, number of series to forecast, time steps to look ahead)
Note: I want to keep the dimensions exactly like that, no
transposition.
A dummy data code that reproduces the problem is:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed, LSTM
epoch_number = 100
batch_size = 20
input_dim = 4
output_dim = 3
look_back = 24
look_ahead = 24
n = 100
trainX = np.random.rand(n, input_dim, look_back)
trainY = np.random.rand(n, output_dim, look_ahead)
print('test X:', trainX.shape)
print('test Y:', trainY.shape)
model = Sequential()
# Add the first LSTM layer (The intermediate layers need to pass the sequences to the next layer)
model.add(LSTM(10, batch_input_shape=(None, input_dim, look_back), return_sequences=True))
# add the first LSTM layer (the dimensions are only needed in the first layer)
model.add(LSTM(10, return_sequences=True))
# the TimeDistributed object allows a 3D output
model.add(TimeDistributed(Dense(look_ahead)))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(trainX, trainY, nb_epoch=epoch_number, batch_size=batch_size, verbose=1)
This trows:
Exception: Error when checking model target: expected
timedistributed_1 to have shape (None, 4, 24) but got array with shape
(100, 3, 24)
The problem seems to be when defining the TimeDistributed layer.
How do I define the TimeDistributed layer so that it compiles and trains?
The error message is a bit misleading in your case. Your output node of the network is called timedistributed_1 because that's the last node in your sequential model. What the error message is trying to tell you is that the output of this node does not match the target your model is fitting to, i.e. your labels trainY.
Your trainY has a shape of (n, output_dim, look_ahead), so (100, 3, 24) but the network is producing an output shape of (batch_size, input_dim, look_ahead). The problem in this case is that output_dim != input_dim. If your time dimension changes you may need padding or a network node that removes said timestep.
I think the problem is that you expect output_dim (!= input_dim) at the output of TimeDistributed, while it's not possible. This dimension is what it considers as the time dimension: it is preserved.
The input should be at least 3D, and the dimension of index one will
be considered to be the temporal dimension.
The purpose of TimeDistributed is to apply the same layer to each time step. You can only end up with the same number of time steps as you started with.
If you really need to bring down this dimension from 4 to 3, I think you will need to either add another layer at the end, or use something different from TimeDistributed.
PS: one hint towards finding this issue was that output_dim is never used when creating the model, it only appears in the validation data. While it's only a code smell (there might not be anything wrong with this observation), it's something worth checking.

Strange behaviour sequence to sequence learning for variable length sequences

I am training a sequence to sequence model for variable length sequences with Keras, but I am running into some unexpected problems. It is unclear to me whether the behaviour I am observing is the desired behaviour of the library and why it would be.
Model Creation
I've made a recurrent model with an embeddings layer and a GRU recurrent layer that illustrates the problem. I used mask_zero=0.0 for the embeddings layer instead of a masking layer, but changing this doesn't seem to make a difference (nor does adding a masking layer before the output):
import numpy
from keras.layers import Embedding, GRU, TimeDistributed, Dense, Input
from keras.models import Model
import keras.preprocessing.sequence
numpy.random.seed(0)
input_layer = Input(shape=(3,), dtype='int32', name='input')
embeddings = Embedding(input_dim=20, output_dim=2, input_length=3, mask_zero=True, name='embeddings')(input_layer)
recurrent = GRU(5, return_sequences=True, name='GRU')(embeddings)
output_layer = TimeDistributed(Dense(1), name='output')(recurrent)
model = Model(input=input_layer, output=output_layer)
output_weights = model.layers[-1].get_weights()
output_weights[1] = numpy.array([0.2])
model.layers[-1].set_weights(output_weights)
model.compile(loss='mse', metrics=['mse'], optimizer='adam', sample_weight_mode='temporal')
I use masking and the sample_weight parameter to exclude the padding values from the training/evaluation. I will test this model on one input/output sequence which I pad using the Keras padding function:
X = [[1, 2]]
X_padded = keras.preprocessing.sequence.pad_sequences(X, dtype='float32', maxlen=3)
Y = [[[1], [2]]]
Y_padded = keras.preprocessing.sequence.pad_sequences(Y, maxlen=3, dtype='float32')
Output Shape
Why the output is expected to be formatted in this way. Why can I not use input/output sequences that have exactly the same dimensionality? model.evaluate(X_padded, Y_padded) gives me a dimensionality error.
Then, when I run model.predict(X_padded) I get the following output (with numpy.random.seed(0) before generating the model):
[[[ 0.2 ]
[ 0.19946882]
[ 0.19175649]]]
Why isn't the first input masked for the output layer? Is the output_value computed anyways (and equal to the bias, as the hidden layer values are 0? This does not seem desirable. Adding a Masking layer before the output layer does not solve this problem.
MSE calculation
Then, when I evaluate the model (model.evaluate(X_padded, Y_padded)), this returns the Mean Squared Error (MSE) of the entire sequence (1.3168) including this first value, which I suppose is to be expected when it isn't masked, but not what I would want.
From the Keras documentation I understand I should use the sample_weight parameter to solve this problem, which I tried:
sample_weight = numpy.array([[0, 1, 1]])
model_evaluation = model.evaluate(X_padded, Y_padded, sample_weight=sample_weight)
print model.metrics_names, model_evaluation
The output I get is
['loss', 'mean_squared_error'] [2.9329459667205811, 1.3168648481369019]
This leaves the metric (MSE) unaltered, it is still the MSE over all values, including the one that I wanted masked. Why? This is not what I want when I evaluate my model. It does cause a change in the loss value, which appears to be the MSE over the last two values normalised to not give more weight to longer sequences.
Am I doing something wrong with the sample weights? Also, I can really not figure out how this loss value came about. What should I do to exclude the padded values from both training and evaluation (I assume the sample_weight parameter works the same in the fit function).
It was indeed a bug in the library, in Keras 2 this issue is resolved.

Resources