I intend to feed all outputs of timesteps from a LSTM to a fully-connected layer. However, the following codes fail. How can I reduce 3D output of LSTM to 2D by concatenating each output of timestep?
X = LSTM(units=128,return_sequences=True)(input_sequence)
X = Dropout(rate=0.5)(X)
X = LSTM(units=128,return_sequences=True)(X)
X = Dropout(rate=0.5)(X)
X = Concatenate()(X)
X = Dense(n_class)(X)
X = Activation('softmax')(X)
You can use the Flatten layer to flatten the 3D output of LSTM layer to a 2D shape.
As a side note, it is better to use dropout and recurrent_dropout arguments of LSTM layer instead of using Dropout layer directly with recurrent layers.
Additional to #todays answer:
It seems like you want to use return_sequences just to concatenate it into a dense layer. If you did not already try it with return_sequeunces=False, I would recommend you to do to so. The main purpose of return_sequences is to stack LSTMS or to make seq2seq predictions. In your case it should be enough to just use the LSTM.
Related
What would be the most straightforward way to implement a hypernetwork in Keras? That is, where one leg of the network creates the weights for another? In particular, I would like to do template matching where I feed the template in to a CNN leg that generates a convolutional kernel for a leg that operates on the main image. The part I'm unsure of is where I have a CNN layer that is fed weights externally, yet the gradients still flow through properly for training.
The weights leg:
For the weights leg, just create a regiular network as you would with Keras.
Be sure that its output(s) have shape like (spatial_kernel_size1, spatial_kernel_size2, input_channels, output_channels)
Usint the functional API you can create a few weights, for instance:
inputs = Input((imgSize1, imgSize2, imgChannels))
w1 = Conv2D(desired_channels, ....)(inputs)
w2 = Conv2D(desired_channels2, ....)(inputs or w1)
....
You should apply some kind of pooling here, since your outputs will have a huge size and you probably want filters with small sizes such as 3, 5, etc.
w1 = GlobalAveragePooling2D()(w1) #maybe GlobalMaxPooling2D
w2 = GlobalAveragePooling2D()(w2)
If you're using fixed image sizes, you could also use other kinds of pooling or flatten and dense, etc.
Make sure you reshape the weights for the correct shape.
w1 = Reshape((size1,size2,input_channels, output_channels))(w1)
w2 = Reshape((sizeA, sizeB, input_channels2, output_channels2))(w2)
....
The choice of the number of channels is up to you to optimize
The convolutional leg:
Now, this leg will only use "non trainable" convolutions, they can be found directly in the backend and be used in Lambda layers:
out1 = Lambda(lambda x: K.conv2d(x[0], x[1]))([inputs,w1])
out2 = Lambda(lambda x: K.conv2d(x[0], x[1]))([out1,w2])
Now, how you're going to interleave the layers, how many weights, etc., is also something you should optimize for yourself.
Create the model:
model = Model(inputs, out2)
Interleaving
You may take an output from this leg as input for the weight generator leg too:
w3 = Conv2D(filters, ...)(out2)
w3 = GlobalAveragePooling2D()(w3)
w3 = Reshape((sizeI, sizeII, inputC, outputC))(w3)
out3 = Lambda(lambda x: K.conv2d(x[0], x[1]))([out2,w3])
I am trying to understand the following snippets from Keras documentation. What is the logic of specifying y as Dense statement followed by '(x)'? Not sure about the purpose of this statement.
# this is a logistic regression in Keras
x = Input(shape=(32,))
y = Dense(16, activation='softmax')(x)
model = Model(x, y)
It is quite simple, in this case y is the output of a Dense layer with 16 units and a softmax activation, given the input x.
This structure is meant to be functional-like, so you can specify models with multiple inputs and outputs easily in code.
I would like to merge a forward LSTM and a backward LSTM in Keras. The input array of the backward LSTM is different from that of a forward LSTM. Thus, I cannot use keras.layers.Bidirectional.
The forward input is (10, 4).
The backward input is (12, 4) and it is reversed before put into the model. I would like to reverse it again after LSTM and merge it with the forward.
The simplified model is as follows.
from lambdawithmask import Lambda as MaskLambda
def reverse_func(x, mask=None):
return tf.reverse(x, [False, True, False])
forward = Sequential()
backward = Sequential()
model = Sequential()
forward.add(LSTM(input_shape = (10, 4), output_dim = 4, return_sequences = True))
backward.add(LSTM(input_shape = (12, 4), output_dim = 4, return_sequences = True))
backward.add(MaskLambda(function=reverse_func, mask_function=reverse_func))
model.add(Merge([forward, backward], mode = "concat", concat_axis = 1))
When I run this, the error message is:
Tensors in list passed to 'values' of 'ConcatV2' Op have types [bool, float32] that don't all match.
Could anyone help me? I coded in Python 3.5.2 with Keras (2.0.5) and the backend is tensorflow (1.2.1).
First of all, if you have two different inputs, you cannot use a Sequential model. You must use the functional API Model:
from keras.models import Model
The two first models can be sequential, no problem, but the junction must be a regular model. When it's about concatenating, I also use the functional approach (create the layer, then pass the input):
junction = Concatenate(axis=1)([forward.output,backward.output])
Why axis=1? You can only concatenate things with the same shape. Since you have 10 and 12, they're not compatible unless you use this exact axis for the merge, which is the second axis, considering you have (BatchSize, TimeSteps, Units)
For creating the final model, use the Model, specify the inputs and outputs:
model = Model([forward.input,backward.input], junction)
In the model to be reversed, use simply a Lambda layer. A MaskLambda does more than just the function you want. I also suggest you use the keras backend insted of tensorflow functions:
import keras.backend as K
#instead of the MaskLambda:
backward.add(Lambda(lambda x: K.reverse(x,axes=[1]), output_shape=(12,?))
Here, the ? is the amount of units your LSTM layers have. See PS at the end.
PS: I'm not sure output_dim is useful in the LSTM layer. It's necessary in Lambda layers, but I never use it anywhere else. Shapes are natural consequences of the amount of "units" you put in your layers. Strangely, you didn't specify the amount of units.
PS2: How exactly do you want to concatenate two sequences with different sizes?
As said in above answer, using a Functional API offers you much flexibility in case of multi input/output models. You can simply set the go_backwards argument as True to reverse the traversal of the input vector by the LSTM layer.
I have defined the smart_merge function below which merges the forward and backward LSTM layers together along with handling the single traversal case.
from keras.models import Model
from keras.layers import Input, merge
def smart_merge(vectors, **kwargs):
return vectors[0] if len(vectors)==1 else merge(vectors, **kwargs)
input1 = Input(shape=(10,4), dtype='int32')
input2 = Input(shape=(12,4), dtype='int32')
LtoR_LSTM = LSTM(56, return_sequences=False)
LtoR_LSTM_vector = LtoR_LSTM(input1)
RtoL_LSTM = LSTM(56, return_sequences=False, go_backwards=True)
RtoL_LSTM_vector = RtoL_LSTM(input2)
BidireLSTM_vector = [LtoR_LSTM_vector]
BidireLSTM_vector.append(RtoL_LSTM_vector)
BidireLSTM_vector= smart_merge(BidireLSTM_vector, mode='concat')
I need to feed variable length sequences into my model.
My model is Embedding + LSTM + Conv1d + Maxpooling + softmax.
When I set mask_zero = True in Embedding, I fail to compile at Conv1d.
How can I input mask value in Conv1d or is there another solution?
The Masking layer expects every downstream layer to support masking, which is not the case of the Conv1D layer. Fortunately, there is another way to apply masking, using the Functional API:
inputs = Input(...)
mask = Masking().compute_mask(inputs) # <= Compute the mask
embed = Embedding(...)(inputs)
lstm = LSTM(...)(embed, mask=mask) # <= Apply the mask
conv = Conv1D(...)(lstm)
...
model = Model(inputs=[inputs], outputs=[...])
Conv1D layer does not support masking at this time. Here is an open issue on the keras repo.
Depending on the task you might be able to get away with embedding the mask_value just like the other values in the sequence and apply global pooling (as you're doing now).
my question is quite closely related to this question but also goes beyond it.
I am trying to implement the following LSTM in Keras where
the number of timesteps be nb_tsteps=10
the number of input features is nb_feat=40
the number of LSTM cells at each time step is 120
the LSTM layer is followed by TimeDistributedDense layers
From the question referenced above I understand that I have to present the input data as
nb_samples, 10, 40
where I get nb_samples by rolling a window of length nb_tsteps=10 across the original timeseries of shape (5932720, 40). The code is hence
model = Sequential()
model.add(LSTM(120, input_shape=(X_train.shape[1], X_train.shape[2]),
return_sequences=True, consume_less='gpu'))
model.add(TimeDistributed(Dense(50, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(20, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(10, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(3, activation='relu')))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
Now to my question (assuming the above is correct so far):
The binary responses (0/1) are heavily imbalanced and I need to pass a class_weight dictionary like cw = {0: 1, 1: 25} to model.fit(). However I get an exception class_weight not supported for 3+ dimensional targets. This is because I present the response data as (nb_samples, 1, 1). If I reshape it into a 2D array (nb_samples, 1) I get the exception Error when checking model target: expected timedistributed_5 to have 3 dimensions, but got array with shape (5932720, 1).
Thanks a lot for any help!
I think you should use sample_weight with sample_weight_mode='temporal'.
From the Keras docs:
sample_weight: Numpy array of weights for the training samples, used
for scaling the loss function (during training only). You can either
pass a flat (1D) Numpy array with the same length as the input samples
(1:1 mapping between weights and samples), or in the case of temporal
data, you can pass a 2D array with shape (samples, sequence_length),
to apply a different weight to every timestep of every sample. In this
case you should make sure to specify sample_weight_mode="temporal" in
compile().
In your case you would need to supply a 2D array with the same shape as your labels.
If this is still an issue.. I think the TimeDistributed Layer expects and returns a 3D array (kind of similar to if you have return_sequences=True in the regular LSTM layer). Try adding a Flatten() layer or another LSTM layer at the end before the prediction layer.
d = TimeDistributed(Dense(10))(input_from_previous_layer)
lstm_out = Bidirectional(LSTM(10))(d)
output = Dense(1, activation='sigmoid')(lstm_out)
Using temporal is a workaround. Check out this stack. The issue is also documented on github.