Dimension error building LSTM with keras - keras

I am trying to builda LSTM network using Keras.
My time seriese example is of size 492. And I want to use the 3 previous examples to predict the next example. So, the input is converted to size (num_samples,3*492),and the output size is (num_samples,492).
According to this blog, I firstly convert my data size into of form (num_samples,timesteps,features)
#convert trainning data to 3D LSTM shape
train_origin_x = train_origin_x.reshape((train_origin_x.shape[0],3,492))
test_origin_x = test_origin_x.reshape((test_origin_x.shape[0],3,492))
print(train_origin_x.shape,test_origin_x.shape)
(216, 3, 492) (93, 3, 492)
print(train_origin_y,test_origin_y)
(216, 492) (93, 492)
And below is the my code to build the LSTM network
#building network
model = Sequential()
model.add(LSTM(hidden_units,return_sequences=True,input_shape=(train_origin_x.shape[1],train_origin_x.shape[2])))
model.add(Dense(492))
model.compile(loss='mse',optimizer='adam')
print('model trainning begins...')
history = model.fit(train_origin_x,train_origin_y,epochs = num_epochs,batch_size = num_batchs,
validation_data=(test_origin_x,test_origin_y))
However I got an error in the process, saying
ValueError: Error when checking target: expected dense_1 to have 3 dimensions, but got array with shape (216, 492)
Anyone knows the what's the problem?
Any comments or suggestions is welcome and appreciated!!
Below is the result of model.summary()
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 3, 50) 108600
_________________________________________________________________
dense_1 (Dense) (None, 3, 492) 25092
=================================================================

add return_sequences to your LSTM code:
model.add(LSTM(hidden_units, return_sequences = False,input_shape=(train_origin_x.shape[1],train_origin_x.shape[2])))

Related

How to access intermediate layers in a pretrained model after retraining it on a new image dataset

I trained a VGG16 model on a labeled image dataset using the categorical crossentropy. I removed the fully connected layer and replaced it with new layers, as follows:
vgg16_model = VGG16(weights="imagenet", include_top=False, input_shape=(224, 224, 3))
model = Sequential()
model.add(vgg16_model)
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(70, activation='softmax'))
Then I trained the full model on my dataset. I want to use it now as a feature extraction model by extracting image features using any of the intermediate layers that belong to vgg16_model.
After training and saving the model, I can only access the layers that I added Dense and Dropout using pop() function to remove them and only keep the trained feature extractor model (vgg16).
i = 0
while i < 4:
model.pop()
This keeps only VGG16 model. However, the layers inside are not accessible, I tried:
new_model = Model(model.input,model.layers[-1].output)
But I get this error:
ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name='input_9'), name='input_9', description="created by layer 'input_9'") at layer "block1_conv1". The following previous layers were accessed without issue: []
How can I modify my model to consider early k layers at a given time, then use the model for prediction?
Defining the keras model in the way you did, unfortunately has many complications when we tried to features from intermediate layer. I've raised a ticket, see here.
As you want to extract image features using any of the intermediate layers that belong to vgg16_model, you can try the following approach.
# [good practice]: check first to know name and shape
# for layer in model.layers:
# print(layer.name, layer.output_shape)
# vgg16 (None, 7, 7, 512)
# flatten (None, 25088)
# dense (None, 256)
# dropout (None, 256)
# dense_1 (None, 70)
# get the trained model first
trained_vgg16 = keras.Model(
inputs=model.get_layer(name="vgg16").inputs,
outputs=model.get_layer(name="vgg16").outputs,
)
x = tf.ones((1, 224, 224, 3))
y = trained_vgg16(x)
y.shape
TensorShape([1, 7, 7, 512])
Next, use this trained_vgg16 model to build the target model. For example,
# extract only 1 intermediate layer
feature_extractor_block3_pool = keras.Model(
inputs=trained_vgg16.inputs,
outputs=trained_vgg16.get_layer(name="block3_pool").output,
)
# or, 2 based on purpose.
feature_extractor_block3_pool_block4_conv3 = keras.Model(
inputs=trained_vgg16.inputs,
outputs=[
trained_vgg16.get_layer(name="block3_pool").output,
trained_vgg16.get_layer(name="block4_conv3").output,
],
)
# or, all
feature_extractor = keras.Model(
inputs=trained_vgg16.inputs,
outputs=[layer.output for layer in trained_vgg16.layers],
)

Enquiry about the input & output shape of LSTMs in Keras

I am trying to predict a time series with LSTM and am writing my code in Python by using Keras.
I have 30 features as input (continuous value) and a binary output.
I would like to use the 20 previous timesteps (t-20, t-19, .. , t-1) of each input feature in order to predict the output of next timestep (t+1).
My batch size is fixed at 52. What does this exactly mean?
I don't understand how to define the shape of the input layer.
The stacked LSTM example in the Keras documentation says that the last dimension of the 3D tensor will be 'data_dim'.
Is it input dimension or output dimension?
If this is output dimension, then I can't use more than one input feature as in my case the input_shape will be (batch_size=52,time_step=20,data_dim=1).
Also, in case data_dim is input shape, then I have tried to define a four layers-LSTM and the model shape results to be like this.
Layer (type) Output Shape Param #
================================================================= input_2 (InputLayer) (52, 20, 30) 0
_________________________________________________________________ lstm_3 (LSTM) (52, 20, 128) 81408
_________________________________________________________________ lstm_4 (LSTM) (52, 128) 131584
_________________________________________________________________ dense_2 (Dense) (52, 1) 129
================================================================= Total params: 213,121 Trainable params: 213,121 Non-trainable params: 0
Does this architecture make sense? Am I making some obvious mistakes?
My snippet of code is the one below:
input_layer=Input(batch_shape=(batch_size,input_timesteps,input_dims))
lstm1=LSTM(num_neurons,activation = 'relu',dropout=0.0,stateful=False,return_sequences=True)(input_layer)
lstm2=LSTM(num_neurons,activation = 'relu',dropout=0.0,stateful=False,return_sequences=False)(lstm1)
output_layer=Dense(1, activation='sigmoid')(lstm2)
model=Model(inputs=input_layer,outputs=output_layer)
I am getting very poor results and thus trying to debug each step.
If you want to use deep learning techniques you should try to overfit first and then reduce the complexity till you reach a break even point in terms of both neural complexity, training error and test error.
You are actually using a larger feature space in the hidden layer, are you sure your data are able to fit this?
Do you have enough rows to let the model learn this complex representation?
Otherwise I would suggest you something like this, in order to extrapolate the most important dimensions:
num_neurons1 = int(input_dims/2)
num_neurons2 = int(input_dims/4)
input_layer=Input(batch_shape=(batch_size, input_timesteps, input_dims))
lstm1=LSTM(num_neurons, activation = 'relu', dropout=0.0, stateful=False, return_sequences=True, kernel_initializer="he_normal")(input_layer)
lstm2=LSTM(num_neurons2, activation = 'relu', dropout=0.0, stateful=False, return_sequences=False, kernel_initializer="he_normal")(lstm1)
output_layer=Dense(1, activation='sigmoid')(lstm2)
model=Model(inputs=input_layer,outputs=output_layer)
Also, you are using relu as activation function.
Does it fit your data? Would be better you have only positive data after rescaling & normalization.
In case it does fit, you can also use a proper kernel initialization.
To better understand the problem, you could also post the optimizer parameters and the behaviour while training during epochs.

In Keras, how to get the layer name associated with a "Model" object contained in my model?

I built a Sequential model with the VGG16 network at the initial base, for example:
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
# do not include the top, fully-connected Dense layers
include_top=False,
input_shape=(150, 150, 3))
from keras import models
from keras import layers
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
# the 3 corresponds to the three output classes
model.add(layers.Dense(3, activation='sigmoid'))
My model looks like this:
model.summary()
Layer (type) Output Shape Param #
=================================================================
vgg16 (Model) (None, 4, 4, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 8192) 0
_________________________________________________________________
dense_7 (Dense) (None, 256) 2097408
_________________________________________________________________
dense_8 (Dense) (None, 3) 771
=================================================================
Total params: 16,812,867
Trainable params: 16,812,867
Non-trainable params: 0
_________________________________________________________________
Now, I want to get the layer names associated with the vgg16 Model portion of my network. I.e. something like:
layer_name = 'block3_conv1'
filter_index = 0
layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:, :, :, filter_index])
However, since the vgg16 convolutional is shown as a Model and it's layers are not being exposed, I get the error:
ValueError: No such layer: block3_conv1
How do I do this?
The key is to first do .get_layer on the Model object, then do another .get_layer on that specifying the specific vgg16 layer, THEN do .output:
layer_output = model.get_layer('vgg16').get_layer('block3_conv1').output
To get the name of the layer from the VGG16 instance use the following code.
for layer in conv_base.layers:
print(layer.name)
the name should be the same inside your model. to show this you could do the following.
print([layer.name for layer in model.get_layer('vgg16').layers])
like Ryan showed us. to call the vgg16 layer you must call it from the model first using the get_layer method.
One can simply store the name of layers in the list for further usage
layer_names=[layer.name for layer in base_model.layers]
This worked for me :)
for idx in range(len(model.layers)):
print(model.get_layer(index = idx).name)
Use the layer's summary:
model.get_layer('vgg16').summary()

Passing initial_state to Bidirectional RNN layer in Keras

I'm trying to implement encoder-decoder type network in Keras, with Bidirectional GRUs.
The following code seems to be working
src_input = Input(shape=(5,))
ref_input = Input(shape=(5,))
src_embedding = Embedding(output_dim=300, input_dim=vocab_size)(src_input)
ref_embedding = Embedding(output_dim=300, input_dim=vocab_size)(ref_input)
encoder = Bidirectional(
GRU(2, return_sequences=True, return_state=True)
)(src_embedding)
decoder = GRU(2, return_sequences=True)(ref_embedding, initial_state=encoder[1])
But when I change the decode to use Bidirectional wrapper, it stops showing encoder and src_input layers in the model.summary(). The new decoder looks like:
decoder = Bidirectional(
GRU(2, return_sequences=True)
)(ref_embedding, initial_state=encoder[1:])
The output of model.summary() with the Bidirectional decoder.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 5) 0
_________________________________________________________________
embedding_2 (Embedding) (None, 5, 300) 6610500
_________________________________________________________________
bidirectional_2 (Bidirection (None, 5, 4) 3636
=================================================================
Total params: 6,614,136
Trainable params: 6,614,136
Non-trainable params: 0
_________________________________________________________________
Question: Am I missing something when I pass initial_state in Bidirectional decoder? How can I fix this? Is there any other way to make this work?
It's a bug. The RNN layer implements __call__ so that tensors in initial_state can be collected into a model instance. However, the Bidirectional wrapper did not implement it. So topological information about the initial_state tensors is missing and some strange bugs happen.
I wasn't aware of it when I was implementing initial_state for Bidirectional. It should be fixed now, after this PR. You can install the latest master branch on GitHub to fix it.

Understanding SimpleRNN process

I have created the following SimpleRNN using Keras:
X = X.reshape((X.shape[0], X.shape[1], 1))
tr_X, ts_X, tr_y, ts_y = train_test_split(X, y, train_size=.8)
batch_size = 1000
print('RNN model...')
model = Sequential()
model.add(SimpleRNN(64, activation='relu', batch_input_shape=(batch_size, X.shape[1], 1)))
model.add(Dense(1, activation='relu'))
print('Training...')
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print (model.summary())
print ('\n')
model.fit(tr_X, tr_y,
batch_size=batch_size, epochs=1,
shuffle=True, validation_data=(ts_X, ts_y))
For the model summary, I get the following:
Layer (type) Output Shape Param #
=================================================================
simple_rnn_1 (SimpleRNN) (1000, 64) 4224
_________________________________________________________________
dense_1 (Dense) (1000, 1) 65
=================================================================
Total params: 4,289
Trainable params: 4,289
Non-trainable params: 0
_________________________________________________________________
Given that I have a dataset of 10,000 samples and 64 features. My goal is to generate a classification model by training it using this dataset (class labels are binary 0 and 1). Now, I am trying to understand what is going on here. As seen in 'Output Shape' column, the simple_rnn_1 has (1000, 64). I interpret it as 1000 rows (which is the batch) and 64 features. Assuming the code above is logically correct, my questions is:
How does RNN handle this matrix (i.e., (1000,64))? Does it input
each column something like this figure?
Should SimpleRNN() units always be equal to the number of features?
Thank you
In the code, you defined batch_input_shape to be with shape: (batch_size, X.shape[1], 1)
which means that you will insert to the RNN, batch_size examples, each example contains X.shape[1] time-stamps (number of pink boxes in your image) and each time-stamp is shape 1 (scalar).
So yes, input shape of (1000,64,1) will be exactly like you said - each column will be input to the RNN.
No! units will be your output dim. Usually more units means more complex network (just like in regular neural network) -> more parameters to learn.
Units will be the shape of the RNN's internal state.
(So, in your example, if you declare units=2000 your output will be (1000,2000).)

Resources