SOLVED!(Had to set trainable=true in the sequential model)
I am currently changing my Keras model from Sequential to the functional API. While the Sequential model does improve to an accuracy of 1 after like 10 epochs, the functional API model does not even reach 0.7 and does not further improve. Apart from the Input layer, both nets should be the same.
model = Sequential()
model.add(Embedding(20000, 256,input_length = 30))
model.add(LSTM(256, dropout=0.3, recurrent_dropout=0.3))
model.compile(loss = 'binary_crossentropy', optimizer=Adam(lr=0.0001),metrics = ['accuracy'])
Output is:
Layer (type) Output Shape Param #
embedding_6 (Embedding) (None, 30, 256) 5120000
spatial_dropout1d_5 (Spatial (None, 30, 256) 0
lstm_5 (LSTM) (None, 256) 525312
dense_6 (Dense) (None, 1) 257
Total params: 5,645,569
Trainable params: 5,645,569
Non-trainable params: 0
For the functional API:
inputs = Input(shape=(31,))
embed = Embedding(20000, 256, trainable=False)(inputs)
drop = (SpatialDropout1D(0.4))(embed)
lstm = LSTM(256, dropout=0.3, recurrent_dropout=0.3)(drop)
acti = Dense(1,activation='sigmoid')(lstm)
model = Model(inputs=inputs, outputs=acti)
model.compile(loss = 'binary_crossentropy', optimizer=Adam(lr=0.0001),metrics = ['accuracy'])
Model: "model_5"
Layer (type) Output Shape Param #
input_8 (InputLayer) (None, 31) 0
embedding_7 (Embedding) (None, 31, 256) 5120000
spatial_dropout1d_6 (Spatial (None, 31, 256) 0
lstm_6 (LSTM) (None, 256) 525312
dense_7 (Dense) (None, 1) 257
Total params: 5,645,569
Trainable params: 525,569
Non-trainable params: 5,120,000
Have I overseen something or can someone explain my results?
I am experimenting/fiddling/learning with some small ML problems.
I have a loaded model based on a pre-trained convolution base with some self-trained dense layers (for model details see below).
I wanted to try to apply some visualizations like activations and the Grad CAM Visualization ( on the model. But I was not able to do so.
I tried to create a new model based on mine (like in the article) with
grad_model = tf.keras.models.Model(model.inputs,
but this already fails with the error:
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_5_12:0", shape=(None, None, None, 3), dtype=float32) at layer "block1_conv1". The following previous layers were accessed without issue: []
I do not understand what this means. the model surely works (i can evaluate it and make predictions with it).
The call does not fail if I omit the model.get_layer('vgg16').output from the outputs list but of course, this is required for the visualization.
What I am doing wrong?
In a model that I constructed and trained from scratch, I was able to create a similar model with the activations as outputs but here i get these errors.
My model's details
The model was created with the following code and then trained and saved.
from tensorflow import keras
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras import optimizers
conv_base = keras.applications.vgg16.VGG16(
conv_base.trainable = False
data_augmentation = keras.Sequential(
inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
later it was loaded:
model = keras.models.load_model("myModel.keras")
Model: "functional_3"
Layer (type) Output Shape Param #
input_6 (InputLayer) [(None, 180, 180, 3)] 0
sequential (Sequential) (None, 180, 180, 3) 0
vgg16 (Functional) (None, None, None, 512) 14714688
flatten_1 (Flatten) (None, 12800) 0
dense_2 (Dense) (None, 256) 3277056
dropout_1 (Dropout) (None, 256) 0
dense_3 (Dense) (None, 1) 257
Total params: 17,992,001
Trainable params: 10,356,737
Non-trainable params: 7,635,264
Model: "sequential"
Layer (type) Output Shape Param #
random_flip (RandomFlip) (None, 180, 180, 3) 0
random_rotation (RandomRotat (None, 180, 180, 3) 0
random_zoom (RandomZoom) (None, 180, 180, 3) 0
Total params: 0
Trainable params: 0
Non-trainable params: 0
Model: "vgg16"
Layer (type) Output Shape Param #
input_5 (InputLayer) [(None, None, None, 3)] 0
block1_conv1 (Conv2D) multiple 1792
block1_conv2 (Conv2D) multiple 36928
block1_pool (MaxPooling2D) multiple 0
block2_conv1 (Conv2D) multiple 73856
block2_conv2 (Conv2D) multiple 147584
block2_pool (MaxPooling2D) multiple 0
block3_conv1 (Conv2D) multiple 295168
block3_conv2 (Conv2D) multiple 590080
block3_conv3 (Conv2D) multiple 590080
block3_pool (MaxPooling2D) multiple 0
block4_conv1 (Conv2D) multiple 1180160
block4_conv2 (Conv2D) multiple 2359808
block4_conv3 (Conv2D) multiple 2359808
block4_pool (MaxPooling2D) multiple 0
block5_conv1 (Conv2D) multiple 2359808
block5_conv2 (Conv2D) multiple 2359808
block5_conv3 (Conv2D) multiple 2359808
block5_pool (MaxPooling2D) multiple 0
Total params: 14,714,688
Trainable params: 7,079,424
Non-trainable params: 7,635,264
You can achieve what you want in the following way. First, define your model as follows:
inputs = tf.keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs, training=True)
x = keras.applications.VGG16(input_tensor=x,
x.trainable = False
x = layers.Flatten()(x.output)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs, x)
for i, layer in enumerate(model.layers):
print(i,, layer.output_shape, layer.trainable)
17 block5_conv2 (None, 11, 11, 512) False
18 block5_conv3 (None, 11, 11, 512) False
19 block5_pool (None, 5, 5, 512) False
20 flatten_2 (None, 12800) True
21 dense_4 (None, 256) True
22 dropout_2 (None, 256) True
23 dense_5 (None, 1) True
Now, build the grad-cam model with desired output layer as follows:
grad_model = keras.models.Model(
image = np.random.rand(1, 180, 180, 3).astype(np.float32)
with tf.GradientTape() as tape:
convOutputs, predictions = grad_model(tf.cast(image, tf.float32))
loss = predictions[:, tf.argmax(predictions[0])]
grads = tape.gradient(loss, convOutputs)
[[[[ 9.8454033e-04 3.6991197e-03 ... -1.2012678e-02
-1.7934230e-03 2.2925171e-03]
[ 1.6165405e-03 -1.9513096e-03 ... -2.5789393e-03
1.2443252e-03 -1.3931725e-03]
[-2.0554627e-04 1.2232144e-03 ... 5.2324748e-03
3.1955825e-04 3.4566019e-03]
[ 2.3650150e-03 -2.5699558e-03 ... -2.4103196e-03
5.8940407e-03 5.3285398e-03]
I am training a keras model for a sentence classification task. The problem is although it is giving an accuracy of 94%, it is not learning anything. When I give a new sentence (not present in the dataset), it gives the same probability for it (in the model.prediction step). I can't figure out why is this happening.
Here is my model
model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(Dense(2, activation='sigmoid'))
Here max_words = 2000 and max_len=300
Here is the model summary
Model: "sequential_3"
Layer (type) Output Shape Param #
embedding_3 (Embedding) (None, 300, 30) 60000
batch_normalization_5 (Batch (None, 300, 30) 120
activation_5 (Activation) (None, 300, 30) 0
dropout_3 (Dropout) (None, 300, 30) 0
bidirectional_3 (Bidirection (None, 64) 16128
batch_normalization_6 (Batch (None, 64) 256
activation_6 (Activation) (None, 64) 0
dropout_4 (Dropout) (None, 64) 0
dense_3 (Dense) (None, 2) 130
Total params: 76,634
Trainable params: 76,446
Non-trainable params: 188
And here is the code, the size of my dataset is 20k, with 10% in testing.
model.compile(loss='sparse_categorical_crossentropy', metrics=['accuracy'], optimizer = 'adam')
history =, Y_train, batch_size=256, epochs=50, validation_split=0.1)
Try changing activation function of the last layer from sigmoid to softmax. It doesn't quite match the loss you are using (categorical cross-entropy). If you use sigmoid, then you only need one unit and should use binary cross-entropy loss.
I had a problem about hierarchical lstm in keras. It works well when the data is 2 dimensions. When I changed it to three dimensions, it does not work. My data is (25,10,2)
I want to build a hierarchical lstm, the first layer lstm will convert each data with shape (10,2) into a vector, there are 25 vectors feed into the second layer lstm. The input data in the first layer lstm is (10,2). I used two embeddings and multiply them. I appreciate if anyone can help.
def H_LSTM():
single_input = Input(shape=(10,2),dtype='int32')
in_sentence = Lambda(lambda x: single_input[:,:, 0:1], output_shape=(maxlen,))(single_input)
in_sentence = Reshape((maxlen,), input_shape = (maxlen,1))(in_sentence)
in_drug = Lambda(lambda x: single_input[:, :, 1:1], output_shape=(maxlen,))(single_input)
in_drug = Reshape((maxlen,), input_shape = (maxlen,1))(in_drug)
embedded_sentence = Embedding(len(word_index) + 1, embedding_dim, weights=[embedding_matrix],
input_length=maxlen, trainable=True, mask_zero=False)(in_sentence)
embedded_drug = Embedding(len(word_index) + 1, embedding_dim, weights=[embedding_matrix],
input_length=maxlen, trainable=True, mask_zero=False)(in_drug)
embedded_sequences = Multiply()([embedded_sentence, embedded_drug])
lstm_sentence = LSTM(100)(embedded_sequences)
encoded_model = Model(inputs = single_input, outputs = lstm_sentence)
sequence_input = Input(shape=(25,10,2),dtype='int32')
seq_encoded = TimeDistributed(encoded_model)(sequence_input)
seq_encoded = Dropout(0.2)(seq_encoded)
# Encode entire sentence
seq_encoded = LSTM(100)(seq_encoded)
# Prediction
prediction = Dense(2, activation='softmax')(seq_encoded)
model = Model(inputs = sequence_input, outputs = prediction)
return model
Model Summary:
Layer (type) Output Shape Param # Connected to
input_3 (InputLayer) (None, 10, 2) 0
lambda_3 (Lambda) (None, 10) 0 input_3[0][0]
lambda_4 (Lambda) (None, 10) 0 input_3[0][0]
reshape_3 (Reshape) (None, 10) 0 lambda_3[0][0]
reshape_4 (Reshape) (None, 10) 0 lambda_4[0][0]
embedding_3 (Embedding) (None, 10, 128) 4895744 reshape_3[0][0]
embedding_4 (Embedding) (None, 10, 128) 4895744 reshape_4[0][0]
multiply_2 (Multiply) (None, 10, 128) 0 embedding_3[0][0]
lstm_3 (LSTM) (None, 100) 91600 multiply_2[0][0]
Total params: 9,883,088
Trainable params: 9,883,088
Non-trainable params: 0
Model: "model_4"
Layer (type) Output Shape Param #
input_4 (InputLayer) (None, 25, 10, 2) 0
time_distributed_2 (TimeDist (None, 25, 100) 9883088
dropout_2 (Dropout) (None, 25, 100) 0
lstm_4 (LSTM) (None, 100) 80400
dense_2 (Dense) (None, 2) 202
Total params: 9,963,690
Trainable params: 9,963,690
Non-trainable params: 0
Error Message:
InvalidArgumentError: You must feed a value for placeholder tensor 'input_3' with dtype int32 and shape [?,10,2]
[[node input_3 (defined at D:\Users\Jinhe.Shi\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\backend\ ]] [Op:__inference_keras_scratch_graph_6214]
Function call stack:
Update: the framework is shown in the following, the difference is no attention layer and I added two embeddings in the lower layer lstm.
enter image description here
Model fit:
The error happens during the model fitting.
model2 = H_LSTM();
print("model fitting - Hierachical network"), Y_train, nb_epoch=3, batch_size=100, validation_data=(X_test, Y_test))
The input data likes:
enter image description here
I trained and load a cnn+dense model:
# load model
cnn_model = load_model('my_cnn_model.h5')
The output is this (I have images dimension 2 X 3600):
Layer (type) Output Shape Param #
conv2d_1 (Conv2D) (None, 2, 3600, 32) 128
conv2d_2 (Conv2D) (None, 2, 1800, 32) 3104
max_pooling2d_1 (MaxPooling2 (None, 2, 600, 32) 0
conv2d_3 (Conv2D) (None, 2, 600, 64) 6208
conv2d_4 (Conv2D) (None, 2, 300, 64) 12352
max_pooling2d_2 (MaxPooling2 (None, 2, 100, 64) 0
conv2d_5 (Conv2D) (None, 2, 100, 128) 24704
conv2d_6 (Conv2D) (None, 2, 50, 128) 49280
max_pooling2d_3 (MaxPooling2 (None, 2, 16, 128) 0
flatten_1 (Flatten) (None, 4096) 0
dense_1 (Dense) (None, 1024) 4195328
dense_2 (Dense) (None, 1024) 1049600
dense_3 (Dense) (None, 3) 3075
Total params: 5,343,779
Trainable params: 5,343,779
Non-trainable params: 0
Now, what I want is to leave weights up to flatten and replace dense layers with LSTM to train the added LSTM part.
I just wrote:
# freeze model
base_model = cnn_model(input_shape=(2, 3600, 1))
#base_model = cnn_model
base_model.trainable = False
# Adding the first lstm layer
x = LSTM(1024,activation='relu',return_sequences='True')(base_model.output)
# Adding the second lstm layer
x = LSTM(1024, activation='relu',return_sequences='False')(x)
# Adding the output
output = Dense(3,activation='linear')(x)
# Final model creation
model = Model(inputs=[base_model.input], outputs=[output])
But I obtained:
base_model = cnn_model(input_shape=(2, 3600, 1))
TypeError: __call__() missing 1 required positional argument: 'inputs'
I know I have to add TimeDistributed ideally in the Flatten layer, but I do not know how to do.
Moreover I'm not sure about base_model.trainable = False if it do exactly what I want.
Can you please help me to do the job?
Thank you very much!
You can't directly take the output from Flatten(), LSTM needs 2-d features (time, filters). You have to reshape your tensors.
You can take the output from the layer before flatten (max-pooling), let's say this layer has index i in the model, we can take the output from that layer and reshape it based on our needs and pass it to LSTM.
before_flatten = base_model.layers[i].output # i is the index of the layer from which you want to take the model output
conv2lstm_reshape = Reshape((-1, 2))(before_flatten) # you have to select it, the temporal dim and filters
# Adding the first lstm layer
x = LSTM(1024,activation='relu',return_sequences='True')(conv2lstm_reshape)
# Adding the second lstm layer
x = LSTM(1024, activation='relu',return_sequences='False')(x)
# Adding the output
output = Dense(3,activation='linear')(before_flatten)
# Final model creation
model = Model(inputs=[base_model.input], outputs=[output])
I have a similar problem to Keras replacing input layer, however I need to remove also the next layer, and that will require different input shape.
Here is a simplification of what I'm trying to do:
a = Input(shape=(64,))
b = Dense(32)(a)
c = Dense(16)(b)
d = Dense(8)(c)
model = Model(inputs=a, outputs=d)
print('input shape = ' + str(model.input_shape))
print('input shape = ' + str(model.input_shape))
new_input = Input(shape=(32,))
new_output = model(new_input)
new_model = Model(new_input, new_output)
But the input shape of the model remains the same:
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 64) 0
dense_1 (Dense) (None, 32) 2080
dense_2 (Dense) (None, 16) 528
dense_3 (Dense) (None, 8) 136
Total params: 2,744
Trainable params: 2,744
Non-trainable params: 0
input shape = (None, 64)
Layer (type) Output Shape Param #
dense_2 (Dense) (None, 16) 528
dense_3 (Dense) (None, 8) 136
Total params: 664
Trainable params: 664
Non-trainable params: 0
input shape = (None, 64)
And that prevents me from creating new model, so the code above fails with:
ValueError: Dimensions must be equal, but are 32 and 64 for 'model_1/dense_1/MatMul' (op: 'MatMul') with input shapes: [?,32], [64,32].
Any ideas how to do that?
It might not be possible to do in the way that you describe. The accepted answer on this post explains it a little.
Their solution was to rebuild the layer with the correct input shape, then load the pre-trained weights for that specific layer.