Keras model optimization with bayesian optimization. Model summary is different - keras

I am trying to tune my cnn model with tuner keras. After running the bayesian search, when I display model.summary() it is totally different from the the bayesian optimization parameters found by the algorithm.
I define my function model_builder as follow:
def model_builder2(hp):
model = Sequential()
for i in range(hp.Int('num_blocks', 1,5)):
hp_padding=hp.Choice('padding_'+ str(i), values=['valid', 'same'])
hp_filters=hp.Choice('filters_'+ str(i), values=[32, 64])
model.add(Conv2D(hp_filters, (3, 3), padding=hp_padding, activation='relu', kernel_initializer='he_uniform', input_shape=(50, 50, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(hp.Choice('dropout_'+ str(i), values=[0.2, 0.5])))
hp_units = hp.Int('units', min_value=25, max_value=512, step=25)
model.add(Dense(hp_units, activation='relu', kernel_initializer='he_uniform'))
hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-5])
#hp_optimizer=hp.Choice('Optimizer', values=['Adam', 'SGD'])
#hp_optimizer=hp.Choice('Optimizer', values=['Adam'])
#model.compile(loss=keras.losses.binary_crossentropy, optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate), metrics=['accuracy'])
return model
After running the algorith, I have as best hyperparameters:
Best Hyper-parameters
{'dropout_0': 0.5,
'filters_0': 32,
'learning_rate': 0.01,
'num_blocks': 5,
'padding_0': 'same',
'units': 250}
But, when I do model.summary(), I have:
Model: "sequential_1"
Layer (type) Output Shape Param #
conv2d_1 (Conv2D) (None, 50, 50, 32) 896
max_pooling2d_1 (MaxPooling (None, 25, 25, 32) 0
dropout_1 (Dropout) (None, 25, 25, 32) 0
flatten_1 (Flatten) (None, 20000) 0
dense_2 (Dense) (None, 250) 5000250
dense_3 (Dense) (None, 2) 502
Total params: 5,001,648
Trainable params: 5,001,648
Non-trainable params: 0
It seems I don't have 5 blocks. Any idea ?


Keras model not learning after training

I am training a keras model for a sentence classification task. The problem is although it is giving an accuracy of 94%, it is not learning anything. When I give a new sentence (not present in the dataset), it gives the same probability for it (in the model.prediction step). I can't figure out why is this happening.
Here is my model
model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(Dense(2, activation='sigmoid'))
Here max_words = 2000 and max_len=300
Here is the model summary
Model: "sequential_3"
Layer (type) Output Shape Param #
embedding_3 (Embedding) (None, 300, 30) 60000
batch_normalization_5 (Batch (None, 300, 30) 120
activation_5 (Activation) (None, 300, 30) 0
dropout_3 (Dropout) (None, 300, 30) 0
bidirectional_3 (Bidirection (None, 64) 16128
batch_normalization_6 (Batch (None, 64) 256
activation_6 (Activation) (None, 64) 0
dropout_4 (Dropout) (None, 64) 0
dense_3 (Dense) (None, 2) 130
Total params: 76,634
Trainable params: 76,446
Non-trainable params: 188
And here is the code, the size of my dataset is 20k, with 10% in testing.
model.compile(loss='sparse_categorical_crossentropy', metrics=['accuracy'], optimizer = 'adam')
history =, Y_train, batch_size=256, epochs=50, validation_split=0.1)
Try changing activation function of the last layer from sigmoid to softmax. It doesn't quite match the loss you are using (categorical cross-entropy). If you use sigmoid, then you only need one unit and should use binary cross-entropy loss.

logits and labels must have the same first dimension, got logits shape [3662,5] and labels shape [18310]

I am trying to create a CNN with tensorflow, my images are 64x64x1 images and I have an array of 3662 images which I am using for training. I have total 5 labels which I have one-hot encoded. I am getting this error everytime:
InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [3662,5] and labels shape [18310]
[[{{node loss_2/dense_5_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
my neural network structure is this:
def cnn_model():
model = models.Sequential()
# model.add(layers.Dense(128, activation='relu', ))
model.add(layers.Conv2D(128, (3, 3), activation='relu',input_shape=(64, 64, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu',padding = 'same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(5, activation='softmax'))
return model
My model summary is this:
Model: "sequential_3"
Layer (type) Output Shape Param #
conv2d_9 (Conv2D) (None, 62, 62, 128) 1280
max_pooling2d_6 (MaxPooling2 (None, 31, 31, 128) 0
conv2d_10 (Conv2D) (None, 31, 31, 64) 73792
max_pooling2d_7 (MaxPooling2 (None, 15, 15, 64) 0
conv2d_11 (Conv2D) (None, 15, 15, 64) 36928
dense_4 (Dense) (None, 15, 15, 64) 4160
flatten_2 (Flatten) (None, 14400) 0
dense_5 (Dense) (None, 5) 72005
Total params: 188,165
Trainable params: 188,165
Non-trainable params: 0
my output array is of the shape (3662,5,1). I have seen other answers to same questions but I can't figure out the problem with mine. Where am I wrong?
Edit: My labels are stored in one hot encoded form using these:
df = pd.get_dummies(df)
diag = np.array(df)
diag = np.reshape(diag,(3662,5,1))
I have tried as numpy array and after converting to tensor(same for input as per documentation)
The problem lines within the choice of the loss function tf.keras.losses.SparseCategoricalCrossentropy(). According to what you are trying to achieve you should use tf.keras.losses.CategoricalCrossentropy(). Namely, the documentation of tf.keras.losses.SparseCategoricalCrossentropy() states:
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers.
On the other hand, the documentation of tf.keras.losses.CategoricalCrossentropy() states:
We expect labels to be provided in a one_hot representation.
And because your labels are encoded as one-hot, you should use tf.keras.losses.CategoricalCrossentropy().

Tensorflow invalid shape (InvalidArgumentError) produces exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot update variable with shape [] using a Tensor with shape [32], shapes must be equal.
[[{{node metrics/accuracy/AssignAddVariableOp}}]]
[[loss/dense_loss/categorical_crossentropy/weighted_loss/broadcast_weights/assert_broadcastable/AssertGuard/pivot_f/_50/_63]] [Op:__inference_keras_scratch_graph_1408]
Model definition:
model = tf.keras.Sequential()
input_shape=(360, 7)
model.add(tf.keras.layers.Conv1D(32, 1, activation='relu', input_shape=(360, 7)))
model.add(tf.keras.layers.Conv1D(32, 1, activation='relu'))
model.add(tf.keras.layers.Conv1D(512, 1, activation='relu'))
model.add(tf.keras.layers.Conv1D(1048, 1, activation='relu'))
model.add(tf.keras.layers.Dense(32, activation='softmax'))
Input Features Shape
(105, 360, 7)
Input Labels Shape
(105, 32, 1)
Compile statement
metrics=['accuracy']) statement,
Any help would be much appreciated
You can use model.summary() to see your model architecture.
Layer (type) Output Shape Param #
conv1d (Conv1D) (None, 360, 32) 256
conv1d_1 (Conv1D) (None, 360, 32) 1056
max_pooling1d (MaxPooling1D) (None, 120, 32) 0
conv1d_2 (Conv1D) (None, 120, 512) 16896
conv1d_3 (Conv1D) (None, 120, 1048) 537624
global_average_pooling1d (Gl (None, 1048) 0
dropout (Dropout) (None, 1048) 0
dense (Dense) (None, 32) 33568
Total params: 589,400
Trainable params: 589,400
Non-trainable params: 0
The shape of your output layer is required to be (None,32), but the shape of your labels is (105,32,1). So you need to change the shape to (105,32). np.squeeze() function is used when we want to remove single-dimensional entries from the shape of an array.
Use Flatten() before the Dense Layers.

convolutional autoencoder to analyse long 1-D sequences

I have a dataset of 1-D vectors each 3001 digits long. I have used a simple convolutional network to perform binary classification on these sequences:
model = Sequential()
model.add(Conv1D(75,3,strides=1, input_shape=shape, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
The network achieves ~60% accuracy.
I now would like to create an autoencoder to discover the regular pattern that is distinguishing samples where the label is '1' vs those where it is '0'. i.e. to generate an exemplary sequence- that is representative of the '1' labeled samples.
Based on previous blogs and posts I have tried to put together an autoencoder that can achieve this:
input_sig = Input(batch_shape=(None,3001,1))
x = Conv1D(64,3, activation='relu', padding='same')(input_sig)
x1 = MaxPooling1D(2)(x)
x2 = Conv1D(32,3, activation='relu', padding='same')(x1)
x3 = MaxPooling1D(2)(x2)
flat = Flatten()(x3)
encoded = Dense(1,activation = 'relu')(flat)
x2_ = Conv1D(32, 3, activation='relu', padding='same')(x3)
x1_ = UpSampling1D(2)(x2_)
x_ = Conv1D(64, 3, activation='relu', padding='same')(x1_)
upsamp = UpSampling1D(2)(x_)
decoded = Conv1D(1, 3, activation='sigmoid', padding='same')(upsamp)
autoencoder = Model(input_sig, decoded)
autoencoder.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
This looks as follows:
Layer (type) Output Shape Param #
input_57 (InputLayer) (None, 3001, 1) 0
conv1d_233 (Conv1D) (None, 3001, 64) 256
max_pooling1d_115 (MaxPoolin (None, 1500, 64) 0
conv1d_234 (Conv1D) (None, 1500, 32) 6176
max_pooling1d_116 (MaxPoolin (None, 750, 32) 0
conv1d_235 (Conv1D) (None, 750, 32) 3104
up_sampling1d_106 (UpSamplin (None, 1500, 32) 0
conv1d_236 (Conv1D) (None, 1500, 64) 6208
up_sampling1d_107 (UpSamplin (None, 3000, 64) 0
conv1d_237 (Conv1D) (None, 3000, 64) 12352
Total params: 28,096
Trainable params: 28,096
Non-trainable params: 0
hence everything seems to be going smoothly until I train the netowrk,train_y,epochs=3,batch_size=100,validation_data=(test_X, test_y))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bsxcto/miniconda3/lib/python3.6/site-packages/keras/engine/", line 1630, in fit
File "/home/bsxcto/miniconda3/lib/python3.6/site-packages/keras/engine/", line 1480, in _standardize_user_data
File "/home/bsxcto/miniconda3/lib/python3.6/site-packages/keras/engine/", line 113, in _standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking target: expected conv1d_237 to have 3 dimensions, but got array with shape (32318, 1)
Hence I have tried adding a 'Reshape' layer before the last one.
upsamp = UpSampling1D(2)(x_)
flat = Flatten()(upsamp)
reshaped = Reshape((3000,64))(flat)
decoded = Conv1D(1, 3, activation='sigmoid', padding='same')(reshaped)
in which case the network looks as follows:
Layer (type) Output Shape Param #
input_59 (InputLayer) (None, 3001, 1) 0
conv1d_243 (Conv1D) (None, 3001, 64) 256
max_pooling1d_119 (MaxPoolin (None, 1500, 64) 0
conv1d_244 (Conv1D) (None, 1500, 32) 6176
max_pooling1d_120 (MaxPoolin (None, 750, 32) 0
conv1d_245 (Conv1D) (None, 750, 32) 3104
up_sampling1d_110 (UpSamplin (None, 1500, 32) 0
conv1d_246 (Conv1D) (None, 1500, 64) 6208
up_sampling1d_111 (UpSamplin (None, 3000, 64) 0
flatten_111 (Flatten) (None, 192000) 0
reshape_45 (Reshape) (None, 3000, 64) 0
conv1d_247 (Conv1D) (None, 3000, 1) 193
Total params: 15,937
Trainable params: 15,937
Non-trainable params: 0
But the same error results:
Error when checking target: expected conv1d_247 to have 3 dimensions, but got array with shape (32318, 1)
My questions are:
1) Is this a feasible way of finding the pattern that is distinguishing samples with label '1' vs '0'?
2) how can I make the final layer accept the final output of the last upsampling layer?
original = Sequential()
original.add(Conv1D(75,repeat_length,strides=stride, input_shape=shape, activation='relu’,padding=‘same’))
original.add(Dense(1, activation='sigmoid'))
original.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
mod.add(Conv1D(75,window, activation='relu', padding='same'))
mod.add(Conv1D(1, 1, activation='sigmoid', padding='same'))
mod.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
decoded_imgs = mod.predict(test_X)

Keras to Pytorch model translation and input size

I am following a Keras tutorial and want to shadow it in Pytorch, so am translating. I'm not strongly familiar with either and am coming unstuck on the input size parameter especially, but also the final layer - do I need another Linear layer? Can anyone translate the following to a Pytorch sequential definition?
visible = Input(shape=(64,64,1))
conv1 = Conv2D(32, kernel_size=4, activation='relu')(visible)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(16, kernel_size=4, activation='relu')(pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
hidden1 = Dense(10, activation='relu')(pool2)
output = Dense(1, activation='sigmoid')(hidden1)
model = Model(inputs=visible, outputs=output)
This is the output of the model:
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 64, 64, 1) 0
conv2d_1 (Conv2D) (None, 61, 61, 32) 544
max_pooling2d_1 (MaxPooling2 (None, 30, 30, 32) 0
conv2d_2 (Conv2D) (None, 27, 27, 16) 8208
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 16) 0
dense_1 (Dense) (None, 13, 13, 10) 170
dense_2 (Dense) (None, 13, 13, 1) 11
Total params: 8,933
Trainable params: 8,933
Non-trainable params: 0
What I have worked out lacks a specification for the shape of the input, and I am also a bit perplexed at the translation of stride in the specified Keras model as it uses stride 2 in the MaxPooling2D but doesn't specify this elsewhere - it is perhaps a toy example.
model = nn.Sequential(
nn.Conv2d(1, 32, 4),
nn.MaxPool2d(2, 2),
nn.Conv2d(1, 16, 4),
nn.MaxPool2d(2, 2),
nn.Linear(10, 1),
