Keras model optimization with bayesian optimization. Model summary is different - keras

I am trying to tune my cnn model with tuner keras. After running the bayesian search, when I display model.summary() it is totally different from the the bayesian optimization parameters found by the algorithm.
I define my function model_builder as follow:
def model_builder2(hp):
model = Sequential()
#model.add(Input(shape=(50,50,3)))
for i in range(hp.Int('num_blocks', 1,5)):
hp_padding=hp.Choice('padding_'+ str(i), values=['valid', 'same'])
hp_filters=hp.Choice('filters_'+ str(i), values=[32, 64])
model.add(Conv2D(hp_filters, (3, 3), padding=hp_padding, activation='relu', kernel_initializer='he_uniform', input_shape=(50, 50, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(hp.Choice('dropout_'+ str(i), values=[0.2, 0.5])))
model.add(Flatten())
hp_units = hp.Int('units', min_value=25, max_value=512, step=25)
model.add(Dense(hp_units, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(2,activation="sigmoid"))
hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-5])
#hp_optimizer=hp.Choice('Optimizer', values=['Adam', 'SGD'])
#hp_optimizer=hp.Choice('Optimizer', values=['Adam'])
#model.compile(loss=keras.losses.binary_crossentropy, optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate), metrics=['accuracy'])
model.compile(optimizer=tf.optimizers.Adam(learning_rate=hp_learning_rate),loss=keras.losses.binary_crossentropy,metrics=['accuracy'])
return model
After running the algorith, I have as best hyperparameters:
Best Hyper-parameters
{'dropout_0': 0.5,
'filters_0': 32,
'learning_rate': 0.01,
'num_blocks': 5,
'padding_0': 'same',
'units': 250}
But, when I do model.summary(), I have:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 50, 50, 32) 896
max_pooling2d_1 (MaxPooling (None, 25, 25, 32) 0
2D)
dropout_1 (Dropout) (None, 25, 25, 32) 0
flatten_1 (Flatten) (None, 20000) 0
dense_2 (Dense) (None, 250) 5000250
dense_3 (Dense) (None, 2) 502
=================================================================
Total params: 5,001,648
Trainable params: 5,001,648
Non-trainable params: 0
It seems I don't have 5 blocks. Any idea ?

Related

Keras model not learning after training

I am training a keras model for a sentence classification task. The problem is although it is giving an accuracy of 94%, it is not learning anything. When I give a new sentence (not present in the dataset), it gives the same probability for it (in the model.prediction step). I can't figure out why is this happening.
Here is my model
model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(32)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='sigmoid'))
model.summary()
Here max_words = 2000 and max_len=300
Here is the model summary
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_3 (Embedding) (None, 300, 30) 60000
_________________________________________________________________
batch_normalization_5 (Batch (None, 300, 30) 120
_________________________________________________________________
activation_5 (Activation) (None, 300, 30) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 300, 30) 0
_________________________________________________________________
bidirectional_3 (Bidirection (None, 64) 16128
_________________________________________________________________
batch_normalization_6 (Batch (None, 64) 256
_________________________________________________________________
activation_6 (Activation) (None, 64) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 64) 0
_________________________________________________________________
dense_3 (Dense) (None, 2) 130
=================================================================
Total params: 76,634
Trainable params: 76,446
Non-trainable params: 188
And here is the code, the size of my dataset is 20k, with 10% in testing.
model.compile(loss='sparse_categorical_crossentropy', metrics=['accuracy'], optimizer = 'adam')
history = model.fit(sequences_matrix, Y_train, batch_size=256, epochs=50, validation_split=0.1)
Try changing activation function of the last layer from sigmoid to softmax. It doesn't quite match the loss you are using (categorical cross-entropy). If you use sigmoid, then you only need one unit and should use binary cross-entropy loss.

logits and labels must have the same first dimension, got logits shape [3662,5] and labels shape [18310]

I am trying to create a CNN with tensorflow, my images are 64x64x1 images and I have an array of 3662 images which I am using for training. I have total 5 labels which I have one-hot encoded. I am getting this error everytime:
InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [3662,5] and labels shape [18310]
[[{{node loss_2/dense_5_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
my neural network structure is this:
def cnn_model():
model = models.Sequential()
# model.add(layers.Dense(128, activation='relu', ))
model.add(layers.Conv2D(128, (3, 3), activation='relu',input_shape=(64, 64, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu',padding = 'same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
print(model.summary())
return model
My model summary is this:
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_9 (Conv2D) (None, 62, 62, 128) 1280
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 31, 31, 128) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 31, 31, 64) 73792
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 15, 15, 64) 0
_________________________________________________________________
conv2d_11 (Conv2D) (None, 15, 15, 64) 36928
_________________________________________________________________
dense_4 (Dense) (None, 15, 15, 64) 4160
_________________________________________________________________
flatten_2 (Flatten) (None, 14400) 0
_________________________________________________________________
dense_5 (Dense) (None, 5) 72005
=================================================================
Total params: 188,165
Trainable params: 188,165
Non-trainable params: 0
my output array is of the shape (3662,5,1). I have seen other answers to same questions but I can't figure out the problem with mine. Where am I wrong?
Edit: My labels are stored in one hot encoded form using these:
df = pd.get_dummies(df)
diag = np.array(df)
diag = np.reshape(diag,(3662,5,1))
I have tried as numpy array and after converting to tensor(same for input as per documentation)
The problem lines within the choice of the loss function tf.keras.losses.SparseCategoricalCrossentropy(). According to what you are trying to achieve you should use tf.keras.losses.CategoricalCrossentropy(). Namely, the documentation of tf.keras.losses.SparseCategoricalCrossentropy() states:
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers.
On the other hand, the documentation of tf.keras.losses.CategoricalCrossentropy() states:
We expect labels to be provided in a one_hot representation.
And because your labels are encoded as one-hot, you should use tf.keras.losses.CategoricalCrossentropy().

Tensorflow invalid shape (InvalidArgumentError)

model.fit produces exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot update variable with shape [] using a Tensor with shape [32], shapes must be equal.
[[{{node metrics/accuracy/AssignAddVariableOp}}]]
[[loss/dense_loss/categorical_crossentropy/weighted_loss/broadcast_weights/assert_broadcastable/AssertGuard/pivot_f/_50/_63]] [Op:__inference_keras_scratch_graph_1408]
Model definition:
model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(
input_shape=(360, 7)
))
model.add(tf.keras.layers.Conv1D(32, 1, activation='relu', input_shape=(360, 7)))
model.add(tf.keras.layers.Conv1D(32, 1, activation='relu'))
model.add(tf.keras.layers.MaxPooling1D(3))
model.add(tf.keras.layers.Conv1D(512, 1, activation='relu'))
model.add(tf.keras.layers.Conv1D(1048, 1, activation='relu'))
model.add(tf.keras.layers.GlobalAveragePooling1D())
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(32, activation='softmax'))
Input Features Shape
(105, 360, 7)
Input Labels Shape
(105, 32, 1)
Compile statement
model.compile(optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
Model.fit statement
model.fit(features,
labels,
epochs=50000,
validation_split=0.2,
verbose=1)
Any help would be much appreciated
You can use model.summary() to see your model architecture.
print(model.summary())
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d (Conv1D) (None, 360, 32) 256
_________________________________________________________________
conv1d_1 (Conv1D) (None, 360, 32) 1056
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 120, 32) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 120, 512) 16896
_________________________________________________________________
conv1d_3 (Conv1D) (None, 120, 1048) 537624
_________________________________________________________________
global_average_pooling1d (Gl (None, 1048) 0
_________________________________________________________________
dropout (Dropout) (None, 1048) 0
_________________________________________________________________
dense (Dense) (None, 32) 33568
=================================================================
Total params: 589,400
Trainable params: 589,400
Non-trainable params: 0
_________________________________________________________________
None
The shape of your output layer is required to be (None,32), but the shape of your labels is (105,32,1). So you need to change the shape to (105,32). np.squeeze() function is used when we want to remove single-dimensional entries from the shape of an array.
Use Flatten() before the Dense Layers.

convolutional autoencoder to analyse long 1-D sequences

I have a dataset of 1-D vectors each 3001 digits long. I have used a simple convolutional network to perform binary classification on these sequences:
shape=train_X.shape[1:]
model = Sequential()
model.add(Conv1D(75,3,strides=1, input_shape=shape, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
The network achieves ~60% accuracy.
I now would like to create an autoencoder to discover the regular pattern that is distinguishing samples where the label is '1' vs those where it is '0'. i.e. to generate an exemplary sequence- that is representative of the '1' labeled samples.
Based on previous blogs and posts I have tried to put together an autoencoder that can achieve this:
input_sig = Input(batch_shape=(None,3001,1))
x = Conv1D(64,3, activation='relu', padding='same')(input_sig)
x1 = MaxPooling1D(2)(x)
x2 = Conv1D(32,3, activation='relu', padding='same')(x1)
x3 = MaxPooling1D(2)(x2)
flat = Flatten()(x3)
encoded = Dense(1,activation = 'relu')(flat)
x2_ = Conv1D(32, 3, activation='relu', padding='same')(x3)
x1_ = UpSampling1D(2)(x2_)
x_ = Conv1D(64, 3, activation='relu', padding='same')(x1_)
upsamp = UpSampling1D(2)(x_)
decoded = Conv1D(1, 3, activation='sigmoid', padding='same')(upsamp)
autoencoder = Model(input_sig, decoded)
autoencoder.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
This looks as follows:
autoencoder.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_57 (InputLayer) (None, 3001, 1) 0
_________________________________________________________________
conv1d_233 (Conv1D) (None, 3001, 64) 256
_________________________________________________________________
max_pooling1d_115 (MaxPoolin (None, 1500, 64) 0
_________________________________________________________________
conv1d_234 (Conv1D) (None, 1500, 32) 6176
_________________________________________________________________
max_pooling1d_116 (MaxPoolin (None, 750, 32) 0
_________________________________________________________________
conv1d_235 (Conv1D) (None, 750, 32) 3104
_________________________________________________________________
up_sampling1d_106 (UpSamplin (None, 1500, 32) 0
_________________________________________________________________
conv1d_236 (Conv1D) (None, 1500, 64) 6208
_________________________________________________________________
up_sampling1d_107 (UpSamplin (None, 3000, 64) 0
_________________________________________________________________
conv1d_237 (Conv1D) (None, 3000, 64) 12352
=================================================================
Total params: 28,096
Trainable params: 28,096
Non-trainable params: 0
hence everything seems to be going smoothly until I train the netowrk
autoencoder.fit(train_X,train_y,epochs=3,batch_size=100,validation_data=(test_X, test_y))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bsxcto/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1630, in fit
batch_size=batch_size)
File "/home/bsxcto/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1480, in _standardize_user_data
exception_prefix='target')
File "/home/bsxcto/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 113, in _standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking target: expected conv1d_237 to have 3 dimensions, but got array with shape (32318, 1)
Hence I have tried adding a 'Reshape' layer before the last one.
upsamp = UpSampling1D(2)(x_)
flat = Flatten()(upsamp)
reshaped = Reshape((3000,64))(flat)
decoded = Conv1D(1, 3, activation='sigmoid', padding='same')(reshaped)
in which case the network looks as follows:
autoencoder.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_59 (InputLayer) (None, 3001, 1) 0
_________________________________________________________________
conv1d_243 (Conv1D) (None, 3001, 64) 256
_________________________________________________________________
max_pooling1d_119 (MaxPoolin (None, 1500, 64) 0
_________________________________________________________________
conv1d_244 (Conv1D) (None, 1500, 32) 6176
_________________________________________________________________
max_pooling1d_120 (MaxPoolin (None, 750, 32) 0
_________________________________________________________________
conv1d_245 (Conv1D) (None, 750, 32) 3104
_________________________________________________________________
up_sampling1d_110 (UpSamplin (None, 1500, 32) 0
_________________________________________________________________
conv1d_246 (Conv1D) (None, 1500, 64) 6208
_________________________________________________________________
up_sampling1d_111 (UpSamplin (None, 3000, 64) 0
_________________________________________________________________
flatten_111 (Flatten) (None, 192000) 0
_________________________________________________________________
reshape_45 (Reshape) (None, 3000, 64) 0
_________________________________________________________________
conv1d_247 (Conv1D) (None, 3000, 1) 193
=================================================================
Total params: 15,937
Trainable params: 15,937
Non-trainable params: 0
But the same error results:
Error when checking target: expected conv1d_247 to have 3 dimensions, but got array with shape (32318, 1)
My questions are:
1) Is this a feasible way of finding the pattern that is distinguishing samples with label '1' vs '0'?
2) how can I make the final layer accept the final output of the last upsampling layer?
original = Sequential()
original.add(Conv1D(75,repeat_length,strides=stride, input_shape=shape, activation='relu’,padding=‘same’))
original.add(MaxPooling1D(repeat_length))
original.add(Flatten())
original.add(Dense(1, activation='sigmoid'))
original.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
calculate_roc(original......)
mod=Sequential()
mod.add(original.layers[0])
mod.add(original.layers[1])
mod.add(Conv1D(75,window, activation='relu', padding='same'))
mod.add(UpSampling1D(window))
mod.add(Conv1D(1, 1, activation='sigmoid', padding='same'))
mod.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
mod.layers[0].trainable=False
mod.layers[1].trainable=False
mod.fit(train_X,train_X,epochs=1,batch_size=100)
decoded_imgs = mod.predict(test_X)
x=decoded_imgs.mean(axis=0)
plt.plot(x)

Keras to Pytorch model translation and input size

I am following a Keras tutorial and want to shadow it in Pytorch, so am translating. I'm not strongly familiar with either and am coming unstuck on the input size parameter especially, but also the final layer - do I need another Linear layer? Can anyone translate the following to a Pytorch sequential definition?
visible = Input(shape=(64,64,1))
conv1 = Conv2D(32, kernel_size=4, activation='relu')(visible)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(16, kernel_size=4, activation='relu')(pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
hidden1 = Dense(10, activation='relu')(pool2)
output = Dense(1, activation='sigmoid')(hidden1)
model = Model(inputs=visible, outputs=output)
This is the output of the model:
Layer (type) Output Shape Param #
_________________________________________________________________
input_1 (InputLayer) (None, 64, 64, 1) 0
conv2d_1 (Conv2D) (None, 61, 61, 32) 544
max_pooling2d_1 (MaxPooling2 (None, 30, 30, 32) 0
conv2d_2 (Conv2D) (None, 27, 27, 16) 8208
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 16) 0
dense_1 (Dense) (None, 13, 13, 10) 170
dense_2 (Dense) (None, 13, 13, 1) 11
Total params: 8,933
Trainable params: 8,933
Non-trainable params: 0
What I have worked out lacks a specification for the shape of the input, and I am also a bit perplexed at the translation of stride in the specified Keras model as it uses stride 2 in the MaxPooling2D but doesn't specify this elsewhere - it is perhaps a toy example.
model = nn.Sequential(
nn.Conv2d(1, 32, 4),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(1, 16, 4),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Linear(10, 1),
nn.Sigmoid(),
)

Resources