Keras model not learning after training - python-3.x

I am training a keras model for a sentence classification task. The problem is although it is giving an accuracy of 94%, it is not learning anything. When I give a new sentence (not present in the dataset), it gives the same probability for it (in the model.prediction step). I can't figure out why is this happening.
Here is my model
model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(32)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='sigmoid'))
model.summary()
Here max_words = 2000 and max_len=300
Here is the model summary
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_3 (Embedding) (None, 300, 30) 60000
_________________________________________________________________
batch_normalization_5 (Batch (None, 300, 30) 120
_________________________________________________________________
activation_5 (Activation) (None, 300, 30) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 300, 30) 0
_________________________________________________________________
bidirectional_3 (Bidirection (None, 64) 16128
_________________________________________________________________
batch_normalization_6 (Batch (None, 64) 256
_________________________________________________________________
activation_6 (Activation) (None, 64) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 64) 0
_________________________________________________________________
dense_3 (Dense) (None, 2) 130
=================================================================
Total params: 76,634
Trainable params: 76,446
Non-trainable params: 188
And here is the code, the size of my dataset is 20k, with 10% in testing.
model.compile(loss='sparse_categorical_crossentropy', metrics=['accuracy'], optimizer = 'adam')
history = model.fit(sequences_matrix, Y_train, batch_size=256, epochs=50, validation_split=0.1)

Try changing activation function of the last layer from sigmoid to softmax. It doesn't quite match the loss you are using (categorical cross-entropy). If you use sigmoid, then you only need one unit and should use binary cross-entropy loss.

Related

Keras model optimization with bayesian optimization. Model summary is different

I am trying to tune my cnn model with tuner keras. After running the bayesian search, when I display model.summary() it is totally different from the the bayesian optimization parameters found by the algorithm.
I define my function model_builder as follow:
def model_builder2(hp):
model = Sequential()
#model.add(Input(shape=(50,50,3)))
for i in range(hp.Int('num_blocks', 1,5)):
hp_padding=hp.Choice('padding_'+ str(i), values=['valid', 'same'])
hp_filters=hp.Choice('filters_'+ str(i), values=[32, 64])
model.add(Conv2D(hp_filters, (3, 3), padding=hp_padding, activation='relu', kernel_initializer='he_uniform', input_shape=(50, 50, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(hp.Choice('dropout_'+ str(i), values=[0.2, 0.5])))
model.add(Flatten())
hp_units = hp.Int('units', min_value=25, max_value=512, step=25)
model.add(Dense(hp_units, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(2,activation="sigmoid"))
hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-5])
#hp_optimizer=hp.Choice('Optimizer', values=['Adam', 'SGD'])
#hp_optimizer=hp.Choice('Optimizer', values=['Adam'])
#model.compile(loss=keras.losses.binary_crossentropy, optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate), metrics=['accuracy'])
model.compile(optimizer=tf.optimizers.Adam(learning_rate=hp_learning_rate),loss=keras.losses.binary_crossentropy,metrics=['accuracy'])
return model
After running the algorith, I have as best hyperparameters:
Best Hyper-parameters
{'dropout_0': 0.5,
'filters_0': 32,
'learning_rate': 0.01,
'num_blocks': 5,
'padding_0': 'same',
'units': 250}
But, when I do model.summary(), I have:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 50, 50, 32) 896
max_pooling2d_1 (MaxPooling (None, 25, 25, 32) 0
2D)
dropout_1 (Dropout) (None, 25, 25, 32) 0
flatten_1 (Flatten) (None, 20000) 0
dense_2 (Dense) (None, 250) 5000250
dense_3 (Dense) (None, 2) 502
=================================================================
Total params: 5,001,648
Trainable params: 5,001,648
Non-trainable params: 0
It seems I don't have 5 blocks. Any idea ?

Keras functional API slower than Sequential / Not improving

SOLVED!(Had to set trainable=true in the sequential model)
I am currently changing my Keras model from Sequential to the functional API. While the Sequential model does improve to an accuracy of 1 after like 10 epochs, the functional API model does not even reach 0.7 and does not further improve. Apart from the Input layer, both nets should be the same.
Sequential:
model = Sequential()
model.add(Embedding(20000, 256,input_length = 30))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(256, dropout=0.3, recurrent_dropout=0.3))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer=Adam(lr=0.0001),metrics = ['accuracy'])
print(model.summary())
Output is:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_6 (Embedding) (None, 30, 256) 5120000
_________________________________________________________________
spatial_dropout1d_5 (Spatial (None, 30, 256) 0
_________________________________________________________________
lstm_5 (LSTM) (None, 256) 525312
_________________________________________________________________
dense_6 (Dense) (None, 1) 257
=================================================================
Total params: 5,645,569
Trainable params: 5,645,569
Non-trainable params: 0
_________________________________________________________________
None
For the functional API:
inputs = Input(shape=(31,))
embed = Embedding(20000, 256, trainable=False)(inputs)
drop = (SpatialDropout1D(0.4))(embed)
lstm = LSTM(256, dropout=0.3, recurrent_dropout=0.3)(drop)
acti = Dense(1,activation='sigmoid')(lstm)
model = Model(inputs=inputs, outputs=acti)
model.compile(loss = 'binary_crossentropy', optimizer=Adam(lr=0.0001),metrics = ['accuracy'])
print(model.summary())
Result
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) (None, 31) 0
_________________________________________________________________
embedding_7 (Embedding) (None, 31, 256) 5120000
_________________________________________________________________
spatial_dropout1d_6 (Spatial (None, 31, 256) 0
_________________________________________________________________
lstm_6 (LSTM) (None, 256) 525312
_________________________________________________________________
dense_7 (Dense) (None, 1) 257
=================================================================
Total params: 5,645,569
Trainable params: 525,569
Non-trainable params: 5,120,000
_________________________________________________________________
None
Have I overseen something or can someone explain my results?

Tensorflow invalid shape (InvalidArgumentError)

model.fit produces exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot update variable with shape [] using a Tensor with shape [32], shapes must be equal.
[[{{node metrics/accuracy/AssignAddVariableOp}}]]
[[loss/dense_loss/categorical_crossentropy/weighted_loss/broadcast_weights/assert_broadcastable/AssertGuard/pivot_f/_50/_63]] [Op:__inference_keras_scratch_graph_1408]
Model definition:
model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(
input_shape=(360, 7)
))
model.add(tf.keras.layers.Conv1D(32, 1, activation='relu', input_shape=(360, 7)))
model.add(tf.keras.layers.Conv1D(32, 1, activation='relu'))
model.add(tf.keras.layers.MaxPooling1D(3))
model.add(tf.keras.layers.Conv1D(512, 1, activation='relu'))
model.add(tf.keras.layers.Conv1D(1048, 1, activation='relu'))
model.add(tf.keras.layers.GlobalAveragePooling1D())
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(32, activation='softmax'))
Input Features Shape
(105, 360, 7)
Input Labels Shape
(105, 32, 1)
Compile statement
model.compile(optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
Model.fit statement
model.fit(features,
labels,
epochs=50000,
validation_split=0.2,
verbose=1)
Any help would be much appreciated
You can use model.summary() to see your model architecture.
print(model.summary())
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d (Conv1D) (None, 360, 32) 256
_________________________________________________________________
conv1d_1 (Conv1D) (None, 360, 32) 1056
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 120, 32) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 120, 512) 16896
_________________________________________________________________
conv1d_3 (Conv1D) (None, 120, 1048) 537624
_________________________________________________________________
global_average_pooling1d (Gl (None, 1048) 0
_________________________________________________________________
dropout (Dropout) (None, 1048) 0
_________________________________________________________________
dense (Dense) (None, 32) 33568
=================================================================
Total params: 589,400
Trainable params: 589,400
Non-trainable params: 0
_________________________________________________________________
None
The shape of your output layer is required to be (None,32), but the shape of your labels is (105,32,1). So you need to change the shape to (105,32). np.squeeze() function is used when we want to remove single-dimensional entries from the shape of an array.
Use Flatten() before the Dense Layers.

keras parametes for multilabel text classification

I am using keras in my multiclass text classifcation, the dataset contains 25000 arabic tweets with 10 class labels
I use this code :
model = Sequential()
model.add(Dense(512, input_shape=(10902,)))#10902
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(10))
model.add(Activation('softmax'))
model.summary()
#categorical_crossentropy
model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
..
history = model.fit(X_train, y_train,
batch_size=100,
epochs=30,
verbose=1,
validation_split=0.5)
Summary:
Layer (type) Output Shape Param #
=================================================================
dense_23 (Dense) (None, 512) 5582336
_________________________________________________________________
activation_22 (Activation) (None, 512) 0
_________________________________________________________________
dropout_15 (Dropout) (None, 512) 0
_________________________________________________________________
dense_24 (Dense) (None, 512) 262656
_________________________________________________________________
activation_23 (Activation) (None, 512) 0
_________________________________________________________________
dropout_16 (Dropout) (None, 512) 0
_________________________________________________________________
dense_25 (Dense) (None, 10) 5130
_________________________________________________________________
activation_24 (Activation) (None, 10) 0
=================================================================
Total params: 5,850,122
Trainable params: 5,850,122
Non-trainable params: 0
but i get error:
could not convert string to float: 'food'
where food is a class name
when i change loss to categorical_crossentropy i get the error
Error when checking target: expected activation_24 to have shape (10,) but got array with shape (1,)
Update
'
nd=data.replace(['ads', 'Politic', 'eco', 'food', 'health', 'porno', 'religion', 'sports', 'tech','tv'],
[1, 2, 3, 4, 5,6,7,8,9,10])
model = Sequential()
model.add(Dense(512, input_shape=(10902,10)))#no. of words
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(10))
model.add(Activation('softmax'))
model.summary()
#categorical_crossentropy
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
y_train=keras.utils.to_categorical(y_train)
history = model.fit(X_train, y_train,
batch_size=100,
epochs=30,
verbose=1,
validation_split=0.5)'
You correctly used Dense(10) at the end, in order to produce ten results, one for each class.
But you should have your output y_train shaped also with 10 classes.
It should have shape (numberOfTweets, 10).
For this you should:
If you have an array with indices, transform them using the keras function y_train=to_categorical(y_train).
If you have them as strings, you must transform them in indices, and then use to_categorical

convolutional autoencoder to analyse long 1-D sequences

I have a dataset of 1-D vectors each 3001 digits long. I have used a simple convolutional network to perform binary classification on these sequences:
shape=train_X.shape[1:]
model = Sequential()
model.add(Conv1D(75,3,strides=1, input_shape=shape, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
The network achieves ~60% accuracy.
I now would like to create an autoencoder to discover the regular pattern that is distinguishing samples where the label is '1' vs those where it is '0'. i.e. to generate an exemplary sequence- that is representative of the '1' labeled samples.
Based on previous blogs and posts I have tried to put together an autoencoder that can achieve this:
input_sig = Input(batch_shape=(None,3001,1))
x = Conv1D(64,3, activation='relu', padding='same')(input_sig)
x1 = MaxPooling1D(2)(x)
x2 = Conv1D(32,3, activation='relu', padding='same')(x1)
x3 = MaxPooling1D(2)(x2)
flat = Flatten()(x3)
encoded = Dense(1,activation = 'relu')(flat)
x2_ = Conv1D(32, 3, activation='relu', padding='same')(x3)
x1_ = UpSampling1D(2)(x2_)
x_ = Conv1D(64, 3, activation='relu', padding='same')(x1_)
upsamp = UpSampling1D(2)(x_)
decoded = Conv1D(1, 3, activation='sigmoid', padding='same')(upsamp)
autoencoder = Model(input_sig, decoded)
autoencoder.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
This looks as follows:
autoencoder.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_57 (InputLayer) (None, 3001, 1) 0
_________________________________________________________________
conv1d_233 (Conv1D) (None, 3001, 64) 256
_________________________________________________________________
max_pooling1d_115 (MaxPoolin (None, 1500, 64) 0
_________________________________________________________________
conv1d_234 (Conv1D) (None, 1500, 32) 6176
_________________________________________________________________
max_pooling1d_116 (MaxPoolin (None, 750, 32) 0
_________________________________________________________________
conv1d_235 (Conv1D) (None, 750, 32) 3104
_________________________________________________________________
up_sampling1d_106 (UpSamplin (None, 1500, 32) 0
_________________________________________________________________
conv1d_236 (Conv1D) (None, 1500, 64) 6208
_________________________________________________________________
up_sampling1d_107 (UpSamplin (None, 3000, 64) 0
_________________________________________________________________
conv1d_237 (Conv1D) (None, 3000, 64) 12352
=================================================================
Total params: 28,096
Trainable params: 28,096
Non-trainable params: 0
hence everything seems to be going smoothly until I train the netowrk
autoencoder.fit(train_X,train_y,epochs=3,batch_size=100,validation_data=(test_X, test_y))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bsxcto/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1630, in fit
batch_size=batch_size)
File "/home/bsxcto/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1480, in _standardize_user_data
exception_prefix='target')
File "/home/bsxcto/miniconda3/lib/python3.6/site-packages/keras/engine/training.py", line 113, in _standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking target: expected conv1d_237 to have 3 dimensions, but got array with shape (32318, 1)
Hence I have tried adding a 'Reshape' layer before the last one.
upsamp = UpSampling1D(2)(x_)
flat = Flatten()(upsamp)
reshaped = Reshape((3000,64))(flat)
decoded = Conv1D(1, 3, activation='sigmoid', padding='same')(reshaped)
in which case the network looks as follows:
autoencoder.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_59 (InputLayer) (None, 3001, 1) 0
_________________________________________________________________
conv1d_243 (Conv1D) (None, 3001, 64) 256
_________________________________________________________________
max_pooling1d_119 (MaxPoolin (None, 1500, 64) 0
_________________________________________________________________
conv1d_244 (Conv1D) (None, 1500, 32) 6176
_________________________________________________________________
max_pooling1d_120 (MaxPoolin (None, 750, 32) 0
_________________________________________________________________
conv1d_245 (Conv1D) (None, 750, 32) 3104
_________________________________________________________________
up_sampling1d_110 (UpSamplin (None, 1500, 32) 0
_________________________________________________________________
conv1d_246 (Conv1D) (None, 1500, 64) 6208
_________________________________________________________________
up_sampling1d_111 (UpSamplin (None, 3000, 64) 0
_________________________________________________________________
flatten_111 (Flatten) (None, 192000) 0
_________________________________________________________________
reshape_45 (Reshape) (None, 3000, 64) 0
_________________________________________________________________
conv1d_247 (Conv1D) (None, 3000, 1) 193
=================================================================
Total params: 15,937
Trainable params: 15,937
Non-trainable params: 0
But the same error results:
Error when checking target: expected conv1d_247 to have 3 dimensions, but got array with shape (32318, 1)
My questions are:
1) Is this a feasible way of finding the pattern that is distinguishing samples with label '1' vs '0'?
2) how can I make the final layer accept the final output of the last upsampling layer?
original = Sequential()
original.add(Conv1D(75,repeat_length,strides=stride, input_shape=shape, activation='relu’,padding=‘same’))
original.add(MaxPooling1D(repeat_length))
original.add(Flatten())
original.add(Dense(1, activation='sigmoid'))
original.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
calculate_roc(original......)
mod=Sequential()
mod.add(original.layers[0])
mod.add(original.layers[1])
mod.add(Conv1D(75,window, activation='relu', padding='same'))
mod.add(UpSampling1D(window))
mod.add(Conv1D(1, 1, activation='sigmoid', padding='same'))
mod.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
mod.layers[0].trainable=False
mod.layers[1].trainable=False
mod.fit(train_X,train_X,epochs=1,batch_size=100)
decoded_imgs = mod.predict(test_X)
x=decoded_imgs.mean(axis=0)
plt.plot(x)

Resources