Problem with fine-tuning pretrained models like VGG16, ResNet50 - stack on exactly 50% - keras

My siamese network works with my custom layers and I get +90% accuracy, but when I try to use pretrained models like VGG16 or ResNet50 I always get exactly 50% after each of epoch. I even copied VGG16 archicteture and learned from 0 and had good results. However when I use pretrained model I always get 50%. I guess I have done somehing wrong with reading pretrained model? Beacuse everything its ok with other aspects, as I said it works with my custom netowrks. I also tried some soultions from others posts like changing learning rate etc. but nothing really helps.
Here's my code:
def build_siamese_vgg_model():
inputs = Input((64, 64, 3))
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(64, 64,3))(inputs)
flat = Flatten()(base_model)
outputs = Dense(256)(flat)
model = Model(inputs, outputs)
return model
# Read data
(X_train, Y_train) = functions.read_data("data/train")
(X_val, Y_val) = functions.read_data("data/test")
# Generate pairs
(X_train_pairs, Y_train_pairs) = functions.generate_pairs(X_train, Y_train)
(X_val_pairs, Y_val_pairs) = functions.generate_pairs(X_val, Y_val)
imgA = Input(shape=(64, 64, 3))
imgB = Input(shape=(64, 64, 3))
featureExtractor = build_siamese_vgg_model()
featsA = featureExtractor(imgA)
featsB = featureExtractor(imgB)
distance = Lambda(functions.euclidean_distance)([featsA, featsB])
outputs = Dense(1, activation="sigmoid")(distance)
model_1 = Model(inputs=[imgA, imgB], outputs=distance)
model_1.compile(loss="binary_crossentropy", optimizer=tf.keras.optimizers.Adam(), metrics=["accuracy"])
his = model_1.fit([X_train_pairs[:, 0], X_train_pairs[:, 1]], Y_train_pairs[:],
validation_data=([X_val_pairs[:, 0], X_val_pairs[:, 1]], Y_val_pairs[:]), batch_size=64, epochs=20)

Related

Pre-trained CNN for multi-output models

How to use Keras Applications for a multi-output model?
I am trying to use VGG16 for a multi-output model but get an value error ValueError: Input 0 of layer block1_conv1 is incompatible with the layer: expected axis -1 of input shape to have value 3 but received input with shape (None, 64, 64, 1). I am a bit lost and can't seem to find anything on multioutputs for VGG16 or any other Keras pre-trained application.
I am using this as a guide: Transfer Learning in Keras with Computer Vision
# load model without classifier layers
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(IMG_HEIGHT, IMG_WIDTH,3))
# add new classifier layers
sex_x = base_model.output
sex_x = GlobalAveragePooling2D()(sex_x)
sex_x = Dropout(0.5)(gender_x)
output_sex = Dense(2, activation='sigmoid')(sex_x)
weight_x = base_model.output
weight_x = GlobalAveragePooling2D()(weight_x)
weight_x = Dropout(0.5)(weight_x)
output_weight = Dense(1, activation='linear')(weight_x)
# define new model
model = Model(inputs=base_model.inputs, outputs=[output_sex, output_weight])
model.compile(optimizer = 'adam', loss =['binary_crossentropy','mae',],metrics=['accuracy'])
history = model.fit(x_train,[y_train[:,0],y_train[:,1]],validation_data=(x_test,[y_test[:,0],y_test[:,1]]),epochs = 10, batch_size=128,shuffle = True)
Any ideas as to what I am doing wrong?
`
According to this error, your model input asks 3 channel RGB but your data seems to be in 1 channel (grayscale). Make sure you data has RGB images.

How to apply triplet loss function in resnet50 for the purpose of deepranking

I try to create image embeddings for the purpose of deep ranking using a triplet loss function. The idea is that we can take a pretrained CNN (e.g. resnet50 or vgg16), remove the FC layers and add an L2 normalization function to retrieve unit vectors which can then be compared via a distance metric (e.g. cosine similarity). As far as I understand the predicted vectors that come out of a pretrained CNN are not optimal, but are a good start. By adding the triplet loss function we can re-train the network to keep similar pictures 'close' to each other and different pictures 'far' apart in the feature space. Inspired by this notebook , I tried to setup the following code, but I get an error ValueError: The name "conv1_pad" is used 3 times in the model. All layer names should be unique..
# Anchor, Positive and Negative are numpy arrays of size (200, 256, 256, 3), same for the test images
pic_size=256
def shared_dnn(inp):
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(3, pic_size, pic_size),
input_tensor=inp)
x = base_model.output
x = Flatten()(x)
x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x)
for layer in base_model.layers[15:]:
layer.trainable = False
return x
anchor_input = Input((3, pic_size,pic_size ), name='anchor_input')
positive_input = Input((3, pic_size,pic_size ), name='positive_input')
negative_input = Input((3, pic_size,pic_size ), name='negative_input')
encoded_anchor = shared_dnn(anchor_input)
encoded_positive = shared_dnn(positive_input)
encoded_negative = shared_dnn(negative_input)
merged_vector = concatenate([encoded_anchor, encoded_positive, encoded_negative], axis=-1, name='merged_layer')
model = Model(inputs=[anchor_input,positive_input, negative_input], outputs=merged_vector)
#ValueError: The name "conv1_pad" is used 3 times in the model. All layer names should be unique.
model.compile(loss=triplet_loss, optimizer=adam_optim)
model.fit([Anchor,Positive,Negative],
y=Y_dummy,
validation_data=([Anchor_test,Positive_test,Negative_test],Y_dummy2), batch_size=512, epochs=500)
I am new to keras and I am not quite sure how to solve this. The author in the link above creates his own CNN from scratch, but I would like to build it upon resnet (or vgg16). How can I configure ResNet50 to use a triplet loss function (in the link above you find also the source code for the triplet loss function).
In your ResNet50 definition, you've written
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(3, pic_size, pic_size), input_tensor=inp)
Remove the input_tensor argument. Change input_shape=inp.
If you're using TF backend as you mentioned the input should be (256, 256, 3), then your input should be (pic_size, pic_size, 3).
def shared_dnn(inp):
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=inp)
x = base_model.output
x = Flatten()(x)
x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x)
for layer in base_model.layers[15:]:
layer.trainable = False
return x
img_shape=(256, 256, 3)
anchor_input = Input(img_shape, name='anchor_input')
positive_input = Input(img_shape, name='positive_input')
negative_input = Input(img_shape, name='negative_input')
encoded_anchor = shared_dnn(anchor_input)
encoded_positive = shared_dnn(positive_input)
encoded_negative = shared_dnn(negative_input)
merged_vector = concatenate([encoded_anchor, encoded_positive, encoded_negative], axis=-1, name='merged_layer')
model = Model(inputs=[anchor_input,positive_input, negative_input], outputs=merged_vector)
model.compile(loss=triplet_loss, optimizer=adam_optim)
model.fit([Anchor,Positive,Negative],
y=Y_dummy,
validation_data=([Anchor_test,Positive_test,Negative_test],Y_dummy2), batch_size=512, epochs=500)
The model plot is as follows:
model_plot

Image Classification - Movie posters by genre. Keras model overfitting

Please note everyone, that this is a "Multi-Label" problem as opposed to Multi-Class. It's a multi-label, multi-class problem. I am trying to classify movie posters by genre by reading the images alone. I have over 22000 images overall. I have one hot encoded the genres for the Keas model to predict multiple genres for movies (Used sigmoid in final dense layer activation). I need recommendation for a proper batch size and number of epochs. I am using VGG16. I would also like to know how to choose which layers to freeze.
I have tried for 25 epochs using VGG16. The model tends to over-fit though. Batch size was 10.
Here's my training images count by genre (Note: A movie could be more than one genre which is more often than not, the case)
{'Action': 2226.0,
'Thriller': 2788.0,
'Drama': 9283.0,
'Romance': 2184.0,
'Horror': 2517.0,
'Sci-Fi': 756.0,
'Mystery': 918.0,
'Adventure': 1105.0,
'Animation': 583.0,
'Crime': 1369.0,
'Comedy': 5524.0,
'Fantasy': 735.0,
'Family': 991.0,
'Music': 319.0,
'History': 359.0,
'War': 177.0,
'Musical': 191.0,
'Biography': 484.0,
'Sport': 190.0}
conv_base = VGG16(weights = 'imagenet', include_top = False, input_shape = (224,224,3))
for layer in conv_base.layers[6:]:
layer.trainable = False
early = EarlyStopping(monitor='acc', patience=7, mode='auto')
classifier = Sequential()
classifier.add(conv_base)
from keras.layers import Dropout
classifier.add(Dense(units = 128, activation = "relu"))
classifier.add(Dropout(0.2))
classifier.add(Dense(units = 128, activation = "relu"))
classifier.add(Dropout(0.1))
classifier.add(Dense(units = 19, activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ["accuracy"])
classifier.fit(X_train, y_train, epochs=15, batch_size=10, callbacks = [early])

Keras gets None gradient error when connecting models

I’m trying to implement a Visual Storytelling model using Keras with a hierarchical RNN model, basically Neural Image Captioner style but over a sequence of photos with a bidirectional RNN on top of the decoder RNNs.
I implemented and tested the three parts of this model, CNN, BRNN and decoder RNN separately but got this error when trying to connect them:
ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
My code are as follows:
#vgg16 model with the fc2 layer as output
cnn_base_model = self.cnn_model.base_model
brnn_model = self.brnn_model.model
rnn_model = self.rnn_model.model
cnn_part = TimeDistributed(cnn_base_model)
img_input = Input((self.story_length,) + self.cnn_model.input_shape, name='brnn_img_input')
extracted_feature = cnn_part(img_input)
#[None, 5, 512], a 512 length vector for each picture in the story
brnn_feature = brnn_model(extracted_feature)
#[None, 5, 25], input groundtruth word indices fed as input when training
decoder_input = Input((self.story_length, self.max_length), name='brnn_decoder_input')
decoder_outputs = []
for i in range(self.story_length):
#separate timesteps for decoding
decoder_input_i = Lambda(lambda x: x[:, i, :])(decoder_input)
brnn_feature_i = Lambda(lambda x: x[:, i, :])(brnn_feature)
#the problem persists when using Dense instead of the Lambda layers above
#decoder_input_i = Dense(25)(Reshape((125,))(decoder_input))
#brnn_feature_i = Dense(512)(Reshape((5 * 512,))(brnn_feature))
decoder_output_i = rnn_model([decoder_input_i, brnn_feature_i])
decoder_outputs.append(decoder_output_i)
decoder_output = Concatenate(axis=-2, name='brnn_decoder_output')(decoder_outputs)
self.model = Model([img_input, decoder_input], decoder_output)
And codes for the BRNN:
image_feature = Input(shape=(self.story_length, self.img_feature_dim,))
image_emb = TimeDistributed(Dense(self.lstm_size))(image_feature)
brnn = Bidirectional(LSTM(self.lstm_size, return_sequences=True), merge_mode='concat')(image_emb)
brnn_emb = TimeDistributed(Dense(self.lstm_size))(brnn)
self.model = Model(inputs=image_feature, outputs=brnn_emb)
And RNN:
#[None, 512], the vector to be decoded
initial_input = Input(shape=(self.input_dim,), name='rnn_initial_input')
#[None, 25], the groundtruth word indices fed as input when training
decoder_inputs = Input(shape=(None,), name='rnn_decoder_inputs')
decoder_input_masking = Masking(mask_value=0.0)(decoder_inputs)
decoder_input_embeddings = Embedding(self.vocabulary_size, self.emb_size,
embeddings_regularizer=l2(regularizer))(decoder_input_masking)
decoder_input_dropout = Dropout(.5)(decoder_input_embeddings)
initial_emb = Dense(self.emb_size,
kernel_regularizer=l2(regularizer))(initial_input)
initial_reshape = Reshape((1, self.emb_size))(initial_emb)
initial_masking = Masking(mask_value=0.0)(initial_reshape)
initial_dropout = Dropout(.5)(initial_masking)
decoder_lstm = LSTM(self.hidden_dim, return_sequences=True, return_state=True,
recurrent_regularizer=l2(regularizer),
kernel_regularizer=l2(regularizer),
bias_regularizer=l2(regularizer))
_, initial_hidden_h, initial_hidden_c = decoder_lstm(initial_dropout)
decoder_outputs, decoder_state_h, decoder_state_c = decoder_lstm(decoder_input_dropout,
initial_state=[initial_hidden_h, initial_hidden_c])
decoder_output_dense_layer = TimeDistributed(Dense(self.vocabulary_size, activation='softmax',
kernel_regularizer=l2(regularizer)))
decoder_output_dense = decoder_output_dense_layer(decoder_outputs)
self.model = Model([decoder_inputs, initial_input], decoder_output_dense)
I’m using adam as optimizer and sparse_categorical_crossentropy as loss.
At first I thought the problem is with the Lambda layers used for splitting the timesteps but the problem persists when I replaced them with Dense layers (which are guarantee
I had a similar error and it turned out I was suppose to build the layers (in my custom layer or model) in the init() like so:
self.lstm_custom_1 = keras.layers.LSTM(128,batch_input_shape=batch_input_shape, return_sequences=False,stateful=True)
self.lstm_custom_1.build(batch_input_shape)
self.dense_custom_1 = keras.layers.Dense(32, activation = 'relu')
self.dense_custom_1.build(input_shape=(batch_size, 128))```
The issue is actually with the Embedding layer, I think. Gradients can't pass through an Embedding layer, so unless it's the first layer in the model it won't work.

Is it possible to train using same model with two inputs?

Hello I have a some question for keras.
currently i want implement some network
using same cnn model, and use two images as input of cnn model
and use two result of cnn model, provide to Dense model
for example
def cnn_model():
input = Input(shape=(None, None, 3))
x = Conv2D(8, (3, 3), strides=(1, 1))(input)
x = GlobalAvgPool2D()(x)
model = Model(input, x)
return model
def fc_model(cnn1, cnn2):
input_1 = cnn1.output
input_2 = cnn2.output
input = concatenate([input_1, input_2])
x = Dense(1, input_shape=(None, 16))(input)
x = Activation('sigmoid')(x)
model = Model([cnn1.input, cnn2.input], x)
return model
def main():
cnn1 = cnn_model()
cnn2 = cnn_model()
model = fc_model(cnn1, cnn2)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x=[image1, image2], y=[1.0, 1.0], batch_size=1, ecpochs=1)
i want to implement model something like this, and train models
but i got error message like below :
'All layer names should be unique'
Actually i want use only one CNN model as feature extractor and finally use two features to predict one float value as 0.0 ~ 1.0
so whole system -->>
using two images and extract features from same CNN model, and features are provided to Dense model to get one floating value
Please, help me implement this system and how to train..
Thank you
See the section of the Keras documentation on shared layers:
https://keras.io/getting-started/functional-api-guide/
A code snippet from the documentation above demonstrating this:
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)

Resources