Stacking fully connected layers on top of two autoencoders for classification - keras

I'm training autoencoders on 2D images using convolutional layers and would like to put fully connected layers on top of encoder part for classification. My autoencoder is defined as follows (just a simple one for illustration):
def encoder(input_img):
conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
conv1 = BatchNormalization()(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1)
conv2 = BatchNormalization()(conv2)
return conv2
def decoder(conv2):
conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv2)
conv3 = BatchNormalization()(conv3)
up1 = UpSampling2D((2,2))(conv3)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up1)
return decoded
autoencoder = Model(input_img, decoder(encoder(input_img)))
My input images are of size (64,80,1). Now when stacking fully connected layers on top of the encoder I'm doing the following:
def fc(enco):
flat = Flatten()(enco)
den = Dense(128, activation='relu')(flat)
out = Dense(num_classes, activation='softmax')(den)
return out
encode = encoder(input_img)
full_model = Model(input_img,fc(encode))
for l1,l2 in zip(full_model.layers[:19],autoencoder.layers[0:19]):
l1.set_weights(l2.get_weights())
For only one autoencoder this works but the problem now is that I have 2 autoencoders trained on sets of images all of size (64, 80, 1).
For every label I have as input two images of size (64, 80, 1) and one label (0 or 1). I need to feed image 1 into the first autoencoder and image 2 into the second autoencoder. But how can I combine both autoencoders in the full_model in above code?
Another problem is also the input to the fit() method. Until now with only one autoencoder the input consisted just of numpy arrays of images (e.g. (1000,64,80,1)) but with two autoencoders I would have two sets of images as input. How can I feed this into the fit() method so that the first autoencoder consumes the first set of images and the second autoencoder the second set?

Q: How can I combine both autoencoders in full_model?
A: You could concatenate the bottleneck layers enco_1 and enco_2 of both autoencoders within fc:
def fc(enco_1, enco_2):
flat_1 = Flatten()(enco_1)
flat_2 = Flatten()(enco_2)
flat = Concatenate()([enco_1, enco_2])
den = Dense(128, activation='relu')(flat)
out = Dense(num_classes, activation='softmax')(den)
return out
encode_1 = encoder_1(input_img_1)
encode_2 = encoder_2(input_img_2)
full_model = Model([input_img_1, input_img_2], fc(encode_1, encode_2))
Note that the last part where you manually set the weights of the encoder is unnecessary - see https://keras.io/getting-started/functional-api-guide/#shared-layers
Q: How can I feed this into the fit method so that the first autoencoder consumes the first set of images and the second autoencoder the second set?
A: In the code above, note that the two encoders are fed with different inputs (one for each image set). Now, provided that the model is defined in this way, you can call full_model.fit as follows:
full_model.fit(x=[images_set_1, images_set_2],
y=label,
...)
NOTE: Not tested.

Related

Getting the output from a specific layer in a CNN

I'm trying to understand how to work on CNNs. My question is that I would like to be able to extract the "encoded" version of an image from this model. The encoded version should be the output of the Dense 4096 .
def get_siamese_model(input_shape):
# Define the tensors for the two input images
left_input = Input(input_shape)
right_input = Input(input_shape)
# Convolutional Neural Network
model = Sequential()
model.add(Conv2D(64, (10,10), activation='relu', input_shape=input_shape,
kernel_initializer=initialize_weights, kernel_regularizer=l2(2e-4)))
model.add(MaxPooling2D())
model.add(Conv2D(128, (7,7), activation='relu',
kernel_initializer=initialize_weights,
bias_initializer=initialize_bias, kernel_regularizer=l2(2e-4)))
model.add(MaxPooling2D())
model.add(Conv2D(128, (4,4), activation='relu', kernel_initializer=initialize_weights,
bias_initializer=initialize_bias, kernel_regularizer=l2(2e-4)))
model.add(MaxPooling2D())
model.add(Conv2D(256, (4,4), activation='relu', kernel_initializer=initialize_weights,
bias_initializer=initialize_bias, kernel_regularizer=l2(2e-4)))
model.add(Flatten())
model.add(Dense(4096, activation='sigmoid',
kernel_regularizer=l2(1e-3),
kernel_initializer=initialize_weights,bias_initializer=initialize_bias))
# Generate the encodings (feature vectors) for the two images
encoded_l = model(left_input)
encoded_r = model(right_input)
# Add a customized layer to compute the absolute difference between the encodings
L1_layer = Lambda(lambda tensors:K.abs(tensors[0] - tensors[1]))
L1_distance = L1_layer([encoded_l, encoded_r])
# Add a dense layer with a sigmoid unit to generate the similarity score
prediction = Dense(1,activation='sigmoid',bias_initializer=initialize_bias)(L1_distance)
# Connect the inputs with the outputs
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)
# return the model
return siamese_net
Moreover, can I give as input only one image and get the encoding of that image?
Many thanks!
Edit: I have tried by doing this
layer_name = 'sequential_3'
model2= Model(inputs=model.input, outputs=model.get_layer(layer_name).output)
but i get this error
ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 105, 105, 1), dtype=tf.float32, name='conv2d_12_input'), name='conv2d_12_input', description="created by layer 'conv2d_12_input'") at layer "conv2d_12". The following previous layers were accessed without issue: []
And I don't know how to fix it

How to use tensorflow's image classification tutorial for classifying unseen images not in the original training or validation dataset?

TensorFlow has this tutorial, which I am able to run.
Then I define this function for using the model:
def predict1(model,img):
img = img.resize((150,150))
im = np.array(img)
im = im.reshape(-1,np.shape(im)[0],np.shape(im)[1],3) # resize for 'batch'
preds = model.predict(im) # predict
return preds
Then actually use it:
img = Image.open('MYIMAGE.jpg')
predict1(model,img)
The output is:
array([[3415.0505]], dtype=float32)
I observe that when a positive number (here 3415.0505) corresponds to one of the categories, a negative number to another one (I realize this after several tries). This is good, I can write a function which returns me a string 'dog' or 'cat', based on the sign of the returned string.
However, I think I am missing the point. What is a better way of actually getting the 'dog' or 'cat' prediction?
My method based on the signs would fail if there are numerous categories. I'll use this to classifier to classify into many categories, that's why I would need a better method.
This is based on the tutorial you are trying to do,
I have made some modifications to the code in the tutorial since what you are trying to do is a categorical classification, not a binary one.
For the sake of demonstration
For the train_gen:
train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
directory=train_dir,
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='categorical') // change from binary to categorical
For the validation_gen:
val_data_gen = validation_image_generator.flow_from_directory(batch_size=batch_size,
directory=validation_dir,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='categorical') // changed from binary to categorical
I also updated the output layer since its categorical
model = Sequential([
Conv2D(16, 3, padding='same', activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
MaxPooling2D(),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(2) #Changed to 2 since you wanted a multiple output e.g.[ 0.333, -0.1333]
])
On the prediction part, I have used tf.math.argmax method, It returns the index of the highest value from the prediction.
listofLabels = ['dog', 'cat']
x = model.predict(sample_training_images)
labels = tf.math.argmax(x, axis = 1)
print(labels)
for label in labels:
print(listofLabels[label])

Keras multi-class prediction only returning 1 prediction with softmax and categorical_crossentropy

I'm using Keras and Tensorflow to train a model that predicts a matching font based on an image of some letters. My folder contains data with a separate folder with each image of the letter in varying forms. My code for training the model looks like this:
LETTER_IMAGES_FOLDER = "datasets"
MODEL_FILENAME = "fonts_model.hdf5"
MODEL_LABELS_FILENAME = "model_labels.dat"
data = pd.read_csv('annotations.csv')
paths = list(data['Path'].values)
Y = list(data['Font'].values)
encoder = LabelEncoder()
encoder.fit(Y)
Y = encoder.transform(Y)
Y = np_utils.to_categorical(Y)
data = []
# loop over the input images
for image_file in paths:
# Load the image and convert it to grayscale
image = cv2.imread(image_file)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Add a third channel dimension to the image to make Keras happy
image = np.expand_dims(image, axis=2)
# Add the letter image and it's label to our training data
data.append(image)
data = np.array(data, dtype="float") / 255.0
train_x, test_x, train_y, test_y = model_selection.train_test_split(data,Y,test_size = 0.1, random_state = 0)
# Save the mapping from labels to one-hot encodings.
# We'll need this later when we use the model to decode what it's predictions mean
with open(MODEL_LABELS_FILENAME, "wb") as f:
pickle.dump(encoder, f)
# Build the neural network!
model = Sequential()
# First convolutional layer with max pooling
model.add(Conv2D(20, (5, 5), padding="same", input_shape=(100, 100, 1), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Second convolutional layer with max pooling
model.add(Conv2D(50, (5, 5), padding="same", activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(500, activation="relu"))
print (len(encoder.classes_))
model.add(Dense(len(encoder.classes_), activation="softmax"))
# Ask Keras to build the TensorFlow model behind the scenes
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
# Train the neural network
model.fit(train_x, train_y, validation_data=(test_x, test_y), batch_size=32, epochs=2, verbose=1)
# Save the trained model to disk
model.save(MODEL_FILENAME)
Once the model has been created I'm predicting with it as follows:
predictions = model.predict(letter_image)
print (predictions) # this has the length of 1
The problem is that "predictions" is always an array of size 1 and I'm not sure why. I'm using softmax, categorical_crossentropy and my Dense value is greater than 1 in the last layer. Could someone please tell me why I'm not getting the top n predictions here?
I've also tried sigmoid with binary_crossentropy but get the same result. I think there's something more to it that I'm missing.

Dimension errors in neural network in Keras

I am trying to implement a neural network where I merge/concatenate a fully connected neural network with a convolution neural network. But when I fit the model, I get the following error:
ValueError: All input arrays (x) should have the same number of
samples. Got array shapes: [(1, 100, 60, 4500), (100, 4500)]
I have two different inputs:
image(dimensions: 1,100,60,4500) where 1 is the channel, 100: # of sample, 60*4500 (dimension of my image). This goes to my convolution neural network
positions(dimensions: 100,4500): where 100 refers to samples.
Dimension for my output is 100,2.
The code for my neural network is:
###Convolution neural network
b1 = Sequential()
b1.add(Conv2D(128*2, kernel_size=3,activation='relu',data_format='channels_first',
input_shape=(100,60,4500)))
b1.add(Conv2D(128*2, kernel_size=3, activation='relu'))
b1.add(Dropout(0.2))
b1.add(Conv2D(128*2, kernel_size=4, activation='relu'))
b1.add(Dropout(0.2))
b1.add(Flatten())
b1.summary()
###Fully connected feed forward neural network
b2 = Sequential()
b2.add(Dense(64, input_shape = (4500,), activation='relu'))
b2.add(Dropout(0.1))
b2.summary()
model = Sequential()
###Concatenating the two networks
concat = concatenate([b1.output, b2.output], axis=-1)
x = Dense(256, activation='relu', kernel_initializer='normal')(concat)
x = Dropout(0.25)(x)
output = Dense(2, activation='softmax')(x)
model = Model([b1.input, b2.input], [output])
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adam(),
metrics=['accuracy'])
history = model.fit([image, positions], Ytest, batch_size=10,
epochs=1,
verbose=1)
Also, the reason why my 'image' array is 4 dimensional is because in the beginning it was just (100,60,4500) but then I ran into the following error:
ValueError: Error when checking input: expected conv2d_10_input to
have 4 dimensions, but got array with shape (100, 60, 4500)
And upon googling I found out that it expects # of channels as an input too. And after I added the # of channel, this error went away but then I ran into the other error that I mentioned in the beginning.
So can someone tell me how to solve for the error (the one I specified in the beginning)? Help would be appreciated.
It is not a good practice to mix Sequential and Functional API.
You can implement the model like this
i1 = Input(shape=(1, 60, 4500))
c1 = Conv2D(128*2, kernel_size=3,activation='relu',data_format='channels_first')(i1)
c1 = Conv2D(128*2, kernel_size=3, activation='relu')(c1)
c1 = Dropout(0.2)(c1)
c1 = Conv2D(128*2, kernel_size=4, activation='relu')(c1)
c1 = Dropout(0.2)(c1)
c1 = Flatten()(c1)
i2 = Input(shape=(4500, ))
c2 = Dense(64, input_shape = (4500,), activation='relu')(i2)
c2 = Dropout(0.2)(c2)
c = concatenate([c1, c2])
x = Dense(256, activation='relu', kernel_initializer='normal')(c)
x = Dropout(0.25)(x)
output = Dense(2, activation='softmax')(x)
model = Model([i1, i2], [output])
model.summary()
Note the shape of i1 is shape=(1, 60, 4500). You have set data_format='channels_first' in Conv2D layer hence you need 1 in the beginning.
Compiled the model like this
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
Placeholder data
import numpy as np
X_img = np.zeros((100, 1, 60, 4500))
X_pos = np.ones((100, 4500))
Y = np.zeros((100, 2))
Training
history = model.fit([X_img, X_pos], Y, batch_size=1,
epochs=1,
verbose=1)
You number of samples (batch size) should always be the first dimension. So, your data should have shape (100, 1, 60, 4500) for image and (100, 4500) for positions. The argument channels_first for the Conv2D layer means that the channels is the first non-batch dimension.
You also need to change the input shape to (1, 60, 4500) in the first Conv2D layer.

Add LSTM layer after Conv2D layers and add some other inputs

I'm working on a racing game that uses reinforcement learning. To train the model I'm facing an issue when implementing the neural network. I found some examples that use CNN. But it seems like adding extra LSTM layer will increase the model efficiency. I found the following example.
https://team.inria.fr/rits/files/2018/02/ICRA18_EndToEndDriving_CameraReady.pdf
The network I need to implement
The problem is I'm not sure how can I implement the LSTM layer here. How can I give following inputs to LSTM layer
Processed image output
current speed
last action
Here is the code I'm currently using. I want to add the LSTM layer after Conv2D.
self.__nb_actions = 28
self.__gamma = 0.99
#Define the model
activation = 'relu'
pic_input = Input(shape=(59,255,3))
img_stack = Conv2D(16, (3, 3), name='convolution0', padding='same', activation=activation, trainable=train_conv_layers)(pic_input)
img_stack = MaxPooling2D(pool_size=(2,2))(img_stack)
img_stack = Conv2D(32, (3, 3), activation=activation, padding='same', name='convolution1', trainable=train_conv_layers)(img_stack)
img_stack = MaxPooling2D(pool_size=(2, 2))(img_stack)
img_stack = Conv2D(32, (3, 3), activation=activation, padding='same', name='convolution2', trainable=train_conv_layers)(img_stack)
img_stack = MaxPooling2D(pool_size=(2, 2))(img_stack)
img_stack = Flatten()(img_stack)
img_stack = Dropout(0.2)(img_stack)
img_stack = Dense(128, name='rl_dense', kernel_initializer=random_normal(stddev=0.01))(img_stack)
img_stack=Dropout(0.2)(img_stack)
output = Dense(self.__nb_actions, name='rl_output', kernel_initializer=random_normal(stddev=0.01))(img_stack)
opt = Adam()
self.__action_model = Model(inputs=[pic_input], outputs=output)
self.__action_model.compile(optimizer=opt, loss='mean_squared_error')
self.__action_model.summary()
Thanks
There are various methods to do that, First, reshape the output of conv output and feed it to lstm layer. Here is an explained example with various method Shaping data for LSTM, and feeding output of dense layers to LSTM

Resources