Feeding images along with labels to lstm - keras

I am trying to build an lstm model in keras to feed image sequences and also label corresponding to each image. LSTM model should predict the label of image at time t.
Each sequence has 5 images of size 224 x 224 with 3 channels
I am extracting the features from CNN model and feeding the lstm, the features and the labels at each time point.
Questions-
How do I add the labels, for instance, for the first sequence the labels are [0,1,1,2,3]. Each element in the list correspond to each image in the sequence.
What is the best way to feed in the labels of each sequence?
seqs=5
channels=3
rows=224
columns=224
inp = Input(shape=(seqs,
rows,
columns, channels))
cnn_base = VGG16(input_shape=(rows,columns,channels),weights="imagenet",include_top=False)
cnn_out = GlobalAveragePooling2D()(cnn_base.output)
cnn = Model(inputs=cnn_base.input, outputs=cnn_out)
encoded_frames = TimeDistributed(cnn)(inp)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(output_dim=1024, activation="relu")(encoded_sequence)
outputs = Dense(output_dim=2, activation="softmax")(hidden_layer)
model = Model([inp], outputs)
optimizer = Nadam(lr=0.002,beta_1=0.9,beta_2=0.999,epsilon=1e-08,schedule_decay=0.004)
model.compile(loss="categorical_crossentropy",
optimizer=optimizer,
metrics=["categorical_accuracy"])

Related

Keras, how to feed an Embedding layer with a random sampling of a Softmax layer

In the model I am constructing, I have the following layer:
y = layers.Dense(10, activation="softmax")(x)
And I want the next layer of this model to be an Embedding layer that "represent" the choice made by the Dense layer.
I.e, I want
to sample a choice from y (based on the probability "represented" by the values of the softmax)
to turn this choice into an Embedding Layer with vocabulary size 10.
Any idea how to do this ?
Regards
Initial answer
Add a layer that takes the argmax of the output of the dense layer before feeding it into the embedding layer to propagate the most likely category label:
import tensorflow as tf
from keras import backend as K
# generate some data
BATCH_SIZE,INPUT_DIM = (4,2)
x = tf.random.uniform([BATCH_SIZE,INPUT_DIM])
# model
NUM_CLASSES = 10
EMBEDDING_DIM = 10
dense = tf.keras.layers.Dense(NUM_CLASSES,activation='softmax')(x)
argmax = tf.keras.layers.Lambda(lambda x: K.argmax(x,axis=-1))(dense)
emb = tf.keras.layers.Embedding(NUM_CLASSES,EMBEDDING_DIM)(argmax)
Updated answer
If you want to propagate a randomly sampled category label instead of the most likely category label, you can do so by using tf.random.categorical. Note that tf.random.categorical takes logits as inputs, so you don't need the softmax activation at the end of the dense layer.
NUM_CLASSES = 10
EMBEDDING_DIM = 10
logits = tf.keras.layers.Dense(NUM_CLASSES)(x)
sample = tf.keras.layers.Lambda(lambda logits: tf.squeeze(tf.random.categorical(logits, 1)))(logits)
emb = tf.keras.layers.Embedding(NUM_CLASSES,EMBEDDING_DIM)(sample)

Sentiment Analysis using LSTM (Model has not not generate good output)

I Make a sentiment analysis model using LSTM but my model gives very bad prediction.
Here is the complete code
Dataset for amazon review
My LSTM model looks like this:
def ltsm_model(input_shape, word_to_vec_map, word_to_index):
"""
Function creating the ltsm_model model's graph.
Arguments:
input_shape -- shape of the input, usually (max_len,)
word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
Returns:
model -- a model instance in Keras
"""
### START CODE HERE ###
# Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
sentence_indices = Input(shape=input_shape, dtype='int32')
# Create the embedding layer pretrained with GloVe Vectors (≈1 line)
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
# Propagate sentence_indices through your embedding layer, you get back the embeddings
embeddings = embedding_layer(sentence_indices)
# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a batch of sequences.
X = LSTM(128, return_sequences=True)(embeddings)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X trough another LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a single hidden state, not a batch of sequences.
X = LSTM(128, return_sequences=False)(X)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
X = Dense(2, activation='relu')(X)
# Add a softmax activation
X = Activation('softmax')(X)
# Create Model instance which converts sentence_indices into X.
model = Model(inputs=[sentence_indices], outputs=X)
### END CODE HERE ###
return model
Here is what my training dataset looks like:
This is my testing data:
x_test = np.array(['amazing!: this soundtrack is my favorite music..'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+ str(np.argmax(model.predict(X_test_indices))))
I got following out for this:
amazing!: this soundtrack is my favorite music.. 0
But it should be positive sentiment and should be 1
Also this my fit model output:
How can I improve my model performance? This pretty bad model I suppose.

create Dense layers in a loop

I need to create multiple dense layers by for loop, the number of iteration depends on the number of labels. I want to create one dense layer for each label. Each label has a different set of features, so I want to predict each label separately with corresponding feature set in each dense layer. Is that possible? The following code is my attempt.
layers = []
for i in range(num_labels):
h1 = Dense(num_genes_per+10, kernel_initializer='normal', input_dim = num_genes_per, activation='relu')(inputs)
h2 = Dense(int(num_genes_per/2), kernel_initializer='normal', activation='relu')(h1)
output= Dense(1, kernel_initializer='normal', activation='linear')(h2)
layers.append(output)
merged_output = concatenate(layers, axis=1)
model = Model(inputs, merged_output)
The output of each h2 will have shape [batch, 1], and the merged_output will have shape [batch, num_labels]. Is there any error in the above code?
I know it is not efficient, but if I concatenate the different set of features into one input tensor, and use only one dense layer to predict all labels at same time, would it harms the prediction accuracy?
It depends on how you defined features and labels. If features 1, 2 and 3 are used to predict label 1 and they have no relation to label 2, It does not make sense to include it in label 3 inference.

How do I convert BERT embeddings into a tensor for feeding into an LSTM?

I am trying to replace Word2Vec word embeddings by sentence embeddings by BERT in a siamese LSTM network (https://github.com/eliorc/Medium/blob/master/MaLSTM.ipynb). However my BERT embeddings are (1,768) shaped matrix and not tensors that can be fed to a keras layer. I wanted to know if it would be possible to convert it.
I have found a way to replace word embeddings by Universal sentence embeddings (http://hunterheidenreich.com/blog/google-universal-sentence-encoder-in-keras/) I tried to modify the code of the LSTM to use BERT sentence embeddings from the following service (https://github.com/hanxiao/bert-as-service#what-is-it).
# Model variables for LSTM
n_hidden = 50
gradient_clipping_norm = 1.25
batch_size = 64
n_epoch = 25
def BERTEmbedding(x):
#x is an input tensor
encoded= bc.encode(tf.squeeze(tf.cast(x, tf.string)))
return encoded
def exponent_neg_manhattan_distance(left, right):
''' Helper function for the similarity estimate of the LSTMs outputs'''
return K.exp(-K.sum(K.abs(left-right), axis=1, keepdims=True))
left_input_text = Input(shape=(1,), dtype=tf.string)
right_input_text = Input(shape=(1,), dtype=tf.string)
encoded_left = Lambda(BERTEmbedding, output_shape=(768, ))(left_input_text)
encoded_right = Lambda(BERTEmbedding, output_shape=(768, ))(right_input_text)
# Since this is a siamese network, both sides share the same LSTM
shared_lstm = LSTM(n_hidden)
left_output = shared_lstm(encoded_left)
right_output = shared_lstm(encoded_right)
I am getting the following error message TypeError: "Tensor("lambda_3/Squeeze:0", dtype=string)" must be , but received class 'tensorflow.python.framework.ops.Tensor'
LSTM takes three dimensional input [ Batch_size, sequence_length, feature_dim ]
From bert you can get two types of embeddings :
Token representation for each sequence
'CLS' token representation [ where 'CLS' represent 'CLASSIFICATION ]
If you take Token 'CLS' representation, it would be [1,768] but if
you take all sequence output it will be [ len of sequence, 768 ]
Now if you train the model in batch, it will become
[ Batch_size,len_of_sentence, 768] that's what LSTM encoder takes.
An alternative way, You can add one extra dim [batch_size, 768, 1]
and feed it to LSTM.
Adding extra dim in sequence length doesn't make sense because LSTM
unfold according to the len of sequence.

Change label format for training

Normally, if you train with keras, model.fit expects the train data to have a shape of (samples, timesteps, input) and a label of (samples, outputs). Is there a way to change the matching label to (samples*timesteps, output) or (samples, timesteps, input). So one sample matches len(sample)*label and not only one label?
Yes. You can have whatever shape you want as the output layer. For instance auto-encoders will have the same output shape as input shape.
A toy example:
sequence_length = 20
n_features = 4
def make_model():
inp = Input(shape=(sequence_length, n_features,))
encoder = LSTM(16, return_sequences=True)(inp)
vector = LSTM(32)(encoder)
decoder_in = RepeatVector(sequence_length)(vector)
decoder = LSTM(16, return_sequences=True)(decoder_in)
out = Dense(4)(decoder)
model = Model(inp, out)
model.compile('adam', 'mse')
return model
model = make_model()
model.summary()
In this case the vector layer has shape (32,) (i.e. there is a dimensionality reduction compared to the input) and the output layer has the same dimensions as the input.

Resources