Keras, how to feed an Embedding layer with a random sampling of a Softmax layer - keras

In the model I am constructing, I have the following layer:
y = layers.Dense(10, activation="softmax")(x)
And I want the next layer of this model to be an Embedding layer that "represent" the choice made by the Dense layer.
I.e, I want
to sample a choice from y (based on the probability "represented" by the values of the softmax)
to turn this choice into an Embedding Layer with vocabulary size 10.
Any idea how to do this ?
Regards

Initial answer
Add a layer that takes the argmax of the output of the dense layer before feeding it into the embedding layer to propagate the most likely category label:
import tensorflow as tf
from keras import backend as K
# generate some data
BATCH_SIZE,INPUT_DIM = (4,2)
x = tf.random.uniform([BATCH_SIZE,INPUT_DIM])
# model
NUM_CLASSES = 10
EMBEDDING_DIM = 10
dense = tf.keras.layers.Dense(NUM_CLASSES,activation='softmax')(x)
argmax = tf.keras.layers.Lambda(lambda x: K.argmax(x,axis=-1))(dense)
emb = tf.keras.layers.Embedding(NUM_CLASSES,EMBEDDING_DIM)(argmax)
Updated answer
If you want to propagate a randomly sampled category label instead of the most likely category label, you can do so by using tf.random.categorical. Note that tf.random.categorical takes logits as inputs, so you don't need the softmax activation at the end of the dense layer.
NUM_CLASSES = 10
EMBEDDING_DIM = 10
logits = tf.keras.layers.Dense(NUM_CLASSES)(x)
sample = tf.keras.layers.Lambda(lambda logits: tf.squeeze(tf.random.categorical(logits, 1)))(logits)
emb = tf.keras.layers.Embedding(NUM_CLASSES,EMBEDDING_DIM)(sample)

Related

Sentiment Analysis using LSTM (Model has not not generate good output)

I Make a sentiment analysis model using LSTM but my model gives very bad prediction.
Here is the complete code
Dataset for amazon review
My LSTM model looks like this:
def ltsm_model(input_shape, word_to_vec_map, word_to_index):
"""
Function creating the ltsm_model model's graph.
Arguments:
input_shape -- shape of the input, usually (max_len,)
word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
Returns:
model -- a model instance in Keras
"""
### START CODE HERE ###
# Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
sentence_indices = Input(shape=input_shape, dtype='int32')
# Create the embedding layer pretrained with GloVe Vectors (≈1 line)
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
# Propagate sentence_indices through your embedding layer, you get back the embeddings
embeddings = embedding_layer(sentence_indices)
# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a batch of sequences.
X = LSTM(128, return_sequences=True)(embeddings)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X trough another LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a single hidden state, not a batch of sequences.
X = LSTM(128, return_sequences=False)(X)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
X = Dense(2, activation='relu')(X)
# Add a softmax activation
X = Activation('softmax')(X)
# Create Model instance which converts sentence_indices into X.
model = Model(inputs=[sentence_indices], outputs=X)
### END CODE HERE ###
return model
Here is what my training dataset looks like:
This is my testing data:
x_test = np.array(['amazing!: this soundtrack is my favorite music..'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+ str(np.argmax(model.predict(X_test_indices))))
I got following out for this:
amazing!: this soundtrack is my favorite music.. 0
But it should be positive sentiment and should be 1
Also this my fit model output:
How can I improve my model performance? This pretty bad model I suppose.

How to apply triplet loss function in resnet50 for the purpose of deepranking

I try to create image embeddings for the purpose of deep ranking using a triplet loss function. The idea is that we can take a pretrained CNN (e.g. resnet50 or vgg16), remove the FC layers and add an L2 normalization function to retrieve unit vectors which can then be compared via a distance metric (e.g. cosine similarity). As far as I understand the predicted vectors that come out of a pretrained CNN are not optimal, but are a good start. By adding the triplet loss function we can re-train the network to keep similar pictures 'close' to each other and different pictures 'far' apart in the feature space. Inspired by this notebook , I tried to setup the following code, but I get an error ValueError: The name "conv1_pad" is used 3 times in the model. All layer names should be unique..
# Anchor, Positive and Negative are numpy arrays of size (200, 256, 256, 3), same for the test images
pic_size=256
def shared_dnn(inp):
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(3, pic_size, pic_size),
input_tensor=inp)
x = base_model.output
x = Flatten()(x)
x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x)
for layer in base_model.layers[15:]:
layer.trainable = False
return x
anchor_input = Input((3, pic_size,pic_size ), name='anchor_input')
positive_input = Input((3, pic_size,pic_size ), name='positive_input')
negative_input = Input((3, pic_size,pic_size ), name='negative_input')
encoded_anchor = shared_dnn(anchor_input)
encoded_positive = shared_dnn(positive_input)
encoded_negative = shared_dnn(negative_input)
merged_vector = concatenate([encoded_anchor, encoded_positive, encoded_negative], axis=-1, name='merged_layer')
model = Model(inputs=[anchor_input,positive_input, negative_input], outputs=merged_vector)
#ValueError: The name "conv1_pad" is used 3 times in the model. All layer names should be unique.
model.compile(loss=triplet_loss, optimizer=adam_optim)
model.fit([Anchor,Positive,Negative],
y=Y_dummy,
validation_data=([Anchor_test,Positive_test,Negative_test],Y_dummy2), batch_size=512, epochs=500)
I am new to keras and I am not quite sure how to solve this. The author in the link above creates his own CNN from scratch, but I would like to build it upon resnet (or vgg16). How can I configure ResNet50 to use a triplet loss function (in the link above you find also the source code for the triplet loss function).
In your ResNet50 definition, you've written
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(3, pic_size, pic_size), input_tensor=inp)
Remove the input_tensor argument. Change input_shape=inp.
If you're using TF backend as you mentioned the input should be (256, 256, 3), then your input should be (pic_size, pic_size, 3).
def shared_dnn(inp):
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=inp)
x = base_model.output
x = Flatten()(x)
x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x)
for layer in base_model.layers[15:]:
layer.trainable = False
return x
img_shape=(256, 256, 3)
anchor_input = Input(img_shape, name='anchor_input')
positive_input = Input(img_shape, name='positive_input')
negative_input = Input(img_shape, name='negative_input')
encoded_anchor = shared_dnn(anchor_input)
encoded_positive = shared_dnn(positive_input)
encoded_negative = shared_dnn(negative_input)
merged_vector = concatenate([encoded_anchor, encoded_positive, encoded_negative], axis=-1, name='merged_layer')
model = Model(inputs=[anchor_input,positive_input, negative_input], outputs=merged_vector)
model.compile(loss=triplet_loss, optimizer=adam_optim)
model.fit([Anchor,Positive,Negative],
y=Y_dummy,
validation_data=([Anchor_test,Positive_test,Negative_test],Y_dummy2), batch_size=512, epochs=500)
The model plot is as follows:
model_plot

Attribute error: None type has no attribute summary in keras

I have tried to go in deep with my understanding of word embedding and NLP in keras implementing and copying part of the code creating a Keras model using functional API. When I launch model.summary I receive an Attribute error: None type has no attribute 'summary'.
After many attempts decreasing the numbers of layers, the dimension of word embedding matrix unfortunately nothing changed. I don't know what to do.
def pretrained_embedding_layer(word_to_vec, word_to_index):
vocab_len = len(word_to_index) + 1
emb_dim = word_to_vec["sole"].shape[0]
emb_matrix = np.zeros((vocab_len,emb_dim))
for word, index in word_to_index.items():
emb_matrix[index, :] = word_to_vec[word]
print(emb_matrix.shape)
embedding_layer = Embedding(vocab_len,emb_dim,trainable =False)
embedding_layer.build((None,))
embedding_layer.set_weights([emb_matrix])
return embedding_layer
def Chatbot_V1(input_shape, word_to_vec, word_to_index):
# Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
sentence_indices = Input(input_shape, dtype='int32')
# Create the embedding layer pretrained with GloVe Vectors (≈1 line)
embedding_layer = pretrained_embedding_layer(word_to_vec, word_to_index)
embeddings = embedding_layer(sentence_indices)
# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
X = LSTM(128, return_sequences=True)(embeddings)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X trough another LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a single hidden state, not a batch of sequences.
X = LSTM(128, return_sequences=True)(X)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X through a Dense layer with softmax activation to get back a batch of vocab_dim dimensional vectors.
X = Dense(vocab_dim)(X)
# Add a softmax activation
preds = Activation('softmax')(X)
# Create Model instance which converts sentence_indices into X.
model = Model(sentence_indices, preds)
model = Chatbot_V1((maxLen,), word_to_vec, word_to_index)
model.summary()
Launching model.summary:
AttributeError: 'NoneType' object has no attribute 'summary'
Why? What is wrong in layers definition?
The function Chatbot_V1 does not return anything, and in python this is signaled by None if you assign the return value of the function to a variable. So just use the return keyword to return the model at the end of Chatbot_V1

How to apply a different dense layer for each timestep in Keras

I know that applying a TimeDistributed(Dense()) applies the same dense layer over all the timesteps but I wanted to know how to apply different dense layers for each timestep. The number of timesteps is not variable.
P.S.: I have seen the following link and can't seem to find an answer
You can use a LocallyConnected layer.
The LocallyConnected layer words as a Dense layer connected to each of kernel_size time_steps (1 in this case).
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
sequence_length = 10
n_features = 4
def make_model():
inp = Input((sequence_length, n_features))
h1 = LocallyConnected1D(8, 1, 1)(inp)
out = Flatten()(h1)
model = Model(inp, out)
model.compile('adam', 'mse')
return model
model = make_model()
model.summary()
Per summary the number of variables used by the LocallyConnected layer is
(output_dims * (input_dims + bias)) * time_steps or (8 * (4 + 1)) * 10 = 400.
Wording it another way: the locally connected layer above behaves as 10 different Dense layers each connected to its time step (because we choose kernel_size as 1). Each of these blocks of 50 variables, is a weights matrix of shape (input_dims, output_dims) plus a bias vector of size (output_dims).
Also note that given an input_shape of (sequence_len, n_features), Dense(output_dims) and Conv1D(output_dims, 1, 1) are equivalent.
i.e. this model:
def make_model():
inp = Input((sequence_length, n_features))
h1 = Conv1D(8, 1, 1)(inp)
out = Flatten()(h1)
model = Model(inp, out)
and this model:
def make_model():
inp = Input((sequence_length, n_features))
h1 = Dense(8)(inp)
out = Flatten()(h1)
model = Model(inp, out)
Are the same.

Is it possible to train using same model with two inputs?

Hello I have a some question for keras.
currently i want implement some network
using same cnn model, and use two images as input of cnn model
and use two result of cnn model, provide to Dense model
for example
def cnn_model():
input = Input(shape=(None, None, 3))
x = Conv2D(8, (3, 3), strides=(1, 1))(input)
x = GlobalAvgPool2D()(x)
model = Model(input, x)
return model
def fc_model(cnn1, cnn2):
input_1 = cnn1.output
input_2 = cnn2.output
input = concatenate([input_1, input_2])
x = Dense(1, input_shape=(None, 16))(input)
x = Activation('sigmoid')(x)
model = Model([cnn1.input, cnn2.input], x)
return model
def main():
cnn1 = cnn_model()
cnn2 = cnn_model()
model = fc_model(cnn1, cnn2)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x=[image1, image2], y=[1.0, 1.0], batch_size=1, ecpochs=1)
i want to implement model something like this, and train models
but i got error message like below :
'All layer names should be unique'
Actually i want use only one CNN model as feature extractor and finally use two features to predict one float value as 0.0 ~ 1.0
so whole system -->>
using two images and extract features from same CNN model, and features are provided to Dense model to get one floating value
Please, help me implement this system and how to train..
Thank you
See the section of the Keras documentation on shared layers:
https://keras.io/getting-started/functional-api-guide/
A code snippet from the documentation above demonstrating this:
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)

Resources