Create unsupervised embedding model in keras? - keras

I want to create an autoencoder with the following architecture:
path_source_token_input = Input(shape=(MAX_CONTEXTS,), dtype=tf.int32, name='source_token_input')
path_input = Input(shape=(MAX_CONTEXTS,), dtype=tf.int32, name='path_input')
path_target_token_input = Input(shape=(MAX_CONTEXTS,), dtype=tf.int32, name='target_token_input')
paths_embedded = Embedding(PATH_SIZE, DEFAULT_EMBEDDINGS_SIZE, name='path_embedding')(path_input)
token_embedding_shared_layer = Embedding(TOKEN_SIZE, DEFAULT_EMBEDDINGS_SIZE, name='token_embedding')
path_source_token_embedded = token_embedding_shared_layer(path_source_token_input)
path_target_token_embedded = token_embedding_shared_layer(path_target_token_input)
context_embedded = Concatenate()([path_source_token_embedded, paths_embedded, path_target_token_embedded]) # --> this up to now, is the output of the STANDALONE embedding model
-------- SPLIT HERE? ------
context_after_dense = TimeDistributed(Dense(CODE_VECTOR_SIZE, use_bias=False, activation='tanh'))(context_embedded) # in short, this layer probably has to stay
encoded = LSTM(100, activation='relu', input_shape=context_after_dense.shape)(context_after_dense)
decoded = RepeatVector(MAX_CONTEXTS)(encoded)
decoded = LSTM(100, activation='relu', return_sequences=True)(decoded)
result = TimeDistributed(Dense(1), name='PROBLEM_is_here')(decoded) # this seems to be some trick according to https://github.com/keras-team/keras/issues/10753, so probably don't remove
inputs = (path_source_token_input, path_input, path_target_token_input)
model = tf.keras.Model(inputs=inputs, outputs=result)
So far I have learned that it is impossible to implement an inversed embedding layer in the decoder, so naturally my conclusion is to split my network in two: one to generate concatenated embeddings of input, and the second part would be the autoencoder itself with input as concatenated embeddings (output of first part). Now my question is, is it possible to create an unsupervised embedding model in keras? or anywhere else for that matter. My data is unlabeled and the point of my final neural network would be to create clusters of said unlabeled data.

Related

How do I decode the output of my seq-to-seq model if I'm using an embedding layer?

I have a seq to seq model trained of some clever bot data:
justphrases_X is a list of sentences and justphrases_Y is a list of responses to those sentences.
maxlen = 62
#low is a list of all the unique words.
def Convert_To_Encoding(just_phrases):
encodings = []
for sentence in just_phrases:
onehotencoded = one_hot(sentence, len(low))
encodings.append(np.array(onehotencoded))
encodings_padded = pad_sequences(encodings, maxlen=maxlen, padding='post', value = 0.0)
return encodings_padded
encodings_X_padded = Convert_To_Encoding(just_phrases_X)
encodings_y_padded = Convert_To_Encoding(just_phrases_y)
model = Sequential()
embedding_layer = Embedding(len(low), output_dim=8, input_length=maxlen)
model.add(embedding_layer)
model.add(GRU(128)) # input_shape=(None, 496)
model.add(RepeatVector(numberofwordsoutput)) #number of characters?
model.add(GRU(128, return_sequences = True))
model.add(Flatten())
model.add(Dense(62, activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer= 'adam', metrics=['accuracy'])
model.summary()
model.fit(encodings_X_padded, encodings_y_padded, batch_size = 1, epochs=1) #, validation_data = (testX, testy)
model.save("cleverbottheseq-uel.h5")
When I use this model for prediction, the output will be between 0 and 1 because of my use of softmax. However as I have around 3000 unique words, each with a separate integer assigned to it, how do I essentially repeat what the model did during training and convert the output back to an integer which has a word assigned to it?
I dont think it is possible to create seq2seq with Sequential API. Try to create encoder and decoder separately with Functional API. You need two inputs - first for encoder, second - for decoder.

How to stack same RNN for every layer?

I would like to know how to stack many layers of RNN but every layer are the same RNN. I want every layer share the same weight. I have read stack LSTM and RNN, but I found that each layer was not the same.
1 layer code:
inputs = keras.Input(shape=(maxlen,), batch_size = batch_size)
Emb_layer = layers.Embedding(max_features,word_dim)
Emb_output = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=False,stateful =False)
first_layer_output = first_layer(Emb_output)
dense_layer = layers.Dense(1, activation='sigmoid')
dense_output = dense_layer(first_layer_output )
model = keras.Model(inputs=inputs, outputs=dense_output)
model.summary()
enter image description here
RNN 1 layer
inputs = keras.Input(shape=(maxlen,), batch_size = batch_size)
Emb_layer = layers.Embedding(max_features,word_dim)
Emb_output = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=True,stateful =True)
first_layer_output = first_layer(Emb_output)
first_layer_state = first_layer.states
second_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=False,stateful =False)
second_layer_set_state = second_layer(first_layer_output, initial_state=first_layer_state)
dense_layer = layers.Dense(1, activation='sigmoid')
dense_output = dense_layer(second_layer_set_state )
model = keras.Model(inputs=inputs, outputs=dense_output)
model.summary()
enter image description here
Stack RNN 2 layer.
For example, I want to build two layers RNN, but the first layer and the second must have the same weight, such that when I update the weight in the first layer the second layer must be updated and share the same value. As far as I know, TF has RNN.state. It returns the value from the previous layer. However, when I use this, it seems that each layer is treated independently. The 2-layer RNN that I want should have trainable parameters equal to the 1-layer since they shared the same weight, but this did not work.
You can view the layer object as a container for the weights that knows how to apply the weights. You can use the layer object as many times as you want. Assuming the embedding and the RNN dimension are the same, you can do:
states = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden, use_bias=True, return_sequences=True)
for _ in range(10):
states = first_layer(states)
There is no reason to set stateful to true. This is used when you split long sequences into multiple batches and what the RNN to remember the state between batches, so you do not have yo manually set initial states. You can get the final state of the RNN (that you wany you want to use for classification) by simply indexing the last position from states.

Keras gets None gradient error when connecting models

I’m trying to implement a Visual Storytelling model using Keras with a hierarchical RNN model, basically Neural Image Captioner style but over a sequence of photos with a bidirectional RNN on top of the decoder RNNs.
I implemented and tested the three parts of this model, CNN, BRNN and decoder RNN separately but got this error when trying to connect them:
ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
My code are as follows:
#vgg16 model with the fc2 layer as output
cnn_base_model = self.cnn_model.base_model
brnn_model = self.brnn_model.model
rnn_model = self.rnn_model.model
cnn_part = TimeDistributed(cnn_base_model)
img_input = Input((self.story_length,) + self.cnn_model.input_shape, name='brnn_img_input')
extracted_feature = cnn_part(img_input)
#[None, 5, 512], a 512 length vector for each picture in the story
brnn_feature = brnn_model(extracted_feature)
#[None, 5, 25], input groundtruth word indices fed as input when training
decoder_input = Input((self.story_length, self.max_length), name='brnn_decoder_input')
decoder_outputs = []
for i in range(self.story_length):
#separate timesteps for decoding
decoder_input_i = Lambda(lambda x: x[:, i, :])(decoder_input)
brnn_feature_i = Lambda(lambda x: x[:, i, :])(brnn_feature)
#the problem persists when using Dense instead of the Lambda layers above
#decoder_input_i = Dense(25)(Reshape((125,))(decoder_input))
#brnn_feature_i = Dense(512)(Reshape((5 * 512,))(brnn_feature))
decoder_output_i = rnn_model([decoder_input_i, brnn_feature_i])
decoder_outputs.append(decoder_output_i)
decoder_output = Concatenate(axis=-2, name='brnn_decoder_output')(decoder_outputs)
self.model = Model([img_input, decoder_input], decoder_output)
And codes for the BRNN:
image_feature = Input(shape=(self.story_length, self.img_feature_dim,))
image_emb = TimeDistributed(Dense(self.lstm_size))(image_feature)
brnn = Bidirectional(LSTM(self.lstm_size, return_sequences=True), merge_mode='concat')(image_emb)
brnn_emb = TimeDistributed(Dense(self.lstm_size))(brnn)
self.model = Model(inputs=image_feature, outputs=brnn_emb)
And RNN:
#[None, 512], the vector to be decoded
initial_input = Input(shape=(self.input_dim,), name='rnn_initial_input')
#[None, 25], the groundtruth word indices fed as input when training
decoder_inputs = Input(shape=(None,), name='rnn_decoder_inputs')
decoder_input_masking = Masking(mask_value=0.0)(decoder_inputs)
decoder_input_embeddings = Embedding(self.vocabulary_size, self.emb_size,
embeddings_regularizer=l2(regularizer))(decoder_input_masking)
decoder_input_dropout = Dropout(.5)(decoder_input_embeddings)
initial_emb = Dense(self.emb_size,
kernel_regularizer=l2(regularizer))(initial_input)
initial_reshape = Reshape((1, self.emb_size))(initial_emb)
initial_masking = Masking(mask_value=0.0)(initial_reshape)
initial_dropout = Dropout(.5)(initial_masking)
decoder_lstm = LSTM(self.hidden_dim, return_sequences=True, return_state=True,
recurrent_regularizer=l2(regularizer),
kernel_regularizer=l2(regularizer),
bias_regularizer=l2(regularizer))
_, initial_hidden_h, initial_hidden_c = decoder_lstm(initial_dropout)
decoder_outputs, decoder_state_h, decoder_state_c = decoder_lstm(decoder_input_dropout,
initial_state=[initial_hidden_h, initial_hidden_c])
decoder_output_dense_layer = TimeDistributed(Dense(self.vocabulary_size, activation='softmax',
kernel_regularizer=l2(regularizer)))
decoder_output_dense = decoder_output_dense_layer(decoder_outputs)
self.model = Model([decoder_inputs, initial_input], decoder_output_dense)
I’m using adam as optimizer and sparse_categorical_crossentropy as loss.
At first I thought the problem is with the Lambda layers used for splitting the timesteps but the problem persists when I replaced them with Dense layers (which are guarantee
I had a similar error and it turned out I was suppose to build the layers (in my custom layer or model) in the init() like so:
self.lstm_custom_1 = keras.layers.LSTM(128,batch_input_shape=batch_input_shape, return_sequences=False,stateful=True)
self.lstm_custom_1.build(batch_input_shape)
self.dense_custom_1 = keras.layers.Dense(32, activation = 'relu')
self.dense_custom_1.build(input_shape=(batch_size, 128))```
The issue is actually with the Embedding layer, I think. Gradients can't pass through an Embedding layer, so unless it's the first layer in the model it won't work.

Is it possible to train using same model with two inputs?

Hello I have a some question for keras.
currently i want implement some network
using same cnn model, and use two images as input of cnn model
and use two result of cnn model, provide to Dense model
for example
def cnn_model():
input = Input(shape=(None, None, 3))
x = Conv2D(8, (3, 3), strides=(1, 1))(input)
x = GlobalAvgPool2D()(x)
model = Model(input, x)
return model
def fc_model(cnn1, cnn2):
input_1 = cnn1.output
input_2 = cnn2.output
input = concatenate([input_1, input_2])
x = Dense(1, input_shape=(None, 16))(input)
x = Activation('sigmoid')(x)
model = Model([cnn1.input, cnn2.input], x)
return model
def main():
cnn1 = cnn_model()
cnn2 = cnn_model()
model = fc_model(cnn1, cnn2)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x=[image1, image2], y=[1.0, 1.0], batch_size=1, ecpochs=1)
i want to implement model something like this, and train models
but i got error message like below :
'All layer names should be unique'
Actually i want use only one CNN model as feature extractor and finally use two features to predict one float value as 0.0 ~ 1.0
so whole system -->>
using two images and extract features from same CNN model, and features are provided to Dense model to get one floating value
Please, help me implement this system and how to train..
Thank you
See the section of the Keras documentation on shared layers:
https://keras.io/getting-started/functional-api-guide/
A code snippet from the documentation above demonstrating this:
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)

Classify a sequence using LSTM in keras

I am working on a binary classification problem, where the network takes two inputs and output the label of this input pair.
Basically, I use an encoder layer to do embedding first and concatenate the embedding results. Next, I am going to use RNN structure to classify the concatenated result. But I can't figure out a proper way to write the code. I attach my code below.
input_size = n_feature # the number of features
encoder_size = 2000 # output dim for each encoder
dropout_rate = 0.5
X1 = Input(shape=(input_size, ), name='input_1')
X2 = Input(shape=(input_size, ), name='input_2')
encoder = Sequential()
encoder.add(Dropout(dropout_rate, input_shape=(input_size, )))
encoder.add(Dense(encoder_size, activation='relu'))
encoded_1 = encoder(X1)
encoded_2 = encoder(X2)
merged = concatenate([encoded_1, encoded_2])
#----------Need Help---------------#
comparer = Sequential()
comparer.add(LSTM(512, input_shape=(encoder_size*2, ), return_sequences=True))
comparer.add(Dropout(dropout_rate))
comparer.add(TimeDistributed(Dense(1)))
comparer.add(Activation('sigmoid'))
#----------Need Help---------------#
Y = comparer(merged)
model = Model(inputs=[X1, X2], outputs=Y)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
It seems for the LSTM layer, the input should be (None, encoder_size*2). I tried to use Y = comparer(K.transpose(merged)) to reshape the input for the LSTM layer but I failed. BTW, for this network, the input shape is (input_size,) and output shape is (1,).
If the idea is to transform the input vector in a time series, you can simply reshape it:
comparer = Sequential()
#reshape the vector into a time series form: (None, timeSteps, features)
comparer.add(Reshape((2 * encoder_size,1), input_shape=(2*encoder_size,))
#don't return sequences, you don't want a sequence as result:
comparer.add(LSTM(512, return_sequences=False))
comparer.add(Dropout(dropout_rate))
#Don't use a TimeDistributed, you're not dealing with a series anymore
comparer.add(Dense(1))
comparer.add(Activation('sigmoid'))

Resources