Inferece model creation using model keras - keras

Decoder model inference is giving error graph disconnected error during inference Decoder model inference is giving error graph disconnected error during inference Decoder model inference is giving error graph disconnected error during inference
# TRAINING WITH TEACHER FORCING
# Define an input sequence and process it.
encoder_inputs= Input(shape=(n_timesteps_in, n_features))
encoder_lstm = LSTM(LSTMoutputDimension, return_sequences=True, return_state=True, name='encoder_lstm')
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
# We discard `LSTM_outputs` and only keep the other states.
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None, n_features), name='decoder_inputs')
attention= BahdanauAttention(LSTMoutputDimension, verbose = 1)
decoder_lstm = LSTM(LSTMoutputDimension, return_sequences=True, name='decoder_lstm')
decoder_lstm1 = LSTM(LSTMoutputDimension, return_sequences=True, return_state=True, name='decoder_lstm1')
# Set up the decoder, using `context vector` as initial state.
decoder_outputs= decoder_lstm(decoder_inputs,initial_state=encoder_states)
context_vector, weights = attention(decoder_outputs, LSTM_outputs)
#context_vector = tf.expand_dims(context_vector, 1)
decoder_outputs2 = tf.concat([context_vector, decoder_outputs], axis=-1)
decoder_outputs, s, h = decoder_lstm1(decoder_outputs2)
#complete the decoder model by adding a Dense layer with Softmax activation function
#for prediction of the next output
#Dense layer will output one-hot encoded representation as we did for input
#Therefore, we will use n_features number of neurons
decoder_dense = Dense(n_features, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)
Inference given under is not working
decoder_state_input_h = Input(shape=(LSTMoutputDimension,))
decoder_state_input_c = Input(shape=(LSTMoutputDimension,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
context_vector, weights = attention(decoder_outputs, LSTM_outputs)
decoder_outputs2 = tf.concat([context_vector, decoder_outputs], axis=-1)
decoder_outputs3, dh, dc = decoder_lstm1(decoder_outputs2)
deStates = [dh, dc]
decoder_model = Model( [decoder_inputs] + decoder_states_inputs, [decoder_outputs3] +deStates)

Related

Keras how to see in the black box of a model

I wanted to see dimensions while model is training because model trains in a black box.
I wanted to see dimension of decoder input iteratively as decoder moves in time series. It is a Teacher forcing decoder. But when i fit model it just show me accuracy, loss, epochs and iterations even if i appy verbose 1 or 2. Code is below in keras.
# TRAINING WITH TEACHER FORCING
# Define an input sequence and process it.
encoder_inputs= Input(shape=(n_timesteps_in, n_features))
encoder_lstm=LSTM(LSTMoutputDimension, return_state=True)
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
# We discard `LSTM_outputs` and only keep the other states.
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None, n_features), name='decoder_inputs')
decoder_lstm = LSTM(LSTMoutputDimension, return_sequences=True, return_state=True, name='decoder_lstm')
# Set up the decoder, using `context vector` as initial state.
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
#complete the decoder model by adding a Dense layer with Softmax activation function
#for prediction of the next output
#Dense layer will output one-hot encoded representation as we did for input
#Therefore, we will use n_features number of neurons
decoder_dense = Dense(n_features, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)

Jointly optimizing autoencoder and fully connected network for classification

I have a large set of unlabeled data and a smaller set of labeled data. Thus, I would like to first train a variational autoencoder on the unlabeled data and then use the encoder for classification of three classes (with a fully connected layer attached) on the labeled data. For optimization of the hyperparameters I would like to use Optuna.
One possibility would be to first optimize the autoencoder and then optimize the fully connected network (classification) but then the autoencoder might learn an encoding which is meaningless for the classification.
Is there a possibility to jointly optimize the autoencoder and the fully connected network?
My autoencoder looks as follows (params is just a dictionary holding the params):
inputs = Input(shape=image_size, name='encoder_input')
x = inputs
for i in range(len(params["conv_filter_encoder"])):
x, _ = convolutional_unit(x, params["conv_filter_encoder"][i], params["conv_kernel_size_encoder"][i], params["strides_encoder"][i],
batchnorm=params["batchnorm"][i], dropout=params["dropout"][i], maxpool=params["maxpool"][i], deconv=False)
shape = K.int_shape(x)
x = Flatten()(x)
x = Dense(params["inner_dim"], activation='relu')(x)
z_mean = Dense(params["latent_dim"], name='z_mean')(x)
z_log_var = Dense(params["latent_dim"], name='z_log_var')(x)
# use reparameterization trick to push the sampling out as input
# note that "output_shape" isn't necessary with the TensorFlow backend
z = Lambda(sampling, output_shape=(params["latent_dim"],), name='z')([z_mean, z_log_var])
# instantiate encoder model
encoder = Model(inputs, [z_mean, z_log_var, z], name='encoder')
# build decoder model
latent_inputs = Input(shape=(params["latent_dim"],), name='z_sampling')
x = Dense(params["inner_dim"], activation='relu')(latent_inputs)
x = Dense(shape[1] * shape[2] * shape[3], activation='relu')(x)
x = Reshape((shape[1], shape[2], shape[3]))(x)
len_batchnorm = len(params["batchnorm"])
len_dropout = len(params["dropout"])
for i in range(len(params["conv_filter_decoder"])):
x, _ = convolutional_unit(x, params["conv_filter_decoder"][i], params["conv_kernel_size_decoder"][i], params["strides_decoder"][i],
batchnorm=params["batchnorm"][len_batchnorm-i-1], dropout=params["dropout"][len_dropout-i-1], maxpool=None, deconv=True, activity_regularizer=params["activity_regularizer"])
outputs = Conv2DTranspose(filters=1,
kernel_size=params["conv_kernel_size_decoder"][len(params["conv_kernel_size_decoder"])-1],
activation='sigmoid',
padding='same')(x)
# instantiate decoder model
decoder = Model(latent_inputs, outputs, name='decoder')
# instantiate VAE model
outputs = decoder(encoder(inputs)[2])
vae = Model(inputs, outputs, name='vae')
vae.higgins_beta = K.variable(value=params["beta"])
loss = config["loss"].value
def vae_loss(x, x_decoded_mean):
"""VAE loss function"""
# VAE loss = mse_loss or xent_loss + kl_loss
if loss == Loss.mse.value:
reconstruction_loss = mse(K.flatten(x), K.flatten(x_decoded_mean))
elif loss == Loss.bce.value:
reconstruction_loss = binary_crossentropy(K.flatten(x),
K.flatten(x_decoded_mean))
else:
raise ValueError("Loss unknown")
reconstruction_loss *= image_size[0] * image_size[1]
kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
kl_loss = K.sum(kl_loss, axis=-1)
# kl_loss *= -0.5
kl_loss *= -vae.higgins_beta
vae_loss = K.mean(reconstruction_loss + kl_loss)
return vae_loss
batch_size = params["batch_size"]
optimizer = keras.optimizers.Adam(lr=params["learning_rate"], beta_1=0.9, beta_2=0.999,
epsilon=1e-08, decay=params["learning_rate_decay"])
vae.compile(loss=vae_loss, optimizer=optimizer)
vae.fit(train_X, train_X,
epochs=config.CONFIG["n_epochs"],
batch_size=batch_size,
verbose=0,
callbacks=get_callbacks(config.CONFIG, autoencoder_path, encoder, decoder, vae),
shuffle=shuffle,
validation_data=(valid_X, valid_X))
My fully connected network attached to the encoder looks as follows:
latent = vae.predict(images)[0]
inputs = Input(shape=(input_shape,), name='fc_input')
den = inputs
for i in range(len(self.params["units"])):
den = Dense(self.params["units"][i])(den)
den = Activation('relu')(den)
out = Dense(self.num_classes, activation='softmax')(den)
model = Model(inputs, out, name='fcnn')
optimizer = keras.optimizers.Adam(lr=self.mc.CONFIG["fcnn"]["learning_rate"], beta_1=0.9, beta_2=0.999,
epsilon=1e-08, decay=self.mc.CONFIG["fcnn"]["learning_rate_decay"])
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
model.fit(latent, y,
epochs=self.params["n_epochs"],
batch_size=self.params["batch_size"],
verbose=0,
shuffle=True)
y_prob = model.predict(latent)

Add attention layer to Seq2Seq model

I have build a Seq2Seq model of encoder-decoder. I want to add an attention layer to it. I tried adding attention layer through this but it didn't help.
Here is my initial code without attention
# Encoder
encoder_inputs = Input(shape=(None,))
enc_emb = Embedding(num_encoder_tokens, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
And this is the code after I added attention layer in decoder (the encoder layer is same as in initial code)
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
attention = dot([decoder_lstm, encoder_lstm], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder_lstm], axes=[2,1])
decoder_combined_context = concatenate([context, decoder_lstm])
decoder_outputs, _, _ = decoder_combined_context(dec_emb,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
While doing this, I got an error
Layer dot_1 was called with an input that isn't a symbolic tensor. Received type: <class 'keras.layers.recurrent.LSTM'>. Full input: [<keras.layers.recurrent.LSTM object at 0x7f8f77e2f3c8>, <keras.layers.recurrent.LSTM object at 0x7f8f770beb70>]. All inputs to the layer should be tensors.
Can someone please help in fitting an attention layer in this architecture?
the dot products need to be computed on tensor outputs... in encoder you correctly define the encoder_output, in decoder you have to add decoder_outputs, state_h, state_c = decoder_lstm(enc_emb, initial_state=encoder_states)
the dot products now are
attention = dot([decoder_outputs, encoder_outputs], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder_outputs], axes=[2,1])
the concatenation doesn't need initial_states. you have to define it in your rnn layer: decoder_outputs, state_h, state_c = decoder_lstm(enc_emb, initial_state=encoder_states)
here the full example
ENCODER + DECODER
# dummy variables
num_encoder_tokens = 30
num_decoder_tokens = 10
latent_dim = 100
encoder_inputs = Input(shape=(None,))
enc_emb = Embedding(num_encoder_tokens, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
DECODER w\ ATTENTION
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, state_h, state_c = decoder_lstm(dec_emb, initial_state=encoder_states)
attention = dot([decoder_outputs, encoder_outputs], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder_outputs], axes=[2,1])
decoder_outputs = concatenate([context, decoder_outputs])
decoder_dense = Dense(num_decoder_tokens, activation='softmax')(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_dense)
model.summary()
Marco's answer from above works, but one has to change the lines that involve the dot function in the second chunk. It takes one positional argument as in tensorflow's example here.
Finally, the chunk bellow includes the correction and will work:
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(num_decoder_tokens, latent_dim, mask_zero = True)
dec_emb = dec_emb_layer(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, state_h, state_c = decoder_lstm(dec_emb, initial_state=encoder_states)
attention = Dot(axes=[2, 2])([decoder_outputs, encoder_outputs])
attention = Activation('softmax')(attention)
context = Dot(axes=[2,1])([attention, encoder_outputs])
decoder_outputs = concatenate([context, decoder_outputs])
decoder_dense = Dense(num_decoder_tokens, activation='softmax')(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_dense)
model.summary()

LSTM Encoder-Decoder Inference Model

Many tutorials for seq2seq encoder-decoder architecture based on LSTM, (for example English-French translation), define the model as follow:
encoder_inputs = Input(shape=(None,))
en_x= Embedding(num_encoder_tokens, embedding_size)(encoder_inputs)
# Encoder lstm
encoder = LSTM(50, return_state=True)
encoder_outputs, state_h, state_c = encoder(en_x)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
# french word embeddings
dex= Embedding(num_decoder_tokens, embedding_size)
final_dex= dex(decoder_inputs)
# decoder lstm
decoder_lstm = LSTM(50, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(final_dex,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# While training, model takes eng and french words and outputs #translated french word
fullmodel = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# rmsprop is preferred for nlp tasks
fullmodel.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])
fullmodel.fit([encoder_input_data, decoder_input_data], decoder_target_data,
batch_size=128,
epochs=100,
validation_split=0.20)
Then for prediction, they define infernce models as follow:
# define the encoder model
encoder_model = Model(encoder_inputs, encoder_states)
encoder_model.summary()
# Redefine the decoder model with decoder will be getting below inputs from encoder while in prediction
decoder_state_input_h = Input(shape=(50,))
decoder_state_input_c = Input(shape=(50,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
final_dex2= dex(decoder_inputs)
decoder_outputs2, state_h2, state_c2 = decoder_lstm(final_dex2, initial_state=decoder_states_inputs)
decoder_states2 = [state_h2, state_c2]
decoder_outputs2 = decoder_dense(decoder_outputs2)
# sampling model will take encoder states and decoder_input(seed initially) and output the predictions(french word index) We dont care about decoder_states2
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs2] + decoder_states2)
Then predict using:
# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
(i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
(i, char) for char, i in target_token_index.items())
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1,1))
# Populate the first character of target sequence with the start character.
target_seq[0, 0] = target_token_index['START_']
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, h, c = decoder_model.predict(
[target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += ' '+sampled_char
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '_END' or
len(decoded_sentence) > 52):
stop_condition = True
# Update the target sequence (of length 1).
target_seq = np.zeros((1,1))
target_seq[0, 0] = sampled_token_index
# Update states
states_value = [h, c]
return decoded_sentence
My question is, they trained the model with the name 'fullmodel' to get best weights ... in prediction part, they used the inference models with names (encoder_model & decoder_model) ... so they didn't use any weights from the 'fullmodel' ?!
I don't understand how they benefit from the trained model!
The trick is that everything is in the same variable scope, so the variables got reused.
If you notice carefully, the trained layer weights are being reused.
For example, while creating decoder_model we use decoder_lstm layer which was defined as a part of full model,
decoder_outputs2, state_h2, state_c2 = decoder_lstm(final_dex2, initial_state=decoder_states_inputs),
and encoder model too uses, encoder_inputs and encoder_states layer defined previously.
encoder_model = Model(encoder_inputs, encoder_states)
Due to the architecture of the encoder-decoder model, we need to perform these implementations hacks.
Also, as the keras documentation mentions, With the functional API, it is easy to reuse trained models: you can treat any model as if it were a layer, by calling it on a tensor. Note that by calling a model you aren't just reusing the architecture of the model, you are also reusing its weights. For more details refer - https://keras.io/getting-started/functional-api-guide/#all-models-are-callable-just-like-layers

Graph disconnect in inference in Keras RNN + Encoder/Decoder + Attention

I've successfully trained a model in Keras using an encoder/decoder structure + attention + glove following several examples, most notably this one and this one. It's based on a modification of machine translation. This is a chatbot, so the input is words and so is the output. However, I've struggled to setup inference (prediction) properly and can't figure it out how to get past a graph disconnect. My bidirectional RNN encoder/decoder with embedding and attention is training fine. I've tried modifying the decoder, but feel there is something obvious that I'm not seeing.
Here is the basic model:
from keras.models import Model
from keras.layers.recurrent import LSTM
from keras.layers import Dense, Input, Embedding, Bidirectional, RepeatVector, concatenate, Concatenate
## PARAMETERS
HIDDEN_UNITS = 100
encoder_max_seq_length = 1037 # maximum size of input sequence
decoder_max_seq_length = 187 # maximum size of output sequence
num_encoder_tokens = 6502 # a.k.a the size of the input vocabulary
num_decoder_tokens = 4802 # a.k.a the size of the output vocabulary
## ENCODER
encoder_inputs = Input(shape=(encoder_max_seq_length, ), name='encoder_inputs')
encoder_embedding = Embedding(input_dim = num_encoder_tokens,
output_dim = HIDDEN_UNITS,
input_length = encoder_max_seq_length,
weights = [embedding_matrix],
name='encoder_embedding')(encoder_inputs)
encoder_lstm = Bidirectional(LSTM(units = HIDDEN_UNITS,
return_sequences=True,
name='encoder_lstm'))(encoder_embedding)
## ATTENTION
attention = AttentionL(encoder_max_seq_length)(encoder_lstm)
attention = RepeatVector(decoder_max_seq_length)(attention)
## DECODER
decoder_inputs = Input(shape = (decoder_max_seq_length, num_decoder_tokens),
name='decoder_inputs')
merge = concatenate([attention, decoder_inputs])
decoder_lstm = Bidirectional(LSTM(units = HIDDEN_UNITS*2,
return_sequences = True,
name='decoder_lstm'))(merge)
decoder_dense = Dense(units=num_decoder_tokens,
activation='softmax',
name='decoder_dense')(decoder_lstm)
decoder_outputs = decoder_dense
## Configure the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.load_weights('trained_models/viable-2/word-weights.h5')
encoder_model = Model(encoder_inputs, attention)
model.compile(loss='categorical_crossentropy', optimizer='adam')
It looks like this:
enter image description here
Here is where I run into trouble:
## INFERENCE decoder setup
decoder_inputs_2 = Concatenate()([decoder_inputs, attention])
decoder_lstm = Bidirectional(LSTM(units=HIDDEN_UNITS*2, return_state = True, name='decoder_lstm'))
decoder_outputs, forward_h, forward_c, backward_h, backward_c= decoder_lstm(decoder_inputs_2)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
decoder_states = [state_h, state_c]
decoder_dense = Dense(units=num_decoder_tokens, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs, attention], [decoder_outputs] + decoder_states)
This generates a graph disconnection error: Graph disconnected: cannot obtain value for tensor Tensor("encoder_inputs_61:0", shape=(?, 1037), dtype=float32) at layer "encoder_inputs". The following previous layers were accessed without issue: []
It should be possible to do inference like this, but I can't get past this error. It isn't possible for me to simply add decoder_output and attention together because they're of different shapes.

Resources