Keras how to see in the black box of a model - keras

I wanted to see dimensions while model is training because model trains in a black box.
I wanted to see dimension of decoder input iteratively as decoder moves in time series. It is a Teacher forcing decoder. But when i fit model it just show me accuracy, loss, epochs and iterations even if i appy verbose 1 or 2. Code is below in keras.
# TRAINING WITH TEACHER FORCING
# Define an input sequence and process it.
encoder_inputs= Input(shape=(n_timesteps_in, n_features))
encoder_lstm=LSTM(LSTMoutputDimension, return_state=True)
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
# We discard `LSTM_outputs` and only keep the other states.
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None, n_features), name='decoder_inputs')
decoder_lstm = LSTM(LSTMoutputDimension, return_sequences=True, return_state=True, name='decoder_lstm')
# Set up the decoder, using `context vector` as initial state.
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
#complete the decoder model by adding a Dense layer with Softmax activation function
#for prediction of the next output
#Dense layer will output one-hot encoded representation as we did for input
#Therefore, we will use n_features number of neurons
decoder_dense = Dense(n_features, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)

Related

Inferece model creation using model keras

Decoder model inference is giving error graph disconnected error during inference Decoder model inference is giving error graph disconnected error during inference Decoder model inference is giving error graph disconnected error during inference
# TRAINING WITH TEACHER FORCING
# Define an input sequence and process it.
encoder_inputs= Input(shape=(n_timesteps_in, n_features))
encoder_lstm = LSTM(LSTMoutputDimension, return_sequences=True, return_state=True, name='encoder_lstm')
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
# We discard `LSTM_outputs` and only keep the other states.
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None, n_features), name='decoder_inputs')
attention= BahdanauAttention(LSTMoutputDimension, verbose = 1)
decoder_lstm = LSTM(LSTMoutputDimension, return_sequences=True, name='decoder_lstm')
decoder_lstm1 = LSTM(LSTMoutputDimension, return_sequences=True, return_state=True, name='decoder_lstm1')
# Set up the decoder, using `context vector` as initial state.
decoder_outputs= decoder_lstm(decoder_inputs,initial_state=encoder_states)
context_vector, weights = attention(decoder_outputs, LSTM_outputs)
#context_vector = tf.expand_dims(context_vector, 1)
decoder_outputs2 = tf.concat([context_vector, decoder_outputs], axis=-1)
decoder_outputs, s, h = decoder_lstm1(decoder_outputs2)
#complete the decoder model by adding a Dense layer with Softmax activation function
#for prediction of the next output
#Dense layer will output one-hot encoded representation as we did for input
#Therefore, we will use n_features number of neurons
decoder_dense = Dense(n_features, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)
Inference given under is not working
decoder_state_input_h = Input(shape=(LSTMoutputDimension,))
decoder_state_input_c = Input(shape=(LSTMoutputDimension,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
context_vector, weights = attention(decoder_outputs, LSTM_outputs)
decoder_outputs2 = tf.concat([context_vector, decoder_outputs], axis=-1)
decoder_outputs3, dh, dc = decoder_lstm1(decoder_outputs2)
deStates = [dh, dc]
decoder_model = Model( [decoder_inputs] + decoder_states_inputs, [decoder_outputs3] +deStates)

Negative loss values - Seq2seq model Keras

I'm trying to build a chatbot using the seq2seq model in Keras. I have used the standard seq2seq model specified in the Keras Blog. I have used Word2vec for word embeddings. My problem is that I am getting negative values for loss while training.
Why is this happening and how can I fix it ? Thanks.
from keras.models import Model
from keras.layers import Input, LSTM, Dense
# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2)
I found the reason of the negative values.
mainly because the vector representation from word2vec contains negative values so the loss function does not calculate the loss correctly
Instead of using the vector representation in the decoder_target_data, use one hot encoder of the vocabulary of all the words in the word2vec model

LSTM Encoder-Decoder Inference Model

Many tutorials for seq2seq encoder-decoder architecture based on LSTM, (for example English-French translation), define the model as follow:
encoder_inputs = Input(shape=(None,))
en_x= Embedding(num_encoder_tokens, embedding_size)(encoder_inputs)
# Encoder lstm
encoder = LSTM(50, return_state=True)
encoder_outputs, state_h, state_c = encoder(en_x)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
# french word embeddings
dex= Embedding(num_decoder_tokens, embedding_size)
final_dex= dex(decoder_inputs)
# decoder lstm
decoder_lstm = LSTM(50, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(final_dex,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# While training, model takes eng and french words and outputs #translated french word
fullmodel = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# rmsprop is preferred for nlp tasks
fullmodel.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])
fullmodel.fit([encoder_input_data, decoder_input_data], decoder_target_data,
batch_size=128,
epochs=100,
validation_split=0.20)
Then for prediction, they define infernce models as follow:
# define the encoder model
encoder_model = Model(encoder_inputs, encoder_states)
encoder_model.summary()
# Redefine the decoder model with decoder will be getting below inputs from encoder while in prediction
decoder_state_input_h = Input(shape=(50,))
decoder_state_input_c = Input(shape=(50,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
final_dex2= dex(decoder_inputs)
decoder_outputs2, state_h2, state_c2 = decoder_lstm(final_dex2, initial_state=decoder_states_inputs)
decoder_states2 = [state_h2, state_c2]
decoder_outputs2 = decoder_dense(decoder_outputs2)
# sampling model will take encoder states and decoder_input(seed initially) and output the predictions(french word index) We dont care about decoder_states2
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs2] + decoder_states2)
Then predict using:
# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
(i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
(i, char) for char, i in target_token_index.items())
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1,1))
# Populate the first character of target sequence with the start character.
target_seq[0, 0] = target_token_index['START_']
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, h, c = decoder_model.predict(
[target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += ' '+sampled_char
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '_END' or
len(decoded_sentence) > 52):
stop_condition = True
# Update the target sequence (of length 1).
target_seq = np.zeros((1,1))
target_seq[0, 0] = sampled_token_index
# Update states
states_value = [h, c]
return decoded_sentence
My question is, they trained the model with the name 'fullmodel' to get best weights ... in prediction part, they used the inference models with names (encoder_model & decoder_model) ... so they didn't use any weights from the 'fullmodel' ?!
I don't understand how they benefit from the trained model!
The trick is that everything is in the same variable scope, so the variables got reused.
If you notice carefully, the trained layer weights are being reused.
For example, while creating decoder_model we use decoder_lstm layer which was defined as a part of full model,
decoder_outputs2, state_h2, state_c2 = decoder_lstm(final_dex2, initial_state=decoder_states_inputs),
and encoder model too uses, encoder_inputs and encoder_states layer defined previously.
encoder_model = Model(encoder_inputs, encoder_states)
Due to the architecture of the encoder-decoder model, we need to perform these implementations hacks.
Also, as the keras documentation mentions, With the functional API, it is easy to reuse trained models: you can treat any model as if it were a layer, by calling it on a tensor. Note that by calling a model you aren't just reusing the architecture of the model, you are also reusing its weights. For more details refer - https://keras.io/getting-started/functional-api-guide/#all-models-are-callable-just-like-layers

Is it possible to make longer output length with keras?

I want to predict some index related to space weather(kp, Dst,etc..) with RNN or LSTM. It was possible to build many to one model although it shows poor accuracy. However, my goal is to predict 7 days in the future with last 3 days observation.
The question is, is it functionally possible to build RNN which has longer output length(or timestep?) than input?
Any help would be greatly appreciated! Please help me.
You can. One way of doing it is using the so called "sequence to sequence" architecture. It enables you to input a sequence of data points and predict a sequence of values. It works by encoding the input with a fixed size vector (usually the last hidden state of an LSTM) and then using it as starting state for another LSTM that is unrolled for n time steps.
If you have labels for all the 7 steps you want to predict then you can use a model as the following one where the decoder takes as input the labels as input for each time step 0 <= t < n:
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
As you can see the first LSTM encode the sequence with len num_encoder_token and we only keep its last state [encoder_states = [state_h, state_c]] and use it as initialization for the second LSTM (decoder_lstm) that is then fed with the labels and try to predict the label vector right shifted of 1.
For example if you have as input sequence day1, day2, day3 and you want to predict day4, day5, ..., day10, then you will feed the encoder with day1 to day3 and use day4 ... day9 as input for the decoder. The decoder label then will be day5 ... day10.
You can read more about this kind of model at https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html.

Keras seq2seq padding

I am working on seq2seq chatbot. I would ask you, how to ignore PAD symbols in chatbots responses while val_acc is counting.
For example, my model generates response: [I, am, reading, a, book, PAD, PAD, PAD, PAD, PAD]
But, right response should be: [My, brother, is, playing, fotball,PAD, PAD, PAD, PAD, PAD].
In this case, chatbot responded totally wrong, but val_acc is 50% because of padding symbols.
I use Keras, encoder-decoder model (https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html) with teacher forcing
My code is here:
encoder_inputs = Input(shape=(sentenceLength,), name="Encoder_input")
encoder = LSTM(n_units, return_state=True, name='Encoder_lstm')
Shared_Embedding = Embedding(output_dim=embedding, input_dim=vocab_size, name="Embedding", mask_zero='True')
word_embedding_context = Shared_Embedding(encoder_inputs)
encoder_outputs, state_h, state_c = encoder(word_embedding_context)
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None,), name="Decoder_input")
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True, name="Decoder_lstm")
word_embedding_answer = Shared_Embedding(decoder_inputs)
decoder_outputs, _, _ = decoder_lstm(word_embedding_answer, initial_state=encoder_states)
decoder_dense = Dense(vocab_size, activation='softmax', name="Dense_layer")
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
Encoder input is sentence where each word is integer and 0 is padding: [1,2,5,4,3,0,0,0] -> User question
Decoder input is also sentence where each word is integer, 0 is padding and 100 is symbol GO: [100,8,4,2,0,0,0,0,0]] ->chatbot response shifted one timestamp
decoder output is sentence, where words are integers, and these integers are one hot encoded: [8,4,2,0,0,0,0,0, 0]] ->chatbot response (integers are one hot encoded.)
Problem is, that val_acc is too hight, also whan model predicts totaly wrong sentences. I think that it is caused because of paddings. Is there something wrong with my model? Should I add some another mask to my decoder?
Here is my graphs:
You are correct, it is because that tutorial doesn't use Masking (documentation) to ignore those padding values and shows examples of equal input output length. In your case, the model will still input output PAD but the mask will ignore them. For example, to mask the encoder:
# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder_inputs = Masking()(encoder_inputs) # Assuming PAD is zeros
encoder = LSTM(latent_dim, return_state=True)
# Now the LSTM will ignore the PADs when encoding
# by skipping those timesteps that are masked
encoder_outputs, state_h, state_c = encoder(encoder_inputs)

Resources