I am using 9 features and 18 time steps in the past to forecast 3 values in the future:
lookback = 18
forecast = 3
n_features_X = 9
n_features_Y = 1
My code is:
# Encoder
past_inputs = tf.keras.Input(shape=(lookback, n_features_X), name='past_inputs')
encoder = tf.keras.layers.LSTM(128, return_state=True)
encoder_outputs, state_h, state_c = encoder(past_inputs)
# Decoder
future_inputs = tf.keras.Input(shape=(forecast, n_features_Y), name='future_inputs')
decoder_lstm = tf.keras.layers.LSTM(128, return_sequences=True)
x = decoder_lstm(future_inputs, initial_state=[state_h, state_c])
output = tf.keras.layers.Dense(1, activation='linear')(x)
# Create the model
model = tf.keras.models.Model(inputs=[past_inputs, future_inputs], outputs=output)
The model looks like this
I am afraid the problem is with this line:
future_inputs = tf.keras.Input(shape=(forecast, n_features_Y), name='future_inputs')
The error I am getting is:
AssertionError: Could not compute output Tensor("dense_23/Identity:0", shape=(None, 3, 1), dtype=float32)
Any ideas on how to correctly implement this?
Related
Decoder model inference is giving error graph disconnected error during inference Decoder model inference is giving error graph disconnected error during inference Decoder model inference is giving error graph disconnected error during inference
# TRAINING WITH TEACHER FORCING
# Define an input sequence and process it.
encoder_inputs= Input(shape=(n_timesteps_in, n_features))
encoder_lstm = LSTM(LSTMoutputDimension, return_sequences=True, return_state=True, name='encoder_lstm')
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
# We discard `LSTM_outputs` and only keep the other states.
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None, n_features), name='decoder_inputs')
attention= BahdanauAttention(LSTMoutputDimension, verbose = 1)
decoder_lstm = LSTM(LSTMoutputDimension, return_sequences=True, name='decoder_lstm')
decoder_lstm1 = LSTM(LSTMoutputDimension, return_sequences=True, return_state=True, name='decoder_lstm1')
# Set up the decoder, using `context vector` as initial state.
decoder_outputs= decoder_lstm(decoder_inputs,initial_state=encoder_states)
context_vector, weights = attention(decoder_outputs, LSTM_outputs)
#context_vector = tf.expand_dims(context_vector, 1)
decoder_outputs2 = tf.concat([context_vector, decoder_outputs], axis=-1)
decoder_outputs, s, h = decoder_lstm1(decoder_outputs2)
#complete the decoder model by adding a Dense layer with Softmax activation function
#for prediction of the next output
#Dense layer will output one-hot encoded representation as we did for input
#Therefore, we will use n_features number of neurons
decoder_dense = Dense(n_features, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)
Inference given under is not working
decoder_state_input_h = Input(shape=(LSTMoutputDimension,))
decoder_state_input_c = Input(shape=(LSTMoutputDimension,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
context_vector, weights = attention(decoder_outputs, LSTM_outputs)
decoder_outputs2 = tf.concat([context_vector, decoder_outputs], axis=-1)
decoder_outputs3, dh, dc = decoder_lstm1(decoder_outputs2)
deStates = [dh, dc]
decoder_model = Model( [decoder_inputs] + decoder_states_inputs, [decoder_outputs3] +deStates)
This is my model
engine1 = tf.keras.applications.Xception(
# Freezing the weights of the top layer in the InceptionResNetV2 pre-traiined model
include_top = False,
# Use Imagenet weights
weights = 'imagenet',
# Define input shape to 224x224x3
input_shape = (256, 256 , 3)
)
x1 = tf.keras.layers.GlobalAveragePooling2D(name = 'avg_pool')(engine1.output)
x1 =tf.keras.layers.Dropout(0.75)(x1)
x1 = tf.keras.layers.BatchNormalization(
axis=-1,
momentum=0.99,
epsilon=0.01,
center=True,
scale=True,
beta_initializer="zeros",
gamma_initializer="ones",
moving_mean_initializer="zeros",
moving_variance_initializer="ones",
)(x1)
out1 = tf.keras.layers.Dense(3, activation = 'softmax', name = 'dense_output')(x1)
# Build the Keras model
model1 = tf.keras.models.Model(inputs = engine1.input, outputs = out1)
# Compile the model
model1.compile(
# Set optimizer to Adam(0.0001)
optimizer = tf.keras.optimizers.Adam(learning_rate= 3e-4),
#optimizer= SGD(lr=0.001, decay=1e-6, momentum=0.99, nesterov=True),
# Set loss to binary crossentropy
#loss = tf.keras.losses.SparseCategoricalCrossentropy(),
loss = 'categorical_crossentropy',
# Set metrics to accuracy
metrics = ['accuracy']
)
I want logits so I wrote this
logits = model1(X_test)
probs = tf.nn.softmax(logits)
Getting error as
ResourceExhaustedError: OOM when allocating tensor with shape[1288,64,125,125] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Conv2D]
How to fix this and get the logits? I want to apply the distillation method after getting the logits. My test set consists of 3 classes and 60 samples.
so logit matrix should be a matrix of 60 * 3.
Edit
To get the logits(1288 * 3) I made a change in the output layer of my model
out1 = tf.keras.layers.Dense(3, activation = 'linear', name = 'dense_output')(x1)
Now I am getting logits,
y_pred_logits = model1.predict(X_test)
I want to apply softmax on this, My softmax function looks like this,
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x)
return e_x / e_x.sum(axis=1)
But when I am doing this
y_pred_logits_activated = softmax(y_pred_logits)
Getting errors as
How to fix this and is this method correct? Further, I want to apply this on logits
I have a keras model which is trained on 5 classes,The final layers of the model look like so
dr_steps = Dropout(0.25)(Dense(128, activation = 'relu')(gap_dr))
out_layer = Dense(5, activation = 'softmax')(dr_steps)
model = Model(inputs = [in_lay], outputs = [out_layer])
What I want to do is fine tune this model on an 8 class multilabel problem but I am not sure how to achieve this. This is what I have tried:
dr_steps = Dropout(0.25)(Dense(128, activation = 'relu')(gap_dr))
out_layer = Dense(t_y.shape[-1], activation = 'softmax')(dr_steps)
model = Model(inputs = [in_lay], outputs = [out_layer])
weights_path = 'weights.best.hdf5'
retina_model.load_weights(weights_path)
model.layers.pop()
output = Dense(8, activation = 'sigmoid')(model.layers[-1].output)
model = Model(inputs = [in_lay], outputs = [output])
loss = 'binary_crossentropy'
model.compile(optimizer = RAdam(), loss = FocalLoss,
metrics = ["binary_accuracy",precision, recall,auc])
but this will raise an error like this
raise ValueError(str(e))
ValueError: Dimension 1 in both shapes must be equal, but are 8 and 5. Shapes are [128,8] and [128,5]. for 'Assign_390' (op: 'Assign') with input shapes: [128,8], [128,5].
Any suggestions on how to fine tune this model will be very helpful,Thanks in advance.
Here,
model = Model(inputs = [in_lay], outputs = [out_layer])
weights_path = 'weights.best.hdf5'
this out_layer should have the same dimension(5 classes) described inside weights.best.hdf5.
So, t_y.shape[-1] should be 5 dimensional, not 8.
I've successfully trained a model in Keras using an encoder/decoder structure + attention + glove following several examples, most notably this one and this one. It's based on a modification of machine translation. This is a chatbot, so the input is words and so is the output. However, I've struggled to setup inference (prediction) properly and can't figure it out how to get past a graph disconnect. My bidirectional RNN encoder/decoder with embedding and attention is training fine. I've tried modifying the decoder, but feel there is something obvious that I'm not seeing.
Here is the basic model:
from keras.models import Model
from keras.layers.recurrent import LSTM
from keras.layers import Dense, Input, Embedding, Bidirectional, RepeatVector, concatenate, Concatenate
## PARAMETERS
HIDDEN_UNITS = 100
encoder_max_seq_length = 1037 # maximum size of input sequence
decoder_max_seq_length = 187 # maximum size of output sequence
num_encoder_tokens = 6502 # a.k.a the size of the input vocabulary
num_decoder_tokens = 4802 # a.k.a the size of the output vocabulary
## ENCODER
encoder_inputs = Input(shape=(encoder_max_seq_length, ), name='encoder_inputs')
encoder_embedding = Embedding(input_dim = num_encoder_tokens,
output_dim = HIDDEN_UNITS,
input_length = encoder_max_seq_length,
weights = [embedding_matrix],
name='encoder_embedding')(encoder_inputs)
encoder_lstm = Bidirectional(LSTM(units = HIDDEN_UNITS,
return_sequences=True,
name='encoder_lstm'))(encoder_embedding)
## ATTENTION
attention = AttentionL(encoder_max_seq_length)(encoder_lstm)
attention = RepeatVector(decoder_max_seq_length)(attention)
## DECODER
decoder_inputs = Input(shape = (decoder_max_seq_length, num_decoder_tokens),
name='decoder_inputs')
merge = concatenate([attention, decoder_inputs])
decoder_lstm = Bidirectional(LSTM(units = HIDDEN_UNITS*2,
return_sequences = True,
name='decoder_lstm'))(merge)
decoder_dense = Dense(units=num_decoder_tokens,
activation='softmax',
name='decoder_dense')(decoder_lstm)
decoder_outputs = decoder_dense
## Configure the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.load_weights('trained_models/viable-2/word-weights.h5')
encoder_model = Model(encoder_inputs, attention)
model.compile(loss='categorical_crossentropy', optimizer='adam')
It looks like this:
enter image description here
Here is where I run into trouble:
## INFERENCE decoder setup
decoder_inputs_2 = Concatenate()([decoder_inputs, attention])
decoder_lstm = Bidirectional(LSTM(units=HIDDEN_UNITS*2, return_state = True, name='decoder_lstm'))
decoder_outputs, forward_h, forward_c, backward_h, backward_c= decoder_lstm(decoder_inputs_2)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
decoder_states = [state_h, state_c]
decoder_dense = Dense(units=num_decoder_tokens, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs, attention], [decoder_outputs] + decoder_states)
This generates a graph disconnection error: Graph disconnected: cannot obtain value for tensor Tensor("encoder_inputs_61:0", shape=(?, 1037), dtype=float32) at layer "encoder_inputs". The following previous layers were accessed without issue: []
It should be possible to do inference like this, but I can't get past this error. It isn't possible for me to simply add decoder_output and attention together because they're of different shapes.
I’m trying to implement a Visual Storytelling model using Keras with a hierarchical RNN model, basically Neural Image Captioner style but over a sequence of photos with a bidirectional RNN on top of the decoder RNNs.
I implemented and tested the three parts of this model, CNN, BRNN and decoder RNN separately but got this error when trying to connect them:
ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
My code are as follows:
#vgg16 model with the fc2 layer as output
cnn_base_model = self.cnn_model.base_model
brnn_model = self.brnn_model.model
rnn_model = self.rnn_model.model
cnn_part = TimeDistributed(cnn_base_model)
img_input = Input((self.story_length,) + self.cnn_model.input_shape, name='brnn_img_input')
extracted_feature = cnn_part(img_input)
#[None, 5, 512], a 512 length vector for each picture in the story
brnn_feature = brnn_model(extracted_feature)
#[None, 5, 25], input groundtruth word indices fed as input when training
decoder_input = Input((self.story_length, self.max_length), name='brnn_decoder_input')
decoder_outputs = []
for i in range(self.story_length):
#separate timesteps for decoding
decoder_input_i = Lambda(lambda x: x[:, i, :])(decoder_input)
brnn_feature_i = Lambda(lambda x: x[:, i, :])(brnn_feature)
#the problem persists when using Dense instead of the Lambda layers above
#decoder_input_i = Dense(25)(Reshape((125,))(decoder_input))
#brnn_feature_i = Dense(512)(Reshape((5 * 512,))(brnn_feature))
decoder_output_i = rnn_model([decoder_input_i, brnn_feature_i])
decoder_outputs.append(decoder_output_i)
decoder_output = Concatenate(axis=-2, name='brnn_decoder_output')(decoder_outputs)
self.model = Model([img_input, decoder_input], decoder_output)
And codes for the BRNN:
image_feature = Input(shape=(self.story_length, self.img_feature_dim,))
image_emb = TimeDistributed(Dense(self.lstm_size))(image_feature)
brnn = Bidirectional(LSTM(self.lstm_size, return_sequences=True), merge_mode='concat')(image_emb)
brnn_emb = TimeDistributed(Dense(self.lstm_size))(brnn)
self.model = Model(inputs=image_feature, outputs=brnn_emb)
And RNN:
#[None, 512], the vector to be decoded
initial_input = Input(shape=(self.input_dim,), name='rnn_initial_input')
#[None, 25], the groundtruth word indices fed as input when training
decoder_inputs = Input(shape=(None,), name='rnn_decoder_inputs')
decoder_input_masking = Masking(mask_value=0.0)(decoder_inputs)
decoder_input_embeddings = Embedding(self.vocabulary_size, self.emb_size,
embeddings_regularizer=l2(regularizer))(decoder_input_masking)
decoder_input_dropout = Dropout(.5)(decoder_input_embeddings)
initial_emb = Dense(self.emb_size,
kernel_regularizer=l2(regularizer))(initial_input)
initial_reshape = Reshape((1, self.emb_size))(initial_emb)
initial_masking = Masking(mask_value=0.0)(initial_reshape)
initial_dropout = Dropout(.5)(initial_masking)
decoder_lstm = LSTM(self.hidden_dim, return_sequences=True, return_state=True,
recurrent_regularizer=l2(regularizer),
kernel_regularizer=l2(regularizer),
bias_regularizer=l2(regularizer))
_, initial_hidden_h, initial_hidden_c = decoder_lstm(initial_dropout)
decoder_outputs, decoder_state_h, decoder_state_c = decoder_lstm(decoder_input_dropout,
initial_state=[initial_hidden_h, initial_hidden_c])
decoder_output_dense_layer = TimeDistributed(Dense(self.vocabulary_size, activation='softmax',
kernel_regularizer=l2(regularizer)))
decoder_output_dense = decoder_output_dense_layer(decoder_outputs)
self.model = Model([decoder_inputs, initial_input], decoder_output_dense)
I’m using adam as optimizer and sparse_categorical_crossentropy as loss.
At first I thought the problem is with the Lambda layers used for splitting the timesteps but the problem persists when I replaced them with Dense layers (which are guarantee
I had a similar error and it turned out I was suppose to build the layers (in my custom layer or model) in the init() like so:
self.lstm_custom_1 = keras.layers.LSTM(128,batch_input_shape=batch_input_shape, return_sequences=False,stateful=True)
self.lstm_custom_1.build(batch_input_shape)
self.dense_custom_1 = keras.layers.Dense(32, activation = 'relu')
self.dense_custom_1.build(input_shape=(batch_size, 128))```
The issue is actually with the Embedding layer, I think. Gradients can't pass through an Embedding layer, so unless it's the first layer in the model it won't work.