LSTM Model in Keras with Auxiliary Inputs - keras

I have a dataset with 2 columns - Each column contains a set of documents. I have to match the document in Col A with documents provided in Col B. This is a supervised classification problem. So my training data contains a label column indicating whether the documents match or not.
To solve the problem, I have a created a set of features, say f1-f25 (by comparing the 2 documents) and then trained a binary classifier on these features. This approach works reasonably well, but now I would like to evaluate Deep Learning models on this problem (specifically, LSTM models).
I am using the keras library in Python. After going through the keras documentation and other tutorials available online, I have managed to do the following:
from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model
# Each document contains a series of 200 words
# The necessary text pre-processing steps have been completed to transform
each doc to a fixed length seq
main_input1 = Input(shape=(200,), dtype='int32', name='main_input1')
main_input2 = Input(shape=(200,), dtype='int32', name='main_input2')
# Next I add a word embedding layer (embed_matrix is separately created
for each word in my vocabulary by reading from a pre-trained embedding model)
x = Embedding(output_dim=300, input_dim=20000,
input_length=200, weights = [embed_matrix])(main_input1)
y = Embedding(output_dim=300, input_dim=20000,
input_length=200, weights = [embed_matrix])(main_input2)
# Next separately pass each layer thru a lstm layer to transform seq of
vectors into a single sequence
lstm_out_x1 = LSTM(32)(x)
lstm_out_x2 = LSTM(32)(y)
# concatenate the 2 layers and stack a dense layer on top
x = keras.layers.concatenate([lstm_out_x1, lstm_out_x2])
x = Dense(64, activation='relu')(x)
# generate intermediate output
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(x)
# add auxiliary input - auxiliary inputs contains 25 features for each document pair
auxiliary_input = Input(shape=(25,), name='aux_input')
# merge aux output with aux input and stack dense layer on top
main_input = keras.layers.concatenate([auxiliary_output, auxiliary_input])
x = Dense(64, activation='relu')(main_input)
x = Dense(64, activation='relu')(x)
# finally add the main output layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
model = Model(inputs=[main_input1, main_input2, auxiliary_input], outputs= main_output)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit([x1, x2,aux_input], y,
epochs=3, batch_size=32)
However, when I score this on the training data, I get the same prob. score for all cases. The issue seems to be with the way auxiliary input is fed in (because it generates meaningful output when I remove the aux. input).
I also tried inserting the auxiliary input at different places in the network. But somehow I couldnt get this to work.
Any pointers?

Well, this is open for several months and people are voting it up.
I did something very similar recently using this dataset that can be used to forecast credit card defaults and it contains categorical data of customers (gender, education level, marriage status etc.) as well as payment history as time series. So I had to merge time series with non-series data. My solution was very similar to yours by combining LSTM with a dense, I try to adopt the approach to your problem. What worked for me is dense layer(s) on the auxiliary input.
Furthermore in your case a shared layer would make sense so the same weights are used to "read" both documents. My proposal for testing on your data:
from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model
# Each document contains a series of 200 words
# The necessary text pre-processing steps have been completed to transform
each doc to a fixed length seq
main_input1 = Input(shape=(200,), dtype='int32', name='main_input1')
main_input2 = Input(shape=(200,), dtype='int32', name='main_input2')
# Next I add a word embedding layer (embed_matrix is separately created
for each word in my vocabulary by reading from a pre-trained embedding model)
x1 = Embedding(output_dim=300, input_dim=20000,
input_length=200, weights = [embed_matrix])(main_input1)
x2 = Embedding(output_dim=300, input_dim=20000,
input_length=200, weights = [embed_matrix])(main_input2)
# Next separately pass each layer thru a lstm layer to transform seq of
vectors into a single sequence
# Comment Manngo: Here I changed to shared layer
# Also renamed y as input as it was confusing
# Now x and y are x1 and x2
lstm_reader = LSTM(32)
lstm_out_x1 = lstm_reader(x1)
lstm_out_x2 = lstm_reader(x2)
# concatenate the 2 layers and stack a dense layer on top
x = keras.layers.concatenate([lstm_out_x1, lstm_out_x2])
x = Dense(64, activation='relu')(x)
x = Dense(32, activation='relu')(x)
# generate intermediate output
# Comment Manngo: This is created as a dead-end
# It will not be used as an input of any layers below
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(x)
# add auxiliary input - auxiliary inputs contains 25 features for each document pair
# Comment Manngo: Dense branch on the comparison features
auxiliary_input = Input(shape=(25,), name='aux_input')
auxiliary_input = Dense(64, activation='relu')(auxiliary_input)
auxiliary_input = Dense(32, activation='relu')(auxiliary_input)
# OLD: merge aux output with aux input and stack dense layer on top
# Comment Manngo: actually this is merging the aux output preparation dense with the aux input processing dense
main_input = keras.layers.concatenate([x, auxiliary_input])
main = Dense(64, activation='relu')(main_input)
main = Dense(64, activation='relu')(main)
# finally add the main output layer
main_output = Dense(1, activation='sigmoid', name='main_output')(main)
# Compile
# Comment Manngo: also define weighting of outputs, main as 1, auxiliary as 0.5
model.compile(optimizer=adam,
loss={'main_output': 'w_binary_crossentropy', 'aux_output': 'binary_crossentropy'},
loss_weights={'main_output': 1.,'auxiliary_output': 0.5},
metrics=['accuracy'])
# Train model on main_output and on auxiliary_output as a support
# Comment Manngo: Unknown information marked with placeholders ____
# We have 3 inputs: x1 and x2: the 2 strings
# aux_in: the 25 features
# We have 2 outputs: main and auxiliary; both have the same targets -> (binary)y
model.fit({'main_input1': __x1__, 'main_input2': __x2__, 'auxiliary_input' : __aux_in__}, {'main_output': __y__, 'auxiliary_output': __y__},
epochs=1000,
batch_size=__,
validation_split=0.1,
callbacks=[____])
I don't know how much this can help since I don't have your data so I can't try. Nevertheless this is my best shot.
I didn't run the above code for obvious reasons.

I found answers from https://datascience.stackexchange.com/questions/17099/adding-features-to-time-series-model-lstm Mr.Philippe Remy wrote a library to condition on auxiliary inputs. I used his library and it's very helpful.
# 10 stations
# 365 days
# 3 continuous variables A and B => C is target.
# 2 conditions dim=5 and dim=1. First cond is one-hot. Second is continuous.
import numpy as np
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from cond_rnn import ConditionalRNN
stations = 10 # 10 stations.
time_steps = 365 # 365 days.
continuous_variables_per_station = 3 # A,B,C where C is the target.
condition_variables_per_station = 2 # 2 variables of dim 5 and 1.
condition_dim_1 = 5
condition_dim_2 = 1
np.random.seed(123)
continuous_data = np.random.uniform(size=(stations, time_steps, continuous_variables_per_station))
condition_data_1 = np.zeros(shape=(stations, condition_dim_1))
condition_data_1[:, 0] = 1 # dummy.
condition_data_2 = np.random.uniform(size=(stations, condition_dim_2))
window = 50 # we split series in 50 days (look-back window)
x, y, c1, c2 = [], [], [], []
for i in range(window, continuous_data.shape[1]):
x.append(continuous_data[:, i - window:i])
y.append(continuous_data[:, i])
c1.append(condition_data_1) # just replicate.
c2.append(condition_data_2) # just replicate.
# now we have (batch_dim, station_dim, time_steps, input_dim).
x = np.array(x)
y = np.array(y)
c1 = np.array(c1)
c2 = np.array(c2)
print(x.shape, y.shape, c1.shape, c2.shape)
# let's collapse the station_dim in the batch_dim.
x = np.reshape(x, [-1, window, x.shape[-1]])
y = np.reshape(y, [-1, y.shape[-1]])
c1 = np.reshape(c1, [-1, c1.shape[-1]])
c2 = np.reshape(c2, [-1, c2.shape[-1]])
print(x.shape, y.shape, c1.shape, c2.shape)
model = Sequential(layers=[
ConditionalRNN(10, cell='GRU'), # num_cells = 10
Dense(units=1, activation='linear') # regression problem.
])
model.compile(optimizer='adam', loss='mse')
model.fit(x=[x, c1, c2], y=y, epochs=2, validation_split=0.2)

Related

Keras, how to feed an Embedding layer with a random sampling of a Softmax layer

In the model I am constructing, I have the following layer:
y = layers.Dense(10, activation="softmax")(x)
And I want the next layer of this model to be an Embedding layer that "represent" the choice made by the Dense layer.
I.e, I want
to sample a choice from y (based on the probability "represented" by the values of the softmax)
to turn this choice into an Embedding Layer with vocabulary size 10.
Any idea how to do this ?
Regards
Initial answer
Add a layer that takes the argmax of the output of the dense layer before feeding it into the embedding layer to propagate the most likely category label:
import tensorflow as tf
from keras import backend as K
# generate some data
BATCH_SIZE,INPUT_DIM = (4,2)
x = tf.random.uniform([BATCH_SIZE,INPUT_DIM])
# model
NUM_CLASSES = 10
EMBEDDING_DIM = 10
dense = tf.keras.layers.Dense(NUM_CLASSES,activation='softmax')(x)
argmax = tf.keras.layers.Lambda(lambda x: K.argmax(x,axis=-1))(dense)
emb = tf.keras.layers.Embedding(NUM_CLASSES,EMBEDDING_DIM)(argmax)
Updated answer
If you want to propagate a randomly sampled category label instead of the most likely category label, you can do so by using tf.random.categorical. Note that tf.random.categorical takes logits as inputs, so you don't need the softmax activation at the end of the dense layer.
NUM_CLASSES = 10
EMBEDDING_DIM = 10
logits = tf.keras.layers.Dense(NUM_CLASSES)(x)
sample = tf.keras.layers.Lambda(lambda logits: tf.squeeze(tf.random.categorical(logits, 1)))(logits)
emb = tf.keras.layers.Embedding(NUM_CLASSES,EMBEDDING_DIM)(sample)

Sentiment Analysis using LSTM (Model has not not generate good output)

I Make a sentiment analysis model using LSTM but my model gives very bad prediction.
Here is the complete code
Dataset for amazon review
My LSTM model looks like this:
def ltsm_model(input_shape, word_to_vec_map, word_to_index):
"""
Function creating the ltsm_model model's graph.
Arguments:
input_shape -- shape of the input, usually (max_len,)
word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
Returns:
model -- a model instance in Keras
"""
### START CODE HERE ###
# Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
sentence_indices = Input(shape=input_shape, dtype='int32')
# Create the embedding layer pretrained with GloVe Vectors (≈1 line)
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
# Propagate sentence_indices through your embedding layer, you get back the embeddings
embeddings = embedding_layer(sentence_indices)
# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a batch of sequences.
X = LSTM(128, return_sequences=True)(embeddings)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X trough another LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a single hidden state, not a batch of sequences.
X = LSTM(128, return_sequences=False)(X)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
X = Dense(2, activation='relu')(X)
# Add a softmax activation
X = Activation('softmax')(X)
# Create Model instance which converts sentence_indices into X.
model = Model(inputs=[sentence_indices], outputs=X)
### END CODE HERE ###
return model
Here is what my training dataset looks like:
This is my testing data:
x_test = np.array(['amazing!: this soundtrack is my favorite music..'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+ str(np.argmax(model.predict(X_test_indices))))
I got following out for this:
amazing!: this soundtrack is my favorite music.. 0
But it should be positive sentiment and should be 1
Also this my fit model output:
How can I improve my model performance? This pretty bad model I suppose.

How to use the input gradients as variables within a custom loss function in Keras?

I am using the input gradient as feature important and want to compare the feature importance of a train datapoint with the human annotated feature importance. I would like to make this comparison differentiable such that it can be learned through backpropagation. For that, I am writing a custom loss function that in addition to the regular loss (e.g. m.s.e. on the prediction vs true labels) also checks whether the input gradient is correct (e.g. m.s.e. of the input gradient vs the human annotated feature importance).
With the following code I am able to get the input gradient:
from keras import backend as K
import numpy as np
from keras.models import Model
from keras.layers import Input, Dense
def normalize(x):
# utility function to normalize a tensor by its L2 norm
return x / (K.sqrt(K.mean(K.square(x))) + 1e-5)
# Amount of training samples
N = 1000
input_dim = 10
# Generate training set make the 1st and 2nd feature same as the target feature
X = np.random.standard_normal(size=(N, input_dim))
y = np.random.randint(low=0, high=2, size=(N, 1))
X[:, 1] = y[:, 0]
X[:, 2] = y[:, 0]
# Create simple model
inputs = Input(shape=(input_dim,))
x = Dense(10, name="dense1")(inputs)
output = Dense(1, activation='sigmoid')(x)
model = Model(input=[inputs], output=output)
# Compile and fit model
model.compile(optimizer='adam', loss="mse", metrics=['accuracy'])
model.fit([X], y, epochs=100, batch_size=64)
# Get function to get input gradients
gradients = K.gradients(model.output, model.input)[0]
gradient_function = K.function([model.input], [normalize(gradients)])
# Get input gradient values of the training-set
grads_val = gradient_function([X])[0]
print(grads_val[:2])
This prints the following (you can see that the 1st and the 2nd features have the highest importance):
[[ 1.2629046e-02 2.2765596e+00 2.1479919e+00 2.1558853e-02
4.5277486e-03 2.9851785e-03 9.5279224e-04 -1.0903150e-02
-1.2230731e-02 2.1960819e-02]
[ 1.1318034e-02 2.0402350e+00 1.9250139e+00 1.9320872e-02
4.0577268e-03 2.6752844e-03 8.5390132e-04 -9.7713526e-03
-1.0961102e-02 1.9681118e-02]]
How can I write a custom loss function in which the input gradients are differentiable?
I started with the following loss function.
from keras.losses import mean_squared_error
def custom_loss():
# human annotated feature importance
# Let's say that it says to only look at the second feature
human_feature_importance = []
for i in range(N):
human_feature_importance.append([0,0,1,0,0,0,0,0,0,0])
def loss(y_true, y_pred):
# Get regular loss
regular_loss_value = mean_squared_error(y_true, y_pred)
# Somehow get the input gradient of each training sample as a tensor
# It should be differential w.r.t. all of the weights
gradients = ??
feature_importance_loss_value = mean_squared_error(gradients, human_feature_importance)
# Combine the both losses
return regular_loss_value + feature_importance_loss_value
return loss
I also found an implementation in tensorflow to make the input gradient differentialble: https://github.com/dtak/rrr/blob/master/rrr/tensorflow_perceptron.py#L18

signal to signal pediction using RNN and Keras

I am trying to reproduce the nice work here and adapte it so that it reads real data from a file.
I started by generating random signals (instead of the generating methods provided in the above link). Unfortoutanyl, I could not generate the proper signals that the model can accept.
here is the code:
import numpy as np
import keras
from keras.utils import plot_model
input_sequence_length = 15 # Length of the sequence used by the encoder
target_sequence_length = 15 # Length of the sequence predicted by the decoder
import random
def getModel():# Define an input sequence.
learning_rate = 0.01
num_input_features = 1
lambda_regulariser = 0.000001 # Will not be used if regulariser is None
regulariser = None # Possible regulariser: keras.regularizers.l2(lambda_regulariser)
layers = [35, 35]
num_output_features=1
decay = 0 # Learning rate decay
loss = "mse" # Other loss functions are possible, see Keras documentation.
optimiser = keras.optimizers.Adam(lr=learning_rate, decay=decay) # Other possible optimiser "sgd" (Stochastic Gradient Descent)
encoder_inputs = keras.layers.Input(shape=(None, num_input_features))
# Create a list of RNN Cells, these are then concatenated into a single layer
# with the RNN layer.
encoder_cells = []
for hidden_neurons in layers:
encoder_cells.append(keras.layers.GRUCell(hidden_neurons, kernel_regularizer=regulariser,recurrent_regularizer=regulariser,bias_regularizer=regulariser))
encoder = keras.layers.RNN(encoder_cells, return_state=True)
encoder_outputs_and_states = encoder(encoder_inputs)
# Discard encoder outputs and only keep the states.
# The outputs are of no interest to us, the encoder's
# job is to create a state describing the input sequence.
encoder_states = encoder_outputs_and_states[1:]
# The decoder input will be set to zero (see random_sine function of the utils module).
# Do not worry about the input size being 1, I will explain that in the next cell.
decoder_inputs = keras.layers.Input(shape=(None, 1))
decoder_cells = []
for hidden_neurons in layers:
decoder_cells.append(keras.layers.GRUCell(hidden_neurons,
kernel_regularizer=regulariser,
recurrent_regularizer=regulariser,
bias_regularizer=regulariser))
decoder = keras.layers.RNN(decoder_cells, return_sequences=True, return_state=True)
# Set the initial state of the decoder to be the ouput state of the encoder.
# This is the fundamental part of the encoder-decoder.
decoder_outputs_and_states = decoder(decoder_inputs, initial_state=encoder_states)
# Only select the output of the decoder (not the states)
decoder_outputs = decoder_outputs_and_states[0]
# Apply a dense layer with linear activation to set output to correct dimension
# and scale (tanh is default activation for GRU in Keras, our output sine function can be larger then 1)
decoder_dense = keras.layers.Dense(num_output_features,
activation='linear',
kernel_regularizer=regulariser,
bias_regularizer=regulariser)
decoder_outputs = decoder_dense(decoder_outputs)
# Create a model using the functional API provided by Keras.
# The functional API is great, it gives an amazing amount of freedom in architecture of your NN.
# A read worth your time: https://keras.io/getting-started/functional-api-guide/
model = keras.models.Model(inputs=[encoder_inputs, decoder_inputs], outputs=decoder_outputs)
model.compile(optimizer=optimiser, loss=loss)
print(model.summary())
return model
def getXY():
X, y = list(), list()
for _ in range(100):
x = [random.random() for _ in range(input_sequence_length)]
y = [random.random() for _ in range(target_sequence_length)]
X.append([x,[0 for _ in range(input_sequence_length)]])
y.append(y)
return np.array(X), np.array(y)
X,y = getXY()
print(X,y)
model = getModel()
model.fit(X,y)
The error message i got is:
ValueError: Error when checking model input: the list of Numpy arrays
that you are passing to your model is not the size the model expected.
Expected to see 2 array(s), but instead got the following list of 1
arrays:
what is the correct shape of the input data for the model?
If you read carefully the source of your inspiration, you will find that he talks about the "decoder_input" data.
He talks about the "teacher forcing" technique that consists of feeding the decoder with some delayed data. But also says that it didn't really work well in his case so he puts that initial state of the decoder to a bunch of 0 as this line shows:
decoder_input = np.zeros((decoder_output.shape[0], decoder_output.shape[1], 1))
in his design of the auto-encoder, they are two separate models that have different inputs, then he ties them with RNN stats from each other.
I can see that you have tried doing the same thing but you have appended np.array([x_encoder, x_decoder]) where you should have done [np.array(x_encoder), np.array(x_decoder)]. Each input to the network should be a numpy array that you put in a list of inputs, not one big numpy array.
I also found some typos in your code, you are appending y to itself, where you should instead create a Y variable
def getXY():
X_encoder, X_decoder, Y = list(), list(), list()
for _ in range(100):
x_encoder = [random.random() for _ in range(input_sequence_length)]
# the decoder input is a sequence of 0's same length as target seq
x_decoder = [0]*len(target_sequence_length)
y = [random.random() for _ in range(target_sequence_length)]
X_encoder.append(x_encoder)
# Not really optimal but will work
X_decoder.append(x_decoder)
Y.append(y)
return [np.array(X_encoder), np.array(X_decoder], np.array(Y)
now when you do :
X, Y = getXY()
you receive X which is a list of 2 numpy arrays (as your model requests) and Y which is a single numpy array.
I hope this helps
EDIT
Indeed, in the code that generates the dataset, you can see that they build 3 dimensions np arrays for the input. RNN needs 3 dimensional inputs :-)
The following code should address the shape issue:
def getXY():
X_encoder, X_decoder, Y = list(), list(), list()
for _ in range(100):
x_encoder = [random.random() for _ in range(input_sequence_length)]
# the decoder input is a sequence of 0's same length as target seq
x_decoder = [0]*len(target_sequence_length)
y = [random.random() for _ in range(target_sequence_length)]
X_encoder.append(x_encoder)
# Not really optimal but will work
X_decoder.append(x_decoder)
Y.append(y)
# Make them as numpy arrays
X_encoder = np.array(X_encoder)
X_decoder = np.array(X_decoder)
Y = np.array(Y)
# Make them 3 dimensional arrays (with third dimension being of size 1) like the 1d vector: [1,2] can become 2 de vector [[1,2]]
X_encoder = np.expand_dims(X_encoder, axis=2)
X_decoder = np.expand_dims(X_decoder, axis=2)
Y = np.expand_dims(Y, axis=2)
return [X_encoder, X_decoder], Y

Keras: feed output as input at next timestep

The goal is to predict a timeseries Y of 87601 timesteps (10 years) and 9 targets. The input features X (exogenous input) are 11 timeseries of 87600 timesteps. The output has one more timestep, as this is the initial value.
The output Yt at timestep t depends on the input Xt and on the previous output Yt-1.
Hence, the model should look like this: Model layout
I could only find this thread on this: LSTM: How to feed the output back to the input? #4068.
I tried to implemented this with Keras as follows:
def build_model():
# Input layers
input_x = layers.Input(shape=(features,), name='input_x')
input_y = layers.Input(shape=(targets,), name='input_y-1')
# Merge two inputs
merge = layers.concatenate([input_x,input_y], name='merge')
# Normalise input
norm = layers.Lambda(normalise, name='scale')(merge)
# Hidden layers
x = layers.Dense(128, input_shape=(features,))(norm)
# Output layer
output = layers.Dense(targets, activation='relu', name='output')(x)
model = Model(inputs=[input_x,input_y], outputs=output)
model.compile(loss='mean_squared_error', optimizer=Adam())
return model
def make_prediction(model, X, y):
y_pred = [y[0,None,:]]
for i in range(len(X)):
y_pred.append(model.predict([X[i,None,:],y_pred[i]]))
y_pred = np.asarray(y_pred)
y_pred = y_pred.reshape(y_pred.shape[0],y_pred.shape[2])
return y_pred
# Fit
model = build_model()
model.fit([X_train, y_train[:-1]], [y_train[1:]]], epochs=200,
batch_size=24, shuffle=False)
# Predict
y_hat = make_prediction(model, X_train, y_train)
This works, but is it not what I want to achieve, as there is no connection between input and output. Hence, the model doesn't learn how to correct for an error in the fed-back output, which results in poor accuracy when predicting as the error on the output is accumulated at every timestep.
Is there a way in Keras to implement the output-input feed-back during training stage?
Also, as the initial value of Y is always known, I want to feed this to the network as well.

Resources