TensorFlow | Store gradient in-between runs - python-3.x

For illustration purposes, suppose I have a simple LSTM network and an input sequence X = (X1, ..., XT)
input Xt = (x1,...,xn) --> [LSTM] --> [output_layer] --> output(y1,...,yk)
Is there a way I can feed the network individual timestep inputs and then invoke the training_op at the end? A pseudocode of what I want to achieve:
# Define computational graph
x = tf.placeholder(tf.float32, [batch_size, num_features])
y = tf.placeholder(tf.float32, [batch_size, output_size])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
state = tf.placeholder(tf.float32, [batch_size, lstm.state_size])
lstm_output, state = lstm(x, state)
output = tf.nn.dense(lstm_output, units=units)
loss = tf.losses.mean_squared_error(y, output)
train_op = tf.train.AdamOptimizer(lr).minimize(loss)
# Train loop
with tf.Session() as sess:
for batch in batches:
state = np.zeros(...)
for timestep in batch:
feed_dict = construct_feed_dict(timestep, state)
out, _ = sess.run([output, loss], feed_dict)
# Defer the weight update until the end of sequence
sess.run(train_op, feed_dict=???)
My understanding is, that the returned values are basic numpy arrays and therefore if I later again fed them to the network as part of the input the information about the computation of that value is lost.
I am well aware I can feed the input in the shape [total_timesteps, batch_size, num_features]. However I've found myself in situations where I couldn't adopt this approach:
1) The next timestep input is created from the network output f(y_t-1).
2) Hidden state of LSTM cell is being fed as input to another layer at each timestep.

I successfully achieved this by implementing my own raw_rnn cell.

Related

Training many-to-many stateful LSTM with and without final dense layer

I am trying to train a recurrent model in Keras containing an LSTM for regression purposes.
I would like to use the model online and, as far as I understood, I need to train a stateful LSTM.
Since the model has to output a sequence of values, I hope it computes the loss on each of the expected output vector.
However, I fear my code is not working this way and I would be grateful if anyone would help me to understand if I am doing right or if there is some better approach.
The input to the model is a sequence of 128-dimensional vectors. Each sequence in the training set has a different lenght.
At each time, the model should output a vector of 3 elements.
I am trying to train and compare two models:
A) a simple LSTM with 128 inputs and 3 outputs;
B) a simple LSTM with 128 inputs and 100 outputs + a dense layer with 3 outputs;
For model A) I wrote the following code:
# Model
model = Sequential()
model.add(LSTM(3, batch_input_shape=(1, None, 128), return_sequences=True, activation = "linear", stateful = True))`
model.compile(loss='mean_squared_error', optimizer=Adam())
# Training
for i in range(n_epoch):
for j in np.random.permutation(n_sequences):
X = data[j] # j-th sequences
X = X[np.newaxis, ...] # X has size 1 x NTimes x 128
Y = dataY[j] # Y has size NTimes x 3
history = model.fit(X, Y, epochs=1, batch_size=1, verbose=0, shuffle=False)
model.reset_states()
With this code, model A) seems to train fine because the output sequence approaches the ground-truth sequence on the training set.
However, I wonder if the loss is really computed by considering all NTimes output vectors.
For model B), I could not find any way to get the entire output sequence due to the dense layer. Hence, I wrote:
# Model
model = Sequential()
model.add(LSTM(100, batch_input_shape=(1, None, 128), , stateful = True))
model.add(Dense(3, activation="linear"))
model.compile(loss='mean_squared_error', optimizer=Adam())
# Training
for i in range(n_epoch):
for j in np.random.permutation(n_sequences):
X = data[j] #j-th sequence
X = X[np.newaxis, ...] # X has size 1 x NTimes x 128
Y = dataY[j] # Y has size NTimes x 3
for h in range(X.shape[1]):
x = X[0,h,:]
x = x[np.newaxis, np.newaxis, ...] # h-th vector in j-th sequence
y = Y[h,:]
y = y[np.newaxis, ...]
loss += model.train_on_batch(x,y)
model.reset_states() #After the end of the sequence
With this code, model B) does not train fine. It seems to me the training does not converge and loss values increase and decrease cyclically
I have also tried to use as Y only the last vector and them calling the fit function on the Whole training sequence X, but no improvements.
Any idea? Thank you!
If you want to still have three outputs per step of your sequence, you need to TimeDistribute your Dense layer like so:
model.add(TimeDistributed(Dense(3, activation="linear")))
This applies the dense layer to each timestep independently.
See https://keras.io/layers/wrappers/#timedistributed

How to obtain gradients with respect to parameters when using Lambda layer as output

I am trying to implement a neural network with one hidden layer that can represent the solution to a PDE (let's say the Laplace equation). The objective function therefore depends on the gradient of the neural network w.r.t its input.
Now, I have implemented the calculation of the second derivatives using Lambda layers. However when I try to compute the gradient of the output with respect to the parameters of the model, I get an error.
def grad(y, x, nameit):
return Lambda(lambda z: K.gradients(z[0], z[1]), output_shape = [1], name = nameit)([y,x])
def network(i):
m = Dense(100, activation='sigmoid')(i)
j = Dense(1, name="networkout")(m)
return j
x1 = Input(shape=(1,))
a = network(x1)
b = grad(a, x1, "dudx1")
c = grad(b, x1, "dudx11")
model = Model(inputs = [x1], outputs=[c])
model.compile(optimizer='rmsprop',
loss='mean_squared_error',
metrics=['accuracy'])
x1_data = np.random.random((20, 1))
labels = np.zeros((20,1))
model.fit(x1_data,labels)
This is the error:
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Why can't Keras compute the gradients w.r.t the trainable parameters?
Problem is in networkout layer. It maintains a linear activation which prevents gradients from pass through it, and therefore, return 'None' gradients error. In this case you need to add any activation function except linear to networkout layer.
def network(i):
m = layers.Dense(100)(i)
j = layers.Dense(1, name="networkout", activation='relu')(m)
return j
However, previous layer can have a linear activation.

signal to signal pediction using RNN and Keras

I am trying to reproduce the nice work here and adapte it so that it reads real data from a file.
I started by generating random signals (instead of the generating methods provided in the above link). Unfortoutanyl, I could not generate the proper signals that the model can accept.
here is the code:
import numpy as np
import keras
from keras.utils import plot_model
input_sequence_length = 15 # Length of the sequence used by the encoder
target_sequence_length = 15 # Length of the sequence predicted by the decoder
import random
def getModel():# Define an input sequence.
learning_rate = 0.01
num_input_features = 1
lambda_regulariser = 0.000001 # Will not be used if regulariser is None
regulariser = None # Possible regulariser: keras.regularizers.l2(lambda_regulariser)
layers = [35, 35]
num_output_features=1
decay = 0 # Learning rate decay
loss = "mse" # Other loss functions are possible, see Keras documentation.
optimiser = keras.optimizers.Adam(lr=learning_rate, decay=decay) # Other possible optimiser "sgd" (Stochastic Gradient Descent)
encoder_inputs = keras.layers.Input(shape=(None, num_input_features))
# Create a list of RNN Cells, these are then concatenated into a single layer
# with the RNN layer.
encoder_cells = []
for hidden_neurons in layers:
encoder_cells.append(keras.layers.GRUCell(hidden_neurons, kernel_regularizer=regulariser,recurrent_regularizer=regulariser,bias_regularizer=regulariser))
encoder = keras.layers.RNN(encoder_cells, return_state=True)
encoder_outputs_and_states = encoder(encoder_inputs)
# Discard encoder outputs and only keep the states.
# The outputs are of no interest to us, the encoder's
# job is to create a state describing the input sequence.
encoder_states = encoder_outputs_and_states[1:]
# The decoder input will be set to zero (see random_sine function of the utils module).
# Do not worry about the input size being 1, I will explain that in the next cell.
decoder_inputs = keras.layers.Input(shape=(None, 1))
decoder_cells = []
for hidden_neurons in layers:
decoder_cells.append(keras.layers.GRUCell(hidden_neurons,
kernel_regularizer=regulariser,
recurrent_regularizer=regulariser,
bias_regularizer=regulariser))
decoder = keras.layers.RNN(decoder_cells, return_sequences=True, return_state=True)
# Set the initial state of the decoder to be the ouput state of the encoder.
# This is the fundamental part of the encoder-decoder.
decoder_outputs_and_states = decoder(decoder_inputs, initial_state=encoder_states)
# Only select the output of the decoder (not the states)
decoder_outputs = decoder_outputs_and_states[0]
# Apply a dense layer with linear activation to set output to correct dimension
# and scale (tanh is default activation for GRU in Keras, our output sine function can be larger then 1)
decoder_dense = keras.layers.Dense(num_output_features,
activation='linear',
kernel_regularizer=regulariser,
bias_regularizer=regulariser)
decoder_outputs = decoder_dense(decoder_outputs)
# Create a model using the functional API provided by Keras.
# The functional API is great, it gives an amazing amount of freedom in architecture of your NN.
# A read worth your time: https://keras.io/getting-started/functional-api-guide/
model = keras.models.Model(inputs=[encoder_inputs, decoder_inputs], outputs=decoder_outputs)
model.compile(optimizer=optimiser, loss=loss)
print(model.summary())
return model
def getXY():
X, y = list(), list()
for _ in range(100):
x = [random.random() for _ in range(input_sequence_length)]
y = [random.random() for _ in range(target_sequence_length)]
X.append([x,[0 for _ in range(input_sequence_length)]])
y.append(y)
return np.array(X), np.array(y)
X,y = getXY()
print(X,y)
model = getModel()
model.fit(X,y)
The error message i got is:
ValueError: Error when checking model input: the list of Numpy arrays
that you are passing to your model is not the size the model expected.
Expected to see 2 array(s), but instead got the following list of 1
arrays:
what is the correct shape of the input data for the model?
If you read carefully the source of your inspiration, you will find that he talks about the "decoder_input" data.
He talks about the "teacher forcing" technique that consists of feeding the decoder with some delayed data. But also says that it didn't really work well in his case so he puts that initial state of the decoder to a bunch of 0 as this line shows:
decoder_input = np.zeros((decoder_output.shape[0], decoder_output.shape[1], 1))
in his design of the auto-encoder, they are two separate models that have different inputs, then he ties them with RNN stats from each other.
I can see that you have tried doing the same thing but you have appended np.array([x_encoder, x_decoder]) where you should have done [np.array(x_encoder), np.array(x_decoder)]. Each input to the network should be a numpy array that you put in a list of inputs, not one big numpy array.
I also found some typos in your code, you are appending y to itself, where you should instead create a Y variable
def getXY():
X_encoder, X_decoder, Y = list(), list(), list()
for _ in range(100):
x_encoder = [random.random() for _ in range(input_sequence_length)]
# the decoder input is a sequence of 0's same length as target seq
x_decoder = [0]*len(target_sequence_length)
y = [random.random() for _ in range(target_sequence_length)]
X_encoder.append(x_encoder)
# Not really optimal but will work
X_decoder.append(x_decoder)
Y.append(y)
return [np.array(X_encoder), np.array(X_decoder], np.array(Y)
now when you do :
X, Y = getXY()
you receive X which is a list of 2 numpy arrays (as your model requests) and Y which is a single numpy array.
I hope this helps
EDIT
Indeed, in the code that generates the dataset, you can see that they build 3 dimensions np arrays for the input. RNN needs 3 dimensional inputs :-)
The following code should address the shape issue:
def getXY():
X_encoder, X_decoder, Y = list(), list(), list()
for _ in range(100):
x_encoder = [random.random() for _ in range(input_sequence_length)]
# the decoder input is a sequence of 0's same length as target seq
x_decoder = [0]*len(target_sequence_length)
y = [random.random() for _ in range(target_sequence_length)]
X_encoder.append(x_encoder)
# Not really optimal but will work
X_decoder.append(x_decoder)
Y.append(y)
# Make them as numpy arrays
X_encoder = np.array(X_encoder)
X_decoder = np.array(X_decoder)
Y = np.array(Y)
# Make them 3 dimensional arrays (with third dimension being of size 1) like the 1d vector: [1,2] can become 2 de vector [[1,2]]
X_encoder = np.expand_dims(X_encoder, axis=2)
X_decoder = np.expand_dims(X_decoder, axis=2)
Y = np.expand_dims(Y, axis=2)
return [X_encoder, X_decoder], Y

Time prediction using specialised setup in Keras

I'm working on a project where I have to predict the future states of a 1D vector with y entries. I'm trying to do this using an ANN setup with LSTM units in combination with a convolution layer. The method I'm using is based on the method they used in a (pre-release paper). The suggested setup is as follows:
In the picture c is the 1D vector with y entries. The ANN gets the n previous states as an input and produces o next states as an output.
Currently, my ANN setup looks like this:
inputLayer = Input(shape = (n, y))
encoder = LSTM(200)(inputLayer)
x = RepeatVector(1)(encoder)
decoder = LSTM(200, return_sequences=True)(x)
x = Conv1D(y, 4, activation = 'linear', padding = 'same')(decoder)
model = Model(inputLayer, x)
Here n is the length of the input sequences and y is the length of the state array. As can be seen I'm repeating the d vector only 1 time, as I'm trying to predict only 1 time step in the future. Is this the way to setup the above mentioned network?
Furthermore, I have a numpy array (data) with a shape of (Sequences, Time Steps, State Variables) to train with. I was trying to divide this in randomly selected batches with a generator like this:
def BatchGenerator(batch_size, n, y, data):
# Infinite loop.
while True:
# Allocate a new array for the batch of input-signals.
x_shape = (batch_size, n, y)
x_batch = np.zeros(shape=x_shape, dtype=np.float16)
# Allocate a new array for the batch of output-signals.
y_shape = (batch_size, 1, y)
y_batch = np.zeros(shape=y_shape, dtype=np.float16)
# Fill the batch with random sequences of data.
for i in range(batch_size):
# Select a random sequence
seq_idx = np.random.randint(data.shape[0])
# Get a random start-index.
# This points somewhere into the training-data.
start_idx = np.random.randint(data.shape[1] - n)
# Copy the sequences of data starting at this
# Each batch inside x_batch has a shape of [n, y]
x_batch[i,:,:] = data[seq_idx, start_idx:start_idx+n, :]
# Each batch inside y_batch has a shape of [1, y] (as we predict only 1 time step in advance)
y_batch[i,:,:] = data[seq_idx, start_idx+n, :]
yield (x_batch, y_batch)
The problem is that it gives an error if I'm using a batch_size of more than 1. Could anyone help me to set this data up in a way that it can be used optimally to train my neural network?
The model is now trained using:
generator = BatchGenerator(batch_size, n, y, data)
model.fit_generator(generator = generator, steps_per_epoch = steps_per_epoch, epochs = epochs)
Thanks in advance!

Recurrent neural network architecture

I'm working on a RNN architecture which does speech enhancement. The dimensions of the input is [XX, X, 1024] where XX is the batch size and X is the variable sequence length.
The input to the network is positive valued data and the output is masked binary data(IBM) which is later used to construct enhanced signal.
For instance, if the input to network is [10, 65, 1024] the output will be [10,65,1024] tensor with binary values. I'm using Tensorflow with mean squared error as loss function. But I'm not sure which activation function to use here(which keeps the outputs either zero or one), Following is the code I've come up with so far
tf.reset_default_graph()
num_units = 10 #
num_layers = 3 #
dropout = tf.placeholder(tf.float32)
cells = []
for _ in range(num_layers):
cell = tf.contrib.rnn.LSTMCell(num_units)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob = dropout)
cells.append(cell)
cell = tf.contrib.rnn.MultiRNNCell(cells)
X = tf.placeholder(tf.float32, [None, None, 1024])
Y = tf.placeholder(tf.float32, [None, None, 1024])
output, state = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
out_size = Y.get_shape()[2].value
logit = tf.contrib.layers.fully_connected(output, out_size)
prediction = (logit)
flat_Y = tf.reshape(Y, [-1] + Y.shape.as_list()[2:])
flat_logit = tf.reshape(logit, [-1] + logit.shape.as_list()[2:])
loss_op = tf.losses.mean_squared_error(labels=flat_Y, predictions=flat_logit)
#adam optimizier as the optimization function
optimizer = tf.train.AdamOptimizer(learning_rate=0.001) #
train_op = optimizer.minimize(loss_op)
#extract the correct predictions and compute the accuracy
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
Also my reconstruction isn't good. Can someone suggest on improving the model?
If you want your outputs to be either 0 or 1, to me it seems a good idea to turn this into a classification problem. To this end, I would use a sigmoidal activation and cross entropy:
...
prediction = tf.nn.sigmoid(logit)
loss_op = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=Y, logits=logit))
...
In addition, from my point of view the hidden dimensionality (10) of your stacked RNNs seems quite small for such a big input dimensionality (1024). However this is just a guess, and it is something that needs to be tuned.

Resources