How to correctly implement backpropagation for machine learning the MNIST dataset? - python-3.x

So, I'm using Michael Nielson's machine learning book as a reference for my code (it is basically identical): http://neuralnetworksanddeeplearning.com/chap1.html
The code in question:
def backpropagate(self, image, image_value) :
# declare two new numpy arrays for the updated weights & biases
new_biases = [np.zeros(bias.shape) for bias in self.biases]
new_weights = [np.zeros(weight_matrix.shape) for weight_matrix in self.weights]
# -------- feed forward --------
# store all the activations in a list
activations = [image]
# declare empty list that will contain all the z vectors
zs = []
for bias, weight in zip(self.biases, self.weights) :
print(bias.shape)
print(weight.shape)
print(image.shape)
z = np.dot(weight, image) + bias
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
# -------- backward pass --------
# transpose() returns the numpy array with the rows as columns and columns as rows
delta = self.cost_derivative(activations[-1], image_value) * sigmoid_prime(zs[-1])
new_biases[-1] = delta
new_weights[-1] = np.dot(delta, activations[-2].transpose())
# l = 1 means the last layer of neurons, l = 2 is the second-last, etc.
# this takes advantage of Python's ability to use negative indices in lists
for l in range(2, self.num_layers) :
z = zs[-1]
sp = sigmoid_prime(z)
delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
new_biases[-l] = delta
new_weights[-l] = np.dot(delta, activations[-l-1].transpose())
return (new_biases, new_weights)
My algorithm can only get to the first round backpropagation before this error occurs:
File "D:/Programming/Python/DPUDS/DPUDS_Projects/Fall_2017/MNIST/network.py", line 97, in stochastic_gradient_descent
self.update_mini_batch(mini_batch, learning_rate)
File "D:/Programming/Python/DPUDS/DPUDS_Projects/Fall_2017/MNIST/network.py", line 117, in update_mini_batch
delta_biases, delta_weights = self.backpropagate(image, image_value)
File "D:/Programming/Python/DPUDS/DPUDS_Projects/Fall_2017/MNIST/network.py", line 160, in backpropagate
z = np.dot(weight, activation) + bias
ValueError: shapes (30,50000) and (784,1) not aligned: 50000 (dim 1) != 784 (dim 0)
I get why it's an error. The number of columns in weights doesn't match the number of rows in the pixel image, so I can't do matrix multiplication. Here's where I'm confused -- there are 30 neurons used in the backpropagation, each with 50,000 images being evaluated. My understanding is that each of the 50,000 should have 784 weights attached, one for each pixel. But when I modify the code accordingly:
count = 0
for bias, weight in zip(self.biases, self.weights) :
print(bias.shape)
print(weight[count].shape)
print(image.shape)
z = np.dot(weight[count], image) + bias
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
count += 1
I still get a similar error:
ValueError: shapes (50000,) and (784,1) not aligned: 50000 (dim 0) != 784 (dim 0)
I'm just really confuzzled by all the linear algebra involved and I think I'm just missing something about the structure of the weight matrix. Any help at all would be greatly appreciated.

It looks like the issue is in your changes to the original code.
I’be downloaded example from the link you provided and it works without any errors:
Here is full source code I used:
import cPickle
import gzip
import numpy as np
import random
def load_data():
"""Return the MNIST data as a tuple containing the training data,
the validation data, and the test data.
The ``training_data`` is returned as a tuple with two entries.
The first entry contains the actual training images. This is a
numpy ndarray with 50,000 entries. Each entry is, in turn, a
numpy ndarray with 784 values, representing the 28 * 28 = 784
pixels in a single MNIST image.
The second entry in the ``training_data`` tuple is a numpy ndarray
containing 50,000 entries. Those entries are just the digit
values (0...9) for the corresponding images contained in the first
entry of the tuple.
The ``validation_data`` and ``test_data`` are similar, except
each contains only 10,000 images.
This is a nice data format, but for use in neural networks it's
helpful to modify the format of the ``training_data`` a little.
That's done in the wrapper function ``load_data_wrapper()``, see
below.
"""
f = gzip.open('../data/mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = cPickle.load(f)
f.close()
return (training_data, validation_data, test_data)
def load_data_wrapper():
"""Return a tuple containing ``(training_data, validation_data,
test_data)``. Based on ``load_data``, but the format is more
convenient for use in our implementation of neural networks.
In particular, ``training_data`` is a list containing 50,000
2-tuples ``(x, y)``. ``x`` is a 784-dimensional numpy.ndarray
containing the input image. ``y`` is a 10-dimensional
numpy.ndarray representing the unit vector corresponding to the
correct digit for ``x``.
``validation_data`` and ``test_data`` are lists containing 10,000
2-tuples ``(x, y)``. In each case, ``x`` is a 784-dimensional
numpy.ndarry containing the input image, and ``y`` is the
corresponding classification, i.e., the digit values (integers)
corresponding to ``x``.
Obviously, this means we're using slightly different formats for
the training data and the validation / test data. These formats
turn out to be the most convenient for use in our neural network
code."""
tr_d, va_d, te_d = load_data()
training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
training_data = zip(training_inputs, training_results)
validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
validation_data = zip(validation_inputs, va_d[1])
test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
test_data = zip(test_inputs, te_d[1])
return (training_data, validation_data, test_data)
def vectorized_result(j):
"""Return a 10-dimensional unit vector with a 1.0 in the jth
position and zeroes elsewhere. This is used to convert a digit
(0...9) into a corresponding desired output from the neural
network."""
e = np.zeros((10, 1))
e[j] = 1.0
return e
class Network(object):
def __init__(self, sizes):
"""The list ``sizes`` contains the number of neurons in the
respective layers of the network. For example, if the list
was [2, 3, 1] then it would be a three-layer network, with the
first layer containing 2 neurons, the second layer 3 neurons,
and the third layer 1 neuron. The biases and weights for the
network are initialized randomly, using a Gaussian
distribution with mean 0, and variance 1. Note that the first
layer is assumed to be an input layer, and by convention we
won't set any biases for those neurons, since biases are only
ever used in computing the outputs from later layers."""
self.num_layers = len(sizes)
self.sizes = sizes
self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
self.weights = [np.random.randn(y, x)
for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, a):
"""Return the output of the network if ``a`` is input."""
for b, w in zip(self.biases, self.weights):
a = sigmoid(np.dot(w, a)+b)
return a
def SGD(self, training_data, epochs, mini_batch_size, eta,
test_data=None):
"""Train the neural network using mini-batch stochastic
gradient descent. The ``training_data`` is a list of tuples
``(x, y)`` representing the training inputs and the desired
outputs. The other non-optional parameters are
self-explanatory. If ``test_data`` is provided then the
network will be evaluated against the test data after each
epoch, and partial progress printed out. This is useful for
tracking progress, but slows things down substantially."""
if test_data: n_test = len(test_data)
n = len(training_data)
for j in xrange(epochs):
random.shuffle(training_data)
mini_batches = [
training_data[k:k+mini_batch_size]
for k in xrange(0, n, mini_batch_size)]
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, eta)
if test_data:
print "Epoch {0}: {1} / {2}".format(
j, self.evaluate(test_data), n_test)
else:
print "Epoch {0} complete".format(j)
def update_mini_batch(self, mini_batch, eta):
"""Update the network's weights and biases by applying
gradient descent using backpropagation to a single mini batch.
The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
is the learning rate."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
def backprop(self, x, y):
"""Return a tuple ``(nabla_b, nabla_w)`` representing the
gradient for the cost function C_x. ``nabla_b`` and
``nabla_w`` are layer-by-layer lists of numpy arrays, similar
to ``self.biases`` and ``self.weights``."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# feedforward
activation = x
activations = [x] # list to store all the activations, layer by layer
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation)+b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
# backward pass
delta = self.cost_derivative(activations[-1], y) * \
sigmoid_prime(zs[-1])
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
# Note that the variable l in the loop below is used a little
# differently to the notation in Chapter 2 of the book. Here,
# l = 1 means the last layer of neurons, l = 2 is the
# second-last layer, and so on. It's a renumbering of the
# scheme in the book, used here to take advantage of the fact
# that Python can use negative indices in lists.
for l in xrange(2, self.num_layers):
z = zs[-l]
sp = sigmoid_prime(z)
delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
nabla_b[-l] = delta
nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
return (nabla_b, nabla_w)
def evaluate(self, test_data):
"""Return the number of test inputs for which the neural
network outputs the correct result. Note that the neural
network's output is assumed to be the index of whichever
neuron in the final layer has the highest activation."""
test_results = [(np.argmax(self.feedforward(x)), y)
for (x, y) in test_data]
return sum(int(x == y) for (x, y) in test_results)
def cost_derivative(self, output_activations, y):
"""Return the vector of partial derivatives \partial C_x /
\partial a for the output activations."""
return (output_activations-y)
#### Miscellaneous functions
def sigmoid(z):
"""The sigmoid function."""
return 1.0/(1.0+np.exp(-z))
def sigmoid_prime(z):
"""Derivative of the sigmoid function."""
return sigmoid(z)*(1-sigmoid(z))
training_data, validation_data, test_data = load_data_wrapper()
net = Network([784, 30, 10])
net.SGD(training_data, 30, 10, 3.0, test_data=test_data)
Additional info:
However, I would recommend using one of existing frameworks, for example - Keras to don't reinvent the wheel
Also, it was checked with python 3.6:

Kudos on digging into Nielsen's code. It's a great resource to develop thorough understanding of NN principles. Too many people leap ahead to Keras without knowing what goes on under the hood.
Each training example doesn't get its own weights. Each of the 784 features does. If each example got its own weights then each weight set would overfit to its corresponding training example. Also, if you later used your trained network to run inference on a single test example, what would it do with 50,000 sets of weights when presented with just one handwritten digit? Instead, each of the 30 neurons in your hidden layer learns a set of 784 weights, one for each pixel, that offers high predictive accuracy when generalized to any handwritten digit.
Import network.py and instantiate a Network class like this without modifying any code:
net = network.Network([784, 30, 10])
..which gives you a network with 784 input neurons, 30 hidden neurons and 10 output neurons. Your weight matrices will have dimensions [30, 784] and [10, 30], respectively. When you feed the network an input array of dimensions [784, 1] the matrix multiplication that gave you an error is valid because dim 1 of the weight matrix equals dim 0 of the input array (both 784).
Your problem is not implementation of backprop but rather setting up a network architecture appropriate for the shape of your input data. If memory serves Nielsen leaves backprop as a black box in chapter 1 and doesn't dive into it until chapter 2. Keep at it, and good luck!

Related

How to do a weighted pooling in Mxnet?

I want to do a 2d convolutional operation that uses same 1x2x4 weight on every channel.
(Note: the input height & width are bigger than our kernel, so I can't just use a dot product.)
How can I do this is mxnet?
I tried to use the same instance of a signle 2d conv layer by concatenating it on every channel, but it is incredibly slow.
def Concat(*args, axis=1, **kwargs):
net = nn.HybridConcatenate(axis=axis,**kwargs)
net.add(*args)
return net
def Seq(*args):
net = nn.HybridSequential()
net.add(*args)
return net
class Trim_D1(nn.HybridBlock):
def __init__(self, from_, to, **kwargs):
super(Trim_D1, self).__init__(**kwargs)
self.from_ = from_
self.to = to
def forward(self, x):
return x[:,self.from_:self.to]
PooPool = nn.Conv2D(kernel_size=(2,4), strides=(2, 4), channels=1, activation=None, use_bias=False, weight_initializer=mx.init.Constant(1/8))
conc = ()
for i in range(40):
conc += Seq(
Trim_D1(i,i+1),
PooPool
),
WeightedPool= Concat(*conc)
Ideally I would also want my kernel weights to sum up to 1 in order to resemble the weighted average pooling.
Edit: I think I know how to do this. I'm going to edit Conv2D and _Conv source codes so that instead of creating weights of CxHxW dimension it creates a weight of 1xHxW dimension and uses a broadcasting during the convolutional operation. In order for weights to sum up to 1, additionally a softmax operation has to be applied.
Ok, apparently the weights are of in_channels x out_channels x H x W dimensions and broadcasting is not allowed during the convolutional operation. We could fix out_channels to 1 by using the num_groups same as the output channels, as for input channels, we can simply broadcast the same weight n number of times.
In _Conv.__init__ during initialization I discarded the first two dimensions so our kernel is only H x W now:
self.weight = Parameter('weight', shape=wshapes[1][2:],
init=weight_initializer,
allow_deferred_init=True)
In _Conv.hybrid_forward I am flattening our weight to 1D in order to perform softmax and then restore to the original 2D shape. Then I expand first two dimensions and repeat the first dimension as mentioned above:
orig_shape = weight.shape
act = getattr(F, self._op_name)(x, mx.nd.softmax(weight.reshape(-1)).reshape(orig_shape)[None,None,:].repeat(self._kwargs['num_group'],axis=0), name='fwd', **self._kwargs)

using transforms.LinearTransformation to apply whitening in PyTorch

I need to apply ZCA whitening in PyTorch. I think I have found a way this can be done by using transforms.LinearTransformation and I have found a test in the PyTorch repo which gives some insight into how this is done (see final code block or link below)
https://github.com/pytorch/vision/blob/master/test/test_transforms.py
I am struggling to work out how I apply something like this myself.
Currently I have transforms along the lines of:
transform_test = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(np.array([125.3, 123.0, 113.9]) / 255.0,
np.array([63.0, 62.1, 66.7]) / 255.0),
])
The documents say they way to use LinearTransformation is as follows:
torchvision.transforms.LinearTransformation(transformation_matrix, mean_vector)
whitening transformation: Suppose X is a column vector zero-centered
data. Then compute the data covariance matrix [D x D] with
torch.mm(X.t(), X), perform SVD on this matrix and pass it as
transformation_matrix.
I can see from the tests I linked above and copied below that they are using torch.mm to calculate what they call a principal_components:
def test_linear_transformation(self):
num_samples = 1000
x = torch.randn(num_samples, 3, 10, 10)
flat_x = x.view(x.size(0), x.size(1) * x.size(2) * x.size(3))
# compute principal components
sigma = torch.mm(flat_x.t(), flat_x) / flat_x.size(0)
u, s, _ = np.linalg.svd(sigma.numpy())
zca_epsilon = 1e-10 # avoid division by 0
d = torch.Tensor(np.diag(1. / np.sqrt(s + zca_epsilon)))
u = torch.Tensor(u)
principal_components = torch.mm(torch.mm(u, d), u.t())
mean_vector = (torch.sum(flat_x, dim=0) / flat_x.size(0))
# initialize whitening matrix
whitening = transforms.LinearTransformation(principal_components, mean_vector)
# estimate covariance and mean using weak law of large number
num_features = flat_x.size(1)
cov = 0.0
mean = 0.0
for i in x:
xwhite = whitening(i)
xwhite = xwhite.view(1, -1).numpy()
cov += np.dot(xwhite, xwhite.T) / num_features
mean += np.sum(xwhite) / num_features
# if rtol for std = 1e-3 then rtol for cov = 2e-3 as std**2 = cov
assert np.allclose(cov / num_samples, np.identity(1), rtol=2e-3), "cov not close to 1"
assert np.allclose(mean / num_samples, 0, rtol=1e-3), "mean not close to 0"
# Checking if LinearTransformation can be printed as string
whitening.__repr__()
How do I apply something like this? do I use it where I define my transforms or apply it in my training loop where I am iterating over my training loop?
Thanks in advance
ZCA whitening is typically a preprocessing step, like center-reduction, which basically aims at making your data more NN-friendly (additional info below). As such, it is supposed to be applied once, right before training.
So right before you starts training your model with a given dataset X, compute the whitened dataset Z, which is simply the multiplication of X with the ZCA matrix W_zca that you can learn to compute here. Then train your model on the whitened dataset.
Finally, you should have something that looks like this
class MyModule(torch.nn.Module):
def __init__(self):
super(MyModule,self).__init__()
# Feel free to use something more useful than a simple linear layer
self._network = torch.nn.Linear(...)
# Do your stuff
...
def fit(self, inputs, labels):
""" Trains the model to predict the right label for a given input """
# Compute the whitening matrix and inputs
self._zca_mat = compute_zca(inputs)
whitened_inputs = torch.mm(self._zca_mat, inputs)
# Apply training on the whitened data
outputs = self._network(whitened_inputs)
loss = torch.nn.MSEloss()(outputs, labels)
loss.backward()
optimizer.step()
def forward(self, input):
# You always need to apply the zca transform before forwarding,
# because your network has been trained with whitened data
whitened_input = torch.mm(self._zca_mat, input)
predicted_label = self._network.forward(whitened_input)
return predicted_label
Additional info
Whitening your data means decorrelating its dimensions so that the correlation matrix of the whitened data is the identity matrix. It is a rotation-scaling operation (thus linear), and there are actually an infinity of possible ZCA transforms. To understand the maths behind ZCA, read this

signal to signal pediction using RNN and Keras

I am trying to reproduce the nice work here and adapte it so that it reads real data from a file.
I started by generating random signals (instead of the generating methods provided in the above link). Unfortoutanyl, I could not generate the proper signals that the model can accept.
here is the code:
import numpy as np
import keras
from keras.utils import plot_model
input_sequence_length = 15 # Length of the sequence used by the encoder
target_sequence_length = 15 # Length of the sequence predicted by the decoder
import random
def getModel():# Define an input sequence.
learning_rate = 0.01
num_input_features = 1
lambda_regulariser = 0.000001 # Will not be used if regulariser is None
regulariser = None # Possible regulariser: keras.regularizers.l2(lambda_regulariser)
layers = [35, 35]
num_output_features=1
decay = 0 # Learning rate decay
loss = "mse" # Other loss functions are possible, see Keras documentation.
optimiser = keras.optimizers.Adam(lr=learning_rate, decay=decay) # Other possible optimiser "sgd" (Stochastic Gradient Descent)
encoder_inputs = keras.layers.Input(shape=(None, num_input_features))
# Create a list of RNN Cells, these are then concatenated into a single layer
# with the RNN layer.
encoder_cells = []
for hidden_neurons in layers:
encoder_cells.append(keras.layers.GRUCell(hidden_neurons, kernel_regularizer=regulariser,recurrent_regularizer=regulariser,bias_regularizer=regulariser))
encoder = keras.layers.RNN(encoder_cells, return_state=True)
encoder_outputs_and_states = encoder(encoder_inputs)
# Discard encoder outputs and only keep the states.
# The outputs are of no interest to us, the encoder's
# job is to create a state describing the input sequence.
encoder_states = encoder_outputs_and_states[1:]
# The decoder input will be set to zero (see random_sine function of the utils module).
# Do not worry about the input size being 1, I will explain that in the next cell.
decoder_inputs = keras.layers.Input(shape=(None, 1))
decoder_cells = []
for hidden_neurons in layers:
decoder_cells.append(keras.layers.GRUCell(hidden_neurons,
kernel_regularizer=regulariser,
recurrent_regularizer=regulariser,
bias_regularizer=regulariser))
decoder = keras.layers.RNN(decoder_cells, return_sequences=True, return_state=True)
# Set the initial state of the decoder to be the ouput state of the encoder.
# This is the fundamental part of the encoder-decoder.
decoder_outputs_and_states = decoder(decoder_inputs, initial_state=encoder_states)
# Only select the output of the decoder (not the states)
decoder_outputs = decoder_outputs_and_states[0]
# Apply a dense layer with linear activation to set output to correct dimension
# and scale (tanh is default activation for GRU in Keras, our output sine function can be larger then 1)
decoder_dense = keras.layers.Dense(num_output_features,
activation='linear',
kernel_regularizer=regulariser,
bias_regularizer=regulariser)
decoder_outputs = decoder_dense(decoder_outputs)
# Create a model using the functional API provided by Keras.
# The functional API is great, it gives an amazing amount of freedom in architecture of your NN.
# A read worth your time: https://keras.io/getting-started/functional-api-guide/
model = keras.models.Model(inputs=[encoder_inputs, decoder_inputs], outputs=decoder_outputs)
model.compile(optimizer=optimiser, loss=loss)
print(model.summary())
return model
def getXY():
X, y = list(), list()
for _ in range(100):
x = [random.random() for _ in range(input_sequence_length)]
y = [random.random() for _ in range(target_sequence_length)]
X.append([x,[0 for _ in range(input_sequence_length)]])
y.append(y)
return np.array(X), np.array(y)
X,y = getXY()
print(X,y)
model = getModel()
model.fit(X,y)
The error message i got is:
ValueError: Error when checking model input: the list of Numpy arrays
that you are passing to your model is not the size the model expected.
Expected to see 2 array(s), but instead got the following list of 1
arrays:
what is the correct shape of the input data for the model?
If you read carefully the source of your inspiration, you will find that he talks about the "decoder_input" data.
He talks about the "teacher forcing" technique that consists of feeding the decoder with some delayed data. But also says that it didn't really work well in his case so he puts that initial state of the decoder to a bunch of 0 as this line shows:
decoder_input = np.zeros((decoder_output.shape[0], decoder_output.shape[1], 1))
in his design of the auto-encoder, they are two separate models that have different inputs, then he ties them with RNN stats from each other.
I can see that you have tried doing the same thing but you have appended np.array([x_encoder, x_decoder]) where you should have done [np.array(x_encoder), np.array(x_decoder)]. Each input to the network should be a numpy array that you put in a list of inputs, not one big numpy array.
I also found some typos in your code, you are appending y to itself, where you should instead create a Y variable
def getXY():
X_encoder, X_decoder, Y = list(), list(), list()
for _ in range(100):
x_encoder = [random.random() for _ in range(input_sequence_length)]
# the decoder input is a sequence of 0's same length as target seq
x_decoder = [0]*len(target_sequence_length)
y = [random.random() for _ in range(target_sequence_length)]
X_encoder.append(x_encoder)
# Not really optimal but will work
X_decoder.append(x_decoder)
Y.append(y)
return [np.array(X_encoder), np.array(X_decoder], np.array(Y)
now when you do :
X, Y = getXY()
you receive X which is a list of 2 numpy arrays (as your model requests) and Y which is a single numpy array.
I hope this helps
EDIT
Indeed, in the code that generates the dataset, you can see that they build 3 dimensions np arrays for the input. RNN needs 3 dimensional inputs :-)
The following code should address the shape issue:
def getXY():
X_encoder, X_decoder, Y = list(), list(), list()
for _ in range(100):
x_encoder = [random.random() for _ in range(input_sequence_length)]
# the decoder input is a sequence of 0's same length as target seq
x_decoder = [0]*len(target_sequence_length)
y = [random.random() for _ in range(target_sequence_length)]
X_encoder.append(x_encoder)
# Not really optimal but will work
X_decoder.append(x_decoder)
Y.append(y)
# Make them as numpy arrays
X_encoder = np.array(X_encoder)
X_decoder = np.array(X_decoder)
Y = np.array(Y)
# Make them 3 dimensional arrays (with third dimension being of size 1) like the 1d vector: [1,2] can become 2 de vector [[1,2]]
X_encoder = np.expand_dims(X_encoder, axis=2)
X_decoder = np.expand_dims(X_decoder, axis=2)
Y = np.expand_dims(Y, axis=2)
return [X_encoder, X_decoder], Y

Time prediction using specialised setup in Keras

I'm working on a project where I have to predict the future states of a 1D vector with y entries. I'm trying to do this using an ANN setup with LSTM units in combination with a convolution layer. The method I'm using is based on the method they used in a (pre-release paper). The suggested setup is as follows:
In the picture c is the 1D vector with y entries. The ANN gets the n previous states as an input and produces o next states as an output.
Currently, my ANN setup looks like this:
inputLayer = Input(shape = (n, y))
encoder = LSTM(200)(inputLayer)
x = RepeatVector(1)(encoder)
decoder = LSTM(200, return_sequences=True)(x)
x = Conv1D(y, 4, activation = 'linear', padding = 'same')(decoder)
model = Model(inputLayer, x)
Here n is the length of the input sequences and y is the length of the state array. As can be seen I'm repeating the d vector only 1 time, as I'm trying to predict only 1 time step in the future. Is this the way to setup the above mentioned network?
Furthermore, I have a numpy array (data) with a shape of (Sequences, Time Steps, State Variables) to train with. I was trying to divide this in randomly selected batches with a generator like this:
def BatchGenerator(batch_size, n, y, data):
# Infinite loop.
while True:
# Allocate a new array for the batch of input-signals.
x_shape = (batch_size, n, y)
x_batch = np.zeros(shape=x_shape, dtype=np.float16)
# Allocate a new array for the batch of output-signals.
y_shape = (batch_size, 1, y)
y_batch = np.zeros(shape=y_shape, dtype=np.float16)
# Fill the batch with random sequences of data.
for i in range(batch_size):
# Select a random sequence
seq_idx = np.random.randint(data.shape[0])
# Get a random start-index.
# This points somewhere into the training-data.
start_idx = np.random.randint(data.shape[1] - n)
# Copy the sequences of data starting at this
# Each batch inside x_batch has a shape of [n, y]
x_batch[i,:,:] = data[seq_idx, start_idx:start_idx+n, :]
# Each batch inside y_batch has a shape of [1, y] (as we predict only 1 time step in advance)
y_batch[i,:,:] = data[seq_idx, start_idx+n, :]
yield (x_batch, y_batch)
The problem is that it gives an error if I'm using a batch_size of more than 1. Could anyone help me to set this data up in a way that it can be used optimally to train my neural network?
The model is now trained using:
generator = BatchGenerator(batch_size, n, y, data)
model.fit_generator(generator = generator, steps_per_epoch = steps_per_epoch, epochs = epochs)
Thanks in advance!

Theano MLP with 2 hidden layers throws Shape Mismatch error

I'm approaching to neural networks implementations, trying to build a working MLP using Theano. Following the tutorial, I tried to enhance the net by adding a layer, for a total of two hidden layers each with the same amount of units (250). The problem is that when I run the script I meet "Shape mismatch" ValueError. My code is a modified version of the tutorial code that can be found here http://deeplearning.net/tutorial/mlp.html.
The part I modified is the snippet-2, namely the MLP object, as follows:
class MLP(object):
def __init__(self, rng, input, n_in, n_hidden, n_out):
"""Initialize the parameters for the multilayer perceptron
:type rng: numpy.random.RandomState
:param rng: a random number generator used to initialize weights
:type input: theano.tensor.TensorType
:param input: symbolic variable that describes the input of the
architecture (one minibatch)
:type n_in: int
:param n_in: number of input units, the dimension of the space in
which the datapoints lie
:type n_hidden: int
:param n_hidden: number of hidden units
:type n_out: int
:param n_out: number of output units, the dimension of the space in
which the labels lie
"""
self.hiddenLayer1 = HiddenLayer(
rng=rng,
input=input,
n_in=n_in,
n_out=n_hidden,
activation=T.tanh
)
#try second hidden layer
self.hiddenLayer2 = HiddenLayer(
rng=rng,
input=self.hiddenLayer1.output,
n_in=n_in,
n_out=n_hidden,
activation=T.tanh
)
# The logistic regression layer gets as input the hidden units
# of the hidden layer
self.logRegressionLayer = LogisticRegression(
input=self.hiddenLayer2.output,
n_in=n_hidden,
n_out=n_out
)
# end-snippet-2 start-snippet-3
# L1 norm ; one regularization option is to enforce L1 norm to
# be small
self.L1 = (
abs(self.hiddenLayer1.W).sum()
+ abs(self.hiddenLayer2.W).sum()
+ abs(self.logRegressionLayer.W).sum()
)
# square of L2 norm ; one regularization option is to enforce
# square of L2 norm to be small
self.L2_sqr = (
(self.hiddenLayer1.W ** 2).sum()
+ (self.hiddenLayer2.W ** 2).sum()
+ (self.logRegressionLayer.W ** 2).sum()
)
# negative log likelihood of the MLP is given by the negative
# log likelihood of the output of the model, computed in the
# logistic regression layer
self.negative_log_likelihood = (
self.logRegressionLayer.negative_log_likelihood
)
# same holds for the function computing the number of errors
self.errors = self.logRegressionLayer.errors
# the parameters of the model are the parameters of the two layer it is
# made out of
self.params = self.hiddenLayer1.params + self.hiddenLayer2.params + self.logRegressionLayer.params
# end-snippet-3
# keep track of model input
self.input = input
I also removed some comments for readability. The output error I get is:
ValueError: Shape mismatch: x has 250 cols (and 20 rows) but y has 784
rows (and 250 cols) Apply node that caused the error:
Dot22(Elemwise{Composite{tanh((i0 + i1))}}[(0, 0)].0, W) Inputs types:
[TensorType(float64, matrix), TensorType(float64, matrix)] Inputs
shapes: [(20, 250), (784, 250)] Inputs strides: [(2000, 8), (2000, 8)]
Inputs values: ['not shown', 'not shown']
The size of the input to layer 2 needs to be the same size as the output from layer 1.
hiddenLayer2 takes hiddenLayer1 as input and hiddenLayer1.n_out == n_hidden but 'hiddenLayer2.n_in == n_in'. In this case n_hidden == 250 and n_in == 784. They should match but don't hence the error.
The solution is to make hiddenLayer2.n_in == hiddenLayer1.n_out.

Resources