Error in compiling CNN LSTM neural network - python-3.x

I am trying to create a neural network with keras that will have as input N multivariate time-series and as target output N time-series. I converted the time-series to a supervised problem with the window or lag method. As input I have a 4D matrix (samples, variables, sequence, lag) and as output a 2D matrix (samples, sequence). I found similar examples that used CNN+LSTM models, but I have difficulties applying them. In case it helps, I have train_X, train_y, test_X, test_y with dimensions (112, 5, 7998, 2) (112, 7998) (29, 5, 7998, 2) (29, 7998)
I have tried applying and removing the TimeDistributed Keras wrapper to the CNN part only and the whole model. The relevant part of the code is bellow.
model = Sequential()
model.add(TimeDistributed(Conv2D(filters=32, kernel_size=(1, 80), activation='relu', padding='same', input_shape=(train_X.shape[1], train_X.shape[2], train_X.shape[3]))))
model.add(TimeDistributed(MaxPool2D(pool_size=(1, 2),strides=1)))
model.add(TimeDistributed(Dropout(0.5)))
model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(LSTM(100, return_sequences=True)))
model.add(TimeDistributed(Dropout(0.2)))
model.add(TimeDistributed(Dense(units=1)))
model.compile(loss='mean_squared_error', optimizer='adam')
I get an index error.
IndexError: list index out of range

Related

Tensorflow: Incompatible shapes: [32,12] vs. [32,4]

I've got myself into a bit of trouble. I've got 4 features, and I want to predict each one of them at the same time. My lookback is 12 and I want to predict 12 timesteps ahead. Is it possible to predict all the 4 targets in parallel?
I have to following piece of code. Shape on train_df is (40000, 4) and val_df is (8000, 4).
win_length=12
batch=32
n_features=4
train_generator = TimeseriesGenerator(train_df, train_df, length=win_length, sampling_rate=1, batch_size=batch)
val_generator = TimeseriesGenerator(val_df, val_df, length=win_length, sampling_rate=1, batch_size=batch)
model = Sequential()
model.add(LSTM(128, activation='tanh', input_shape=(win_length, n_features), return_sequences=True))
model.add(LSTM(128, activation='tanh', return_sequences=True))
model.add(LSTM(64, activation='tanh', return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
model.summary()
model.fit_generator(train_generator, validation_data=val_generator)
I get the following error from the fit_generator-function, and I can't seem to figure out how so. Any ideas?
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [32,12] vs. [32,4]
I don't quite understand how TimeseriesGenerator works, but it seems like you want to have two same sequences for data and target, hence the mse. I've checked myself and TimeseriesGenerator doesn't produce two same sequences (data with shape (32, 12, 4) and targets with shape (32, 4)). My best bet right now is to implement the generator manually. Also I think model.add(TimeDistributed(Dense(1))) should be changed to model.add(TimeDistributed(Dense(4)))

Converting .npz model from ChainerRL to Keras model, or alternative methods?

I have a DQN reinforcement learning model which was trained using ChainerRL's built-in DQN experiment on the Ms Pacman Atari game environment, let's call this file model.npz. I have some analysis software written in Keras, which uses a Keras network and loads into that network a model.
I am having trouble getting the .npz exported from ChainerRL to play nice with the Keras network.
I have figured out how to load the weights from the .npz file. I think I figured out how to make sure the Keras model matches the Chainer RL model in terms of kernel size, stride, and activation.
Here is the code which calls the function that builds the network in ChainerRL:
return links.Sequence(
links.NatureDQNHead(),
L.Linear(512, n_actions),
DiscreteActionValue)
And the code which gets called by this, and builds a Chainer DQN network, is:
class NatureDQNHead(chainer.ChainList):
"""DQN's head (Nature version)"""
def __init__(self, n_input_channels=4, n_output_channels=512,
activation=F.relu, bias=0.1):
self.n_input_channels = n_input_channels
self.activation = activation
self.n_output_channels = n_output_channels
layers = [
#L.Convolution2D(n_input_channels, out_channel=32, ksize=8, stride=4, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(n_input_channels, 32, 8, stride=4,
initial_bias=bias),
#L.Convolution2D(n_input_channels=32, out_channel=64, ksize=4, stride=2, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(32, 64, 4, stride=2, initial_bias=bias),
#L.Convolution2D(n_input_channels=64, out_channel=64, ksize=3, stride=1, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(64, 64, 3, stride=1, initial_bias=bias),
#L.Convolution2D(in_size=3136, out_size=n_output_channels, nobias=False, initialW=None, initial_bias=bias),
L.Linear(3136, n_output_channels, initial_bias=bias),
]
super(NatureDQNHead, self).__init__(*layers)
def __call__(self, state):
h = state
for layer in self:
h = self.activation(layer(h))
return h
So I wrote the following Keras code to build an equivalent network in Keras:
# Keras Model
hidden = 512
#bias initializer to match the chainerRL one
initial_bias = tf.keras.initializers.Constant(0.1)
#matches default "channels_last" data format for Keras layers
inputs = Input(shape=(84, 84, 4))
#First call to Conv2D including all defaults for easy reference
x = Conv2D(filters=32, kernel_size=(8, 8), strides=4, padding='valid', data_format=None, dilation_rate=(1, 1), activation='relu', use_bias=True, kernel_initializer='glorot_uniform', bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, name='deepq/q_func/convnet/Conv')(inputs)
x1 = Conv2D(filters=64, kernel_size=(4, 4), strides=2, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_1')(x)
x2 = Conv2D(filters=64, kernel_size=(3, 3), strides=1, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_2')(x1)
#Flatten for move to linear layers
conv_out = Flatten()(x2)
action_out = Dense(hidden, activation='relu', name='deepq/q_func/action_value/fully_connected')(conv_out)
action_scores = Dense(units = 9, name='deepq/q_func/action_value/fully_connected_1', activation='linear', use_bias=True, kernel_initializer="glorot_uniform", bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None,)(action_out) # num_actions in {4, .., 18}
#Now create model using the above-defined layers
modelArchitecture = Model(inputs, action_scores)
I have examined the structure of the initial weights for the Keras model and found them to be as follows:
Layer 0: no weights
Layer 1: (8,8,4,32)
Layer 2: (4,4,32,64)
Layer 3: (4,4,64,64)
Layer 4: no weights
Layer 5: (3136,512)
Layer 6: (9,512)
Then, I examined the weights in the .npz model which I am trying to import and found them to be as follows:
Layer 0: (32,4,8,8)
Layer 1: (64,32,4,4)
Layer 2: (64,64,4,4)
Layer 3: (512,3136)
Layer 4: (9,512)
So, I reshaped the weights from Layer 0 of model.npz with numpy.reshape and applied them to Layer 1 of the Keras network. I did the same with the model.npz weights for Layer 1, and applied them to Layer 2 of the Keras network. Then, I reshaped the weights from Layer 2 of model.npz, and applied them to Layer 3 of the Keras network. I transposed the weights of Layer 3 from model.npz, and applied them to Layer 5 of the Keras model. Finally, I transposed the weights of Layer 4 of model.npz and applied them to Layer 6 of the Keras model.
I saved the model in .H5 format, and then tried to run it on the evaluation code in the Ms Pacman Atari environment, and produces a video. When I do this, Pacman follows the exact same, short path, runs face-first into a wall, and then keeps trying to walk through the wall until a ghost kills it.
It seems, therfore, like I am doing something wrong in my translation between the Chainer DQN network and the Keras DQN network. I am not sure if maybe they process color in a different order or something?
I also attempted to export the ChainerRL model.npz file to ONNX, but got several errors to the point where it didn't seem possible without rewriting a lot of the ChainerRL code base.
Any help would be appreciated.
I am the author of ChainerRL. I have no experience with Keras, but apparently the formats of the weight parameters seem different between Chainer and Keras. You should check the meaning of each dimension of the weight parameters for each deep learning framework. In Chainer, as you can find in the document (https://docs.chainer.org/en/stable/reference/generated/chainer.functions.convolution_2d.html#chainer.functions.convolution_2d), the weight parameter of Convolution2D is stored as (c_O, c_I, h_K, w_K).
Once you find the meaning of each dimension, I guess what you need is always numpy.transpose, not numpy.reshape, to re-order dimensions to match the order of Keras.

Image Regression: Width of Squares

I have a dataset with lots of pictures and each of these pictures shows me a rectangle with a certain width. My task now is to automatically detect the width of these rectangles by image recognition, and I have trained a CNN for an image regression like in the code below.
However, this CNN gives me very bad values, i.e. mses in the range of 4,000,000 and also a very imprecise estimation of the actual widths. During my experiments I even used the training data set as test data set for the time being, but even here the CNN doesn't seem to learn anything useful.
Do you have an idea what I could be doing wrong? Is it possible that I somehow distort the images themselves while reading them in?
I'm rather new to Machine Learning, so I'm happy about every input you give me! :-)
This is the model:
def create_model():
model = Sequential()
model.add(Convolution2D(32, (3, 3), input_shape=(64, 64, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(64, activation="relu"))
model.add(Dense(1))
model.compile(loss="mse", optimizer="adam")
return model
And this is the training code:
classifier = create_model()
// Getting image id and its corresponding square width
data = pd.read_csv('../data/data.csv')
id_width = data[['id', 'width']]
// Training the model
train_datagen = ImageDataGenerator()
training_set = train_datagen.flow_from_dataframe(dataframe=id_width, directory='../data/images',
x_col="id", y_col="width", has_ext=True,
class_mode="raw", target_size=(64, 64),
batch_size=32)
classifier.fit_generator(
training_set,
epochs=50,
validation_data=training_set)

Repeated error for every activation_9n (eg: activation_9, ..activation_45.. etc)

"ValueError: Error when checking target: expected activation_81 to have shape (1,) but got array with shape (7,)"
I am performing a multiclass classification of 7 classes for speech emotion classification using a neural network, but it fails at this point
cnnhistory=model.fit(x_traincnn,
y_train,
batch_size=16,
epochs=700,
validation_data=(x_testcnn, y_test),
callbacks=[mcp_save, lr_reduce])
at the line callbacks=[mcp_save, lr_reduce]
mcp_save being
mcp_save = ModelCheckpoint('model/aug_noiseNshift_2class2_np.h5',
save_best_only=True, monitor='val_loss', mode='min')
and lr_reduce being
lr_reduce = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=20, min_lr=0.000001)
Final layer of NN
Dense(7) for 7 classes
model.add(Dense(7))
model.add(Activation('softmax'))
opt = keras.optimizers.SGD(lr=0.0001, momentum=0.0, decay=0.0, nesterov=False)
compiled model using
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy', fscore])
I have already transformed the dataset, with normalised values, changed the loss function to 'sparse_categorical_crossentropy' from 'categorical_crossentropy'. Nothing has worked just pushed the error from activation_9 to activation_18 to activation_45 to activation_54 to now activation_81. But the error is still there.
Any help would be highly appreciated!
I am new to neural networks.
TIA
If you have labels as numbers, that means y_train has shape (samples, 1) and you should use 'sparse_categorical_crossentropy'.
If you have labels as one-hot encodings, that means y_train has shape (samples, 7) and you should use 'categorical_crossentropy'.

Dimensions not matching in keras LSTM model

I want to use an LSTM neural Network with keras to forecast groups of time series and I am having troubles in making the model match what I want. The dimensions of my data are:
input tensor: (data length, number of series to train, time steps to look back)
output tensor: (data length, number of series to forecast, time steps to look ahead)
Note: I want to keep the dimensions exactly like that, no
transposition.
A dummy data code that reproduces the problem is:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed, LSTM
epoch_number = 100
batch_size = 20
input_dim = 4
output_dim = 3
look_back = 24
look_ahead = 24
n = 100
trainX = np.random.rand(n, input_dim, look_back)
trainY = np.random.rand(n, output_dim, look_ahead)
print('test X:', trainX.shape)
print('test Y:', trainY.shape)
model = Sequential()
# Add the first LSTM layer (The intermediate layers need to pass the sequences to the next layer)
model.add(LSTM(10, batch_input_shape=(None, input_dim, look_back), return_sequences=True))
# add the first LSTM layer (the dimensions are only needed in the first layer)
model.add(LSTM(10, return_sequences=True))
# the TimeDistributed object allows a 3D output
model.add(TimeDistributed(Dense(look_ahead)))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(trainX, trainY, nb_epoch=epoch_number, batch_size=batch_size, verbose=1)
This trows:
Exception: Error when checking model target: expected
timedistributed_1 to have shape (None, 4, 24) but got array with shape
(100, 3, 24)
The problem seems to be when defining the TimeDistributed layer.
How do I define the TimeDistributed layer so that it compiles and trains?
The error message is a bit misleading in your case. Your output node of the network is called timedistributed_1 because that's the last node in your sequential model. What the error message is trying to tell you is that the output of this node does not match the target your model is fitting to, i.e. your labels trainY.
Your trainY has a shape of (n, output_dim, look_ahead), so (100, 3, 24) but the network is producing an output shape of (batch_size, input_dim, look_ahead). The problem in this case is that output_dim != input_dim. If your time dimension changes you may need padding or a network node that removes said timestep.
I think the problem is that you expect output_dim (!= input_dim) at the output of TimeDistributed, while it's not possible. This dimension is what it considers as the time dimension: it is preserved.
The input should be at least 3D, and the dimension of index one will
be considered to be the temporal dimension.
The purpose of TimeDistributed is to apply the same layer to each time step. You can only end up with the same number of time steps as you started with.
If you really need to bring down this dimension from 4 to 3, I think you will need to either add another layer at the end, or use something different from TimeDistributed.
PS: one hint towards finding this issue was that output_dim is never used when creating the model, it only appears in the validation data. While it's only a code smell (there might not be anything wrong with this observation), it's something worth checking.

Resources