When I train my model it has a two-dimension output - it is (none, 1) - corresponding to the time series I'm trying to predict. But whenever I load the saved model in order to make predictions, it has a three-dimensional output - (none, 40, 1) - where 40 corresponds to the n_steps required to fit the conv1D network. What is wrong?
Here is the code:
df = np.load('Principal.npy')
# Conv1D
#model = load_model('ModeloConv1D.h5')
model = autoencoder_conv1D((2, 20, 17), n_steps=40)
model.load_weights('weights_35067.hdf5')
# summarize model.
model.summary()
# load dataset
df = df
# split into input (X) and output (Y) variables
X = f.separar_interface(df, n_steps=40)
# THE X INPUT SHAPE (59891, 17) length and attributes, respectively ##
# conv1D input format
X = X.reshape(X.shape[0], 2, 20, X.shape[2])
# Make predictions
test_predictions = model.predict(X)
## test_predictions.shape = (59891, 40, 1)
test_predictions = model.predict(X).flatten()
##test_predictions.shape = (2395640, 1)
plt.figure(3)
plt.plot(test_predictions)
plt.legend('Prediction')
plt.show()
In the plot below you can see that it is plotting the input format.
Here is the network architecture:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_70 (TimeDis (None, 1, 31, 24) 4104
_________________________________________________________________
time_distributed_71 (TimeDis (None, 1, 4, 24) 0
_________________________________________________________________
time_distributed_72 (TimeDis (None, 1, 4, 48) 9264
_________________________________________________________________
time_distributed_73 (TimeDis (None, 1, 1, 48) 0
_________________________________________________________________
time_distributed_74 (TimeDis (None, 1, 1, 64) 12352
_________________________________________________________________
time_distributed_75 (TimeDis (None, 1, 1, 64) 0
_________________________________________________________________
time_distributed_76 (TimeDis (None, 1, 64) 0
_________________________________________________________________
lstm_17 (LSTM) (None, 100) 66000
_________________________________________________________________
repeat_vector_9 (RepeatVecto (None, 40, 100) 0
_________________________________________________________________
lstm_18 (LSTM) (None, 40, 100) 80400
_________________________________________________________________
time_distributed_77 (TimeDis (None, 40, 1024) 103424
_________________________________________________________________
dropout_9 (Dropout) (None, 40, 1024) 0
_________________________________________________________________
dense_18 (Dense) (None, 40, 1) 1025
=================================================================
As I've found my mistake, and as I think it may be useful for someone else, I'll reply to my own question:
In fact, the network output has the same format as the training dataset labels. It means, the saved model is generating an output with shape (None, 40, 1) since it is exactly the same shape you (me) have given to the training output labels.
You (i.e. me) appreciate a difference between the network output while training and the network while predicting because you are most probably using a method such as train_test_split while training, which randomize the network output. Therefore, What you see at end of training is the production of this randomized batch.
In order to correct your problem (my problem), you should change the shape of the dataset labels from (None, 40, 1) to (None, 1), as you have a regression problem for a time series. For fixing that in your above network, you'd better set a flatten layer before the dense output layer. Therefore, I'll get the result your are looking for.
Related
I trained and load a cnn+dense model:
# load model
cnn_model = load_model('my_cnn_model.h5')
cnn_model.summary()
The output is this (I have images dimension 2 X 3600):
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 2, 3600, 32) 128
_________________________________________________________________
conv2d_2 (Conv2D) (None, 2, 1800, 32) 3104
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 2, 600, 32) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 2, 600, 64) 6208
_________________________________________________________________
conv2d_4 (Conv2D) (None, 2, 300, 64) 12352
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 100, 64) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 2, 100, 128) 24704
_________________________________________________________________
conv2d_6 (Conv2D) (None, 2, 50, 128) 49280
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 2, 16, 128) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 4096) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 4195328
_________________________________________________________________
dense_2 (Dense) (None, 1024) 1049600
_________________________________________________________________
dense_3 (Dense) (None, 3) 3075
=================================================================
Total params: 5,343,779
Trainable params: 5,343,779
Non-trainable params: 0
Now, what I want is to leave weights up to flatten and replace dense layers with LSTM to train the added LSTM part.
I just wrote:
# freeze model
base_model = cnn_model(input_shape=(2, 3600, 1))
#base_model = cnn_model
base_model.trainable = False
# Adding the first lstm layer
x = LSTM(1024,activation='relu',return_sequences='True')(base_model.output)
# Adding the second lstm layer
x = LSTM(1024, activation='relu',return_sequences='False')(x)
# Adding the output
output = Dense(3,activation='linear')(x)
# Final model creation
model = Model(inputs=[base_model.input], outputs=[output])
But I obtained:
base_model = cnn_model(input_shape=(2, 3600, 1))
TypeError: __call__() missing 1 required positional argument: 'inputs'
I know I have to add TimeDistributed ideally in the Flatten layer, but I do not know how to do.
Moreover I'm not sure about base_model.trainable = False if it do exactly what I want.
Can you please help me to do the job?
Thank you very much!
You can't directly take the output from Flatten(), LSTM needs 2-d features (time, filters). You have to reshape your tensors.
You can take the output from the layer before flatten (max-pooling), let's say this layer has index i in the model, we can take the output from that layer and reshape it based on our needs and pass it to LSTM.
before_flatten = base_model.layers[i].output # i is the index of the layer from which you want to take the model output
conv2lstm_reshape = Reshape((-1, 2))(before_flatten) # you have to select it, the temporal dim and filters
# Adding the first lstm layer
x = LSTM(1024,activation='relu',return_sequences='True')(conv2lstm_reshape)
# Adding the second lstm layer
x = LSTM(1024, activation='relu',return_sequences='False')(x)
# Adding the output
output = Dense(3,activation='linear')(before_flatten)
# Final model creation
model = Model(inputs=[base_model.input], outputs=[output])
model.summary()
I want to predict a time series X1 of n points based on two different time series X2 and X3, also of n points each. Those time series interact, so I was hoping to use similar methods as with combining images to produce another image.
So far I have successfully implemented an autoencoder to learn and return all time series (X1, X2, X3). When I tried to set up a neural network to use X2 and X3 only to predict X1 (of 3000 units) the model doesn't compile and I am getting an error:
Error when checking target: expected sequential_9 to have 2 dimensions, but got array with shape (61, 3000, 1, 1)
In different combinations it breaks at flatten_x or dense_x.
It works if my output is of only one unit and not 3000.
The network I tried would have the following layers:
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) (None, 3000, 2, 1) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 3000, 2, 32) 96
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 1500, 2, 32) 0
_________________________________________________________________
flatten_7 (Flatten) (None, 96000) 0
_________________________________________________________________
dense_6 (Dense) (None, 3000, 1, 32)
Here is the code that I'm using to create the net:
network = Sequential((
Conv2D(filters=32, kernel_size=(1,2), activation='relu', input_shape=(x, y, inChannel)),
MaxPooling2D(pool_size = (2, 1)),
Flatten(),
Dense(3000, activation='relu'),
))
network.compile(loss='mean_squared_error', optimizer = RMSprop())
The input has shape (61, 3000, 2, 1).
Should I specify the expected inputs/outputs somewhere and I'm not doing that? Make some data transformations on the way? Maybe use a different architecture?
Thanks for all suggestions!
The network you've coded does not have the architecture you intended
If you print out
network.summary()
you get
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_6 (Conv2D) (None, 3000, 1, 32) 96
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 1500, 1, 32) 0
_________________________________________________________________
flatten_6 (Flatten) (None, 48000) 0
_________________________________________________________________
dense_6 (Dense) (None, 3000) 144003000
=================================================================
Total params: 144,003,096
Trainable params: 144,003,096
Non-trainable params: 0
So, you have to change your architecture to get the desired output shape.
I am trying to create a CNN with tensorflow, my images are 64x64x1 images and I have an array of 3662 images which I am using for training. I have total 5 labels which I have one-hot encoded. I am getting this error everytime:
InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [3662,5] and labels shape [18310]
[[{{node loss_2/dense_5_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
my neural network structure is this:
def cnn_model():
model = models.Sequential()
# model.add(layers.Dense(128, activation='relu', ))
model.add(layers.Conv2D(128, (3, 3), activation='relu',input_shape=(64, 64, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu',padding = 'same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
print(model.summary())
return model
My model summary is this:
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_9 (Conv2D) (None, 62, 62, 128) 1280
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 31, 31, 128) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 31, 31, 64) 73792
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 15, 15, 64) 0
_________________________________________________________________
conv2d_11 (Conv2D) (None, 15, 15, 64) 36928
_________________________________________________________________
dense_4 (Dense) (None, 15, 15, 64) 4160
_________________________________________________________________
flatten_2 (Flatten) (None, 14400) 0
_________________________________________________________________
dense_5 (Dense) (None, 5) 72005
=================================================================
Total params: 188,165
Trainable params: 188,165
Non-trainable params: 0
my output array is of the shape (3662,5,1). I have seen other answers to same questions but I can't figure out the problem with mine. Where am I wrong?
Edit: My labels are stored in one hot encoded form using these:
df = pd.get_dummies(df)
diag = np.array(df)
diag = np.reshape(diag,(3662,5,1))
I have tried as numpy array and after converting to tensor(same for input as per documentation)
The problem lines within the choice of the loss function tf.keras.losses.SparseCategoricalCrossentropy(). According to what you are trying to achieve you should use tf.keras.losses.CategoricalCrossentropy(). Namely, the documentation of tf.keras.losses.SparseCategoricalCrossentropy() states:
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers.
On the other hand, the documentation of tf.keras.losses.CategoricalCrossentropy() states:
We expect labels to be provided in a one_hot representation.
And because your labels are encoded as one-hot, you should use tf.keras.losses.CategoricalCrossentropy().
I have a problem of applying masking layer to CNNs in RNN/LSTM model.
My data is not original image, but I converted into a shape of (16, 34, 4)(channels_first). The data is sequential, and the longest step length is 22. So for invariant way, I set the timestep as 22. Since it may be shorter than 22 steps, I fill others with np.zeros. However, for 0 padding data, it's about half among all dataset, so with 0 paddings, the training cannot reach a very good result with so much useless data. Then I want to add a mask to cancel these 0 padding data.
Here is my code.
mask = np.zeros((16,34,4), dtype = np.int8)
input_shape = (22, 16, 34, 4)
model = Sequential()
model.add(TimeDistributed(Masking(mask_value=mask), input_shape=input_shape, name = 'mask'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name = 'conv1'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn1'))
model.add(Dropout(0.5, name = 'drop1'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv2'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn2'))
model.add(Dropout(0.5, name = 'drop2'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv3'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn3'))
model.add(Dropout(0.5, name = 'drop3'))
model.add(TimeDistributed(Flatten(), name = 'flatten'))
model.add(GRU(256, activation='tanh', return_sequences=True, name = 'gru'))
model.add(Dropout(0.4, name = 'drop_gru'))
model.add(Dense(35, activation = 'softmax', name = 'softmax'))
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['acc'])
Here's the model structure.
model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
mask (TimeDist (None, 22, 16, 34, 4) 0
_________________________________________________________________
conv1 (TimeDistributed) (None, 22, 100, 30, 3) 16100
_________________________________________________________________
bn1 (TimeDistributed) (None, 22, 100, 30, 3) 12
_________________________________________________________________
drop1 (Dropout) (None, 22, 100, 30, 3) 0
_________________________________________________________________
conv2 (TimeDistributed) (None, 22, 100, 26, 2) 100100
_________________________________________________________________
bn2 (TimeDistributed) (None, 22, 100, 26, 2) 8
_________________________________________________________________
drop2 (Dropout) (None, 22, 100, 26, 2) 0
_________________________________________________________________
conv3 (TimeDistributed) (None, 22, 100, 22, 1) 100100
_________________________________________________________________
bn3 (TimeDistributed) (None, 22, 100, 22, 1) 4
_________________________________________________________________
drop3 (Dropout) (None, 22, 100, 22, 1) 0
_________________________________________________________________
flatten (TimeDistributed) (None, 22, 2200) 0
_________________________________________________________________
gru (GRU) (None, 22, 256) 1886976
_________________________________________________________________
drop_gru (Dropout) (None, 22, 256) 0
_________________________________________________________________
softmax (Dense) (None, 22, 35) 8995
=================================================================
Total params: 2,112,295
Trainable params: 2,112,283
Non-trainable params: 12
_________________________________________________________________
For mask_value, I tried with either 0 or this mask structure, but neither works and it still trains through all the data with half 0 paddings in it.
Can anyone help me?
B.T.W., I used TimeDistributed here to connect RNN, and I know another one called ConvLSTM2D. Does anyone know the difference? ConvLSTM2D takes much more params for the model, and get training much slower than TimeDistributed...
Unfortunately masking is not yet supported by the Keras Conv layers. There have been several issues posted about this on the Keras Github page, here is the one with the most substantial conversation on the topic. It appears that there was some hang up implementation details and the issue was never resolved.
The workaround proposed in the discussion is to have an explicit embedding for the padding character in sequences and do global pooling. Here is another workaround I found (not helpful for my use case but maybe helpful to you) - keeping a mask array to merge through multiplication.
You can also check out the conversation around this question which is similar to yours.
everyone,
I have a question about how to modify the pre-trained VGG16 network in Keras. I try to remove the max-pooling layers at the end the last three convolutional layers and add the batch normalization layer at the end of each convolutional layer. At the same time, I want to keep the parameters. This means that the whole modification process will not only include removing some middle layers, adding some new layers, but also concatenating the modified layers with the rest layers.
I'm still very new in Keras. The only way I can find is as shown in
Removing then Inserting a New Middle Layer in a Keras Model
So the codes I edited are as below:
from keras import applications
from keras.models import Model
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers.normalization import BatchNormalization
vgg_model = applications.VGG16(weights='imagenet',
include_top=False,
input_shape=(160, 80, 3))
# Disassemble layers
layers = [l for l in vgg_model.layers]
# Defining new convolutional layer.
# Important: the number of filters should be the same!
# Note: the receiptive field of two 3x3 convolutions is 5x5.
layer_dict = dict([(layer.name, layer) for layer in vgg_model.layers])
x = layer_dict['block3_conv3'].output
for i in range(11, len(layers)-5):
# layers[i].trainable = False
x = layers[i](x)
for j in range(15, len(layers)-1):
# layers[j].trainable = False
x = layers[j](x)
x = Conv2D(filters=128, kernel_size=(1, 1))(x)
x = BatchNormalization()(x)
x = Conv2D(filters=128, kernel_size=(1, 1))(x)
x = BatchNormalization()(x)
x = Conv2D(filters=128, kernel_size=(1, 1))(x)
x = BatchNormalization()(x)
x = Flatten()(x)
x = Dense(50, activation='softmax')(x)
custom_model = Model(inputs=vgg_model.input, outputs=x)
for layer in custom_model.layers[:16]:
layer.trainable = False
custom_model.summary()
However, the output shape of the convolutional layers in block 4 and block 5 are multiple. I tried to correct it by adding a layer MaxPool2D(batch_size=(1,1), stride=none), but the output shape is still multiple. Just like this:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 160, 80, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 160, 80, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 160, 80, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 80, 40, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 80, 40, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 80, 40, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 40, 20, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 40, 20, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 40, 20, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 40, 20, 256) 590080
_________________________________________________________________
block4_conv1 (Conv2D) multiple 1180160
_________________________________________________________________
block4_conv2 (Conv2D) multiple 2359808
_________________________________________________________________
block4_conv3 (Conv2D) multiple 2359808
_________________________________________________________________
block5_conv1 (Conv2D) multiple 2359808
_________________________________________________________________
block5_conv2 (Conv2D) multiple 2359808
_________________________________________________________________
block5_conv3 (Conv2D) multiple 2359808
_________________________________________________________________
conv2d_1 (Conv2D) (None, 40, 20, 128) 65664
_________________________________________________________________
batch_normalization_1 (Batch (None, 40, 20, 128) 512
_________________________________________________________________
conv2d_2 (Conv2D) (None, 40, 20, 128) 16512
_________________________________________________________________
batch_normalization_2 (Batch (None, 40, 20, 128) 512
_________________________________________________________________
conv2d_3 (Conv2D) (None, 40, 20, 128) 16512
_________________________________________________________________
batch_normalization_3 (Batch (None, 40, 20, 128) 512
_________________________________________________________________
flatten_1 (Flatten) (None, 102400) 0
_________________________________________________________________
dense_1 (Dense) (None, 50) 5120050
=================================================================
Total params: 19,934,962
Trainable params: 5,219,506
Non-trainable params: 14,715,456
_________________________________________________________________
Can anyone provide some suggestions about how to reach my goal?
Thanks very much.
The multiple output shape is there because these layers were called two times so they have two output shapes.
You can see here that in case calling layer.output_shape raises an AttributeError, the printed output shape will be 'multiple'.
If you call custom_model.layers[10].output_shape, you will get this error :
AttributeError: The layer "block4_conv1 has multiple inbound nodes, with different output shapes. Hence the notion of "output shape" is ill-defined for the layer. Use `get_output_shape_at(node_index)` instead.
And if you then call custom_model.layers[10].get_output_shape_at(0), you will get the output shape corresponding to the initial network, and for custom_model.layers[10].get_output_shape_at(1), you will get the output shape that you are expecting.
Let me just express that I'm doubting your intention with this modification : if you remove the MaxPooling layer, and that you apply the next layer (number 11) to the output that came before the MaxPooling layer, the learnt filters are "expecting" an image with two times less resolution so they probably won't work.
Let's imagine that one filter is "looking" for eyes and that usually eyes are 10 pixels wide, you'll need an 20 pixels wide eye to trigger the same activation in the layer.
My example is obviously over-simplistic and not accurate but it's just to show that the original idea is wrong, you should either retrain the top of the model / keep the MaxPooling layer/ define a brand new model on the top off layer block3_conv3.