I've a sample tiny CNN implemented in both Keras and PyTorch. When I print summary of both the networks, the total number of trainable parameters are same but total number of parameters and number of parameters for Batch Normalization don't match.
Here is the CNN implementation in Keras:
inputs = Input(shape = (64, 64, 1)). # Channel Last: (NHWC)
model = Conv2D(filters=32, kernel_size=(3, 3), padding='SAME', activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 1))(inputs)
model = BatchNormalization(momentum=0.15, axis=-1)(model)
model = Flatten()(model)
dense = Dense(100, activation = "relu")(model)
head_root = Dense(10, activation = 'softmax')(dense)
And the summary printed for above model is:
Model: "model_8"
Layer (type) Output Shape Param #
input_9 (InputLayer) (None, 64, 64, 1) 0
conv2d_10 (Conv2D) (None, 64, 64, 32) 320
batch_normalization_2 (Batch (None, 64, 64, 32) 128
flatten_3 (Flatten) (None, 131072) 0
dense_11 (Dense) (None, 100) 13107300
dense_12 (Dense) (None, 10) 1010
Total params: 13,108,758
Trainable params: 13,108,694
Non-trainable params: 64
Here's the implementation of the same model architecture in PyTorch:
# Image format: Channel first (NCHW) in PyTorch
class CustomModel(nn.Module):
def __init__(self):
super(CustomModel, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=(3, 3), padding=1),
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(in_features=131072, out_features=100)
self.fc2 = nn.Linear(in_features=100, out_features=10)
def forward(self, x):
output = self.layer1(x)
output = self.flatten(output)
output = self.fc1(output)
output = self.fc2(output)
return output
And following is the output of summary of the above model:
Layer (type) Output Shape Param #
Conv2d-1 [-1, 32, 64, 64] 320
ReLU-2 [-1, 32, 64, 64] 0
BatchNorm2d-3 [-1, 32, 64, 64] 64
Flatten-4 [-1, 131072] 0
Linear-5 [-1, 100] 13,107,300
Linear-6 [-1, 10] 1,010
Total params: 13,108,694
Trainable params: 13,108,694
Non-trainable params: 0
Input size (MB): 0.02
Forward/backward pass size (MB): 4.00
Params size (MB): 50.01
Estimated Total Size (MB): 54.02
As you can see in above results, Batch Normalization in Keras has more number of parameters than PyTorch (2x to be exact). So what's the difference in above CNN architectures? If they are equivalent, then what am I missing here?
Keras treats as parameters (weights) many things that will be "saved/loaded" in the layer.
While both implementations naturally have the accumulated "mean" and "variance" of the batches, these values are not trainable with backpropagation.
Nevertheless, these values are updated every batch, and Keras treats them as non-trainable weights, while PyTorch simply hides them. The term "non-trainable" here means "not trainable by backpropagation", but doesn't mean the values are frozen.
In total they are 4 groups of "weights" for a BatchNormalization layer. Considering the selected axis (default = -1, size=32 for your layer)
scale (32) - trainable
offset (32) - trainable
accumulated means (32) - non-trainable, but updated every batch
accumulated std (32) - non-trainable, but updated every batch
The advantage of having it like this in Keras is that when you save the layer, you also save the mean and variance values the same way you save all other weights in the layer automatically. And when you load the layer, these weights are loaded together.
I am a newbie trying out LSTM.
I am basically using LSTM to determine action type (5 different actions) like running, dancing etc. My input is 60 frames per action and roughly let's say about 120 such videos
train_x.shape = (120,192,192,60)
where 120 is the number of sample videos for training, 192X192 is the frame size and 60 is the # frames.
train_y.shape = (120*5) [1 0 0 0 0 ..... 0 0 0 0 1] one hot-coded
I am not clear as to how to pass 3d parameters to lstm (timestamp and features)
model.add(LSTM(100, input_shape=(train_x.shape[1],train_x.shape[2])))
model.add(Dense(100, activation='relu'))
model.add(Dense(len(uniquesegments), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(train_x, train_y, epochs=100, batch_size=batch_size, verbose=1)
i get the following error
Input 0 of layer sequential is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 192, 192, 60)
training data algorithm
Loop through videos
Loop through each frame of a video
append to array
convert to numpy array
roll axis to convert 60 192 192 to 192 192 60
add to training list
convert training list to numpy array
training list shape <120, 192, 192, 60>
First you should know, method of solving video classification task is better suit for Convolutional RNN than LSTM or any RNN Cell, just as CNN is better suit for image classification task than MLP
Those RNN cell (e.g LSTM, GRU) is expect inputs with shape (samples, timesteps, channels), since you are deal inputs with shape (samples, timesteps, width, height, channels), so you should using tf.keras.layers.ConvLSTM2D instead
Following example code will show you how to build a model that can deal your video classification task:
import tensorflow as tf
from tensorflow.keras import models, layers
timesteps = 60
width = 192
height = 192
channels = 1
action_num = 5
model = models.Sequential(
shape=(timesteps, width, height, channels)
filters=64, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
filters=32, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
filters=16, kernel_size=(3, 3), padding="same", return_sequences=False, dropout=0.1, recurrent_dropout=0.1
pool_size=(2, 2), strides=(2, 2), padding="same"
layers.Dense(256, activation='relu'),
layers.Dense(action_num, activation='softmax')
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Model: "sequential"
Layer (type) Output Shape Param #
conv_lst_m2d (ConvLSTM2D) (None, 60, 192, 192, 64) 150016
max_pooling3d (MaxPooling3D) (None, 60, 96, 96, 64) 0
batch_normalization (BatchNo (None, 60, 96, 96, 64) 256
conv_lst_m2d_1 (ConvLSTM2D) (None, 60, 96, 96, 32) 110720
max_pooling3d_1 (MaxPooling3 (None, 60, 48, 48, 32) 0
batch_normalization_1 (Batch (None, 60, 48, 48, 32) 128
conv_lst_m2d_2 (ConvLSTM2D) (None, 48, 48, 16) 27712
max_pooling2d (MaxPooling2D) (None, 24, 24, 16) 0
batch_normalization_2 (Batch (None, 24, 24, 16) 64
flatten (Flatten) (None, 9216) 0
dense (Dense) (None, 256) 2359552
dense_1 (Dense) (None, 5) 1285
Total params: 2,649,733
Trainable params: 2,649,509
Non-trainable params: 224
Beware you should reorder your data to the shape (samples, timesteps, width, height, channels) before feed in above model (i.e not like np.reshape, but like np.moveaxis), in your case the shape should be (120, 60, 192, 192, 1), then you can split your 120 video to batchs and feed to model
From the docs, it seems like LSTM isn't even intended to take an input_shape argument. And that makes sense because typically you should be feeding it a 1d feature per timestep. That's why in the docs it says:
inputs: A 3D tensor with shape [batch, timesteps, feature]
What you're trying to do won't work (I've also left you a comment explaining why you probably shouldn't be trying to do it that way).
I am currently working on a question answering system. I create a synthetic dataset that contains multiple words in the answers. But, the answers are not a span of the given context.
Initially, I am planning to test it using a deep learning-based model. But I have some problems building the model.
This is how I vectorized data.
def vectorize(data, word2idx, story_maxlen, question_maxlen, answer_maxlen):
""" Create the story and question vectors and the label """
Xs, Xq, Y = [], [], []
for story, question, answer in data:
xs = [word2idx[word] for word in story]
xq = [word2idx[word] for word in question]
y = [word2idx[word] for word in answer]
#y = np.zeros(len(word2idx) + 1)
#y[word2idx[answer]] = 1
return (pad_sequences(Xs, maxlen=story_maxlen),
pad_sequences(Xq, maxlen=question_maxlen),
pad_sequences(Y, maxlen=answer_maxlen))
below is how I create the model.
# story encoder. Output dim: (None, story_maxlen, EMBED_HIDDEN_SIZE)
story_encoder = Sequential()
# question encoder. Output dim: (None, question_maxlen, EMBED_HIDDEN_SIZE)
question_encoder = Sequential()
# episodic memory (facts): story * question
# Output dim: (None, question_maxlen, story_maxlen)
facts_encoder = Sequential()
facts_encoder.add(Merge([story_encoder, question_encoder],
mode="dot", dot_axes=[2, 2]))
facts_encoder.add(Permute((2, 1)))
## combine response and question vectors and do logistic regression
answer = Sequential()
answer.add(Merge([facts_encoder, question_encoder],
mode="concat", concat_axis=-1))
answer.add(LSTM(LSTM_OUTPUT_SIZE, return_sequences=True))
answer.add(Dense(vocab_size,activation= "softmax"))
answer.compile(optimizer="rmsprop", loss="categorical_crossentropy",
answer.fit([Xs_train, Xq_train], Y_train,
batch_size=BATCH_SIZE, nb_epoch=NBR_EPOCHS,
validation_data=([Xs_test, Xq_test], Y_test))
and this is the summary of the model
Layer (type) Output Shape Param #
merge_46 (Merge) (None, 5, 616) 0
lstm_23 (LSTM) (None, 5, 32) 83072
dropout_69 (Dropout) (None, 5, 32) 0
flatten_9 (Flatten) (None, 160) 0
dense_22 (Dense) (None, 37) 5957
Total params: 93,765.0
Trainable params: 93,765.0
Non-trainable params: 0.0
It gives the following error.
ValueError: Error when checking model target: expected dense_22 to have shape (None, 37) but got array with shape (1000, 2)
I think the error is related to Y_train, Y_test. I should encode them to categorical values and the answers are not spans of text, but sequential. I don't know what/how to do it.
how can I fix it? any ideas?
When I use sparse_categorical_crossentropy in the loss, and Reshape(2,-1);
Layer (type) Output Shape Param #
merge_94 (Merge) (None, 5, 616) 0
lstm_65 (LSTM) (None, 5, 32) 83072
dropout_139 (Dropout) (None, 5, 32) 0
reshape_22 (Reshape) (None, 2, 80) 0
dense_44 (Dense) (None, 2, 37) 2997
Total params: 90,805.0
Trainable params: 90,805.0
Non-trainable params: 0.0
The model after modifications
# story encoder. Output dim: (None, story_maxlen, EMBED_HIDDEN_SIZE)
story_encoder = Sequential()
# question encoder. Output dim: (None, question_maxlen, EMBED_HIDDEN_SIZE)
question_encoder = Sequential()
# episodic memory (facts): story * question
# Output dim: (None, question_maxlen, story_maxlen)
facts_encoder = Sequential()
facts_encoder.add(Merge([story_encoder, question_encoder],
mode="dot", dot_axes=[2, 2]))
facts_encoder.add(Permute((2, 1)))
## combine response and question vectors and do logistic regression
## combine response and question vectors and do logistic regression
answer = Sequential()
answer.add(Merge([facts_encoder, question_encoder],
mode="concat", concat_axis=-1))
answer.add(LSTM(LSTM_OUTPUT_SIZE, return_sequences=True))
answer.add(keras.layers.Reshape((2, -1)))
answer.add(Dense(vocab_size,activation= "softmax"))
answer.compile(optimizer="rmsprop", loss="sparse_categorical_crossentropy",
answer.fit([Xs_train, Xq_train], Y_train,
batch_size=BATCH_SIZE, nb_epoch=NBR_EPOCHS,
validation_data=([Xs_test, Xq_test], Y_test))
It still gives
ValueError: Error when checking model target: expected dense_46 to have 3 dimensions, but got array with shape (1000, 2)
As far as I understand - Y_train, Y_test comprise of indexes (not one-hot vectors). If so - change loss to sparse_categorical_entropy:
answer.compile(optimizer="rmsprop", loss="sparse_categorical_crossentropy",
As far as I understand - Y_train, Y_test have a sequence dimension. And the length of questions (5) doesn't equal to the length of the answers (2). This dimension is removed by Flatten(). Try to replace Flatten() by Reshape():
# answer.add(Flatten())
answer.add(tf.keras.layers.Reshape((2, -1)))
My input images have 8 channels and my output (label) has 1 channel and my CNN in keras is like below:
def set_model(ks1=5, ks2=5, nf1=64, nf2=1):
model = Sequential()
model.add(Conv2D(nf1, padding="same", kernel_size=(ks1, ks1),
activation='relu', input_shape=(62, 62, 8)))
model.add(Conv2D(nf2, padding="same", kernel_size=(ks2, ks2),
return model
The filter I have here is the same for all 8 channels. What I would like to have is a 3D filter, something like (8, 5, 5) such that every channel has a separate filter because these channels have not the same importance.
Below is the summary of the model implemented above:
Layer (type) Output Shape Param #
conv2d_1 (Conv2D) (None, 62, 62, 64) 12864
conv2d_2 (Conv2D) (None, 62, 62, 1) 1601
Total params: 14,465
Trainable params: 14,465
Non-trainable params: 0
And when I get the shape of weights for the first layer I have the following results:
for layer in model.layers:
weights = layer.get_weights()
a = np.array(weights[0])
(5, 5, 64, 1)
And I am wondering where is 8 in the shape of weights of the first layer?
I'm trying to build a Q&A model based off of the bAbI Task 8 example and I am having trouble merging two of my input layers into one layer. Here is my current model architecture:
story_input = Input(shape=(story_maxlen,vocab_size), name='story_input')
story_input_proc = Embedding(vocab_size, latent_dim, name='story_input_embed', input_length=story_maxlen)(story_input)
story_input_proc = Reshape((latent_dim,story_maxlen), name='story_input_reshape')(story_input_proc)
query_input = Input(shape=(query_maxlen,vocab_size), name='query_input')
query_input_proc = Embedding(vocab_size, latent_dim, name='query_input_embed', input_length=query_maxlen)(query_input)
query_input_proc = Reshape((latent_dim,query_maxlen), name='query_input_reshape')(query_input_proc)
story_query = dot([story_input_proc, query_input_proc], axes=(1, 1), name='story_query_merge')
encoder = LSTM(latent_dim, return_state=True, name='encoder')
encoder_output, state_h, state_c = encoder(story_query)
encoder_output = RepeatVector(3, name='encoder_3dim')(encoder_output)
encoder_states = [state_h, state_c]
decoder = LSTM(latent_dim, return_sequences=True, name='decoder')(encoder_output, initial_state=encoder_states)
answer_output = Dense(vocab_size, activation='softmax', name='answer_output')(decoder)
model = Model([story_input, query_input], answer_output)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
and here is the output of model.summary()
Layer (type) Output Shape Param # Connected to
story_input (InputLayer) (None, 358, 38) 0
query_input (InputLayer) (None, 5, 38) 0
story_input_embed (Embedding) (None, 358, 64) 2432 story_input[0][0]
query_input_embed (Embedding) (None, 5, 64) 2432 query_input[0][0]
story_input_reshape (Reshape) (None, 64, 358) 0 story_input_embed[0][0]
query_input_reshape (Reshape) (None, 64, 5) 0 query_input_embed[0][0]
story_query_merge (Dot) (None, 358, 5) 0 story_input_reshape[0][0]
encoder (LSTM) [(None, 64), (None, 17920 story_query_merge[0][0]
encoder_3dim (RepeatVector) (None, 3, 64) 0 encoder[0][0]
decoder (LSTM) (None, 3, 64) 33024 encoder_3dim[0][0]
answer_output (Dense) (None, 3, 38) 2470 decoder[0][0]
Total params: 58,278
Trainable params: 58,278
Non-trainable params: 0
where vocab_size = 38, story_maxlen = 358, query_maxlen = 5, latent_dim = 64, and the batch size = 64.
When I try to train this model I get the error:
Input to reshape is a tensor with 778240 values, but the requested shape has 20480
Here is the formula for those two values:
input_to_reshape = batch_size * latent_dim * query_maxlen * vocab_size
requested_shape = batch_size * latent_dim * query_maxlen
Where I'm At
I believe the error message is saying the shape of the tensor inputted into the query_input_reshape layer is (?, 5, 38, 64) but it is expecting a tensor of shape (?, 5, 64) (see formulas above), but I could be wrong on that.
When I change the target_shape input of Reshape to be 3D (i.e. Reshape((latent_dim,query_maxlen,vocab_size)) I get the error total size of new array must be unchanged, which doesn't make any sense to me because the input is 3D. You would think that Reshape((latent_dim,query_maxlen)) would give me that error because it'd be changing a 3D tensor into a 2D tensor, but it compiles fine, so I've no clue what's going on there.
The only reason I'm using Reshape is because I need to merge the two tensors as an input into the LSTM encoder. When I try to get rid of the Reshape layers I just get dimension mismatch errors when I try to compile the model. The model architecture above at least compiles but I can't train it.
Can someone please help me figure out how I can merge the story_input and query_input layers? Thanks!
I was trying to port CRNN model to Keras.
But, I got stuck while connecting output of Conv2D layer to LSTM layer.
Output from CNN layer will have a shape of ( batch_size, 512, 1, width_dash) where first one depends on batch_size, and last one depends on input width of input ( this model can accept variable width input )
For eg: an input with shape [2, 1, 32, 829] was resulting output with shape of (2, 512, 1, 208)
Now, as per Pytorch model, we have to do squeeze(2) followed by permute(2, 0, 1)
it will result a tensor with shape [208, 2, 512 ]
I was trying to implement this is Keras, but I was not able to do that because, in Keras we can not alter batch_size dimension in a keras.models.Sequential model
Can someone please guide me how to port above part of this model to Keras?
Current state of ported CNN layer
You don't need to permute the batch axis in Keras. In a pytorch model you need to do it because a pytorch LSTM expects an input shape (seq_len, batch, input_size). However in Keras, the LSTM layer expects (batch, seq_len, input_size).
So after defining the CNN and squeezing out axis 2, you just need to permute the last two axes. As a simple example (in 'channels_first' Keras image format),
model = Sequential()
model.add(Conv2D(512, 3, strides=(32, 4), padding='same', input_shape=(1, 32, None)))
model.add(Reshape((512, -1)))
model.add(Permute((2, 1)))
You can verify the shapes with model.summary():
Layer (type) Output Shape Param #
conv2d_4 (Conv2D) (None, 512, 1, None) 5120
reshape_3 (Reshape) (None, 512, None) 0
permute_4 (Permute) (None, None, 512) 0
lstm_3 (LSTM) (None, 32) 69760
Total params: 74,880
Trainable params: 74,880
Non-trainable params: 0