I have created the model as below(keras with theano backend). When I run it on my CPU it gives me memory error. I have 8GB DDR3 ram and before calling model1.fit my ram is 2.3 GB consumed. Also I could the RAM being used upto 7.5GB and the program crashes. I also tried it running on GPU (Nvedia GeForce GTX 860M) 4GB but still I got a memory error.
def get_model_convolutional():
model = keras.models.Sequential()
model.add(Conv2D(128, (3, 3), activation='relu', strides = (1,1), input_shape=(1028, 1028, 3)))
model.add(Conv2D(3, (3, 3), strides = (1,1), activation=None))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
return model
if __name__ == "__main__":
model1 = get_model_convolutional()
train_x = np.ones((108, 1208, 1208, 3), dtype=np.uint8)
train_y = np.ones((108, 1204, 1204, 3), dtype = np.uint8)
model1.fit(x_train, y_train, verbose = 2,epochs=20, batch_size=4)
Also the output when I try to print model.summary() is
Layer (type) Output Shape Param #
conv2d_1 (Conv2D) (None, 1026, 1026, 128) 3584
conv2d_2 (Conv2D) (None, 1024, 1024, 3) 3459
Total params: 7,043
Trainable params: 7,043
Non-trainable params: 0
Why is so much of memory is required? I tried to calculate but I think memory around 1.5GB should be required. This is my first model.
The memory was required to compute the intermediate outputs, which get very huge in this case because of no pooling. Some solutions are to reduce the number of filters, reduce the image size (i.e. use cropped images and later stack them together), reduce batch size.
I've a sample tiny CNN implemented in both Keras and PyTorch. When I print summary of both the networks, the total number of trainable parameters are same but total number of parameters and number of parameters for Batch Normalization don't match.
Here is the CNN implementation in Keras:
inputs = Input(shape = (64, 64, 1)). # Channel Last: (NHWC)
model = Conv2D(filters=32, kernel_size=(3, 3), padding='SAME', activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 1))(inputs)
model = BatchNormalization(momentum=0.15, axis=-1)(model)
model = Flatten()(model)
dense = Dense(100, activation = "relu")(model)
head_root = Dense(10, activation = 'softmax')(dense)
And the summary printed for above model is:
Model: "model_8"
Layer (type) Output Shape Param #
input_9 (InputLayer) (None, 64, 64, 1) 0
conv2d_10 (Conv2D) (None, 64, 64, 32) 320
batch_normalization_2 (Batch (None, 64, 64, 32) 128
flatten_3 (Flatten) (None, 131072) 0
dense_11 (Dense) (None, 100) 13107300
dense_12 (Dense) (None, 10) 1010
Total params: 13,108,758
Trainable params: 13,108,694
Non-trainable params: 64
Here's the implementation of the same model architecture in PyTorch:
# Image format: Channel first (NCHW) in PyTorch
class CustomModel(nn.Module):
def __init__(self):
super(CustomModel, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=(3, 3), padding=1),
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(in_features=131072, out_features=100)
self.fc2 = nn.Linear(in_features=100, out_features=10)
def forward(self, x):
output = self.layer1(x)
output = self.flatten(output)
output = self.fc1(output)
output = self.fc2(output)
return output
And following is the output of summary of the above model:
Layer (type) Output Shape Param #
Conv2d-1 [-1, 32, 64, 64] 320
ReLU-2 [-1, 32, 64, 64] 0
BatchNorm2d-3 [-1, 32, 64, 64] 64
Flatten-4 [-1, 131072] 0
Linear-5 [-1, 100] 13,107,300
Linear-6 [-1, 10] 1,010
Total params: 13,108,694
Trainable params: 13,108,694
Non-trainable params: 0
Input size (MB): 0.02
Forward/backward pass size (MB): 4.00
Params size (MB): 50.01
Estimated Total Size (MB): 54.02
As you can see in above results, Batch Normalization in Keras has more number of parameters than PyTorch (2x to be exact). So what's the difference in above CNN architectures? If they are equivalent, then what am I missing here?
Keras treats as parameters (weights) many things that will be "saved/loaded" in the layer.
While both implementations naturally have the accumulated "mean" and "variance" of the batches, these values are not trainable with backpropagation.
Nevertheless, these values are updated every batch, and Keras treats them as non-trainable weights, while PyTorch simply hides them. The term "non-trainable" here means "not trainable by backpropagation", but doesn't mean the values are frozen.
In total they are 4 groups of "weights" for a BatchNormalization layer. Considering the selected axis (default = -1, size=32 for your layer)
scale (32) - trainable
offset (32) - trainable
accumulated means (32) - non-trainable, but updated every batch
accumulated std (32) - non-trainable, but updated every batch
The advantage of having it like this in Keras is that when you save the layer, you also save the mean and variance values the same way you save all other weights in the layer automatically. And when you load the layer, these weights are loaded together.
I am currently working on a question answering system. I create a synthetic dataset that contains multiple words in the answers. But, the answers are not a span of the given context.
Initially, I am planning to test it using a deep learning-based model. But I have some problems building the model.
This is how I vectorized data.
def vectorize(data, word2idx, story_maxlen, question_maxlen, answer_maxlen):
""" Create the story and question vectors and the label """
Xs, Xq, Y = [], [], []
for story, question, answer in data:
xs = [word2idx[word] for word in story]
xq = [word2idx[word] for word in question]
y = [word2idx[word] for word in answer]
#y = np.zeros(len(word2idx) + 1)
#y[word2idx[answer]] = 1
return (pad_sequences(Xs, maxlen=story_maxlen),
pad_sequences(Xq, maxlen=question_maxlen),
pad_sequences(Y, maxlen=answer_maxlen))
below is how I create the model.
# story encoder. Output dim: (None, story_maxlen, EMBED_HIDDEN_SIZE)
story_encoder = Sequential()
# question encoder. Output dim: (None, question_maxlen, EMBED_HIDDEN_SIZE)
question_encoder = Sequential()
# episodic memory (facts): story * question
# Output dim: (None, question_maxlen, story_maxlen)
facts_encoder = Sequential()
facts_encoder.add(Merge([story_encoder, question_encoder],
mode="dot", dot_axes=[2, 2]))
facts_encoder.add(Permute((2, 1)))
## combine response and question vectors and do logistic regression
answer = Sequential()
answer.add(Merge([facts_encoder, question_encoder],
mode="concat", concat_axis=-1))
answer.add(LSTM(LSTM_OUTPUT_SIZE, return_sequences=True))
answer.add(Dense(vocab_size,activation= "softmax"))
answer.compile(optimizer="rmsprop", loss="categorical_crossentropy",
answer.fit([Xs_train, Xq_train], Y_train,
batch_size=BATCH_SIZE, nb_epoch=NBR_EPOCHS,
validation_data=([Xs_test, Xq_test], Y_test))
and this is the summary of the model
Layer (type) Output Shape Param #
merge_46 (Merge) (None, 5, 616) 0
lstm_23 (LSTM) (None, 5, 32) 83072
dropout_69 (Dropout) (None, 5, 32) 0
flatten_9 (Flatten) (None, 160) 0
dense_22 (Dense) (None, 37) 5957
Total params: 93,765.0
Trainable params: 93,765.0
Non-trainable params: 0.0
It gives the following error.
ValueError: Error when checking model target: expected dense_22 to have shape (None, 37) but got array with shape (1000, 2)
I think the error is related to Y_train, Y_test. I should encode them to categorical values and the answers are not spans of text, but sequential. I don't know what/how to do it.
how can I fix it? any ideas?
When I use sparse_categorical_crossentropy in the loss, and Reshape(2,-1);
Layer (type) Output Shape Param #
merge_94 (Merge) (None, 5, 616) 0
lstm_65 (LSTM) (None, 5, 32) 83072
dropout_139 (Dropout) (None, 5, 32) 0
reshape_22 (Reshape) (None, 2, 80) 0
dense_44 (Dense) (None, 2, 37) 2997
Total params: 90,805.0
Trainable params: 90,805.0
Non-trainable params: 0.0
The model after modifications
# story encoder. Output dim: (None, story_maxlen, EMBED_HIDDEN_SIZE)
story_encoder = Sequential()
# question encoder. Output dim: (None, question_maxlen, EMBED_HIDDEN_SIZE)
question_encoder = Sequential()
# episodic memory (facts): story * question
# Output dim: (None, question_maxlen, story_maxlen)
facts_encoder = Sequential()
facts_encoder.add(Merge([story_encoder, question_encoder],
mode="dot", dot_axes=[2, 2]))
facts_encoder.add(Permute((2, 1)))
## combine response and question vectors and do logistic regression
## combine response and question vectors and do logistic regression
answer = Sequential()
answer.add(Merge([facts_encoder, question_encoder],
mode="concat", concat_axis=-1))
answer.add(LSTM(LSTM_OUTPUT_SIZE, return_sequences=True))
answer.add(keras.layers.Reshape((2, -1)))
answer.add(Dense(vocab_size,activation= "softmax"))
answer.compile(optimizer="rmsprop", loss="sparse_categorical_crossentropy",
answer.fit([Xs_train, Xq_train], Y_train,
batch_size=BATCH_SIZE, nb_epoch=NBR_EPOCHS,
validation_data=([Xs_test, Xq_test], Y_test))
It still gives
ValueError: Error when checking model target: expected dense_46 to have 3 dimensions, but got array with shape (1000, 2)
As far as I understand - Y_train, Y_test comprise of indexes (not one-hot vectors). If so - change loss to sparse_categorical_entropy:
answer.compile(optimizer="rmsprop", loss="sparse_categorical_crossentropy",
As far as I understand - Y_train, Y_test have a sequence dimension. And the length of questions (5) doesn't equal to the length of the answers (2). This dimension is removed by Flatten(). Try to replace Flatten() by Reshape():
# answer.add(Flatten())
answer.add(tf.keras.layers.Reshape((2, -1)))
I have a model that was kinda working on some data. I've added in some tokenized word data in the dataset (somewhat truncated for brevity):
vocab_size = len(tokenizer.word_index) + 1
comment_texts = df.comment_text.values
tokenizer = Tokenizer(num_words=num_words)
comment_seq = tokenizer.texts_to_sequences(comment_texts)
maxtrainlen = max_length(comment_seq)
comment_train = pad_sequences(comment_seq, maxlen=maxtrainlen, padding='post')
vocab_size = len(tokenizer.word_index) + 1
df.comment_text = comment_train
x = df.drop('label', 1) # the thing I'm training
labels = df['label'].values # Also known as Y
x_train, x_test, y_train, y_test = train_test_split(
x, labels, test_size=0.2, random_state=1337)
n_cols = x_train.shape[1]
embedding_dim = 100 # TODO: why?
model = Sequential([
Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_shape=(n_cols,)),
Dense(32, activation='relu'),
Dense(512, activation='relu'),
Dense(12, activation='softmax'), # for an unknown type, we don't account for that while training
# convert the y_train to a one hot encoded variable
encoder = LabelEncoder()
encoder.fit(labels) # fit on all the labels
encoded_Y = encoder.transform(y_train) # encode on y_train
one_hot_y = np_utils.to_categorical(encoded_Y)
model.fit(x_train, one_hot_y, epochs=10, batch_size=16)
Now, I get this error:
Model: "sequential"
Layer (type) Output Shape Param #
embedding (Embedding) (None, 12, 100) 4040500
lstm (LSTM) (None, 32) 17024
dense (Dense) (None, 32) 1056
dense_1 (Dense) (None, 512) 16896
dense_2 (Dense) (None, 12) 6156
Total params: 4,081,632
Trainable params: 4,081,632
Non-trainable params: 0
Train on 4702 samples
Epoch 1/10
2020-03-04 22:37:59.499238: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: indices[0,0] = -4 is not in [0, 40405)
I think this must be coming from my comment_text column since that is the only thing I added.
Here is what comment_text looks like before I make the substitution:
And here is after:
My full code (before I made the change) is here:
You should be training with comment_train, not with x which is taking whatever is in the unknown df.
The embedding_dim=100 is free to choose. It's like the number of units in a hidden layer. You can tune this parameter to find which is best for your model as well as you can tune the number of units in hidden layers.
In your case, you will need a model with two or more inputs:
One input for the comments, passing through the embedding and processing text
Another input for the rest of the data, passing probably through a standard netork.
At some point you will concatenate these two branches and keep on going.
This link has a good tutorial about the functional API models and shows a model that has two text inputs and an extra input: https://www.tensorflow.org/guide/keras/functional
I didn't find a clearly answer to this question online (sorry if it exists).
I would like to understand the differences between the two functions (SeparableConv2D and Conv2D), step by step with, for example a input dataset of (3,3,3) (as RGB image).
Running this script based on Keras-Tensorflow :
import numpy as np
from keras.layers import Conv2D, SeparableConv2D
from keras.models import Model
from keras.layers import Input
red = np.array([1]*9).reshape((3,3))
green = np.array([100]*9).reshape((3,3))
blue = np.array([10000]*9).reshape((3,3))
img = np.stack([red, green, blue], axis=-1)
img = np.expand_dims(img, axis=0)
inputs = Input((3,3,3))
conv1 = SeparableConv2D(filters=1,
conv2 = Conv2D(filters=1,
model1 = Model(inputs,conv1)
model2 = Model(inputs,conv2)
print("Model 1 prediction: ")
print("Model 2 prediction: ")
print("Model 1 summary: ")
print("Model 2 summary: ")
I have the following output :
Model 1 prediction:
Model 2 prediction:
Model 1 summary:
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 3, 3, 3) 0
separable_conv2d_1 (Separabl (None, 2, 2, 1) 16
Total params: 16
Trainable params: 16
Non-trainable params: 0
Model 2 summary:
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 3, 3, 3) 0
conv2d_1 (Conv2D) (None, 2, 2, 1) 13
Total params: 13
Trainable params: 13
Non-trainable params: 0
I understand how Keras compute the Conv2D prediction of model 2 thanks to this post, but can someone explains the SeperableConv2D computation of model 1 prediction please and its number of parameters (16) ?
As Keras uses Tensorflow, you can check in the Tensorflow's API the difference.
The conv2D is the traditional convolution. So, you have an image, with or without padding, and filter that slides through the image with a given stride.
On the other hand, the SeparableConv2D is a variation of the traditional convolution that was proposed to compute it faster.
It performs a depthwise spatial convolution followed by a pointwise convolution which mixes together the resulting output channels. MobileNet, for example, uses this operation to compute the convolutions faster.
I could explain both operations here, however, this post has a very good explanation using images and videos that I strongly recommend you to read.
I have not coded in years, forgive me. I am trying to do something that may be impossible. I have 38 videos of people performing the same basic movement. I want to train the model to identify those doing it correct v not correct.
I am using color now, because the grayscale did not work either and I wanted to test like the example I used. I used the model as defined in an example, link.
Python3.5 in Anaconda 64,
Tensorflow backend,
on Windows 10 (64bit)
I was hoping to try different models on the problem and use grayscale to reduce memory, but cant get past first step!
Here is my code:
import time
import numpy as np
import sys
import os
import cv2
import keras
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization
from keras.layers import Conv3D, Conv2D, MaxPooling2D, GRU, ConvLSTM2D, TimeDistributed
y_cat = np.zeros(40,np.float)
good = "Good"
bad = "Bad"
batch_size = 32
num_classes = 1
epochs = 1
nvideos = 38
nframes = 130
nrows = 240
ncols = 320
nchan = 3
x_learn = np.zeros((nvideos,nframes,nrows,ncols,nchan),np.int32)
x_learn = np.load(".\\train\\datasetcolor.npy")
with open(".\\train\\tags.txt") as ft:
y_learn = ft.readlines()
y_learn = [x.strip() for x in y_learn]
# transform string tags to numeric.
for i in range (0,len(y_learn)):
if (y_learn[i] == good): y_cat[i] = 1
elif (y_learn[i] == bad): y_cat[i] = 0
#build model
# duplicating from https://github.com/fchollet/keras/blob/master/examples/conv_lstm.py
model = Sequential()
model.image_dim_ordering = 'tf'
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(Conv3D(filters=1, kernel_size=(3, 3, 3),
padding='same', data_format='channels_last'))
model.compile(loss='binary_crossentropy', optimizer='adadelta')
# fit with first 3 videos because I don't have the horsepower yet
history = model.fit(x_learn[:3], y_learn[:3],
print (history)
Layer (type) Output Shape Param #
conv_lst_m2d_5 (ConvLSTM2D) (None, 130, 240, 320, 40) 62080
batch_normalization_5 (Batch (None, 130, 240, 320, 40) 160
conv_lst_m2d_6 (ConvLSTM2D) (None, 130, 240, 320, 40) 115360
batch_normalization_6 (Batch (None, 130, 240, 320, 40) 160
conv_lst_m2d_7 (ConvLSTM2D) (None, 130, 240, 320, 40) 115360
batch_normalization_7 (Batch (None, 130, 240, 320, 40) 160
conv_lst_m2d_8 (ConvLSTM2D) (None, 130, 240, 320, 40) 115360
batch_normalization_8 (Batch (None, 130, 240, 320, 40) 160
conv3d_1 (Conv3D) (None, 130, 240, 320, 1) 1081
Total params: 409,881.0
Trainable params: 409,561
Non-trainable params: 320.0
ValueError Traceback (most recent call last)
<ipython-input-3-d909d285f474> in <module>()
82 history = model.fit(x_learn[:3], y_learn[:3],
83 batch_size=batch_size,
---> 84 epochs=epochs)
86 print (history)
ValueError: Error when checking model target: expected conv3d_1 to have 5 dimensions, but got array with shape (3, 1)
"Target" means that the problem is in the output of your model versus the format of y_learn.
The array y_learn should be exactly the same shape of the model's output, because the model outputs a "guess", while y_learn is the "correct answer". The system can only compare the guess with the correct answer if they have the same dimensions.
See the difference:
Model Output (seen in the summary): (None,130,240,320,1)
y_learn: (None,1)
Where "None" is the batch size. You gave y_learn[:3], then your batch size is 3 for this training session.
In order to correct it properly, we need to understand what y_learn is.
If I understood well, you've got only a number, 0 or 1, for each video. If that's so, your y_learn is totally ok, and what you need is for your model to output things like (None,1).
A very simple way to do that (perhaps not the best, and I couldn't be of more help here...) is to add a final Dense layer with just one neuron:
model.add(Dense(1, activation='sigmoid'))
Now, when you do model.summary(), you will see the final output as (None,1)