I am trying to develop a model for denoising images. I've been reading up on how to calculate memory usage of a neural network and the standard approach seems to be:
params = depth_n x (kernel_width x kernel_height) x depth_n-1 + depth
By summing all parameters together in my network, I end up getting 1,038,097 which approximates to 4.2MB. It seems I have done a slight miscalculation in the last layer since Keras ends up getting 1,038,497 params. Nevertheless, this is a small difference. 4.2MB is just the parameters, and I've seen somewhere that one should multiply by 3 to include backprop and other needed calculations. This would then approximate to 13MB.
I have approximately 11 GB of GPU memory to work with, yet this model gets exhausted. Where does all the extra needed memory come from? What am I missing? I know this post might be labeled as duplicate, but none of the others seems to catch the topic which I am asking about.
My model:
def network(self):
weights = RandomUniform(minval=-0.05, maxval=0.05, seed=None)
input_img = Input(shape=(self.img_rows, self.img_cols, self.channels))
conv1 = Conv2D(1024, (3,3), activation='tanh', kernel_initializer=weights,
padding='same', use_bias=True)(input_img)
conv2 = Conv2D(64, (3,3), activation='tanh', kernel_initializer=weights,
padding='same', use_bias=True)(conv1)
conv3 = Conv2D(64, (3,3), activation='tanh', kernel_initializer=weights,
padding='same', use_bias=True)(conv2)
conv4 = Conv2D(64, (3,3), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv3)
conv5 = Conv2D(64, (7,7), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv4)
conv6 = Conv2D(64, (5,5), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv5)
conv7 = Conv2D(32, (5,5), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv6)
conv8 = Conv2D(32, (3,3), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv7)
conv9 = Conv2D(16, (3,3), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv8)
decoded = Conv2D(1, (5,5), kernel_initializer=weights,
padding='same', activation='sigmoid', use_bias=True)(conv8)
return input_img, decoded
def compiler(self):
self.model.compile(optimizer='RMSprop', loss='mse')
self.model.summary()
I assume my model is silly in a lot of ways and that there are multiple things to improve (dropout, other filter sizes and numbers, optimizers etc.) and all suggestions are received gladly, but the actual question still remain. Why does this model consume so much memory? Is it due to the extremely high depth of conv1?
Model summary:
Using TensorFlow backend.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 1751, 480, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 1751, 480, 1024) 10240
_________________________________________________________________
conv2d_2 (Conv2D) (None, 1751, 480, 64) 589888
_________________________________________________________________
conv2d_3 (Conv2D) (None, 1751, 480, 64) 36928
_________________________________________________________________
conv2d_4 (Conv2D) (None, 1751, 480, 64) 36928
_________________________________________________________________
conv2d_5 (Conv2D) (None, 1751, 480, 64) 200768
_________________________________________________________________
conv2d_6 (Conv2D) (None, 1751, 480, 64) 102464
_________________________________________________________________
conv2d_7 (Conv2D) (None, 1751, 480, 32) 51232
_________________________________________________________________
conv2d_8 (Conv2D) (None, 1751, 480, 32) 9248
_________________________________________________________________
conv2d_10 (Conv2D) (None, 1751, 480, 1) 801
=================================================================
Total params: 1,038,497
Trainable params: 1,038,497
Non-trainable params: 0
_________________________________________________________________
You are correct, this is due to the number of filters in conv1. What you must compute is the memory required to store the activations:
As shown by your model.summary(), the output size of this layer is (None, 1751, 480, 1024). For a single image, this is a total of 1751*480*1024 pixels. As your image is likely in float32, each pixel takes 4 bytes to store. So the output of this layer requires 1751*480*1024*4 bytes, which is around 3.2 GB per image just for this layer.
If you were to change the number of filters to, say, 64, you would only need around 200 MB per image.
Either change the number of filters or change the batch size to 1.
Related
I am trying to tune my cnn model with tuner keras. After running the bayesian search, when I display model.summary() it is totally different from the the bayesian optimization parameters found by the algorithm.
I define my function model_builder as follow:
def model_builder2(hp):
model = Sequential()
#model.add(Input(shape=(50,50,3)))
for i in range(hp.Int('num_blocks', 1,5)):
hp_padding=hp.Choice('padding_'+ str(i), values=['valid', 'same'])
hp_filters=hp.Choice('filters_'+ str(i), values=[32, 64])
model.add(Conv2D(hp_filters, (3, 3), padding=hp_padding, activation='relu', kernel_initializer='he_uniform', input_shape=(50, 50, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(hp.Choice('dropout_'+ str(i), values=[0.2, 0.5])))
model.add(Flatten())
hp_units = hp.Int('units', min_value=25, max_value=512, step=25)
model.add(Dense(hp_units, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(2,activation="sigmoid"))
hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-5])
#hp_optimizer=hp.Choice('Optimizer', values=['Adam', 'SGD'])
#hp_optimizer=hp.Choice('Optimizer', values=['Adam'])
#model.compile(loss=keras.losses.binary_crossentropy, optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate), metrics=['accuracy'])
model.compile(optimizer=tf.optimizers.Adam(learning_rate=hp_learning_rate),loss=keras.losses.binary_crossentropy,metrics=['accuracy'])
return model
After running the algorith, I have as best hyperparameters:
Best Hyper-parameters
{'dropout_0': 0.5,
'filters_0': 32,
'learning_rate': 0.01,
'num_blocks': 5,
'padding_0': 'same',
'units': 250}
But, when I do model.summary(), I have:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 50, 50, 32) 896
max_pooling2d_1 (MaxPooling (None, 25, 25, 32) 0
2D)
dropout_1 (Dropout) (None, 25, 25, 32) 0
flatten_1 (Flatten) (None, 20000) 0
dense_2 (Dense) (None, 250) 5000250
dense_3 (Dense) (None, 2) 502
=================================================================
Total params: 5,001,648
Trainable params: 5,001,648
Non-trainable params: 0
It seems I don't have 5 blocks. Any idea ?
I am trying to create a CNN with tensorflow, my images are 64x64x1 images and I have an array of 3662 images which I am using for training. I have total 5 labels which I have one-hot encoded. I am getting this error everytime:
InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [3662,5] and labels shape [18310]
[[{{node loss_2/dense_5_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
my neural network structure is this:
def cnn_model():
model = models.Sequential()
# model.add(layers.Dense(128, activation='relu', ))
model.add(layers.Conv2D(128, (3, 3), activation='relu',input_shape=(64, 64, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu',padding = 'same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
print(model.summary())
return model
My model summary is this:
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_9 (Conv2D) (None, 62, 62, 128) 1280
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 31, 31, 128) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 31, 31, 64) 73792
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 15, 15, 64) 0
_________________________________________________________________
conv2d_11 (Conv2D) (None, 15, 15, 64) 36928
_________________________________________________________________
dense_4 (Dense) (None, 15, 15, 64) 4160
_________________________________________________________________
flatten_2 (Flatten) (None, 14400) 0
_________________________________________________________________
dense_5 (Dense) (None, 5) 72005
=================================================================
Total params: 188,165
Trainable params: 188,165
Non-trainable params: 0
my output array is of the shape (3662,5,1). I have seen other answers to same questions but I can't figure out the problem with mine. Where am I wrong?
Edit: My labels are stored in one hot encoded form using these:
df = pd.get_dummies(df)
diag = np.array(df)
diag = np.reshape(diag,(3662,5,1))
I have tried as numpy array and after converting to tensor(same for input as per documentation)
The problem lines within the choice of the loss function tf.keras.losses.SparseCategoricalCrossentropy(). According to what you are trying to achieve you should use tf.keras.losses.CategoricalCrossentropy(). Namely, the documentation of tf.keras.losses.SparseCategoricalCrossentropy() states:
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers.
On the other hand, the documentation of tf.keras.losses.CategoricalCrossentropy() states:
We expect labels to be provided in a one_hot representation.
And because your labels are encoded as one-hot, you should use tf.keras.losses.CategoricalCrossentropy().
I am trying to define a model that works for a sequence of images and tries to predict a sequence in turn. My problem is with the repeat vector, specifically is it the right use of it and secondly how to resolve the exception that I get.
input_frames=Input(shape=(None, 128, 64, 1))
x=ConvLSTM2D(filters=64, kernel_size=(5, 5), padding='same', return_sequences=True)(input_frames)
x=BatchNormalization()(x)
x=ConvLSTM2D(filters=64, kernel_size=(5, 5), padding='same', return_sequences=False)(x)
x=BatchNormalization()(x)
x=Conv2D(filters=1, kernel_size=(5, 5), activation='sigmoid',padding='same')(x)
x=RepeatVector(10)(x)
model=Model(inputs=input_frames,outputs=x)
Specifically, I am trying to forecast 10 frames in future. The above code throws the following exception:
assert_input_compatibility
str(K.ndim(x)))
ValueError: Input 0 is incompatible with layer repeat_vector_5: expected ndim=2, found ndim=4
From the doc of RepeatVector, it only accept 2D inputs, that's what the error message is telling you.
Following is a workaround using Lambda layer:
from keras.layers import Input, ConvLSTM2D, BatchNormalization, RepeatVector, Conv2D
from keras.models import Model
from keras.backend import expand_dims, repeat_elements
from keras.layers import Lambda
input_frames=Input(shape=(None, 128, 64, 1))
x=ConvLSTM2D(filters=64, kernel_size=(5, 5), padding='same', return_sequences=True)(input_frames)
x=BatchNormalization()(x)
x=ConvLSTM2D(filters=64, kernel_size=(5, 5), padding='same', return_sequences=False)(x)
x=BatchNormalization()(x)
x=Conv2D(filters=1, kernel_size=(5, 5), activation='sigmoid',padding='same')(x)
#x=RepeatVector(10)(x)
x=Lambda(lambda x: repeat_elements(expand_dims(x, axis=1), 10, 1))(x)
model=Model(inputs=input_frames,outputs=x)
model.summary()
"""
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_15 (InputLayer) (None, None, 128, 64, 1) 0
_________________________________________________________________
conv_lst_m2d_5 (ConvLSTM2D) (None, None, 128, 64, 64) 416256
_________________________________________________________________
batch_normalization_5 (Batch (None, None, 128, 64, 64) 256
_________________________________________________________________
conv_lst_m2d_6 (ConvLSTM2D) (None, 128, 64, 64) 819456
_________________________________________________________________
batch_normalization_6 (Batch (None, 128, 64, 64) 256
_________________________________________________________________
conv2d_3 (Conv2D) (None, 128, 64, 1) 1601
_________________________________________________________________
lambda_1 (Lambda) (None, 10, 128, 64, 1) 0
=================================================================
Total params: 1,237,825
Trainable params: 1,237,569
Non-trainable params: 256
_________________________________________________________________
"""
Note that I use a expand_dims here since from the doc of repeat_elements, it won't create new axis.
I have built a Keras ConvLSTM neural network, and I want to predict one frame ahead based on a sequence of 10-time steps:
model = Sequential()
model.add(ConvLSTM2D(filters=128, kernel_size=(3, 3),
input_shape=(None, img_size, img_size, Channels),
padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=64, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(ConvLSTM2D(filters=64, kernel_size=(3, 3),
padding='same', return_sequences=False))
model.add(BatchNormalization())
model.add(Conv2D(filters=1, kernel_size=(3, 3),
activation='sigmoid',
padding='same', data_format='channels_last', name='conv2d'))
model.compile(loss='binary_crossentropy', optimizer='adadelta')
Training:
data_train_x:(10, 10, 62, 62, 12)
data_train_y:(10, 1, 62, 62, 1)
model.fit(data_train_x, data_train_y, batch_size=10, epochs=1,
validation_split=0.05)
But I get the following error:
ValueError: Error when checking target: expected conv2d to have 4 dimensions, but got array with shape (10, 1, 62, 62, 1)
And this is the results of 'model.summary()':
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d_4 (ConvLSTM2D) (None, None, 62, 62, 128) 645632
_________________________________________________________________
batch_normalization_3 (Batch (None, None, 62, 62, 128) 512
_________________________________________________________________
conv_lst_m2d_5 (ConvLSTM2D) (None, None, 62, 62, 64) 442624
_________________________________________________________________
batch_normalization_4 (Batch (None, None, 62, 62, 64) 256
_________________________________________________________________
conv_lst_m2d_6 (ConvLSTM2D) (None, 62, 62, 64) 295168
_________________________________________________________________
batch_normalization_5 (Batch (None, 62, 62, 64) 256
_________________________________________________________________
conv2d (Conv2D) (None, 62, 62, 1) 577
=================================================================
Total params: 1,385,025
Trainable params: 1,384,513
Non-trainable params: 512
_________________________________________________________________
This model is a revised version of another model which was compiled without error, what is changed from the previous model is just the last two layers. Previously was like:
model.add(ConvLSTM2D(filters=64, kernel_size=(3, 3),
padding='same', return_sequences=True))
model.add(BatchNormalization())
model.add(Conv3D(filters=1, kernel_size=(3, 3, 3),
activation='sigmoid',
padding='same', data_format='channels_last', name='conv3d'))
I made this change because I want to get a 4-dimensional output of the form (samples, output_row, output_col, filters)
The error message is clear. The model expects the output rank to be four, but you are passing output of rank 5. Squeeze the second dimension of data_train_y before feeding it to the model.
data_train_y = tf.squeeze(data_train_y, axis=1)
I am following a Keras tutorial and want to shadow it in Pytorch, so am translating. I'm not strongly familiar with either and am coming unstuck on the input size parameter especially, but also the final layer - do I need another Linear layer? Can anyone translate the following to a Pytorch sequential definition?
visible = Input(shape=(64,64,1))
conv1 = Conv2D(32, kernel_size=4, activation='relu')(visible)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(16, kernel_size=4, activation='relu')(pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
hidden1 = Dense(10, activation='relu')(pool2)
output = Dense(1, activation='sigmoid')(hidden1)
model = Model(inputs=visible, outputs=output)
This is the output of the model:
Layer (type) Output Shape Param #
_________________________________________________________________
input_1 (InputLayer) (None, 64, 64, 1) 0
conv2d_1 (Conv2D) (None, 61, 61, 32) 544
max_pooling2d_1 (MaxPooling2 (None, 30, 30, 32) 0
conv2d_2 (Conv2D) (None, 27, 27, 16) 8208
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 16) 0
dense_1 (Dense) (None, 13, 13, 10) 170
dense_2 (Dense) (None, 13, 13, 1) 11
Total params: 8,933
Trainable params: 8,933
Non-trainable params: 0
What I have worked out lacks a specification for the shape of the input, and I am also a bit perplexed at the translation of stride in the specified Keras model as it uses stride 2 in the MaxPooling2D but doesn't specify this elsewhere - it is perhaps a toy example.
model = nn.Sequential(
nn.Conv2d(1, 32, 4),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(1, 16, 4),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Linear(10, 1),
nn.Sigmoid(),
)