How the number of parameters associated with BatchNormalization layer is 2048? - keras

I have the following code.
x = keras.layers.Input(batch_shape = (None, 4096))
hidden = keras.layers.Dense(512, activation = 'relu')(x)
hidden = keras.layers.BatchNormalization()(hidden)
hidden = keras.layers.Dropout(0.5)(hidden)
predictions = keras.layers.Dense(80, activation = 'sigmoid')(hidden)
mlp_model = keras.models.Model(input = [x], output = [predictions])
mlp_model.summary()
And this is the model summary:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_3 (InputLayer) (None, 4096) 0
____________________________________________________________________________________________________
dense_1 (Dense) (None, 512) 2097664 input_3[0][0]
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 512) 2048 dense_1[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (None, 512) 0 batchnormalization_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 80) 41040 dropout_1[0][0]
====================================================================================================
Total params: 2,140,752
Trainable params: 2,139,728
Non-trainable params: 1,024
____________________________________________________________________________________________________
The size of the input for the BatchNormalization (BN) layer is 512. According to Keras documentation, shape of the output for BN layer is same as input which is 512.
Then how the number of parameters associated with BN layer is 2048?

These 2048 parameters are in fact [gamma weights, beta weights, moving_mean(non-trainable), moving_variance(non-trainable)], each having 512 elements (the size of the input layer).

The batch normalization in Keras implements this paper.
As you can read there, in order to make the batch normalization work during training, they need to keep track of the distributions of each normalized dimensions. To do so, since you are in mode=0by default, they compute 4 parameters per feature on the previous layer. Those parameters are making sure that you properly propagate and backpropagate the information.
So 4*512 = 2048, this should answer your question.

Related

Keras-rl ValueError"Model has more than one output. DQN expects a model that has a single output"

Is there any way to get around this error? I have a model with a 15x15 input grid, which leads to two outputs. Each output has 15 possible values, which are x or y coordinates. I did this because it is significantly simpler than having 225 separate outputs for every location on the grid.
The problem is that when i try to train the model using this code:
def build_agent(model,actions)
policy = BoltzmannQPolicy()
memory = SequentialMemory(limit=100000, window_length=1)
dqn = DQNAgent(model=model, memory=memory,policy=policy,nb_actions=actions,nb_steps_warmup=100, target_model_update=1e-2)
return(dqn)
dqn = build_agent(model, np.array([15,15]))
dqn.compile(Adam(learning_rate = 0.01), metrics=['mae'])
dqn.fit(env, nb_steps=10000, action_repetition=1, visualize=False, verbose=1,nb_max_episode_steps=10000)
plt.show()
I get the error: "Model has more than one output. DQN expects a model that has a single output".
The model summary is below so you can see there are 2 output layers.
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, 1, 15, 15)] 0 []
conv2d_2 (Conv2D) (None, 12, 13, 13) 120 ['input_2[0][0]']
conv2d_3 (Conv2D) (None, 10, 11, 3) 354 ['conv2d_2[0][0]']
flatten_1 (Flatten) (None, 330) 0 ['conv2d_3[0][0]']
dropout_1 (Dropout) (None, 330) 0 ['flatten_1[0][0]']
dense_2 (Dense) (None, 15) 4965 ['dropout_1[0][0]']
dense_3 (Dense) (None, 15) 4965 ['dropout_1[0][0]']
==================================================================================================
Total params: 10,404
Trainable params: 10,404
Non-trainable params: 0
__________________________________________________________________________________________________
Standard Keras allows a model with multiple outputs using the functional api but from the errpr message i assume that feature is just not supported for Keras-rl? If thats true, is there any way to get around this issue?
The solution was that i had to just use one output of 225. This didn't work great, but it was the best i could find. Two different outputs will not work using keras-rl, so this was all i could think of. Another possibility would be using a different library such as stable baselines2, but that would be completely different to the already built code.

model.summary() and plot_model() showing nothing from the built model in tensorflow.keras

I am testing something which includes building a FCNN network Dynamically. Idea is to build Number of layers and it's neurons based on a given list and the dummy code is:
neurons = [10,20,30] # First Dense has 10 neuron, 2nd has 20 and third has 30
inputs = keras.Input(shape=(1024,))
x = Dense(10,activation='relu')(inputs)
for n in neurons:
x = Dense(n,activation='relu')(x)
out = Dense(1,activation='sigmoid')(x)
model = Model(inputs,out)
model.summary()
keras.utils.plot_model(model,'model.png')
for layer in model.layers:
print(layer.name)
To my surprise, it is showing nothing.I even compiled and ran the functions again and nothing came out.
The model.summary always shows number of trainable and non trainable params but not the model structure and layer names. Why is this happening? Or is this normal?
About model.summary(), don't mix tf 2.x and standalone keras at a time. If I ran you model in tf 2.x, I get the expected results.
from tensorflow.keras.layers import *
from tensorflow.keras import Model
from tensorflow import keras
# your code ...
model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 1024)] 0
_________________________________________________________________
dense (Dense) (None, 10) 10250
_________________________________________________________________
dense_1 (Dense) (None, 10) 110
_________________________________________________________________
dense_2 (Dense) (None, 20) 220
_________________________________________________________________
dense_3 (Dense) (None, 30) 630
_________________________________________________________________
dense_4 (Dense) (None, 1) 31
=================================================================
Total params: 11,241
Trainable params: 11,241
Non-trainable params: 0
_________________________________
About plotting the model, there is a couple of option that can be used while you plot your keras model. Here is one example:
keras.utils.plot_model(model, show_dtype=True,
show_layer_names=True, show_shapes=True,
to_file='model.png')

How to train a siamese neural network for image matching?

I need to identify if two fingerprints (from id card and sensor) match or not. Below some examples from my database (3000 pairs of images):
Example of matching images
Example of non-matching images
I am trying to train a siamese network which receives a pair of images and its output is [1, 0] if they don't match and [0, 1] if they match, then I created my model with Keras:
image_left = Input(shape=(200, 200, 1))
image_right = Input(shape=(200, 200, 1))
vector_left = conv_base(image_left)
vector_right = conv_base(image_right)
merged_features = concatenate([vector_left, vector_right], axis=-1)
fc1 = Dense(64, activation='relu')(merged_features)
fc1 = Dropout(0.2)(fc1)
# # fc2 = Dense(128, activation='relu')(fc1)
pred = Dense(2, activation='softmax')(fc1)
model = Model(inputs=[image_left, image_right], outputs=pred)
Where conv_base is a convolutional architecture. Actually, I have tried with ResNet, leNet, MobileNetV2 and NASNet from keras.applications, but they don't work.
conv_base = NASNetMobile(weights = None,
include_top=True,
classes=256)
My model summary is similar as shown below (depending on corresponding network used):
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) (None, 200, 200, 1) 0
__________________________________________________________________________________________________
input_3 (InputLayer) (None, 200, 200, 1) 0
__________________________________________________________________________________________________
NASNet (Model) (None, 256) 4539732 input_2[0][0]
input_3[0][0]
__________________________________________________________________________________________________
concatenate_5 (Concatenate) (None, 512) 0 NASNet[1][0]
NASNet[2][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 64) 32832 concatenate_5[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 64) 0 dense_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 2) 130 dropout_1[0][0]
==================================================================================================
Total params: 4,572,694
Trainable params: 4,535,956
Non-trainable params: 36,738
Additional to convolutional architecture changes, I've tried with using pre-trained weights, setting all layers as trainable, setting last convolutional layers as trainable, data augmentation, using categorical_crossentropy and contrastive_loss functions, changing learning rate, but they all have same behavior. It is, training and validation accuracy are always 0.5.
Does anybody have an idea about what I am missing/doing wrong?
Thank you.

Convert keras model in DL4J model

I have to save and load a keras model in java and then I thought I could use DL4J. The problem is that when I save my model it does not have the Embedding layer with his own weight.
I have the same problem re-loading the model in keras but in this case I can create the same architecture and load only the weight of my model.
In particolar I start from an architecture like this:
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 300, 300) 219184200
_________________________________________________________________
lstm_1 (LSTM) (None, 300, 256) 570368
_________________________________________________________________
dropout_1 (Dropout) (None, 300, 256) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 128) 197120
_________________________________________________________________
dropout_2 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 2) 258
=================================================================
And after save and load I get this (both in keras and in DL4J):
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, None, 300) 219184200
_________________________________________________________________
lstm_1 (LSTM) (None, None, 256) 570368
_________________________________________________________________
dropout_1 (Dropout) (None, None, 256) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 128) 197120
_________________________________________________________________
dropout_2 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 2) 258
=================================================================
There is a solution or a work around to have this in java?
1) Is it possible to save and load correctly the structure and the weight in keras?
2) is it possible to create a model of this type in java with DL4J or another library?
3) is it possible to implement the conversion word to Embedding in a function and then give to the neural network the input previously converted in Embedding?
4) Can i load the weights in the embedding layer in java with DL4J?
This is the code for my network:
sentence_indices = Input(shape=input_shape, dtype=np.int32)
emb_dim = 300 # embedding di 300 parole in italiano
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index, emb_dim)
embeddings = embedding_layer(sentence_indices)
X = LSTM(256, return_sequences=True)(embeddings)
X = Dropout(0.15)(X)
X = LSTM(128)(X)
X = Dropout(0.15)(X)
X = Dense(num_activation, activation='softmax')(X)
model = Model(sentence_indices, X)
sequentialModel = Sequential(model.layers)
Thanks in advance.
I found out that the diffence between the Keras neural network and the DL4J neural network was due to diffenent parsing of the word2Vec (or GloVe) file.
In particular loading word2Vec and then parsing to create 3 dictionary:
- word2Index
- index2Word
- word2EmbeddingVec
from gensim.models import Word2Vec
modelW2V = Word2Vec.load('C:/Users/Alessio/Desktop/emoji_ita/embedding/glove_WIKI') # glove model
I discover two different parsing (using the same code) produce different matching for couple "index - word" and "word - index". Saving the dictionary in a json file and then load data from it was a solution for me.
Hope this can help others too.
You can probably get this answered on the DL4J Gitter chat: https://gitter.im/deeplearning4j/deeplearning4j

Keras replacing input of network

I have a similar problem to Keras replacing input layer, however I need to remove also the next layer, and that will require different input shape.
Here is a simplification of what I'm trying to do:
a = Input(shape=(64,))
b = Dense(32)(a)
c = Dense(16)(b)
d = Dense(8)(c)
model = Model(inputs=a, outputs=d)
print(model.summary())
print('input shape = ' + str(model.input_shape))
model.layers.pop(0)
model.layers.pop(0)
print(model.summary())
print('input shape = ' + str(model.input_shape))
new_input = Input(shape=(32,))
new_output = model(new_input)
new_model = Model(new_input, new_output)
print(new_model.summary())
But the input shape of the model remains the same:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 64) 0
_________________________________________________________________
dense_1 (Dense) (None, 32) 2080
_________________________________________________________________
dense_2 (Dense) (None, 16) 528
_________________________________________________________________
dense_3 (Dense) (None, 8) 136
=================================================================
Total params: 2,744
Trainable params: 2,744
Non-trainable params: 0
_________________________________________________________________
None
input shape = (None, 64)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 16) 528
_________________________________________________________________
dense_3 (Dense) (None, 8) 136
=================================================================
Total params: 664
Trainable params: 664
Non-trainable params: 0
_________________________________________________________________
None
input shape = (None, 64)
And that prevents me from creating new model, so the code above fails with:
ValueError: Dimensions must be equal, but are 32 and 64 for 'model_1/dense_1/MatMul' (op: 'MatMul') with input shapes: [?,32], [64,32].
Any ideas how to do that?
It might not be possible to do in the way that you describe. The accepted answer on this post explains it a little.
how-to-change-input-shape-in-sequential-model-in-keras?
Their solution was to rebuild the layer with the correct input shape, then load the pre-trained weights for that specific layer.

Resources