What does this custom metric function in keras mean? - keras

I was looking at this capsnet code on github
And I can't find what does the line no. 116 mean?
metrics={'capsnet': 'accuracy'})
Can someone please explain this line? As I can't find any such reference in the keras documentation
Thanks in advance!

Document
From Keras model functional API: https://keras.io/models/model/
See Method > compile > metrics
metrics: List of metrics to be evaluated by the model
during training and testing.
Typically you will use metrics=['accuracy'].
To specify different metrics for different outputs of a
multi-output model, you could also pass a dictionary,
such as metrics={'output_a': 'accuracy'}.
(source: https://github.com/keras-team/keras/blob/master/keras/models.py#L786-L791)
What Does it Do?
The line outputs the layer called capsnet (which can be found within the same file) with accuracy metric. The rest is just the same as the document you provided.
.... (The above omitted)
____________________________________________________________________________________________________
mask_1 (Mask) (None, 160) 0 digitcaps[0][0]
input_2[0][0]
____________________________________________________________________________________________________
capsnet (Length) (None, 10) 0 digitcaps[0][0]
____________________________________________________________________________________________________
decoder (Sequential) (None, 28, 28, 1) 1411344 mask_1[0][0]
====================================================================================================
Total params: 8,215,568
Trainable params: 8,215,568
Non-trainable params: 0
____________________________________________________________________________________________________

Related

Why are these LSTM parameter counts different?

I'm building an LSTM, for a report, and would like to summarize things about it. However, I've seen two different ways to build an LSTM in Keras that yield two different values for the number of parameters.
I'd like to understand why the parameters differ in this way.
This question correctly shows why this code
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
model = Sequential()
model.add(LSTM(256, input_dim=4096, input_length=16))
model.summary()
results in 4457472 parameters.
From what I can tell, the following two LSTMs should be the same
m2 = Sequential()
m2.add(LSTM(1, input_dim=5, input_length=1))
m2.summary()
m3 = Sequential()
m3.add(LSTM((1),batch_input_shape=(None,5,1)))
m3.summary()
However, the m2 results in 28 parameters, but the m3 results in 12 parameters. Why?
How is 12 being calculated for a 1 unit LSTM with a 5-dim input?
Included the warning message. Hope it is helpful.
Output
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 256) 4457472
=================================================================
Total params: 4,457,472
Trainable params: 4,457,472
Non-trainable params: 0
_________________________________________________________________
Warning (from warnings module):
File "difparam.py", line 11
m2.add(LSTM(1, input_dim=5, input_length=1))
UserWarning: The `input_dim` and `input_length` arguments in recurrent layers are deprecated. Use `input_shape` instead.
Warning (from warnings module):
File "difparam.py", line 11
m2.add(LSTM(1, input_dim=5, input_length=1))
UserWarning: Update your `LSTM` call to the Keras 2 API: `LSTM(1, input_shape=(1, 5))`
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_2 (LSTM) (None, 1) 28
=================================================================
Total params: 28
Trainable params: 28
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_3 (LSTM) (None, 1) 12
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
m2 was built based on info from the Stack Overflow question, and m3 was built based on this video from YouTube.
Because the correct values are input_dim = 1 and input_length = 5.
It's even written in the warning you received, where the input shape for m2 is different from the one used in m3:
UserWarning: Update your LSTM call to the Keras 2 API: LSTM(1, input_shape=(1, 5))
It's highly recommended that you use the suggestion in the warning.

How to use a GlobalAveragePool layer as the output of a model

I am receiving this error when I am trying to build a model using the output from another model.
Output tensors to a Model must be the output of a Keras `Layer` (thus holding past layer metadata). Found: <keras.layers.pooling.GlobalAveragePooling2D object at 0x7Somthing or another
What I am trying to do is to use a finetuned model as the base model and retain the whole thing using a different method (an SCNN instead of a CNN).
This is how I am declaring the model and where it goes wrong.
pair_base_model = Model(inputs = base_model.input, outputs=base_model.get_layer('glb_avg_pool'))
And this is how I load my previous model
base_model = load_model('../input/base-model-reid/0.ckpt')
print(base_model.summary())
Which gives me this
......... Whole bunch of other stuff
block14_sepconv2_act (Activatio (None, 7, 7, 2048) 0 block14_sepconv2_bn[0][0]
__________________________________________________________________________________________________
glb_avg_pool (GlobalAveragePool (None, 2048) 0 block14_sepconv2_act[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 1502) 3077598 glb_avg_pool[0][0]
==================================================================================================
Total params: 23,939,078
Trainable params: 10,980,862
Non-trainable params: 12,958,216
__________________________________________________________________________________________________
None
I'm stupoid I was working on this a little too late and found something simple that I messed up.
I needed to use the .output for the layer in order to get the output I cannot just call .get_layer(xxxxxxx) as this won't work
pair_base_model = Model(inputs = base_model.input, outputs=base_model.get_layer('glb_avg_pool').output)
This is the working code.
Thanks for anyone who looked at this.

What does the "[0][0]" of the layers connected to in keras model.summary mean?

As is depicted in the following table, what does the [0][0] of the input_1[0][0] mean?
__________________________________________________
Layer (type) Output Shape Param # Connected to
===================================================================
input_1 (InputLayer) (None, 1) 0
___________________________________________________________________
dropout_1 (Dropout) (None, 1) 0 input_1[0][0]
___________________________________________________________________
dropout_2 (Dropout) (None, 1) 0 input_1[0][0]
===================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
___________________________________________________________________
That's a good question, however to answer it we must dive into the internals of how layers are connected to each other in Keras. So let's start:
0) What is a tensor?
A tensor is a data structure that represent data and they are basically n-dimensional arrays. All the data and information passed between layers must be tensors.
1) What is a layer?
In the simplest sense, a layer is a computation unit where it gets one or more input tensors, then applies a set of operations (e.g. multiplication, addition, etc.) on them, and gives the result as one or more output tensors. When you apply a layer on some input tensors, under the hood a Node is created.
2) So what is a Node?
To represent the connectivity between two layers, Keras internally uses an object of Node class. When a layer is applied on some new input, a node is created and is added to the _inbound_nodes property of that layer. Further, when the output of a layer is used by another layer, a new node is created and is added to _outbound_nodes property of that layer. So essentially, this data structure lets Keras to find out how layers are connected to each other using the following properties of an object of type Node:
input_tensors: it is a list containing the input tensors of the node.
output_tensors: it is a list containing the output tensors of the node.
inbound_layers: it is a list which contains the layers where the input_tensors come from.
outbound_layers: the consumer layers, i.e. the layers that takes input_tensors and turns them into output_tensors.
node_indices: it is a list of integers which contains the node indices of input_tensors (would explain this more in the answer to the following question).
tensor_indices: it is a list of integers which contains the indices of input_tensors within their corresponding inbound layer (would explain this more in the answer to the following question).
3) Fine! Now tell me what those values in "Connected to" column of model summary mean?
To better understand this, let's create a simple model. First, let's create two input layers:
inp1 = Input((10,))
inp2 = Input((20,))
Next, we create a Lambda layer that has two output tensors, the first output is the input tensor divided by 2, and the second output is the input tensor multiplied by 2:
lmb_layer = Lambda(lambda x: [x/2, x*2])
Let's apply this lambda layer on inp1 and inp2:
a1, b1 = lmb_layer(inp1)
a2, b2 = lmb_layer(inp2)
After doing this, two nodes have been created and added to _inbound_nodes property of lmb_layer:
>>> lmb_layer._inbound_nodes
[<keras.engine.base_layer.Node at 0x7efb9a105588>,
<keras.engine.base_layer.Node at 0x7efb9a105f60>]
The first node corresponds to the connectivity of the lmb_layer with the first input layer (inp1) and the second node corresponds to the connectivity of this layer with the second input layer (inp2). Further, each of those nodes have two output tensors (corresponding to a1,b1 and a2,b2):
>>> lmb_layer._inbound_nodes[0].output_tensors
[<tf.Tensor 'lambda_1/truediv:0' shape=(?, 10) dtype=float32>,
<tf.Tensor 'lambda_1/mul:0' shape=(?, 10) dtype=float32>]
>>> lmb_layer._inbound_nodes[1].output_tensors
[<tf.Tensor 'lambda_1_1/truediv:0' shape=(?, 20) dtype=float32>,
<tf.Tensor 'lambda_1_1/mul:0' shape=(?, 20) dtype=float32>]
Now, let's create and apply four different Dense layers and apply them on the four output tensors we obtained:
d1 = Dense(10)(a1)
d2 = Dense(20)(b1)
d3 = Dense(30)(a2)
d4 = Dense(40)(b2)
model = Model(inputs=[inp1, inp2], outputs=[d1, d2, d3, d4])
model.summary()
The model summary would look like this:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 10) 0
__________________________________________________________________________________________________
input_2 (InputLayer) (None, 20) 0
__________________________________________________________________________________________________
lambda_1 (Lambda) multiple 0 input_1[0][0]
input_2[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 10) 110 lambda_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 20) 220 lambda_1[0][1]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 30) 630 lambda_1[1][0]
__________________________________________________________________________________________________
dense_4 (Dense) (None, 40) 840 lambda_1[1][1]
==================================================================================================
Total params: 1,800
Trainable params: 1,800
Non-trainable params: 0
__________________________________________________________________________________________________
In the "Connected to" column for a layer the values have a format of: layer_name[x][y]. The layer_name corresponds to the layer where the input tensors of the this layer comes from. For example, all the Dense layers are connected to lmb_layer and therefore get their inputs from this layer. The [x][y] corresponds to the node index (i.e. node_indices) and tensor index (i.e. tensor_indices) of the the input tensors, respectively. For example:
The dense_1 layer is applied on a1 which is the first (i.e. index: 0) output tensor of the first (i.e. index: 0) inbound node of lmb_layer, therefore the connectivity is displayed as: lambda_1[0][0].
The dense_2 layer is applied on b1 which is the second (i.e. index: 1) output tensor of the first (i.e. index: 0) inbound node of lmb_layer, therefore the connectivity is displayed as: lambda_1[0][1].
The dense_3 layer is applied on a2 which is the first (i.e. index: 0) output tensor of the second (i.e. index: 1) inbound node of lmb_layer, therefore the connectivity is displayed as: lambda_1[1][0].
The dense_4 layer is applied on b2 which is the second (i.e. index: 1) output tensor of the first (i.e. index: 1) inbound node of lmb_layer, therefore the connectivity is displayed as: lambda_1[1][1].
That's it! If you want to know more how summary method works, you can take a look at the print_summary function. And if you want to find out how the connections are printed, you can take a look at the print_layer_summary_with_connections function.

In Keras, how to get the layer name associated with a "Model" object contained in my model?

I built a Sequential model with the VGG16 network at the initial base, for example:
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
# do not include the top, fully-connected Dense layers
include_top=False,
input_shape=(150, 150, 3))
from keras import models
from keras import layers
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
# the 3 corresponds to the three output classes
model.add(layers.Dense(3, activation='sigmoid'))
My model looks like this:
model.summary()
Layer (type) Output Shape Param #
=================================================================
vgg16 (Model) (None, 4, 4, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 8192) 0
_________________________________________________________________
dense_7 (Dense) (None, 256) 2097408
_________________________________________________________________
dense_8 (Dense) (None, 3) 771
=================================================================
Total params: 16,812,867
Trainable params: 16,812,867
Non-trainable params: 0
_________________________________________________________________
Now, I want to get the layer names associated with the vgg16 Model portion of my network. I.e. something like:
layer_name = 'block3_conv1'
filter_index = 0
layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:, :, :, filter_index])
However, since the vgg16 convolutional is shown as a Model and it's layers are not being exposed, I get the error:
ValueError: No such layer: block3_conv1
How do I do this?
The key is to first do .get_layer on the Model object, then do another .get_layer on that specifying the specific vgg16 layer, THEN do .output:
layer_output = model.get_layer('vgg16').get_layer('block3_conv1').output
To get the name of the layer from the VGG16 instance use the following code.
for layer in conv_base.layers:
print(layer.name)
the name should be the same inside your model. to show this you could do the following.
print([layer.name for layer in model.get_layer('vgg16').layers])
like Ryan showed us. to call the vgg16 layer you must call it from the model first using the get_layer method.
One can simply store the name of layers in the list for further usage
layer_names=[layer.name for layer in base_model.layers]
This worked for me :)
for idx in range(len(model.layers)):
print(model.get_layer(index = idx).name)
Use the layer's summary:
model.get_layer('vgg16').summary()

Passing initial_state to Bidirectional RNN layer in Keras

I'm trying to implement encoder-decoder type network in Keras, with Bidirectional GRUs.
The following code seems to be working
src_input = Input(shape=(5,))
ref_input = Input(shape=(5,))
src_embedding = Embedding(output_dim=300, input_dim=vocab_size)(src_input)
ref_embedding = Embedding(output_dim=300, input_dim=vocab_size)(ref_input)
encoder = Bidirectional(
GRU(2, return_sequences=True, return_state=True)
)(src_embedding)
decoder = GRU(2, return_sequences=True)(ref_embedding, initial_state=encoder[1])
But when I change the decode to use Bidirectional wrapper, it stops showing encoder and src_input layers in the model.summary(). The new decoder looks like:
decoder = Bidirectional(
GRU(2, return_sequences=True)
)(ref_embedding, initial_state=encoder[1:])
The output of model.summary() with the Bidirectional decoder.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 5) 0
_________________________________________________________________
embedding_2 (Embedding) (None, 5, 300) 6610500
_________________________________________________________________
bidirectional_2 (Bidirection (None, 5, 4) 3636
=================================================================
Total params: 6,614,136
Trainable params: 6,614,136
Non-trainable params: 0
_________________________________________________________________
Question: Am I missing something when I pass initial_state in Bidirectional decoder? How can I fix this? Is there any other way to make this work?
It's a bug. The RNN layer implements __call__ so that tensors in initial_state can be collected into a model instance. However, the Bidirectional wrapper did not implement it. So topological information about the initial_state tensors is missing and some strange bugs happen.
I wasn't aware of it when I was implementing initial_state for Bidirectional. It should be fixed now, after this PR. You can install the latest master branch on GitHub to fix it.

Resources