When building Sequential model, I notice there is a difference between adding relu layer and LeakyReLU layer.
test = Sequential()
test.add(Dense(1024, activation="relu"))
test.add(LeakyReLU(0.2))
Why cant we add layer with activation = "LeakyReLU" ? (LeakyReLU is not a string which keras can work with)
When adding relu layer, we set the number of units (1024 in my example)
Why can't we do the same for LeakyReLU ?
I was sure that the different between relu and LeakyReLU is the method behavior, but it seems more than that.
We could specify the activation function in the dense layer itself, by using aliases like activation='relu', which would use the default keras parameters for relu. There is no such aliases available in keras, for LeakyRelu activation function. We have to use tf.keras.layers.LeakyRelu or tf.nn.leaky_relu.
We cannot set number of units in Relu layer, it just takes the previous output tensor and applies the relu activation function on it. You have specified the number of units for the Dense layer not the relu layer. When we specify Dense(1024, activation="relu") we multiply the inputs with weights, add biases and apply relu function on the output (all of this is mentioned on a single line). From the method mentioned on step 1, this process is done in 2 stages firstly to multiply weights, add biases and then to apply the LeakyRelu activation function (mentioned in 2 lines).
import tensorflow as tf
test = Sequential()
test.add(Dense(1024, input_dim=784, activation="relu", name="First"))
test.add(Dense(512, activation=tf.keras.layers.LeakyReLU(alpha=0.01), name="middle"))
test.add(Dense(1, activation='sigmoid', name="Last"))
test.compile(loss='binary_crossentropy', optimizer="adam")
print(test.summary())
ouput:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
First (Dense) (None, 1024) 803840
_________________________________________________________________
middle (Dense) (None, 512) 524800
_________________________________________________________________
Last (Dense) (None, 1) 513
=================================================================
Related
I trained a VGG16 model on a labeled image dataset using the categorical crossentropy. I removed the fully connected layer and replaced it with new layers, as follows:
vgg16_model = VGG16(weights="imagenet", include_top=False, input_shape=(224, 224, 3))
model = Sequential()
model.add(vgg16_model)
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(70, activation='softmax'))
Then I trained the full model on my dataset. I want to use it now as a feature extraction model by extracting image features using any of the intermediate layers that belong to vgg16_model.
After training and saving the model, I can only access the layers that I added Dense and Dropout using pop() function to remove them and only keep the trained feature extractor model (vgg16).
i = 0
while i < 4:
model.pop()
This keeps only VGG16 model. However, the layers inside are not accessible, I tried:
new_model = Model(model.input,model.layers[-1].output)
But I get this error:
ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name='input_9'), name='input_9', description="created by layer 'input_9'") at layer "block1_conv1". The following previous layers were accessed without issue: []
How can I modify my model to consider early k layers at a given time, then use the model for prediction?
Defining the keras model in the way you did, unfortunately has many complications when we tried to features from intermediate layer. I've raised a ticket, see here.
As you want to extract image features using any of the intermediate layers that belong to vgg16_model, you can try the following approach.
# [good practice]: check first to know name and shape
# for layer in model.layers:
# print(layer.name, layer.output_shape)
# vgg16 (None, 7, 7, 512)
# flatten (None, 25088)
# dense (None, 256)
# dropout (None, 256)
# dense_1 (None, 70)
# get the trained model first
trained_vgg16 = keras.Model(
inputs=model.get_layer(name="vgg16").inputs,
outputs=model.get_layer(name="vgg16").outputs,
)
x = tf.ones((1, 224, 224, 3))
y = trained_vgg16(x)
y.shape
TensorShape([1, 7, 7, 512])
Next, use this trained_vgg16 model to build the target model. For example,
# extract only 1 intermediate layer
feature_extractor_block3_pool = keras.Model(
inputs=trained_vgg16.inputs,
outputs=trained_vgg16.get_layer(name="block3_pool").output,
)
# or, 2 based on purpose.
feature_extractor_block3_pool_block4_conv3 = keras.Model(
inputs=trained_vgg16.inputs,
outputs=[
trained_vgg16.get_layer(name="block3_pool").output,
trained_vgg16.get_layer(name="block4_conv3").output,
],
)
# or, all
feature_extractor = keras.Model(
inputs=trained_vgg16.inputs,
outputs=[layer.output for layer in trained_vgg16.layers],
)
I have been trying to compute number of parameters in LSTM cell in Keras. I created two models one with LSTM and other with CuDNNLSTM.
Partial summary of models are as
CuDNNLSTM Model:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, None, 300) 192000
_________________________________________________________________
bidirectional (Bidirectional (None, None, 600) 1444800
LSTM model
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, None, 300) 192000
_________________________________________________________________
bidirectional (Bidirectional (None, None, 600) 1442400
Number of parameters in LSTM is following the formula for lstm parameter computation available all over the internet. However, CuDNNLSTM has 2400 extra parameters.
What is the cause of these extra parameters?
code
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
from tensorflow.compat.v1.keras.models import Sequential
from tensorflow.compat.v1.keras.layers import CuDNNLSTM, Bidirectional, Embedding, LSTM
model = Sequential()
model.add(Embedding(640, 300))
model.add(Bidirectional(<LSTM type>(300, return_sequences=True)))
LSTM parameters can be grouped in 3 categories: input weight matrices (W), recurrent weight matrices (R), biases (b). Part of the LSTM cell's computation is W*x + b_i + R*h + b_r where b_i are input biases and b_r are recurrent biases.
If you let b = b_i + b_r, you could rewrite the above expression as W*x + R*h + b. In doing so, you've eliminated the need to keep two separate bias vectors (b_i and b_r) and instead, you only need to store one vector (b).
cuDNN sticks with the original mathematical formulation and stores b_i and b_r separately. Keras does not; it only stores b. That's why cuDNN's LSTM has more parameters than Keras.
I am trying to predict a time series with LSTM and am writing my code in Python by using Keras.
I have 30 features as input (continuous value) and a binary output.
I would like to use the 20 previous timesteps (t-20, t-19, .. , t-1) of each input feature in order to predict the output of next timestep (t+1).
My batch size is fixed at 52. What does this exactly mean?
I don't understand how to define the shape of the input layer.
The stacked LSTM example in the Keras documentation says that the last dimension of the 3D tensor will be 'data_dim'.
Is it input dimension or output dimension?
If this is output dimension, then I can't use more than one input feature as in my case the input_shape will be (batch_size=52,time_step=20,data_dim=1).
Also, in case data_dim is input shape, then I have tried to define a four layers-LSTM and the model shape results to be like this.
Layer (type) Output Shape Param #
================================================================= input_2 (InputLayer) (52, 20, 30) 0
_________________________________________________________________ lstm_3 (LSTM) (52, 20, 128) 81408
_________________________________________________________________ lstm_4 (LSTM) (52, 128) 131584
_________________________________________________________________ dense_2 (Dense) (52, 1) 129
================================================================= Total params: 213,121 Trainable params: 213,121 Non-trainable params: 0
Does this architecture make sense? Am I making some obvious mistakes?
My snippet of code is the one below:
input_layer=Input(batch_shape=(batch_size,input_timesteps,input_dims))
lstm1=LSTM(num_neurons,activation = 'relu',dropout=0.0,stateful=False,return_sequences=True)(input_layer)
lstm2=LSTM(num_neurons,activation = 'relu',dropout=0.0,stateful=False,return_sequences=False)(lstm1)
output_layer=Dense(1, activation='sigmoid')(lstm2)
model=Model(inputs=input_layer,outputs=output_layer)
I am getting very poor results and thus trying to debug each step.
If you want to use deep learning techniques you should try to overfit first and then reduce the complexity till you reach a break even point in terms of both neural complexity, training error and test error.
You are actually using a larger feature space in the hidden layer, are you sure your data are able to fit this?
Do you have enough rows to let the model learn this complex representation?
Otherwise I would suggest you something like this, in order to extrapolate the most important dimensions:
num_neurons1 = int(input_dims/2)
num_neurons2 = int(input_dims/4)
input_layer=Input(batch_shape=(batch_size, input_timesteps, input_dims))
lstm1=LSTM(num_neurons, activation = 'relu', dropout=0.0, stateful=False, return_sequences=True, kernel_initializer="he_normal")(input_layer)
lstm2=LSTM(num_neurons2, activation = 'relu', dropout=0.0, stateful=False, return_sequences=False, kernel_initializer="he_normal")(lstm1)
output_layer=Dense(1, activation='sigmoid')(lstm2)
model=Model(inputs=input_layer,outputs=output_layer)
Also, you are using relu as activation function.
Does it fit your data? Would be better you have only positive data after rescaling & normalization.
In case it does fit, you can also use a proper kernel initialization.
To better understand the problem, you could also post the optimizer parameters and the behaviour while training during epochs.
I built a Sequential model with the VGG16 network at the initial base, for example:
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
# do not include the top, fully-connected Dense layers
include_top=False,
input_shape=(150, 150, 3))
from keras import models
from keras import layers
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
# the 3 corresponds to the three output classes
model.add(layers.Dense(3, activation='sigmoid'))
My model looks like this:
model.summary()
Layer (type) Output Shape Param #
=================================================================
vgg16 (Model) (None, 4, 4, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 8192) 0
_________________________________________________________________
dense_7 (Dense) (None, 256) 2097408
_________________________________________________________________
dense_8 (Dense) (None, 3) 771
=================================================================
Total params: 16,812,867
Trainable params: 16,812,867
Non-trainable params: 0
_________________________________________________________________
Now, I want to get the layer names associated with the vgg16 Model portion of my network. I.e. something like:
layer_name = 'block3_conv1'
filter_index = 0
layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:, :, :, filter_index])
However, since the vgg16 convolutional is shown as a Model and it's layers are not being exposed, I get the error:
ValueError: No such layer: block3_conv1
How do I do this?
The key is to first do .get_layer on the Model object, then do another .get_layer on that specifying the specific vgg16 layer, THEN do .output:
layer_output = model.get_layer('vgg16').get_layer('block3_conv1').output
To get the name of the layer from the VGG16 instance use the following code.
for layer in conv_base.layers:
print(layer.name)
the name should be the same inside your model. to show this you could do the following.
print([layer.name for layer in model.get_layer('vgg16').layers])
like Ryan showed us. to call the vgg16 layer you must call it from the model first using the get_layer method.
One can simply store the name of layers in the list for further usage
layer_names=[layer.name for layer in base_model.layers]
This worked for me :)
for idx in range(len(model.layers)):
print(model.get_layer(index = idx).name)
Use the layer's summary:
model.get_layer('vgg16').summary()
I have created the following SimpleRNN using Keras:
X = X.reshape((X.shape[0], X.shape[1], 1))
tr_X, ts_X, tr_y, ts_y = train_test_split(X, y, train_size=.8)
batch_size = 1000
print('RNN model...')
model = Sequential()
model.add(SimpleRNN(64, activation='relu', batch_input_shape=(batch_size, X.shape[1], 1)))
model.add(Dense(1, activation='relu'))
print('Training...')
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print (model.summary())
print ('\n')
model.fit(tr_X, tr_y,
batch_size=batch_size, epochs=1,
shuffle=True, validation_data=(ts_X, ts_y))
For the model summary, I get the following:
Layer (type) Output Shape Param #
=================================================================
simple_rnn_1 (SimpleRNN) (1000, 64) 4224
_________________________________________________________________
dense_1 (Dense) (1000, 1) 65
=================================================================
Total params: 4,289
Trainable params: 4,289
Non-trainable params: 0
_________________________________________________________________
Given that I have a dataset of 10,000 samples and 64 features. My goal is to generate a classification model by training it using this dataset (class labels are binary 0 and 1). Now, I am trying to understand what is going on here. As seen in 'Output Shape' column, the simple_rnn_1 has (1000, 64). I interpret it as 1000 rows (which is the batch) and 64 features. Assuming the code above is logically correct, my questions is:
How does RNN handle this matrix (i.e., (1000,64))? Does it input
each column something like this figure?
Should SimpleRNN() units always be equal to the number of features?
Thank you
In the code, you defined batch_input_shape to be with shape: (batch_size, X.shape[1], 1)
which means that you will insert to the RNN, batch_size examples, each example contains X.shape[1] time-stamps (number of pink boxes in your image) and each time-stamp is shape 1 (scalar).
So yes, input shape of (1000,64,1) will be exactly like you said - each column will be input to the RNN.
No! units will be your output dim. Usually more units means more complex network (just like in regular neural network) -> more parameters to learn.
Units will be the shape of the RNN's internal state.
(So, in your example, if you declare units=2000 your output will be (1000,2000).)