Efficientnet normalization layer - keras

Question 1
First 2 layers of efficientnet are Rescaling and Normalization Layers. I was reading how the normalization layer works and before normalization, adapt() function should be called. I looked efficientnet source code and there was no adapt() function here is the link : https://github.com/keras-team/keras/blob/07e13740fd181fc3ddec7d9a594d8a08666645f6/keras/applications/efficientnet.py#L326
Do I have to call adapt() function before train my dataset?

Related

Dropout, Regularization and batch normalization

I have a couple of questions about LSTM layers in Keras library
In LSTM layer we have two kind of dropouts: dropout and recurrent-dropout. According to my understanding the first one will drop randomly some features from input (set them to zero) and the second one will do it on hidden units (features of h_t). Since we have different time steps in a LSTM network, is dropping applied seperately to each time step or only one time and will be the same for every step?
My second question is about regularizers in LSTM layer in keras. I know that for example the kernel regularizer will regularize weights corresponding to inputs. but we have different weights for inputs.
For example input gate, update gate and output gates use different weights for input (aslo different weights for h_(t-1)) . So will they be regularized in the same time ? What if I want to regularize only one of them? For example if I want to regularize only the weights used in the formula for input gate.
The last question is about activation functions in keras. In LSTM layer I have activation and recurrent activations. activation is tanh by default and I know in LSTM architecture tanh is used two times (for h_t and candidate of memory cell) and sigmoid is used 3 times (for gates). So does that mean if I change tanh (in LSTM layer in keras) to another function say Relu then it will change for both of h_t and memory cell candidate?
It would be perfect if any of those question could be answered. Thank you very much for your time.

Keras Embedding layer activation function?

In the fully connected hidden layer of Keras embedding, what is the activation function leveraged? I'm either misunderstanding the concept of this class or unable to find documentation. I understand that it is encoding from word to real-valued vector of dimension d via answers like the below on stackoverflow:
Embedding layers in Keras are trained just like any other layer in your network architecture: they are tuned to minimize the loss function by using the selected optimization method. The major difference with other layers, is that their output is not a mathematical function of the input. Instead the input to the layer is used to index a table with the embedding vectors [1]. However, the underlying automatic differentiation engine has no problem to optimize these vectors to minimize the loss function...
In my network, I have a word embedding portion that is then linked to a larger network that is predicting a binary outcome (e.g., click yes/no). I understand that this Keras embedding is not operating like word2vec because here my embedding is being trained and updated against my end cross-entropy function. But, there is no mention of how the embedding fully-connected layer is activated. Thanks!

Initializing the weights of a model from the output of another model in keras for transfer learning

I trained a LeNet architecture on a first dataset. I want to train a VGG architecture on an other dataset by initializing the weights of VGG with weights obtained from LeNet.
All initialization functions in keras are predefined and I do not find how to customize them. For example :
keras.initializers.Zeros()
Any idea how I can set the weights?
https://keras.io/layers/about-keras-layers/
According to the Keras documentation above:
layer.set_weights(weights) sets the weights of the layer from a list of Numpy arrays
layer.get_weights() returns the weights of the layer as a list of Numpy arrays
So, you can do this as follows:
model = Sequential()
model.add(Dense(32))
... building the model's layers ...
# access any nth layer by calling model.layers[n]
model.layers[0].set_weights( your_weights_here )
Of course, you'll need to make sure you are setting the weights of each layer to the appropriate shape they should be.

How to use MC Dropout on a variational dropout LSTM layer on keras?

I'm currently trying to set up a (LSTM) recurrent neural network with Keras (tensorflow backend).
I would like to use variational dropout with MC Dropout on it.
I believe that variational dropout is already implemented with the option "recurrent_dropout" of the LSTM layer but I don't find any way to set a "training" flag to put on to true like a classical Dropout layer.
This is quite easy in Keras, first you need to define a function that takes both model input and the learning_phase:
import keras.backend as K
f = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[-1].output])
For a Functional API model with multiple inputs/outputs you can use:
f = K.function([model.inputs, K.learning_phase()],
[model.outputs])
Then you can call this function like f([input, 1]) and this will tell Keras to enable the learning phase during this call, executing Dropout. Then you can call this function multiple times and combine the predictions to estimate uncertainty.
The source code for "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning" (2015) is located at https://github.com/yaringal/DropoutUncertaintyExps/blob/master/net/net.py. They also use Keras and the code is quite easy to understand. The Dropout layers are used without the Sequential api in order to pass the training parameter. This is a different approach to the suggestion from Matias:
inter = Dropout(dropout_rate)(inter, training=True)

Zero gradients for CuDNNLSTM according to Keras TensorBoard callback

I would like to visualize gradients of a seq2seq model using Keras Tensorboard callback. If I'm using a regular LSTM cell in my encoder and decoder, I get nice non-zero gradients:
However if I change the rnn cell to CuDNNLSTM some gradients turn to zero, which seem to be incorrect:
The both models seem to train correctly.
So, what's wrong with visualisations of CuDNNLSTM gradients? Is there a bug in Keras Tensorboard callback?
Code that I am running is a slightly modified Keras lstm_seq2seq example: https://gist.github.com/nicolas-ivanov/1818d6502d5f1496e5fbe14889eddca1

Resources