Does the GaussianDropout Layer in Keras retain probability like the Dropout Layer? - keras

I was wondering, if the GaussianDropout Layer in Keras retains probability like the Dropout Layer.
The Dropout Layer is implemented as an Inverted Dropout which retains probability.
If you aren't aware of the problem you may have a look at the discussion and specifically at the linxihui's answer.
The crucial point which makes the Dropout Layer retaining the probability is the call of K.dropout, which isn't called by a GaussianDropout Layer.
Is there any reason why GaussianDropout Layer does not retain probability?
Or is it retaining but in another way being unseen?
Similar - referring to the Dropout Layer: Is the Keras implementation of dropout correct?

So Gaussian dropout doesn't need to retain probability as its origin comes from applying Central Limit Theorem to inverted dropout. The details might be found here in 2nd and 3rd paragraph of Multiplicative Gaussian Noise chapter (chapter 10, p. 1951).

Related

Keras constraint for whole layer unit norm

https://keras.io/constraints/
It seems the built-in constraints in Keras will constrain each node in the layer individually. What I need is to constrain the weights in a dense layer to be non negative (which I see in the docs) but also for the whole layer to have unit norm. So that’s the sum of all the weights in the layer should be 1. Is there a way to do this that I’m not seeing in the docs?

Few questions about Keras documentation

In Keras documentation named activations.md, it says "Activations can either be used through an Activation layer, or through the activation argument supported by all forward layers.". Then what is the meaning of forward layers? I think some layers don't have an activation parameter.(ex. Dropout layer)
And "Activations that are more complex than a simple TensorFlow/Theano/CNTK function (eg. learnable activations, which maintain a state) are available as Advanced Activation layers, and can be found in the module keras.layers.advanced_activations. These include PReLU and LeakyReLU.". Then what is the meaning of state in this case?
I am not sure there is a strict definition of "forward layers" in this context, but basically what it means is that the "classic", keras-built-in types of layers comprising one or more sets of weights used to transform an input matrix into an output one have a activation argument. Typically, Dense layers have one, as well as the various kinds of RNN and CNN layers.
It would not make sense for Dropout layers to have an activation function : they simply add a mechanism triggered at training to (hopefully) improve convergence rate and decrease overfitting chances.
As for the idea of "maintaining a state", it refers to activation functions that would not behave independently on each and every fed-in sample, but would instead retain some learnable information (the so-called state). Typically, for a LeakyReLU activation, you could adjust the leak parameter through training (and it would, in the documentation's terminology, be referred to as a state of this activation function).

Where to implement Layer normalization?

I am trying to implement Layer normalization on my LSTM model but I am unsure of how many Layer norms I need in my model and where to exactly place them
def build_model():
model = Sequential()
layers = [100, 200, 2]
model.add(Bidirectional(LSTM(
layers[0],
input_shape=(timestep, feature),
dropout=0.4,
recurrent_dropout=0.4,
return_sequences=True)))
model.add(LayerNormalization())
model.add(Bidirectional(LSTM(
layers[1],
input_shape=(timestep, feature),
dropout=0.4,
recurrent_dropout=0.4,
return_sequences=False)))
model.add(LayerNormalization())
model.add(Dense(
layers[2]))
Normalization layers usually apply their normalization effect to the previous layer, so it should be put in front of the layer that you want normalized.
Usually all layers are normalized, except the output layer, so the configuration you are showing in your question already does this, so it can be considered to be good practice.
In general you do not have to normalize every layer, and it is a bit of experimentation (trial and error) about which layers to normalize.

Keras Dropout Layer Model Predict

The dropout layer is only supposed to be used during the training of the model, not during testing.
If I have a dropout layer in my Keras sequential model, do I need to do something to remove or silence it before I do model.predict()?
No, you don't need to silence it or remove it. Keras automatically
takes care of it.
It is clearly mentioned in the documentation. A Keras model has two modes:
training
testing
Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time.
Note: Also, Batch Normalization is a much-preferred technique for regularization, in my opinion, as compared to Dropout. Consider using it.

Keras: the difference between LSTM dropout and LSTM recurrent dropout

From the Keras documentation:
dropout: Float between 0 and 1. Fraction of the units to drop for the
linear transformation of the inputs.
recurrent_dropout: Float between 0 and 1. Fraction of the units to
drop for the linear transformation of the recurrent state.
Can anyone point to where on the image below each dropout happens?
I suggest taking a look at (the first part of) this paper. Regular dropout is applied on the inputs and/or the outputs, meaning the vertical arrows from x_t and to h_t. In your case, if you add it as an argument to your layer, it will mask the inputs; you can add a Dropout layer after your recurrent layer to mask the outputs as well. Recurrent dropout masks (or "drops") the connections between the recurrent units; that would be the horizontal arrows in your picture.
This picture is taken from the paper above. On the left, regular dropout on inputs and outputs. On the right, regular dropout PLUS recurrent dropout:
(Ignore the colour of the arrows in this case; in the paper they are making a further point of keeping the same dropout masks at each timestep)
Above answer highlights one of the recurrent dropout methods but that one is NOT used by tensorflow and keras. Tensorflow Doc.
Keras/TF refers a recurrent method proposed by Semeniuta et al. Also, check below the image comparing different recurrent dropout methods. The Gal and Ghahramani method which is mentioned in above answer is at second position and Semeniuta method is the right most.

Resources