Dropout in custom LSTM in pytorch - pytorch

I have built a custom peephole lstm, and I want to imitate the dropout part in the already built in nn.lstm. So, how to add the dropout like what this intialization of this lstm, nn.LSTM(input_size, hidden_size, dropout=0.3), do? I have an idea of how to do it, which is by just applying a normal dropout just before returning the output, like this:
# init method
self.dropout = nn.Dropout(0.3)
# forward method
hidden_seq = self.dropout(hidden_seq)
return hidden_seq, (h_t, c_t)
I just want to make sure that this is the right way. If not what to do?

nn.LSTM(... dropout=0.3) applies a Dropout layer on the outputs of each LSTM layer except the last layer. You can have multiple stacked layers by passing parameter num_layers > 1. If you want to add a dropout to the final layer (or if LSTM has only one layer), you have to add it as you are doing now.
If you want to replicate what LSTM dropout does (which is only in case of multiple layers), you can stack LSTM layers manually and add a dropout layer in between.

Related

Find top layers for a fine-tuned model

I want to use a fine-tuned model, based on MobileNetV2 (pre-trained on Keras). But I need to add top layers in order to classify my images into 2 classes. I would like to know how to choose the "architecture" of layers that I need ?
In some examples, people use SVM Classifer or series of Dense layer with a specific number of neurons as top layers.
The following code (by default), it works :
self.base_model = base_model
x = self.base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(2, activation='softmax')(x)
Is there any methodology to find the best solution ?
I'll recommend either Dropout or BatchNormalization. Dense can be easily overfitted because it has too many parameters in a layer. Both layers can regularize the model well. GlobalAveragePooling2D is a good choice because it also acts like regularizer itself.
I'll also suggest that, for the binary classification problem, you can change the output layer to be Dense(1, activation='sigmoid') to predict only P(class1), where you can calculate P(class2) by 1-P(class1). The loss you should use in this case will be binary_crossentropy instead of categorical_crossentropy.

Keras lstm and dense layer

How is dense layer changing the output coming from LSTM layer? How come that from 50 shaped output from previous layer i get output of size 1 from dense layer that is used for prediction?
Lets say i have this basic model:
model = Sequential()
model.add(LSTM(50,input_shape=(60,1)))
model.add(Dense(1, activation="softmax"))
Is the Dense layer taking the values coming from previous layer and assigning the probablity(using softmax function) of each of the 50 inputs and then taking it out as an output?
No, Dense layers do not work like that, the input has 50-dimensions, and the output will have dimensions equal to the number of neurons, one in this case. The output is a weighted linear combination of the input plus a bias.
Note that with the softmax activation, it makes no sense to use it with a one neuron layer, as the softmax is normalized, the only possible output will be constant 1.0. That's probably now what you want.

How to use MC Dropout on a variational dropout LSTM layer on keras?

I'm currently trying to set up a (LSTM) recurrent neural network with Keras (tensorflow backend).
I would like to use variational dropout with MC Dropout on it.
I believe that variational dropout is already implemented with the option "recurrent_dropout" of the LSTM layer but I don't find any way to set a "training" flag to put on to true like a classical Dropout layer.
This is quite easy in Keras, first you need to define a function that takes both model input and the learning_phase:
import keras.backend as K
f = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[-1].output])
For a Functional API model with multiple inputs/outputs you can use:
f = K.function([model.inputs, K.learning_phase()],
[model.outputs])
Then you can call this function like f([input, 1]) and this will tell Keras to enable the learning phase during this call, executing Dropout. Then you can call this function multiple times and combine the predictions to estimate uncertainty.
The source code for "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning" (2015) is located at https://github.com/yaringal/DropoutUncertaintyExps/blob/master/net/net.py. They also use Keras and the code is quite easy to understand. The Dropout layers are used without the Sequential api in order to pass the training parameter. This is a different approach to the suggestion from Matias:
inter = Dropout(dropout_rate)(inter, training=True)

Keras Model - Functional API - adding layers to existing model

I am trying to learn to use the Keras Model API for modifying a trained model for the purpose of fine-tuning it on the go:
A very basic model:
inputs = Input((x_train.shape[1:]))
x = BatchNormalization(axis=1)(inputs)
x = Flatten()(x)
outputs = Dense(10, activation='softmax')(x)
model1 = Model(inputs, outputs)
model1.compile(optimizer=Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['categorical_accuracy'])
The architecture of it is
InputLayer -> BatchNormalization -> Flatten -> Dense
After I do some training batches on it I want to add some extra Dense layer between the Flatten one and the outputs:
x = Dense(32,activation='relu')(model1.layers[-2].output)
outputs = model1.layers[-1](x)
However, when I run it, i get this:
ValueError: Input 0 is incompatible with layer dense_1: expected axis -1 of input shape to have value 784 but got shape (None, 32)
Could someone please explain what is going on and how/if can I add layers to an already trained model?
Thank you
A Dense layer is made strictly for a certain input dimension. That dimension cannot be changed after you define it (it would need a different number of weights).
So, if you really want to add layers before a dense layer that is already used, you need to make sure that the outputs of the last new layer is the same shape as the flatten's output. (It says you need 784, so your new last dense layer needs 784 units).
Another approach
Since you're adding intermediate layers, it's pointless to keep the last layer: it was trained specifically for a certain input, if you change the input, then you need to train it again.
Well... since you need to train it again anyway, why keep it? Just create a new one that will be suited to the shapes of your new previous layers.

LSTM with variable sequences & return full sequences

How can I set up a keras model such that the final LSTM layer outputs a prediction for each time step while having variable sequence lengths as input?
I'd then like to provide labels for each of the timesteps after a dense layer with linear activation.
When I try to add a reshape or a dense layer to the LSTM model that is returning the full sequence and has a masking layer to take care of variable sequence lengths, it says:
The reshape and the dense layers do not support masking.
Would this be possible to do?
You can use the TimeDistributed layer wrapper for this. This applies the layer you want to each timestep. In your case, you could also just use TimeDistributedDense.

Resources