Why return sequences in stacked RNNs? - keras

When stacking RNNs, it is mandatory to set return_sequences parameter as True in Keras.
For instance in Keras,
lstm1 = LSTM(1, return_sequences=True)(inputs1)
lstm2 = LSTM(1)(lstm1)
It is somewhat intuitive to preserve the dimensionality of input space for each stacked RNN layer, however, I am not convinced thoroughly.
Can someone (mathematically) explain the reason?
Thanks.

The input shape for recurrent layers is:
(number_of_sequences, time_steps, input_features).
This is absolutely required for recurrent layers because there can only be any recurrency if there are time steps.
Now, compare the "outputs" of the recurrent layers in each case:
with return_sequences=True - (number_of_sequences, time_steps, output_features)
with return_sequences=False - (number_of_sequences, output_features)
Without return_sequences=True, you eliminate the time steps, so, it cannot be fed into a recurrent layer, because there aren't enough dimensions and the most important one, the time_steps is not present.

Related

What is the difference between density of input layer and input_dim in Keras library?

I have a question about the terms of MLP in Keras.
what does the density of a layer mean?
is it the same as the number of neurons? if it is, so what's the role of input_dim?
I have never head of the "density" of a layer in the context of vanilla feed forward networks. I would assume it refers to the number of neurons, but really it depends on context.
Input layer with a certain dimension and the first hidden layer with input_dim argument are both equivalent ways to handle input in Keras.

Keras lstm and dense layer

How is dense layer changing the output coming from LSTM layer? How come that from 50 shaped output from previous layer i get output of size 1 from dense layer that is used for prediction?
Lets say i have this basic model:
model = Sequential()
model.add(LSTM(50,input_shape=(60,1)))
model.add(Dense(1, activation="softmax"))
Is the Dense layer taking the values coming from previous layer and assigning the probablity(using softmax function) of each of the 50 inputs and then taking it out as an output?
No, Dense layers do not work like that, the input has 50-dimensions, and the output will have dimensions equal to the number of neurons, one in this case. The output is a weighted linear combination of the input plus a bias.
Note that with the softmax activation, it makes no sense to use it with a one neuron layer, as the softmax is normalized, the only possible output will be constant 1.0. That's probably now what you want.

multi layer LSTM net with stateful=True

My question is does the this code make sense? And if this makes sense what should be the purpose?
model.add(LSTM(18, return_sequences=True,batch_input_shape=(batch_size,look_back,dim_x), stateful=True))
model.add(Dropout(0.3))
model.add(LSTM(50,return_sequences=False,stateful=False))
model.add(Dropout(0.3))
model.add(Dense(1, activation='linear'))
Because if my first LSTM layer returns my state from one batch to the next, why shouldn't do my second LSTM layer the same?
I'm having a hard time to understand the LSTM mechanics in Keras so I'm very thankful for any kind of help :)
And if you down vote this post could you tell me why in the commands? thanks.
Your program is a regression problem where your model consists of 2 lstm layers with 18 and 50 layers each and finally a dense layer to show the regression value.
LSTM requires a 3D input.Since the output of your first LSTM layer is going to the input for the second LSTM layer.The input of the Second LSTM layer should also be in 3D. so we set the retrun sequence as true in 1st as it will return a 3D output which can then be used as an input for the second LSTM.
Your second LSTMs value does not return a sequence because after the second LSTM you have a dense layer which does not need a 3D value as input.
[update]
In keras by default LSTM states are reset after each batch of training data,so if you don't want the states to be reset after each batch you can set the stateful=True. If LSTM is made stateful final state of a batch will be used as an initial state for the next batch.
You can later reset the states by calling reset_states()

Keras: the difference between LSTM dropout and LSTM recurrent dropout

From the Keras documentation:
dropout: Float between 0 and 1. Fraction of the units to drop for the
linear transformation of the inputs.
recurrent_dropout: Float between 0 and 1. Fraction of the units to
drop for the linear transformation of the recurrent state.
Can anyone point to where on the image below each dropout happens?
I suggest taking a look at (the first part of) this paper. Regular dropout is applied on the inputs and/or the outputs, meaning the vertical arrows from x_t and to h_t. In your case, if you add it as an argument to your layer, it will mask the inputs; you can add a Dropout layer after your recurrent layer to mask the outputs as well. Recurrent dropout masks (or "drops") the connections between the recurrent units; that would be the horizontal arrows in your picture.
This picture is taken from the paper above. On the left, regular dropout on inputs and outputs. On the right, regular dropout PLUS recurrent dropout:
(Ignore the colour of the arrows in this case; in the paper they are making a further point of keeping the same dropout masks at each timestep)
Above answer highlights one of the recurrent dropout methods but that one is NOT used by tensorflow and keras. Tensorflow Doc.
Keras/TF refers a recurrent method proposed by Semeniuta et al. Also, check below the image comparing different recurrent dropout methods. The Gal and Ghahramani method which is mentioned in above answer is at second position and Semeniuta method is the right most.

LSTM with variable sequences & return full sequences

How can I set up a keras model such that the final LSTM layer outputs a prediction for each time step while having variable sequence lengths as input?
I'd then like to provide labels for each of the timesteps after a dense layer with linear activation.
When I try to add a reshape or a dense layer to the LSTM model that is returning the full sequence and has a masking layer to take care of variable sequence lengths, it says:
The reshape and the dense layers do not support masking.
Would this be possible to do?
You can use the TimeDistributed layer wrapper for this. This applies the layer you want to each timestep. In your case, you could also just use TimeDistributedDense.

Resources