multi layer LSTM net with stateful=True

multi layer LSTM net with stateful=True - keras

My question is does the this code make sense? And if this makes sense what should be the purpose?
model.add(LSTM(18, return_sequences=True,batch_input_shape=(batch_size,look_back,dim_x), stateful=True))
model.add(Dropout(0.3))
model.add(LSTM(50,return_sequences=False,stateful=False))
model.add(Dropout(0.3))
model.add(Dense(1, activation='linear'))
Because if my first LSTM layer returns my state from one batch to the next, why shouldn't do my second LSTM layer the same?
I'm having a hard time to understand the LSTM mechanics in Keras so I'm very thankful for any kind of help :)
And if you down vote this post could you tell me why in the commands? thanks.

Your program is a regression problem where your model consists of 2 lstm layers with 18 and 50 layers each and finally a dense layer to show the regression value.
LSTM requires a 3D input.Since the output of your first LSTM layer is going to the input for the second LSTM layer.The input of the Second LSTM layer should also be in 3D. so we set the retrun sequence as true in 1st as it will return a 3D output which can then be used as an input for the second LSTM.
Your second LSTMs value does not return a sequence because after the second LSTM you have a dense layer which does not need a 3D value as input.
[update]
In keras by default LSTM states are reset after each batch of training data,so if you don't want the states to be reset after each batch you can set the stateful=True. If LSTM is made stateful final state of a batch will be used as an initial state for the next batch.
You can later reset the states by calling reset_states()

Related

Dropout, Regularization and batch normalization

I have a couple of questions about LSTM layers in Keras library
In LSTM layer we have two kind of dropouts: dropout and recurrent-dropout. According to my understanding the first one will drop randomly some features from input (set them to zero) and the second one will do it on hidden units (features of h_t). Since we have different time steps in a LSTM network, is dropping applied seperately to each time step or only one time and will be the same for every step?
My second question is about regularizers in LSTM layer in keras. I know that for example the kernel regularizer will regularize weights corresponding to inputs. but we have different weights for inputs.
For example input gate, update gate and output gates use different weights for input (aslo different weights for h_(t-1)) . So will they be regularized in the same time ? What if I want to regularize only one of them? For example if I want to regularize only the weights used in the formula for input gate.
The last question is about activation functions in keras. In LSTM layer I have activation and recurrent activations. activation is tanh by default and I know in LSTM architecture tanh is used two times (for h_t and candidate of memory cell) and sigmoid is used 3 times (for gates). So does that mean if I change tanh (in LSTM layer in keras) to another function say Relu then it will change for both of h_t and memory cell candidate?
It would be perfect if any of those question could be answered. Thank you very much for your time.

Keras lstm and dense layer

How is dense layer changing the output coming from LSTM layer? How come that from 50 shaped output from previous layer i get output of size 1 from dense layer that is used for prediction?
Lets say i have this basic model:
model = Sequential()
model.add(LSTM(50,input_shape=(60,1)))
model.add(Dense(1, activation="softmax"))
Is the Dense layer taking the values coming from previous layer and assigning the probablity(using softmax function) of each of the 50 inputs and then taking it out as an output?

No, Dense layers do not work like that, the input has 50-dimensions, and the output will have dimensions equal to the number of neurons, one in this case. The output is a weighted linear combination of the input plus a bias.
Note that with the softmax activation, it makes no sense to use it with a one neuron layer, as the softmax is normalized, the only possible output will be constant 1.0. That's probably now what you want.

I need to understand this LSTM and Masking layers result

I'm new at keras lstm could you please explain to me this model.summary()
in rasa core training
![model after training][1]
Also, what is the Masking layer doing and what does the value -1 in it mean?

A Masking layer is meant to "ignore steps" in sequences.
Your LSTM is working with sequences of 5 steps and 42 features per step.
If all features in a step have the same value defined in Masking (-1 in the example), that step will be ignored during training.
The idea is to simulate variable length sequences.

Not sure exactly, what exactly you don't understand but model.summary()
prints a summary representation of your model. (keras.io)
It lists all layers used in the given model with its respective size.
This particular model obviously starts with a masking layer for input sequences (I guess because of padding) and is followed by the simplest LSTM model possible.

Inner workings of Keras LSTM

I am working on a multi-class classification task: the goal is to identify what is the correct language of origin of a certain surname. For this, I am using a Keras LSTM.
So far, I have only worked with PyTorch and I am very surprised by the "black box" character of Keras. For this classification task, my understanding is that I need to retrieve the output of the last time step for a given input sequence in the LSTM and then apply the softmax on it to get the probability distribution over all classes.
Interestingly, without me explicitly defining to do so, the LSTM seems to automatically do the right thing and chooses the last time step's output and not e.g. the hidden state to apply the softmax on (good training & validation results so far). How is that possible? Does the choice of the appropriate loss function categorical_crossentropy indicate to the model that is should use the last time step's output to do the classification?
Code:
model = Sequential()
model.add(Dense(100, input_shape=(max_len, len(alphabet)), kernel_regularizer=regularizers.l2(0.00001)))
model.add(Dropout(0.85))
model.add(LSTM(100, input_shape=(100,)))
model.add(Dropout(0.85))
model.add(Dense(num_output_classes, activation='softmax'))
adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, decay=1e-6)
model.compile(loss='categorical_crossentropy',
optimizer=adam,
metrics=['accuracy'])
history = model.fit(train_data, train_labels,
epochs=5000,
batch_size=num_train_examples,
validation_data = (valid_data, valid_labels))

No, returning the last time step's output is just what every Keras RNN layer does by default. See the documentation for return_sequences, which causes it to return every time step's output instead (which is necessary for stacking RNN layers). There's no automatic intuition based on what kinds of layers you're hooking together, you just got what you wanted by default, presumably because the designers figured that to be the most common case.

LSTM with variable sequences & return full sequences

How can I set up a keras model such that the final LSTM layer outputs a prediction for each time step while having variable sequence lengths as input?
I'd then like to provide labels for each of the timesteps after a dense layer with linear activation.
When I try to add a reshape or a dense layer to the LSTM model that is returning the full sequence and has a masking layer to take care of variable sequence lengths, it says:
The reshape and the dense layers do not support masking.
Would this be possible to do?

You can use the TimeDistributed layer wrapper for this. This applies the layer you want to each timestep. In your case, you could also just use TimeDistributedDense.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string