Keras lstm and dense layer - keras

How is dense layer changing the output coming from LSTM layer? How come that from 50 shaped output from previous layer i get output of size 1 from dense layer that is used for prediction?
Lets say i have this basic model:
model = Sequential()
model.add(LSTM(50,input_shape=(60,1)))
model.add(Dense(1, activation="softmax"))
Is the Dense layer taking the values coming from previous layer and assigning the probablity(using softmax function) of each of the 50 inputs and then taking it out as an output?

No, Dense layers do not work like that, the input has 50-dimensions, and the output will have dimensions equal to the number of neurons, one in this case. The output is a weighted linear combination of the input plus a bias.
Note that with the softmax activation, it makes no sense to use it with a one neuron layer, as the softmax is normalized, the only possible output will be constant 1.0. That's probably now what you want.

Related

Dropout, Regularization and batch normalization

I have a couple of questions about LSTM layers in Keras library
In LSTM layer we have two kind of dropouts: dropout and recurrent-dropout. According to my understanding the first one will drop randomly some features from input (set them to zero) and the second one will do it on hidden units (features of h_t). Since we have different time steps in a LSTM network, is dropping applied seperately to each time step or only one time and will be the same for every step?
My second question is about regularizers in LSTM layer in keras. I know that for example the kernel regularizer will regularize weights corresponding to inputs. but we have different weights for inputs.
For example input gate, update gate and output gates use different weights for input (aslo different weights for h_(t-1)) . So will they be regularized in the same time ? What if I want to regularize only one of them? For example if I want to regularize only the weights used in the formula for input gate.
The last question is about activation functions in keras. In LSTM layer I have activation and recurrent activations. activation is tanh by default and I know in LSTM architecture tanh is used two times (for h_t and candidate of memory cell) and sigmoid is used 3 times (for gates). So does that mean if I change tanh (in LSTM layer in keras) to another function say Relu then it will change for both of h_t and memory cell candidate?
It would be perfect if any of those question could be answered. Thank you very much for your time.

Can I use a 3D input on a Keras Dense Layer?

As an exercise I need to use only dense layers to perform text classifications. I want to leverage words embeddings, the issue is that the dataset then is 3D (samples,words of sentence,embedding dimension). Can I input a 3D dataset into a dense layer?
Thanks
As stated in the keras documentation you can use 3D (or higher rank) data as input for a Dense layer but the input gets flattened first:
Note: if the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with kernel.
This means that if your input has shape (batch_size, sequence_length, dim), then the dense layer will first flatten your data to shape (batch_size * sequence_length, dim) and then apply a dense layer as usual. The output will have shape (batch_size, sequence_length, hidden_units). This is actually the same as applying a Conv1D layer with kernel size 1, and it might be more explicit to use a Conv1D layer instead of a Dense layer.

Keras Dense function

Can anyone explain me this:
model.add(Dense(units=64, activation='relu', input_dim=100))
Is it dense adding layers in the model and units in the number of neurons in the particular layer and what is reason behind the activation parameter?
Yes the through Dense you add an regular densely-connected layer to the network. And as you said units specifies the number of neurons in this layer. Maybe see https://keras.io/ for more information.
The activation parameter defines which activation function is used for the units in this layer. This function takes the sum of all inputs to the neuron as input and computes through that the output for the unit (which then goes into the next unit/layer). For 'relu' it would be basically f(x)=max(x,0) where x is the sum of all inputs and f(x) the output.

multi layer LSTM net with stateful=True

My question is does the this code make sense? And if this makes sense what should be the purpose?
model.add(LSTM(18, return_sequences=True,batch_input_shape=(batch_size,look_back,dim_x), stateful=True))
model.add(Dropout(0.3))
model.add(LSTM(50,return_sequences=False,stateful=False))
model.add(Dropout(0.3))
model.add(Dense(1, activation='linear'))
Because if my first LSTM layer returns my state from one batch to the next, why shouldn't do my second LSTM layer the same?
I'm having a hard time to understand the LSTM mechanics in Keras so I'm very thankful for any kind of help :)
And if you down vote this post could you tell me why in the commands? thanks.
Your program is a regression problem where your model consists of 2 lstm layers with 18 and 50 layers each and finally a dense layer to show the regression value.
LSTM requires a 3D input.Since the output of your first LSTM layer is going to the input for the second LSTM layer.The input of the Second LSTM layer should also be in 3D. so we set the retrun sequence as true in 1st as it will return a 3D output which can then be used as an input for the second LSTM.
Your second LSTMs value does not return a sequence because after the second LSTM you have a dense layer which does not need a 3D value as input.
[update]
In keras by default LSTM states are reset after each batch of training data,so if you don't want the states to be reset after each batch you can set the stateful=True. If LSTM is made stateful final state of a batch will be used as an initial state for the next batch.
You can later reset the states by calling reset_states()

LSTM with variable sequences & return full sequences

How can I set up a keras model such that the final LSTM layer outputs a prediction for each time step while having variable sequence lengths as input?
I'd then like to provide labels for each of the timesteps after a dense layer with linear activation.
When I try to add a reshape or a dense layer to the LSTM model that is returning the full sequence and has a masking layer to take care of variable sequence lengths, it says:
The reshape and the dense layers do not support masking.
Would this be possible to do?
You can use the TimeDistributed layer wrapper for this. This applies the layer you want to each timestep. In your case, you could also just use TimeDistributedDense.

Resources