Can anyone explain me this:
model.add(Dense(units=64, activation='relu', input_dim=100))
Is it dense adding layers in the model and units in the number of neurons in the particular layer and what is reason behind the activation parameter?

Yes the through Dense you add an regular densely-connected layer to the network. And as you said units specifies the number of neurons in this layer. Maybe see for more information.
The activation parameter defines which activation function is used for the units in this layer. This function takes the sum of all inputs to the neuron as input and computes through that the output for the unit (which then goes into the next unit/layer). For 'relu' it would be basically f(x)=max(x,0) where x is the sum of all inputs and f(x) the output.


Pytorch how use a linear activation function

In Keras, I can create any network layer with a linear activation function as follows (for example, a fully-connected layer is taken):
model.add(keras.layers.Dense(outs, input_shape=(160,), activation='linear'))
But I can't find the linear activation function in the PyTorch documentation. ReLU is not suitable, because there are negative values in my sample. How do I create a layer with a linear activation function in PyTorch?
If you take a look at the Keras documentation, you will see tf.keras.layers.Dense's activation='linear' corresponds to the a(x) = x function. Which means no non-linearity.
So in PyTorch, you just define the linear function without adding any activation layer:
torch.nn.Linear(160, outs)
activation='linear' is equivavlent to no activation at all.
As can be seen here, it is also called "passthrough", meaning the it does nothing.
So in pytorch you can simply not apply any activation at all, to be in parity.
However, as already told by #Minsky, hidden layer without real activation, i.e. some non-linear activation is useless. It is like changing the weights which is anyway done during the network taining.
As already answered you don't need a linear activation layer in pytorch. But if you need to include it, you can write a custom one, that passes the output as follows.
class linear(torch.nn.Module):
# a linear activation function based on y=x
def forward(self, output):return output
The you can call it like any other activation function.
linear()(torch.tensor([1,2,3])) == nn.ReLU()(torch.tensor([1,2,3]))

What is the difference between density of input layer and input_dim in Keras library?

I have a question about the terms of MLP in Keras.
what does the density of a layer mean?
is it the same as the number of neurons? if it is, so what's the role of input_dim?
I have never head of the "density" of a layer in the context of vanilla feed forward networks. I would assume it refers to the number of neurons, but really it depends on context.
Input layer with a certain dimension and the first hidden layer with input_dim argument are both equivalent ways to handle input in Keras.

Dropout, Regularization and batch normalization

I have a couple of questions about LSTM layers in Keras library
In LSTM layer we have two kind of dropouts: dropout and recurrent-dropout. According to my understanding the first one will drop randomly some features from input (set them to zero) and the second one will do it on hidden units (features of h_t). Since we have different time steps in a LSTM network, is dropping applied seperately to each time step or only one time and will be the same for every step?
My second question is about regularizers in LSTM layer in keras. I know that for example the kernel regularizer will regularize weights corresponding to inputs. but we have different weights for inputs.
For example input gate, update gate and output gates use different weights for input (aslo different weights for h_(t-1)) . So will they be regularized in the same time ? What if I want to regularize only one of them? For example if I want to regularize only the weights used in the formula for input gate.
The last question is about activation functions in keras. In LSTM layer I have activation and recurrent activations. activation is tanh by default and I know in LSTM architecture tanh is used two times (for h_t and candidate of memory cell) and sigmoid is used 3 times (for gates). So does that mean if I change tanh (in LSTM layer in keras) to another function say Relu then it will change for both of h_t and memory cell candidate?
It would be perfect if any of those question could be answered. Thank you very much for your time.

Keras lstm and dense layer

How is dense layer changing the output coming from LSTM layer? How come that from 50 shaped output from previous layer i get output of size 1 from dense layer that is used for prediction?
Lets say i have this basic model:
model = Sequential()
model.add(Dense(1, activation="softmax"))
Is the Dense layer taking the values coming from previous layer and assigning the probablity(using softmax function) of each of the 50 inputs and then taking it out as an output?
No, Dense layers do not work like that, the input has 50-dimensions, and the output will have dimensions equal to the number of neurons, one in this case. The output is a weighted linear combination of the input plus a bias.
Note that with the softmax activation, it makes no sense to use it with a one neuron layer, as the softmax is normalized, the only possible output will be constant 1.0. That's probably now what you want.

LSTM with variable sequences & return full sequences

How can I set up a keras model such that the final LSTM layer outputs a prediction for each time step while having variable sequence lengths as input?
I'd then like to provide labels for each of the timesteps after a dense layer with linear activation.
When I try to add a reshape or a dense layer to the LSTM model that is returning the full sequence and has a masking layer to take care of variable sequence lengths, it says:
The reshape and the dense layers do not support masking.
Would this be possible to do?
You can use the TimeDistributed layer wrapper for this. This applies the layer you want to each timestep. In your case, you could also just use TimeDistributedDense.
