How to Calculate Output from Conv2D and Conv2DTranspose - keras

Regarding a question about calculating parameter numbers, I have the follow-up questions.
Keras CNN model parameters calculation
Regarding the formula for total parameters, does it mean that, for each input channel, there are 64 sets of learnable weights (filters) but only 1 common bias applied to it, then one single output channel is generated in such a way that 64 filtered results are simply added up followed by adding the common bias?

Related

Tensorflow Keras: Problems to handle variable length input, using generator?

We want to train our model on varying input dimensions. Every input in a given batch and across batches has different dimensions.
We cannot resize our input (since we’ll lose our microscopic features). Now, since we cannot resize our input, converting them into batches of numpy array becomes impossible. In order to handle this now I have made the list for the input and each list of element contained (height, width, 1). Height is variable size and width is constant.
Sometime my input excessively large. In order to do that I have plan to use model.fit_generator(). In this, We find the max height and width of input in a batch and pad every other input with zeros so that every input in the batch has an equal dimension. Now we can easily convert it to a numpy array or a tensor and pass it to the fit_generator(). The model automatically learns to ignore the zeros and learns features from the intended portion from the padded input. This way we have a batch with equal input dimensions but every batch has a different shape (due to difference in max height and width of input across batches).
Now until here, I described the things what I have learned and what I have plan to do with variable input data. But I am stuck with the following confusions:
1- I have plan to use CNN first and then LSTM on that. I am using tensorflow keras. There, we have the facility of padding and masking . However, As for as I know that LSTM can work on masking and padding ignore 0-padded values. However, I am concerned about the CNN (does CNN ignores 0-padded values), because my padded input will first feed to CNN. I have seen some discussion in the following links:
How to apply masking layer to sequential CNN model in Keras?
https://github.com/keras-team/keras/issues/411
In these link, they mentioned that Unfortunately masking is not yet supported by the Keras Conv layers. However, now we can see alot of development and advancements specifically in the form of tensorflow Keras. So I am wondering that now tensorflow keras can support masking input?
2- To use the generator, we can use custom keras generator. For that I went through a vary good tutorial. I made the mind to use this. But I am wondering is there any advance built-in facility in tensorflow keras to use generator and save me to write custom keras generator?

Keras regression - Should my first/last layer have an activation function?

I keep seeing examples floating around the internet where the input and/or output layer have either no activation function, a linear activation function, or None. What I'm confused about is when to use one, and how to know if you should? I also am confused about what the number of nodes should be for the input layer.
Right now I have a regression problem, I'm trying to predict a real value based on an array of inputs (about 54). Should I be using relu in my activation function for the input layer? Should I have linear as my output activation? My data is linearly scaled from 0 to 1 for each feature independently as they're different units. I was also unsure of the number of nodes I should use for my input layer as I see some examples pick an arbitrary number not related to their input shape, and other examples saying to specifically set it to the number of inputs, or number of inputs plus one for a bias. But none of the examples so far have explained their reasoning behind their choices.
Since my model isn't performing very well, I thought asking what the architecture should be could help me fine tune it more.

Calculation of Keras layers output dimensions

I am currently trying to implement GoogLeNet architecture (InceptionV1) in Keras using theano backend, as I want to generate features for CUB dataset using GoogLeNet model.
I found an implementation in Keras here.
However, it is based on the earlier version of Keras and I had to make changes in the layers as per Keras version 2.
Now, the model is getting built correctly. However, the predict() function is failing with the error as
ValueError: CorrMM images and kernel must have the same stack size
So, I started looking at the original paper and correlating the layers mentioned in the paper with the implemented one.
So, here I found first layer to have output as expected as 112x112x64 with the input as 224x224x3.
However, when I tried to calculate the expected output dimensions as per the formula given in Stanford University tutorial page, it is different from the actual output which I received from the Keras code, though this is what is the expected output as per the GoogLeNet paper. i.e. as per the formula mentioned on the Stanford page Output height or length = ((Input height or length - filter size + 2 * Padding) / Stride) + 1
As per above equation, the output dimension comes in fraction which is not valid and to get the expected dimension as per the formula, input needs to be of shape 227x227x3. However, in Keras, with this input, output comes as 114x114x64.
Does Keras calculate the output dimensions in some different way or am I missing out on something?
Somehow I could make it work yesterday by removing few lines of code from the model which was making it to change the dimensions. (Possibly it was required by earlier version of Keras and Theano)
Also, contrary to the one mentioned in the paper, I changed patch size of MaxPooling2D() function from 3x3 to 2x2 which is the only way to achieve the desired output dimensions in GoogLeNet architecture. With input shape 224x224 and applying max pooling with patch size 2x2 and stride 2x2, its dimensions gets halved and we can get the desired output shape.
I am not sure why equation of output dimensions based on input, filter, padding and stride as parameters are not applicable here.

Keras LSTM: first argument

In Keras, if you want to add an LSTM layer with 10 units, you use model.add(LSTM(10)). I've heard that number 10 referred to as the number of hidden units here and as the number of output units (line 863 of the Keras code here).
My question is, are those two things the same? Is the dimensionality of the output the same as the number of hidden units? I've read a few tutorials (like this one and this one), but none of them state this explicitly.
The answers seems to refer to multi-layer perceptrons (MLP) in which the hidden layer can be of different size and often is. For LSTMs, the hidden dimension is the same as the output dimension by construction:
The h is the output for a given timestep and the cell state c is bound by the hidden size due to element wise multiplication. The addition of terms to compute the gates would require that both the input kernel W and the recurrent kernel U map to the same dimension. This is certainly the case for Keras LSTM as well and is why you only provide single units argument.
To get a good intuition for why this makes sense. Remember that the LSTM job is to encode a sequence into a vector (maybe a Gross oversimplification but its all we need). The size of that vector is specified by hidden_units, the output is:
seq vector RNN weights
(1 X input_dim) * (input_dim X hidden_units),
which has 1 X hidden_units (a row vector representing the encoding of your input sequence). And thus, the names in this case are used synonymously.
Of course RNNs require more than one multiplication and keras implements RNNs as a sequence of matrix-matrix multiplications instead vector-matrix shown above.
The number of hidden units is not the same as the number of output units.
The number 10 controls the dimension of the output hidden state (source code for the LSTM constructor method can be found here. 10 specifies the units argument). In one of the tutorial's you have linked to (colah's blog), the units argument would control the dimension of the vectors ht-1 , ht, and ht+1: RNN image.
If you want to control the number of LSTM blocks in your network, you need to specify this as an input into the LSTM layer. The input shape to the layer is (nb_samples, timesteps, input_dim) Keras documentation. timesteps controls how many LSTM blocks your network contains. Referring to the tutorial on colah's blog again, in RNN image, timesteps would control how many green blocks the network contains.

Input shape for Keras LSTM/GRU for floats

I'm sorry for asking that stupid thing. I can't apply answers from other questions to my task.
Currently I got well-known error:
expected lstm_input_1 to have 3 dimensions, but got array with shape (7491, 1025)
My data:
matrix - 1025 float numbers in row. 7491 rows
So how to make it 3d? Or am I trying to use wrong layer model?
You need to have an explicit time dimension and a batch dimension. You always have a batch dimension (1 if you are using only one batch) and for recurrent models you need a time dimension as well, as these are sequential models and they operate over time.
Reshape your data to (1,7491,1025) for 1 batch and a sequence of length 7491 with 1025 features per time-step.

Resources