Input shape for Keras LSTM/GRU for floats - keras

I'm sorry for asking that stupid thing. I can't apply answers from other questions to my task.
Currently I got well-known error:
expected lstm_input_1 to have 3 dimensions, but got array with shape (7491, 1025)
My data:
matrix - 1025 float numbers in row. 7491 rows
So how to make it 3d? Or am I trying to use wrong layer model?

You need to have an explicit time dimension and a batch dimension. You always have a batch dimension (1 if you are using only one batch) and for recurrent models you need a time dimension as well, as these are sequential models and they operate over time.
Reshape your data to (1,7491,1025) for 1 batch and a sequence of length 7491 with 1025 features per time-step.

Related

Mutli variate time series prediction - Conv1D and loss function - pytorch

I have a couple of questions.
I have data of the following shape:
(32, 64, 11)
where 32 is the batch size, 64 is a sequence length and 11 is the number of features. each sample of mine is 64X11, and has a label of 0 or 1.
I’d like to predict when a sequence has a label of “1”.
I’m trying to use a simple architecture with
conv1D → ReLU → flatten → linear → sigmoid.
For the Conv1D I thought that since it is a multi variate time series prediction, and each row in my data is a second, I think that the number of in channels should be the number of features, since that way it will process all of the features concurrently, (I don’t have any spatial things in my data, it doesn’t matter if a column is in index 0 or 9, as it is important in image with pixels.
I can't get to decide how to “initialize” the conv1D parameters. Currently I think the number of channels should be the number of features and not 1, as the reason I just explained, but unsure of it.
Secondly, should the loss function be BCELOSS or something else? assuming that my labels are 0 or 1, and the prediction for me is I want the model to provide a probability of belonging to class with label 1.
A lot of thanks.

Concept of mini batch in deep generative model using pyro

I am new to probabilistic programming and ML. I am following a code on deep Markov model given on pyro's website. The link to the github page to that code is:
https://github.com/pyro-ppl/pyro/blob/dev/examples/dmm/dmm.py
I understand most part of the code. The part I don't understand is mini batch idea they are using from line 175.
Question 1:
Could someone explain what are they doing there when they are using mini-batch?
In pyro documentation they say
mini_batch is a three dimensional tensor, with the first dimension being the batch dimension, the second dimension being the temporal dimension, and the final dimension being the features (88-dimensional in our case)'
Question 2:
What does temporal dimension means here?
Because I want to use this code on my dataset which is a sequential data. I have done one hot encoding of my data such that it's dimension is (10000,500,20) where 10000 is the number of examples/Sequences, 500 is the length of each of these sequences and 20 is the number of features.
Question 3:
How can I use my one hot encoded data as mini batch here?
I'm sorry if it is a really basic question but, insights will be appreciated.
Link to that documentation is:
https://pyro.ai/examples/dmm.html
Question 1: Could someone explain what are they doing there when they are using mini-batch?
To optimize most of the deep learning models, we use mini-batch gradient descent. Here, A mini_batch refers to a small number of examples. Let's say, we have 10,000 training examples and we want to create mini-batches of 50 examples. So, in total there will be 200 mini-batches and we will perform 200 parameter updates during one iteration over the entire dataset.
Question 2: What does the temporal dimension mean here?
In your data: (10000, 500, 20), the second dimension refers to the temporal dimension. You can consider you have examples with 500 timesteps (t1, t2, ..., t500).
Question 3: How can I use my one-hot encoded data as mini-batch here?
In your scenario, you can split your data (10000, 500, 20) into 200 small batches of size (50, 500, 20) where 50 is the number of examples/Sequences in the mini-batch, 500 is the length of each of these sequences and 20 is the number of features.
How do we decide the mini-batch size? Basically, we can tune the batch size just like any other hyperparameters of our model.

How to Calculate Output from Conv2D and Conv2DTranspose

Regarding a question about calculating parameter numbers, I have the follow-up questions.
Keras CNN model parameters calculation
Regarding the formula for total parameters, does it mean that, for each input channel, there are 64 sets of learnable weights (filters) but only 1 common bias applied to it, then one single output channel is generated in such a way that 64 filtered results are simply added up followed by adding the common bias?

Keras : order in which dimensions of an input tensor is specified

Consider an input layer in keras as:
model.add(layers.Dense(32, input_shape=(784,)))
What this says is input is a 2D tensor where axix=0 (batch dimension) is not specified while axis=1 is 784. Axis=0 can take any value.
My question is: isnt this style confusing?
Ideally, should it not be
input_shape=(?,784)
This reflects axis=0 is wildcard while axis=1 should be 784
Any particular reason why it is so ? Am I missing something here ?
The consistency in this case is between the sizes of the layers and the size of the input. In general, the shapes are assumed to represent the nature of the data; in that sense, the batch dimension is not part of the data itself, but rather how you group it for training or evaluation. So, in your code snippet, it is quite clear that you have inputs with 784 features and a first layer producing a vector of 32 features. If you want to explicitly include the batch dimension, you can use instead batch_input_shape=(None, 784) (this is sometimes necessary, for example if you want to give batches of a fixed size but with an additional time dimension of unknown size). This is explained in the Sequential model guide, but also matches the documentation of the Input layer, where you can give a shape or batch_shape parameter (analogous to input_shape or batch_input_shape).

Calculation of Keras layers output dimensions

I am currently trying to implement GoogLeNet architecture (InceptionV1) in Keras using theano backend, as I want to generate features for CUB dataset using GoogLeNet model.
I found an implementation in Keras here.
However, it is based on the earlier version of Keras and I had to make changes in the layers as per Keras version 2.
Now, the model is getting built correctly. However, the predict() function is failing with the error as
ValueError: CorrMM images and kernel must have the same stack size
So, I started looking at the original paper and correlating the layers mentioned in the paper with the implemented one.
So, here I found first layer to have output as expected as 112x112x64 with the input as 224x224x3.
However, when I tried to calculate the expected output dimensions as per the formula given in Stanford University tutorial page, it is different from the actual output which I received from the Keras code, though this is what is the expected output as per the GoogLeNet paper. i.e. as per the formula mentioned on the Stanford page Output height or length = ((Input height or length - filter size + 2 * Padding) / Stride) + 1
As per above equation, the output dimension comes in fraction which is not valid and to get the expected dimension as per the formula, input needs to be of shape 227x227x3. However, in Keras, with this input, output comes as 114x114x64.
Does Keras calculate the output dimensions in some different way or am I missing out on something?
Somehow I could make it work yesterday by removing few lines of code from the model which was making it to change the dimensions. (Possibly it was required by earlier version of Keras and Theano)
Also, contrary to the one mentioned in the paper, I changed patch size of MaxPooling2D() function from 3x3 to 2x2 which is the only way to achieve the desired output dimensions in GoogLeNet architecture. With input shape 224x224 and applying max pooling with patch size 2x2 and stride 2x2, its dimensions gets halved and we can get the desired output shape.
I am not sure why equation of output dimensions based on input, filter, padding and stride as parameters are not applicable here.

Resources