Mutli variate time series prediction - Conv1D and loss function - pytorch - pytorch

I have a couple of questions.
I have data of the following shape:
(32, 64, 11)
where 32 is the batch size, 64 is a sequence length and 11 is the number of features. each sample of mine is 64X11, and has a label of 0 or 1.
I’d like to predict when a sequence has a label of “1”.
I’m trying to use a simple architecture with
conv1D → ReLU → flatten → linear → sigmoid.
For the Conv1D I thought that since it is a multi variate time series prediction, and each row in my data is a second, I think that the number of in channels should be the number of features, since that way it will process all of the features concurrently, (I don’t have any spatial things in my data, it doesn’t matter if a column is in index 0 or 9, as it is important in image with pixels.
I can't get to decide how to “initialize” the conv1D parameters. Currently I think the number of channels should be the number of features and not 1, as the reason I just explained, but unsure of it.
Secondly, should the loss function be BCELOSS or something else? assuming that my labels are 0 or 1, and the prediction for me is I want the model to provide a probability of belonging to class with label 1.
A lot of thanks.

Related

Masking pixels or doing convolutional LSTM classification with Keras

I have a sequence of multi-band images, say each sample is a tensor of size (50, 6, 30, 30) where 50 is the number of image frames in sequence, 6 is number of bands per pixel, and 30x30 is the spatial dimension of the image. The ground truth map is of size 30x30, but it is one-hot encoded (to use crossentropy loss) o 7 classes, so it is a tensor of size (1, 7, 30, 30).I want to use a combination of convolutional and LSTM (or use an integrated ConvLSTM2D layer) for my classification task, but there are below problems:
1- Not every point has a valid label at the output map (i.e. some one-hot vectors are all-zero),
2- Not every pixel has a valid value in every time stamp. So, at every given time stamp, some of the pixels may have zero value (means invalid) for all of their band values.
I read many Q&As on how to handle this issue and I think I should use sample_weights option to mask the invalid points and classes but I am really uncertain how to do it. Sample_weights should be applied to every pixel and each timestamp independently. I think I can manage it if I didn't have the convolution part (a 2D approach). But don't understand how it works when convolution is in place, because some pixel values in convolution window are valid and some are invalid.If I mask those invalid pixels at a specific time (that still I don't know how to do it), what will happen to the chain of forward and backward propagation and loss calculation? I think it will be ruined!
Looking for comments and help.
Possible solution:
Problem 1- For pixels where do not have class at all you can introduce a new class with a label for example noise,
it means not in your one hot encode you have value for that as well and weights will be generated accordingly for those pixels for noise class
this is an indirect way to achieve the same thing you do with sample weight
cause in the sample_weight technique you tell keras or sklearn that what is the weightage of the parameter or sample ratio of the weights.
Problem 2- To answer part 2 consider the possible use cases for example for these invalid values class value can be there in hot encode vector or it will be all zeros?
or you can preprocess and add these to the noise class as well then point 2 will be handled by point 1 automatically.

How is a filter assigned with feature in CNN? (or is it assigned?)

Lets say the first conv layer has 32 filters of size 5x5 with stride of 1.
model.add(Conv2D(32, (5, 5), input_shape=input_shape))
Lets say the image is of size 32x32x3(channesl). So when a filter convolves with a part of an image, is it already looking for a specific feature? I understand that the filter matrix is initialized with random numbers. But do they already have a sort of purpose to what they are looking for? Could you explain how features are being detected in CNN?
The goal of a convolutional layer is filtering. As we move over an image we effectively check for patterns in that section of the image. This works because of filters, stacks of weights represented as a vector, which are multiplied by the values outputed by the convolution.When training an image, these weights change, and so when it is time to evaluate an image, these weights return high values if it thinks it is seeing a pattern it has seen before. The combinations of high weights from various filters let the network predict the content of an image.
So, when a filter convolves with a part of an image, at first, it doesn't know it is feature or not, by training and changing weights, the filters are adaptive to the features in images so that the Loss function should be minimum with the ground truth. The reason for initialization is just we will change weights so that the predicted value will be as closest as possible to the given label.

Keras LSTM: first argument

In Keras, if you want to add an LSTM layer with 10 units, you use model.add(LSTM(10)). I've heard that number 10 referred to as the number of hidden units here and as the number of output units (line 863 of the Keras code here).
My question is, are those two things the same? Is the dimensionality of the output the same as the number of hidden units? I've read a few tutorials (like this one and this one), but none of them state this explicitly.
The answers seems to refer to multi-layer perceptrons (MLP) in which the hidden layer can be of different size and often is. For LSTMs, the hidden dimension is the same as the output dimension by construction:
The h is the output for a given timestep and the cell state c is bound by the hidden size due to element wise multiplication. The addition of terms to compute the gates would require that both the input kernel W and the recurrent kernel U map to the same dimension. This is certainly the case for Keras LSTM as well and is why you only provide single units argument.
To get a good intuition for why this makes sense. Remember that the LSTM job is to encode a sequence into a vector (maybe a Gross oversimplification but its all we need). The size of that vector is specified by hidden_units, the output is:
seq vector RNN weights
(1 X input_dim) * (input_dim X hidden_units),
which has 1 X hidden_units (a row vector representing the encoding of your input sequence). And thus, the names in this case are used synonymously.
Of course RNNs require more than one multiplication and keras implements RNNs as a sequence of matrix-matrix multiplications instead vector-matrix shown above.
The number of hidden units is not the same as the number of output units.
The number 10 controls the dimension of the output hidden state (source code for the LSTM constructor method can be found here. 10 specifies the units argument). In one of the tutorial's you have linked to (colah's blog), the units argument would control the dimension of the vectors ht-1 , ht, and ht+1: RNN image.
If you want to control the number of LSTM blocks in your network, you need to specify this as an input into the LSTM layer. The input shape to the layer is (nb_samples, timesteps, input_dim) Keras documentation. timesteps controls how many LSTM blocks your network contains. Referring to the tutorial on colah's blog again, in RNN image, timesteps would control how many green blocks the network contains.

Input shape for Keras LSTM/GRU for floats

I'm sorry for asking that stupid thing. I can't apply answers from other questions to my task.
Currently I got well-known error:
expected lstm_input_1 to have 3 dimensions, but got array with shape (7491, 1025)
My data:
matrix - 1025 float numbers in row. 7491 rows
So how to make it 3d? Or am I trying to use wrong layer model?
You need to have an explicit time dimension and a batch dimension. You always have a batch dimension (1 if you are using only one batch) and for recurrent models you need a time dimension as well, as these are sequential models and they operate over time.
Reshape your data to (1,7491,1025) for 1 batch and a sequence of length 7491 with 1025 features per time-step.

Keras SimpleRNN input shape and masking

Newbie to Keras alert!!!
I've got some questions related to Recurrent Layers in Keras (over theano)
How is the input supposed to be formatted regarding timesteps (say for instance I want a layer that will have 3 timesteps 1 in the future 1 in the past and 1 current) I see some answers and the API proposing padding and using the embedding layer or to shape the input using a time window (3 in this case) and in any case I can't make heads or tails of the API and SimpleRNN examples are scarce and don't seem to agree.
How would the input time window formatting work with a masking layer?
Some related answers propose performing masking with an embedding layer. What does masking have to do with embedding layers anyway, aren't embedding layers basically 1-hot word embeddings? (my application would use phonemes or characters as input)
I can start an answer, but this question is very broad so I would appreciate suggestions on improvement to my answer.
Keras SimpleRNN expects an input of size (num_training_examples, num_timesteps, num_features).
For example, suppose I have sequences of counts of numbers of cars driving by an intersection per hour (small example just to illustrate):
X = np.array([[10, 14, 2, 5], [12, 15, 1, 4], [13, 10, 0, 0]])
Aside: Notice that I was taking observations over four hours, and the last two hours had no cars driving by. That's an example of zero-padding the input, which means making all of the sequences the same length by adding 0s to the end of shorter sequences to match the length of the longest sequence.
Keras would expect the following input shape: (X.shape[0], X.shape1, 1), which means I could do this:
X_train = np.reshape(X, (X.shape[0], X.shape[1], 1))
And then I could feed that into the RNN:
model = Sequential()
model.add(SimpleRNN(units=10, activation='relu', input_shape = (X.shape[1], X.shape[2])))
You'd add more layers, or add regularization, etc., depending on the nature of your task.
For your specific application, I would think you would need to reshape your input to have 3 elements per row (last time step, current, next).
I don't know much about the masking layers, but here is a good place to start.
As far as I know, embeddings are independent of maskings, but you can mask an embedding.
Hope that provides a good starting point!

Resources