Calculation of Keras layers output dimensions - keras

I am currently trying to implement GoogLeNet architecture (InceptionV1) in Keras using theano backend, as I want to generate features for CUB dataset using GoogLeNet model.
I found an implementation in Keras here.
However, it is based on the earlier version of Keras and I had to make changes in the layers as per Keras version 2.
Now, the model is getting built correctly. However, the predict() function is failing with the error as
ValueError: CorrMM images and kernel must have the same stack size
So, I started looking at the original paper and correlating the layers mentioned in the paper with the implemented one.
So, here I found first layer to have output as expected as 112x112x64 with the input as 224x224x3.
However, when I tried to calculate the expected output dimensions as per the formula given in Stanford University tutorial page, it is different from the actual output which I received from the Keras code, though this is what is the expected output as per the GoogLeNet paper. i.e. as per the formula mentioned on the Stanford page Output height or length = ((Input height or length - filter size + 2 * Padding) / Stride) + 1
As per above equation, the output dimension comes in fraction which is not valid and to get the expected dimension as per the formula, input needs to be of shape 227x227x3. However, in Keras, with this input, output comes as 114x114x64.
Does Keras calculate the output dimensions in some different way or am I missing out on something?

Somehow I could make it work yesterday by removing few lines of code from the model which was making it to change the dimensions. (Possibly it was required by earlier version of Keras and Theano)
Also, contrary to the one mentioned in the paper, I changed patch size of MaxPooling2D() function from 3x3 to 2x2 which is the only way to achieve the desired output dimensions in GoogLeNet architecture. With input shape 224x224 and applying max pooling with patch size 2x2 and stride 2x2, its dimensions gets halved and we can get the desired output shape.
I am not sure why equation of output dimensions based on input, filter, padding and stride as parameters are not applicable here.

Related

Tensorflow Keras: Problems to handle variable length input, using generator?

We want to train our model on varying input dimensions. Every input in a given batch and across batches has different dimensions.
We cannot resize our input (since we’ll lose our microscopic features). Now, since we cannot resize our input, converting them into batches of numpy array becomes impossible. In order to handle this now I have made the list for the input and each list of element contained (height, width, 1). Height is variable size and width is constant.
Sometime my input excessively large. In order to do that I have plan to use model.fit_generator(). In this, We find the max height and width of input in a batch and pad every other input with zeros so that every input in the batch has an equal dimension. Now we can easily convert it to a numpy array or a tensor and pass it to the fit_generator(). The model automatically learns to ignore the zeros and learns features from the intended portion from the padded input. This way we have a batch with equal input dimensions but every batch has a different shape (due to difference in max height and width of input across batches).
Now until here, I described the things what I have learned and what I have plan to do with variable input data. But I am stuck with the following confusions:
1- I have plan to use CNN first and then LSTM on that. I am using tensorflow keras. There, we have the facility of padding and masking . However, As for as I know that LSTM can work on masking and padding ignore 0-padded values. However, I am concerned about the CNN (does CNN ignores 0-padded values), because my padded input will first feed to CNN. I have seen some discussion in the following links:
How to apply masking layer to sequential CNN model in Keras?
https://github.com/keras-team/keras/issues/411
In these link, they mentioned that Unfortunately masking is not yet supported by the Keras Conv layers. However, now we can see alot of development and advancements specifically in the form of tensorflow Keras. So I am wondering that now tensorflow keras can support masking input?
2- To use the generator, we can use custom keras generator. For that I went through a vary good tutorial. I made the mind to use this. But I am wondering is there any advance built-in facility in tensorflow keras to use generator and save me to write custom keras generator?

Extracting hidden representations for each token - PyTorch LSTM

I am currently working on a NLP project involving recurrent neural networks. I implemented a LSTM with PyTorch, following the tutorial here.
For my project, I need to extract the hidden representation for every token of an input text. I thought that the easiest way would be to test using a batch size and sequence length of 1, but when I do that the loss gets orders of magnitude larger than in training phase (during training I used a batch size of 64 and a sequence length of 35).
Is there any other way I can easily access these word-level hidden representations? Thank you.
Yes, that is possible with nn.LSTM as long as it is a single layer LSTM. If u check the documentation (here), for the output of an LSTM, you can see it outputs a tensor and a tuple of tensors. The tuple contains the hidden and cell for the last sequence step. What each dimension means of the output depends on how u initialized your network. Either the first or second dimension is the batch dimension and the rest is the sequence of word embeddings you want.
If u use a packed sequence as input, it is a bit of a different story.

LSTM input longer than output

I am not sure if I understand how exactly Keras version of LSTM works.
Let's say I have vector of len=20 as input and I specify keras.layers.LSTM(units=10)
So in this example does the network finish after processing 50% of input or it precess the rest from start (I mean from first cell)?
Units are never related to the input size.
Units are related only to the output size (units = output features or channels).
An LSTM layer will always process the entire data and optionally return either the "same length (all steps)" or "no length (only last step)".
In terms of shapes
You must have an input tensor with shape (batch, len=20, input_features).
And it will output:
For return_sequences=False: (batch, output_features=10) - no length
For return_sequences=True: (batch, len=20, output_features=10) - same length
Output features is always equal to units.
See a full comprehension of the LSTM layers here: Understanding Keras LSTMs

Masking pixels or doing convolutional LSTM classification with Keras

I have a sequence of multi-band images, say each sample is a tensor of size (50, 6, 30, 30) where 50 is the number of image frames in sequence, 6 is number of bands per pixel, and 30x30 is the spatial dimension of the image. The ground truth map is of size 30x30, but it is one-hot encoded (to use crossentropy loss) o 7 classes, so it is a tensor of size (1, 7, 30, 30).I want to use a combination of convolutional and LSTM (or use an integrated ConvLSTM2D layer) for my classification task, but there are below problems:
1- Not every point has a valid label at the output map (i.e. some one-hot vectors are all-zero),
2- Not every pixel has a valid value in every time stamp. So, at every given time stamp, some of the pixels may have zero value (means invalid) for all of their band values.
I read many Q&As on how to handle this issue and I think I should use sample_weights option to mask the invalid points and classes but I am really uncertain how to do it. Sample_weights should be applied to every pixel and each timestamp independently. I think I can manage it if I didn't have the convolution part (a 2D approach). But don't understand how it works when convolution is in place, because some pixel values in convolution window are valid and some are invalid.If I mask those invalid pixels at a specific time (that still I don't know how to do it), what will happen to the chain of forward and backward propagation and loss calculation? I think it will be ruined!
Looking for comments and help.
Possible solution:
Problem 1- For pixels where do not have class at all you can introduce a new class with a label for example noise,
it means not in your one hot encode you have value for that as well and weights will be generated accordingly for those pixels for noise class
this is an indirect way to achieve the same thing you do with sample weight
cause in the sample_weight technique you tell keras or sklearn that what is the weightage of the parameter or sample ratio of the weights.
Problem 2- To answer part 2 consider the possible use cases for example for these invalid values class value can be there in hot encode vector or it will be all zeros?
or you can preprocess and add these to the noise class as well then point 2 will be handled by point 1 automatically.

Exclude zero-padded regions from network loss - Keras 2.0 Theno Backend (segmentation network)

I'm training a segmentation network in Keras with Theano backend and I'm using ImageDataGenerator with flow_from_directory.
My images have flexible size. In order to use flow_from_directory though you have to specify a fixed size (target_size) and while reading the images, the function automatically fills the points outside the boundaries of the original image.
Currently I'm setting this specified size a value larger than my largest image- say may largest image is 300x400, I fix the target_size to 400x400 and use fill_mode='constant' and cval=0 to pad the points outside original image with zero.
Now my problem is as follows. During training, I do not wish these padded regions to contribute to my loss function at all. Does anyone has an idea how to do that?
Masking could help you.
Masks a sequence by using a mask value to skip timesteps.
For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking).
If any downstream layer does not support masking yet receives such an input mask, an exception will be raised.
It's technically made for time series, but it should work for images too with some tweaking. Here you can find some attempts to do so (and some alternatives).

Resources