Pytorch adaptive_avg_pool2d algorithm - pytorch

Anyone knows the algorithm for pytorch adaptive_avg_pool2d, for example,
adaptive_avg_pool2d(image,[14,14])
so question:
I want to do the same in keras neural network, for any give inputs, want to get 14*14 output. Any suggestions?

I don't think this exists in Keras. You could get the dimension of your input and divide by 14 to get the desired pool_size.
For example if your inputs are 28x28 you can use:
keras.layers.AveragePooling2D(pool_size=(2, 2), strides=None, padding='valid', data_format=None)

Related

How to use Conv1D as an input layer

I am trying to use Conv1d as a first layer. I read the link https://keras.io/api/layers/convolution_layers/convolution1d/ but failed to understand.
If I use the first layer as dense like
model.add(keras.layers.Dense(12, input_dim=232, activation='relu'))
it works fine but if try to use the first layer as Conv1d like
model.add(keras.layers.Conv1D(32, 5, activation = 'relu'))
I get an error as:
Input 0 of layer "conv1d" is incompatible with the layer: expected min_ndim=3, found ndim=2. Full shape received: (1, 232)
My input size is 3000X232 and I am trying to learn whether a particular vector is present or not. So my output is either 0 or 1. If the vector is present in the input, the output is 1 and if the vector is absent in the input the output is 0. So I am learning a simple two-class classifier.
Can anyone help? Is it that there must be at least 2 spatial dimensions to use Conv1D?

Correct use of Cross-entropy as a loss function for sequence of elements

I have a sequece labeling task.
So as input, I have a sequence of elements with shape [batch_size, sequence_length] and where each element of this sequence should be assigned with some class.
And as a loss function during training a neural net, I use a Cross-entropy.
How should I correctly use it?
My variable target_predictions has shape [batch_size, sequence_length, number_of_classes] and target has shape [batch_size, sequence_length].
Documentation says:
I know if I use CrossEntropyLoss(target_predictions.permute(0, 2, 1), target), everything will work fine. But I have concerns that torch is intepreting my sequence_length as d_1 variable as on screenshot and will think that it is a multidimential loss, which is not the case.
How should I correctly do it?
Using CE Loss will give you loss instead of labels. By default mean will be taken which is what you are probably after and the snippet with permute will be fine (using this loss you can train your nn via backward).
To get predicted class just take argmax across appropriate dimension, in the case without permutation it would be:
labels = torch.argmax(target_predictions, dim=-1)
This will give you (batch, sequence_length) output containing classes.

ResNet50 and VGG16 for data with 2 channels

Is there any way that I can try to modify ResNet50 and VGG16 where my data(spectograms) is of the shape (64,256,2)?
I understand that I can take some layers out and modify them(output, dense) but I am not really sure for input channels.
Can anyone suggest a way to accommodate 2 channels in the models? Help is much appreciated!
You can use a different number of channels in the input (and a different height and width), but in this case, you cannot use the pretrained imagenet weights. You have to train from scratch. You can create them as follows:
from tensorflow import keras # or just import keras
vggnet = keras.applications.vgg16.VGG16(input_shape=(64,256,2), include_top=False, weights=None)
Note the weights=None argument. It means initialize weights randomly. If you have number of channels set to 3, you could use weights='imagenet', but in your case, you have 2 channels, so it won't work and you have to set it to None. The include_top=False is there for you to add final classification layers with different categories yourself. You could also create vgg19.VGG19 in the same way. For ResNet, you could similarly create it as follows:
resnet = keras.applications.resnet50.ResNet50(input_shape=(64, 256, 2), weights=None, include_top=False)
For other models and versions of vgg and resnet, please check here.

Understanding choice of loss and activation in deep autoencoder?

I am following this keras tutorial to create an autoencoder using the MNIST dataset. Here is the tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.
However, I am confused with the choice of activation and loss for the simple one-layer autoencoder (which is the first example in the link). Is there a specific reason sigmoid activation was used for the decoder part as opposed to something such as relu? I am trying to understand whether this is a choice I can play around with, or if it should indeed be sigmoid, and if so why? Similarily, I understand the loss is taken by comparing each of the original and predicted digits on a pixel-by-pixel level, but I am unsure why the loss is binary_crossentropy as opposed to something like mean squared error.
I would love clarification on this to help me move forward! Thank you!
MNIST images are generally normalized in the range [0, 1], so the autoencoder should output images in the same range, for easier learning. This is why a sigmoid activation is used at the output.
The mean squared error loss has a non-linear penalty, with big errors having a larger penalty than smaller errors, which generally leads to converging to the mean of the solution, instead of a more accurace solution. The binary cross-entropy does not have this problem, and thus it is preferred. It works because the output of the model and the labels are in the [0, 1] range, and the loss is applied to all pixels.

How to change batch size of an intermediate layer in Keras?

My problem is to take all hidden outputs from an LSTM and use them as training examples for a single dense layer. Flattening the output of the hidden layers and feeding them to a dense layer is not what I am looking to do. I have tried the following things:
I have considered Timedistributed wrapper for the dense layer (https://keras.io/layers/wrappers/). But, this seems to apply the same layer to every time slice, which is not what I want. In other words, the Timedistributed wrapper has input_shape of a 3D tensor (number of samples, number of timesteps, number of features) and produces another 3D tensor of the same type: (number of samples, number of timesteps, number of features). Instead what I want is a 2D tensor as output, which looks like (number of samples*number of timesteps, number of features)
There was a pull request for an AdvancedReshapeLayer: https://github.com/fchollet/keras/pull/36 on GitHub. This seems to be exactly what I am looking for. Unfortunately, it appears like that pull request was closed with no conclusive outcome.
I tried to build my own lambda layer to accomplish what I want as follows:
A). model.add(LSTM(NUM_LSTM_UNITS, return_sequences=True, activation='tanh')) #
B). model.add(Lambda(lambda x: x, output_shape=lambda x: (x[0]*x[1], x[2])))
C). model.add(Dense(NUM_CLASSES, input_dim=NUM_LSTM_UNITS))
mode.output_shape after (A) prints: (BATCH_SIZE, NUM_TIME_STEPS, NUM_LSTM_UNITS) and model.output_shape after (B) prints: (BATCH_SIZE*NUM_OF_TIMESTEPS, NUM_LSTM_UNITS)
Which is exactly what I am trying to achieve.
Unfortunately, when I try to run step (C). I get the following error:
Input 0 is incompatible with layer dense_1: expected ndim=2, found
ndim=3
This is baffling since when I print model.output_shape after (B), I do indeed see (BATCH_SIZE*NUM_OF_TIMESTEPS, NUM_LSTM_UNITS), which is of ndim=2.
Really appreciate any help with this.
EDIT: When I try to use the functional API instead of a sequential model, I still get the same error on step (C)
You can use backend reshape which includes batch_size dimension.
def backend_reshape(x):
return backend.reshape(x, (-1, NUM_LSTM_UNITS))
model.add(Lambda(backend_reshape, output_shape=(NUM_LSTM_UNITS,)))

Resources