def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
This is the code from the Deep MNIST for experts tutorial on Tensorflow website.
I have two questions:
1) The documentation k-size is an integer list of length greater than 4 that refers to the size of the max-pool window. Shouldn't that be just [2,2] considering that it's a 2X2 window? I mean why is it [1, 2, 2, 1] instead of [2,2] ?
2) If we are taking a stride step on size one. Why do we need a vector of 4 values, wouldn't one value suffice?
strides = [1]
3) If padding = 'SAME' why does the image size decrease by half? ( from 28 X 28 to 14 X 14 in the first convolutional process )
I'm not sure which documentation you're referring to in this question. The maxpool window is indeed 2x2.
The step size can be different depending on the dimensions. The 4 vector is the most general case where suppose you wanted to skip images in the batch, skip different height and width and potentially even skip based on channels. This is hardly used but has been left in.
If you have a stride of 2 along each direction then you skip every other pixel that you could potentially use for max pooling. If you set the skip size to be [1,1,1,1] with padding same then you would indeed return a result of the same size. The padding "SAME" refers to zero padding the image such that you add a border of height kernel hieght and a width of size kernel width to the image.
Related
I have an immature question.
For example, I got a tensor with the size of: torch.Size([2, 1, 80, 64]).
I need to turn it into another tensor with the size of: torch.Size([2, 1, 80, 16]).
Are there any right ways to achieve that?
There exist many functions to achieve dimensionality reduction and the following are some examples:
randomly select 16 out of the 64 features
take the mean of every four features (64/4=16)
use a dimensionality reduction technique like PCA
apply a linear transformation
apply a convolution function
To give a satisfying answer, more information about why and what you want to do is necessary.
Answered by: #ptrblck_de
slice the tensor
y = x[..., :16]
print(y.shape)
# torch.Size([2, 1, 80, 16])
index it with a stride of 4
y = x[..., ::4]
print(y.shape)
# torch.Size([2, 1, 80, 16])
use any pooling (max, avg, etc.) layer (the same would also work using adaptive pooling layers)
pool = nn.MaxPool2d((1, 2), (1, 4))
y = pool(x)
print(y.shape)
# torch.Size([2, 1, 80, 16])
pool = nn.AdaptiveAvgPool2d(output_size=(80, 16))
y = pool(x)
print(y.shape)
# torch.Size([2, 1, 80, 16])
or manually reduce the last dimension with any reduction op (sum, mean, max, etc.)
How would I go about blacking out a portion of an image or feature map such that AutoGrad can backprop through the operation?
Specifically I want to black out everything except for n layers of border pixels. So if we consider a single channel of the feature map which looks like:
[
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
]
I set a constant n=1 so my operation does the following to the input:
[
[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 0, 0, 1],
[1, 1, 1, 1],
]
In my case I'd be doing it to a multi channel feature map and all channels would be treated the same way.
If possible, I want to do it in a functional manner.
Considering the comments you added, i.e. that you don't need the output to be differentiable wrt. to the mask (said differently, the mask is constant), you could just store the indices of the 1s in the mask and act only on the corresponding elements of whatever Tensor you're considering. Or if you don't want to deal with fancy indexing, you could just keep the mask as a Tensor of 0s and 1s and do an element-wise multiplication of it with whatever Tensor you're considering. Or, if you truly just need to compute a loss along just the border pixels, just extract the first and last row, and first and last column, and avoid double-counting the corners. This latter solution is essentially just the first solution recast in a special case.
To address the question in your comment to my answer:
x = torch.tensor([[1.0,2,3],[4,5,6]], requires_grad = True)
print(x[:,0])
gives
tensor([1., 4.], grad_fn=<SelectBackward>)
, so we see that slicing does not mess with the autograd engine (it's still tracking the contribution to the gradient). It is not too surprising that this works automatically; slicing can be viewed as the (mathematical) function that of projecting onto a subspace of R^n, for which it's easy to compute the gradient.
I have binary images (as the one below) at the output of my net. I need the '1's to be further from each other (not connected), so that they would form a sparse binary image (without white blobs). Something like salt-and-pepper noise. I am looking for a way to define a loss (in pytorch) that would punish based on the density of the '1's.
Thanks.
I
It depends on how you're generating said image. Since neural networks have to be trained by backpropagation, I'm rather sure your binary image is not the direct output of your neural network (ie not the thing you're applying loss to), because gradient can't blow through binary (discrete) variables. I suspect you do something like pixel-wise binary cross entropy or similar and then threshold.
I assume your code works like that: you densely regress real-valued numbers and then apply thresholding, likely using sigmoid to map from [-inf, inf] to [0, 1]. If it is so, you can do the following. Build a convolution kernel which is 0 in the center and 1 elsewhere, of size related to how big you want your "sparsity gaps" to be.
kernel = [
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 0, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
]
Then you apply sigmoid to your real-valued output to squash it to [0, 1]:
squashed = torch.sigmoid(nn_output)
then you convolve squashed with kernel, which gives you the relaxed number of non-zero neighbors.
neighborhood = nn.functional.conv2d(squashed, kernel, padding=2)
and your loss will be the product of each pixel's value in squashed with the corresponding value in neighborhood:
sparsity_loss = (squashed * neighborhood).mean()
If you think of this loss applied to your binary image, for a given pixel p it will be 1 if and only if both p and at least one of its neighbors have values 1 and 0 otherwise. Since we apply it to non-binary numbers in [0, 1] range, it will be the differentiable approximation of that.
Please note that I left out some of the details from the code above (like correctly reshaping kernel to work with nn.functional.conv2d).
I'm working on a smaller project to better understand RNN, in particualr LSTM and GRU. I'm not at all an expert, so please bear that in mind.
The problem I'm facing is given as data in the form of:
>>> import numpy as np
>>> import pandas as pd
>>> pd.DataFrame([[1, 2, 3],[1, 2, 1], [1, 3, 2],[2, 3, 1],[3, 1, 1],[3, 3, 2],[4, 3, 3]], columns=['person', 'interaction', 'group'])
person interaction group
0 1 2 3
1 1 2 1
2 1 3 2
3 2 3 1
4 3 1 1
5 3 3 2
6 4 3 3
this is just for explanation. We have different person interacting with different groups in different ways. I've already encoded the various features. The last interaction of a user is always a 3, which means selecting a certain group. In the short example above person 1 chooses group 2, person 2 chooses group 1 and so on.
My whole data set is much bigger but I would like to understand first the conceptual part before throwing models at it. The task I would like to learn is given a sequence of interaction, which group is chosen by the person. A bit more concrete, I would like to have an output a list with all groups (there are 3 groups, 1, 2, 3) sorted by the most likely choice, followed by the second and third likest group. The loss function is therefore a mean reciprocal rank.
I know that in Keras Grus/LSTM can handle various length input. So my three questions are.
The input is of the format:
(samples, timesteps, features)
writing high level code:
import keras.layers as L
import keras.models as M
model_input = L.Input(shape=(?, None, 2))
timestep=None should imply the varying size and 2 is for the feature interaction and group. But what about the samples? How do I define the batches?
For the output I'm a bit puzzled how this should look like in this example? I think for each last interaction of a person I would like to have a list of length 3. Assuming I've set up the output
model_output = L.LSTM(3, return_sequences=False)
I then want to compile it. Is there a way of using the mean reciprocal rank?
model.compile('adam', '?')
I know the questions are fairly high level, but I would like to understand first the big picture and start to play around. Any help would therefore be appreciated.
The concept you've drawn in your question is a pretty good start already. I'll add a few things to make it work, as well as a code example below:
You can specify LSTM(n_hidden, input_shape=(None, 2)) directly, instead of inserting an extra Input layer; the batch dimension is to be omitted for the definition.
Since your model is going to perform some kind of classification (based on time series data) the final layer is what we'd expect from "normal" classification as well, a Dense(num_classes, action='softmax'). Chaining the LSTM and the Dense layer together will first pass the time series input through the LSTM layer and then feed its output (determined by the number of hidden units) into the Dense layer. activation='softmax' allows to compute a class score for each class (we're going to use one-hot-encoding in a data preprocessing step, see code example below). This means class scores are not ordered, but you can always do so via np.argsort or np.argmax.
Categorical crossentropy loss is suited for comparing the classification score, so we'll use that one: model.compile(loss='categorical_crossentropy', optimizer='adam').
Since the number of interactions. i.e. the length of model input, varies from sample to sample we'll use a batch size of 1 and feed in one sample at a time.
The following is a sample implementation w.r.t to the above considerations. Note that I modified your sample data a bit, in order to provide more "reasoning" behind group choices. Also each person needs to perform at least one interaction before choosing a group (i.e. the input sequence cannot be empty); if this is not the case for your data, then introducing an additional no-op interaction (e.g. 0) can help.
import pandas as pd
import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(10, input_shape=(None, 2))) # LSTM for arbitrary length series.
model.add(tf.keras.layers.Dense(3, activation='softmax')) # Softmax for class probabilities.
model.compile(loss='categorical_crossentropy', optimizer='adam')
# Example interactions:
# * 1: Likes the group,
# * 2: Dislikes the group,
# * 3: Chooses the group.
df = pd.DataFrame([
[1, 1, 3],
[1, 1, 3],
[1, 2, 2],
[1, 3, 3],
[2, 2, 1],
[2, 2, 3],
[2, 1, 2],
[2, 3, 2],
[3, 1, 1],
[3, 1, 1],
[3, 1, 1],
[3, 2, 3],
[3, 2, 2],
[3, 3, 1]],
columns=['person', 'interaction', 'group']
)
data = [person[1][['interaction', 'group']].values for person in df.groupby('person')]
x_train = [x[:-1] for x in data]
y_train = tf.keras.utils.to_categorical([x[-1, 1]-1 for x in data]) # Expects class labels from 0 to n (-> subtract 1).
print(x_train)
print(y_train)
class TrainGenerator(tf.keras.utils.Sequence):
def __init__(self, x, y):
self.x = x
self.y = y
def __len__(self):
return len(self.x)
def __getitem__(self, index):
# Need to expand arrays to have batch size 1.
return self.x[index][None, :, :], self.y[index][None, :]
model.fit_generator(TrainGenerator(x_train, y_train), epochs=1000)
pred = [model.predict(x[None, :, :]).ravel() for x in x_train]
for p, y in zip(pred, y_train):
print(p, y)
And the corresponding sample output:
[...]
Epoch 1000/1000
3/3 [==============================] - 0s 40ms/step - loss: 0.0037
[0.00213619 0.00241093 0.9954529 ] [0. 0. 1.]
[0.00123938 0.99718493 0.00157572] [0. 1. 0.]
[9.9632275e-01 7.5039308e-04 2.9268670e-03] [1. 0. 0.]
Using custom generator expressions: According to the documentation we can use any generator to yield the data. The generator is expected to yield batches of the data and loop over the whole data set indefinitely. When using tf.keras.utils.Sequence we do not need to specify the parameter steps_per_epoch as this will default to len(train_generator). Hence, when using a custom generator, we shall provide this parameter as well:
import itertools as it
model.fit_generator(((x_train[i % len(x_train)][None, :, :],
y_train[i % len(y_train)][None, :]) for i in it.count()),
epochs=1000,
steps_per_epoch=len(x_train))
If I have a layer with 32 convolution 5x5 rgb kernels in it, I would expect the shape to be (32, 5, 5, 3) being (count, h, w, rgb) but instead it is
(5, 5, 3, 32). This messes up iteration since
for kern in kernels:
Does not work correctly. I get a series of (5 ,3, 32) ndarrays. I do not get each of the 5x5 rgb kernels.
Am I just doing this wrong?
Strange that the kernel is stored in the shape (h, w, channels, filters), as the implementation suggests otherwise:
kernel_shape = self.kernel_size + (self.filters, input_dim)
self.kernel = self.add_weight(shape=kernel_shape, ...)
...
However, if this is what you are seeing, and if you need to iterate over each filter, why not just move the axis with np.moveaxis:
kernel = np.moveaxis(kernel, -1,0)
to get the desired (kernels, h, w, channels).