Keras TimeDistributed Dense with softmax is not normalised per time step

Keras TimeDistributed Dense with softmax is not normalised per time step - keras

This ended up being a different issue from the one in the question
I have a very simple Keras model that accepts time series data. I want to use a recurrent layer to predict a new sequence of the same dimensions, with a softmax on the end to provide a normalised result at each time step.
This is how my model looks.
x = GRU(256, return_sequences=True)(x)
x = TimeDistributed(Dense(3, activation='softmax'))(x)
Imagine the input is something like:
[
[0.25, 0.25, 0.5],
[0.3, 0.3, 0.4],
[0.2, 0.7, 0.1],
[0.1, 0.1, 0.8]
]
I'd expect the output to be the same shape and normalised at each step, like:
[
[0.15, 0.35, 0.5],
[0.35, 0.35, 0.3],
[0.1, 0.6, 0.3],
[0.1, 0.2, 0.7]
]
But what I actually get is a result where the sum of elements in each row are actually a quarter (or whatever fraction of the number of rows), not 1.
Put simply, I thought the idea of TimeDistributed was to apply the Dense layer to each time step, so effectively the Dense with softmax activation would be applied repeatedly to each timestep. But I seem to be getting a result that looks like it is normalized across all elements in the output matrix of time steps.
Since I seem to understand incorrectly, is there a way to get a Dense softmax result for each time step (normalized to 1 at each step) without having to predict each time step sequentially?

It appears that the issue wasn't with the handling of Softmax with a TimeDistributed wrapper, but an error in my predictions function, which was summing over the whole matrix rather than on a row by row basis.

Related

PyTorch: Custom Sampler for sampling classes with changing probabilities?

I want to implement a custom sampler in PyTorch. I have a classification problem and I want to sample data in each batch by sampling classes non-uniformly, with probabilities that change batch-to-batch, and then sampling data uniformly within each class.
So for instance, if I'm using CIFAR10 with a batch size 32 and initial probabilities [0.5, 0.5, 0, ..., 0], I want to sample only the first 2 classes with equal probability e.g. [0, 1, 1, 0, 1...0], and then sample 32 points uniformly from within each sampled class. After the first batch, I may want the probabilities to shift to [0.25, 0.5, 0.25, 0, ..., 0], meaning I want to sample from the first three classes with those probabilities.
How can I implement this in a custom pytorch Sampler?

Pytorch, sample given batch logits

Given logits like
# each row is a record of data
logits = np.array([ [0.1, 0.3, 0.5], [0.3, 0.1, 0.5], [0.1, 0.3, 0.0] ])
How can I use Pytorch to sample the index for the logits of each row? Current distribution APIs does not support such functions.
What I want is, for example
distribution = Categorical(logits=logits)
labels = distribution.sample(dim=1)

Keras autoencoder optimizer and loss function

I've been implementing an autoencoder which receives as inputs vectors that consist only of 0 and 1, such as [1, 0, 1, 0, 1, 0, ...].
Likewise, another autoencoder that receives as inputs vectors that consist in values between 0 and 1, such as [0.123, 1, 0.9, 0.01, 0.9, ...]. In both cases each vector element is the input value of a node. The activation function of the hidden layers is relu and for the output layer is sigmoid.
I've seen some examples of autoencoders where adam/adadelta are used as optimizer and binary_crossentropy is used as a loss function. For that reason I implemented in both adadelta and binary_crossentropy, but I'm not sure if for both cases it's the correct configuration.

Keras convolution along samples

I have LSTM NN that has 1 output after the last Dense (softmax) neuron. I saw that if I smooth the predicted Y by applying numpy convolution, I get much better accuracy.
The issue is that I manually choose values for convolution kernel. I'd like to get NN possibility to train convolution kernel values. So, I need to add convolution as the last layer after softmax dense. If I understand Keras Conv1D correctly, it can convolve along features only. But I need to convolve along output for different samples (axis 0). Thus, if NN produces
Y = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
...and kernel_size of convolution layer is 3, it should convolve vector Y and the another trained convolution vector C (for example [0.1, 0.5, 1]):
>>> np.convolve([0.1, 0.5, 1],[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7], mode='same')
array([ 0.07, 0.23, 0.39, 0.55, 0.71, 0.87, 0.95])
So, the goal is convolve output along samples but let NN train convolution vector kernel to choose the best one.
Is it possible to do this in Keras?

A convolution layer will require an input shape like (samples, length, channels).
To make a convolution along the samples, you simply reorganize your tensor to make it attain the convolutional input requirements.
Looks like you want the old samples to be the new length, and that you have only one channel in any case. I'm not sure whether this is exactly what you intend to do, but as a consequence, we will leave only one new sample.
So, we reshape your tensor from (samples,) to (1, samples, 1).
For reshaping considering the first dimension, we need a lambda layer:
model.add(Lambda(lambda x: K.reshape(x,(1,-1,1)), output_shape=(None,1)))
model.add(Conv1D(1,3,padding='same'))
#it's very important to reshape back to the same number of original samples, or keras will not accept your model:
model.add(lambda(lambda x: K.reshape(x,(-1,1)),output_shape=(1,)))
The final shape may need adjustment to fit your training data. Depending on whether your numpy arrays are (samples,) or (samples,1).

What is the replace for softmax layer in case more than one output can be activated?

For example, I have CNN which tries to predict numbers from MNIST dataset (code written using Keras). It has 10 outputs, which form softmax layer. Only one of outputs can be true (independently for each digit from 0 to 9):
Real: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
Predicted: [0.02, 0.9, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]
Sum of predicted is equal to 1.0 due to definition of softmax.
Let's say I have a task where I need to classify some objects that can fall in several categories:
Real: [0, 1, 0, 1, 0, 1, 0, 0, 0, 1]
So I need to normalize in some other way. I need function which gives value on range [0, 1] and which sum can be larger than 1.
I need something like that:
Predicted: [0.1, 0.9, 0.05, 0.9, 0.01, 0.8, 0.1, 0.01, 0.2, 0.9]
Each number is probability that object falls in given category. After that I can use some threshold like 0.5 to distinguish categories in which given object falls.
The following questions appear:
So which activation function can be used for this?
May be this function already exists in Keras?
May be you can propose some other way to predict in this case?

Your problem is one of multi-label classification, and in the context of Keras it is discussed, for example, here: https://github.com/fchollet/keras/issues/741
In short the suggested solution for it in keras is to replace the softmax layer with a sigmoid layer and use binary_crossentropy as your cost function.
an example from that thread:
# Build a classifier optimized for maximizing f1_score (uses class_weights)
clf = Sequential()
clf.add(Dropout(0.3))
clf.add(Dense(xt.shape[1], 1600, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1600, 1200, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1200, 800, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(800, yt.shape[1], activation='sigmoid'))
clf.compile(optimizer=Adam(), loss='binary_crossentropy')
clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)
preds = clf.predict(xs)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
print f1_score(ys, preds, average='macro')

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Keras TimeDistributed Dense with softmax is not normalised per time step - keras

It appears that the issue wasn't with the handling of Softmax with a TimeDistributed wrapper, but an error in my predictions function, which was summing over the whole matrix rather than on a row by row basis.

Related

PyTorch: Custom Sampler for sampling classes with changing probabilities?

Pytorch, sample given batch logits

Keras autoencoder optimizer and loss function

Keras convolution along samples

What is the replace for softmax layer in case more than one output can be activated?

Categories

Resources