Keras autoencoder optimizer and loss function - keras

I've been implementing an autoencoder which receives as inputs vectors that consist only of 0 and 1, such as [1, 0, 1, 0, 1, 0, ...].
Likewise, another autoencoder that receives as inputs vectors that consist in values between 0 and 1, such as [0.123, 1, 0.9, 0.01, 0.9, ...]. In both cases each vector element is the input value of a node. The activation function of the hidden layers is relu and for the output layer is sigmoid.
I've seen some examples of autoencoders where adam/adadelta are used as optimizer and binary_crossentropy is used as a loss function. For that reason I implemented in both adadelta and binary_crossentropy, but I'm not sure if for both cases it's the correct configuration.

Related

how do I solve ValueError in Tensorflow?

I am running a cnn in Google colab and i am using tensorflow or Keras. However I received this feedback
Negative dimension size caused by subtracting 3 from 2 for '{{node conv2d_11/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](Placeholder, conv2d_11/Conv2D/ReadVariableOp)' with input shapes: [?,2,2,394], [3,3,394,394].
Call arguments received:
• inputs=tf.Tensor(shape=(None, 2, 2, 394), dtype=float32)
does this have to do with my input data or my parameters? Thanks
Try inserting a number instead of using None when specifying the shape. As said in the documentation here, you run across not-fully-specified shapes when using None.
in this case you have defined model which consists of MaxPool layers or AvgPool layers alot, so by the images pass through model layers, images size will be decreased; i think it would be helpful if you set padding parameter in convolution layer to same and for more details you could read about conv layers parameters strides and padding.

PyTorch: Custom Sampler for sampling classes with changing probabilities?

I want to implement a custom sampler in PyTorch. I have a classification problem and I want to sample data in each batch by sampling classes non-uniformly, with probabilities that change batch-to-batch, and then sampling data uniformly within each class.
So for instance, if I'm using CIFAR10 with a batch size 32 and initial probabilities [0.5, 0.5, 0, ..., 0], I want to sample only the first 2 classes with equal probability e.g. [0, 1, 1, 0, 1...0], and then sample 32 points uniformly from within each sampled class. After the first batch, I may want the probabilities to shift to [0.25, 0.5, 0.25, 0, ..., 0], meaning I want to sample from the first three classes with those probabilities.
How can I implement this in a custom pytorch Sampler?

Multilabel classification of a sequence, how to do it?

I am quite new to the deep learning field especially Keras. Here I have a simple problem of classification and I don't know how to solve it. What I don't understand is how the general process of the classification, like converting the input data into tensors, the labels, etc.
Let's say we have three classes, 1, 2, 3.
There is a sequence of classes that need to be classified as one of those classes. The dataset is for example
Sequence 1, 1, 1, 2 is labeled 2
Sequence 2, 1, 3, 3 is labeled 1
Sequence 3, 1, 2, 1 is labeled 3
and so on.
This means the input dataset will be
[[1, 1, 1, 2],
[2, 1, 3, 3],
[3, 1, 2, 1]]
and the label will be
[[2],
[1],
[3]]
Now one thing that I do understand is to one-hot encode the class. Because we have three classes, every 1 will be converted into [1, 0, 0], 2 will be [0, 1, 0] and 3 will be [0, 0, 1]. Converting the example above will give a dataset of 3 x 4 x 3, and a label of 3 x 1 x 3.
Another thing that I understand is that the last layer should be a softmax layer. This way if a test data like (e.g. [1, 2, 3, 4]) comes out, it will be softmaxed and the probabilities of this sequence belonging to class 1 or 2 or 3 will be calculated.
Am I right? If so, can you give me an explanation/example of the process of classifying these sequences?
Thank you in advance.
Here are a few clarifications that you seem to be asking about.
This point was confusing so I deleted it.
If your input data has the shape (4), then your input tensor will have the shape (batch_size, 4).
Softmax is the correct activation for your prediction (last) layer
given your desired output, because you have a classification problem
with multiple classes. This will yield output of shape (batch_size,
3). These will be the probabilities of each potential classification, summing to one across all classes. For example, if the classification is class 0, then a single prediction might look something like [0.9714,0.01127,0.01733].
Batch size isn't hard-coded to the network, hence it is represented in model.summary() as None. E.g. the network's last-layer output shape can be written (None, 3).
Unless you have an applicable alternative, a softmax prediction layer requires a categorical_crossentropy loss function.
The architecture of a network remains up to you, but you'll at least need a way in and a way out. In Keras (as you've tagged), there are a few ways to do this. Here are some examples:
Example with Keras Sequential
model = Sequential()
model.add(InputLayer(input_shape=(4,))) # sequence of length four
model.add(Dense(3, activation='softmax')) # three possible classes
Example with Keras Functional
input_tensor = Input(shape=(4,))
x = Dense(3, activation='softmax')(input_tensor)
model = Model(input_tensor, x)
Example including input tensor shape in first functional layer (either Sequential or Functional):
model = Sequential()
model.add(Dense(666, activation='relu', input_shape=(4,)))
model.add(Dense(3, activation='softmax'))
Hope that helps!

Keras convolution along samples

I have LSTM NN that has 1 output after the last Dense (softmax) neuron. I saw that if I smooth the predicted Y by applying numpy convolution, I get much better accuracy.
The issue is that I manually choose values for convolution kernel. I'd like to get NN possibility to train convolution kernel values. So, I need to add convolution as the last layer after softmax dense. If I understand Keras Conv1D correctly, it can convolve along features only. But I need to convolve along output for different samples (axis 0). Thus, if NN produces
Y = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
...and kernel_size of convolution layer is 3, it should convolve vector Y and the another trained convolution vector C (for example [0.1, 0.5, 1]):
>>> np.convolve([0.1, 0.5, 1],[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7], mode='same')
array([ 0.07, 0.23, 0.39, 0.55, 0.71, 0.87, 0.95])
So, the goal is convolve output along samples but let NN train convolution vector kernel to choose the best one.
Is it possible to do this in Keras?
A convolution layer will require an input shape like (samples, length, channels).
To make a convolution along the samples, you simply reorganize your tensor to make it attain the convolutional input requirements.
Looks like you want the old samples to be the new length, and that you have only one channel in any case. I'm not sure whether this is exactly what you intend to do, but as a consequence, we will leave only one new sample.
So, we reshape your tensor from (samples,) to (1, samples, 1).
For reshaping considering the first dimension, we need a lambda layer:
model.add(Lambda(lambda x: K.reshape(x,(1,-1,1)), output_shape=(None,1)))
model.add(Conv1D(1,3,padding='same'))
#it's very important to reshape back to the same number of original samples, or keras will not accept your model:
model.add(lambda(lambda x: K.reshape(x,(-1,1)),output_shape=(1,)))
The final shape may need adjustment to fit your training data. Depending on whether your numpy arrays are (samples,) or (samples,1).

What is the replace for softmax layer in case more than one output can be activated?

For example, I have CNN which tries to predict numbers from MNIST dataset (code written using Keras). It has 10 outputs, which form softmax layer. Only one of outputs can be true (independently for each digit from 0 to 9):
Real: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
Predicted: [0.02, 0.9, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]
Sum of predicted is equal to 1.0 due to definition of softmax.
Let's say I have a task where I need to classify some objects that can fall in several categories:
Real: [0, 1, 0, 1, 0, 1, 0, 0, 0, 1]
So I need to normalize in some other way. I need function which gives value on range [0, 1] and which sum can be larger than 1.
I need something like that:
Predicted: [0.1, 0.9, 0.05, 0.9, 0.01, 0.8, 0.1, 0.01, 0.2, 0.9]
Each number is probability that object falls in given category. After that I can use some threshold like 0.5 to distinguish categories in which given object falls.
The following questions appear:
So which activation function can be used for this?
May be this function already exists in Keras?
May be you can propose some other way to predict in this case?
Your problem is one of multi-label classification, and in the context of Keras it is discussed, for example, here: https://github.com/fchollet/keras/issues/741
In short the suggested solution for it in keras is to replace the softmax layer with a sigmoid layer and use binary_crossentropy as your cost function.
an example from that thread:
# Build a classifier optimized for maximizing f1_score (uses class_weights)
clf = Sequential()
clf.add(Dropout(0.3))
clf.add(Dense(xt.shape[1], 1600, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1600, 1200, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1200, 800, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(800, yt.shape[1], activation='sigmoid'))
clf.compile(optimizer=Adam(), loss='binary_crossentropy')
clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)
preds = clf.predict(xs)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
print f1_score(ys, preds, average='macro')

Resources