I have LSTM NN that has 1 output after the last Dense (softmax) neuron. I saw that if I smooth the predicted Y by applying numpy convolution, I get much better accuracy.
The issue is that I manually choose values for convolution kernel. I'd like to get NN possibility to train convolution kernel values. So, I need to add convolution as the last layer after softmax dense. If I understand Keras Conv1D correctly, it can convolve along features only. But I need to convolve along output for different samples (axis 0). Thus, if NN produces
Y = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
...and kernel_size of convolution layer is 3, it should convolve vector Y and the another trained convolution vector C (for example [0.1, 0.5, 1]):
>>> np.convolve([0.1, 0.5, 1],[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7], mode='same')
array([ 0.07, 0.23, 0.39, 0.55, 0.71, 0.87, 0.95])
So, the goal is convolve output along samples but let NN train convolution vector kernel to choose the best one.
Is it possible to do this in Keras?
A convolution layer will require an input shape like (samples, length, channels).
To make a convolution along the samples, you simply reorganize your tensor to make it attain the convolutional input requirements.
Looks like you want the old samples to be the new length, and that you have only one channel in any case. I'm not sure whether this is exactly what you intend to do, but as a consequence, we will leave only one new sample.
So, we reshape your tensor from (samples,) to (1, samples, 1).
For reshaping considering the first dimension, we need a lambda layer:
model.add(Lambda(lambda x: K.reshape(x,(1,-1,1)), output_shape=(None,1)))
model.add(Conv1D(1,3,padding='same'))
#it's very important to reshape back to the same number of original samples, or keras will not accept your model:
model.add(lambda(lambda x: K.reshape(x,(-1,1)),output_shape=(1,)))
The final shape may need adjustment to fit your training data. Depending on whether your numpy arrays are (samples,) or (samples,1).
Related
I am running a cnn in Google colab and i am using tensorflow or Keras. However I received this feedback
Negative dimension size caused by subtracting 3 from 2 for '{{node conv2d_11/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](Placeholder, conv2d_11/Conv2D/ReadVariableOp)' with input shapes: [?,2,2,394], [3,3,394,394].
Call arguments received:
• inputs=tf.Tensor(shape=(None, 2, 2, 394), dtype=float32)
does this have to do with my input data or my parameters? Thanks
Try inserting a number instead of using None when specifying the shape. As said in the documentation here, you run across not-fully-specified shapes when using None.
in this case you have defined model which consists of MaxPool layers or AvgPool layers alot, so by the images pass through model layers, images size will be decreased; i think it would be helpful if you set padding parameter in convolution layer to same and for more details you could read about conv layers parameters strides and padding.
I'm trying to model a Keras-based network using a set of 1D CNN and LSTM layers. Most of the available examples on the web uses data in the shape such as (1, 30, 50) (1 sample containing 30 time-steps with 50 features each).
However, each time step in my dataset is composed of a number of 1D arrays. A 10 time-step sample would be (1, 10, 100, 384) (1 batch - a single sample, 10 time-steps each containing 100 arrays with 384 features). So, how should I define a model with such shape?
I really could have flatten each time-step data (100*384), but that seems quite inadequate, as could void all the CNN processing... Plus, each time-step data is really 1D: it is not spacial data.
I have already defined a simple model such as below, but i think it's using the batch_size of the input shape incorrectly. I think its trying to learn from "482 samples" and not from a single sample with "482 time-steps"...
data_input_shape = (482, 100, 384)
model = Sequential()
model.add(Conv1D(300, 1, activation="relu", input_shape=(100,384)))
model.add(MaxPooling1D(4))
model.add(Conv1D(256, 1, activation="relu"))
model.add(MaxPooling1D(4))
model.add(Conv1D(128, 1, activation="relu"))
model.add(MaxPooling1D(5))
model.add(LSTM(200, return_sequences=True))
model.add(LSTM(200, return_sequences=True))
model.add(LSTM(200, return_sequences=True))
model.add(Dense(1, activation='sigmoid'))
Any suggestions?
Let's assume the following two case as you have already mentioned that the 100 arrays are not spatially correlated:
The 384 values of each feature are spatially independent.
The 384 values of each feature are spatially dependent. For example, they are values across a frequency range after some FFT or similar operation.
In case 1, you have basically 100x384 independent feature. So flatting seems to be the option to go with.
In case 2 though, it might make sense to apply a 2D convolution across the features. Here is how:
First, you should prepare your data in the right format. Assuming that your data has 482 time steps, you should decide how many time steps you like to have in each sample. For example, you can decide to have 10 time steps in each sample, which with no overlapping between the samples will give you about 48 samples. So the data now would be of shape (48, 10, 100, 384). In addition, we should add an additional dimension as channel to be able to apply a 2D convolution in Keras. So your data would become of shape (48, 10, 100, 384, 1)
Next, you can decide on the architecture. We will apply a Conv2D to each array at each time step. We use a kernel size of (1, x) or (100, x) since your arrays are not spatially related. Here is an example architecture:
model = Sequential()
model.add(TimeDistributed(Conv2D(16, (1, 5), activation="relu"), input_shape=(10, 100, 384, 1)))
model.add(TimeDistributed(MaxPooling2D((1, 2))))
model.add(TimeDistributed(Conv2D(32, (100, 9), activation="relu"), input_shape=(10, 100, 384, 1)))
model.add(TimeDistributed(MaxPooling2D((1, 4))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(16, return_sequences=True))
model.add(Dense(1, activation='sigmoid'))
A few additional notes:
You can certainly add more layers of each type.
TimeDistruted is new above. You can read about it here.
If you have images to begin with, consider using CNN/LSTM hybrid or Conv3D from the beginning vs extracting 100 arrays from the image.
Take a look at ConvLSTM2D here for combined CNN and LSTM layer.
This ended up being a different issue from the one in the question
I have a very simple Keras model that accepts time series data. I want to use a recurrent layer to predict a new sequence of the same dimensions, with a softmax on the end to provide a normalised result at each time step.
This is how my model looks.
x = GRU(256, return_sequences=True)(x)
x = TimeDistributed(Dense(3, activation='softmax'))(x)
Imagine the input is something like:
[
[0.25, 0.25, 0.5],
[0.3, 0.3, 0.4],
[0.2, 0.7, 0.1],
[0.1, 0.1, 0.8]
]
I'd expect the output to be the same shape and normalised at each step, like:
[
[0.15, 0.35, 0.5],
[0.35, 0.35, 0.3],
[0.1, 0.6, 0.3],
[0.1, 0.2, 0.7]
]
But what I actually get is a result where the sum of elements in each row are actually a quarter (or whatever fraction of the number of rows), not 1.
Put simply, I thought the idea of TimeDistributed was to apply the Dense layer to each time step, so effectively the Dense with softmax activation would be applied repeatedly to each timestep. But I seem to be getting a result that looks like it is normalized across all elements in the output matrix of time steps.
Since I seem to understand incorrectly, is there a way to get a Dense softmax result for each time step (normalized to 1 at each step) without having to predict each time step sequentially?
It appears that the issue wasn't with the handling of Softmax with a TimeDistributed wrapper, but an error in my predictions function, which was summing over the whole matrix rather than on a row by row basis.
I've been implementing an autoencoder which receives as inputs vectors that consist only of 0 and 1, such as [1, 0, 1, 0, 1, 0, ...].
Likewise, another autoencoder that receives as inputs vectors that consist in values between 0 and 1, such as [0.123, 1, 0.9, 0.01, 0.9, ...]. In both cases each vector element is the input value of a node. The activation function of the hidden layers is relu and for the output layer is sigmoid.
I've seen some examples of autoencoders where adam/adadelta are used as optimizer and binary_crossentropy is used as a loss function. For that reason I implemented in both adadelta and binary_crossentropy, but I'm not sure if for both cases it's the correct configuration.
For example, I have CNN which tries to predict numbers from MNIST dataset (code written using Keras). It has 10 outputs, which form softmax layer. Only one of outputs can be true (independently for each digit from 0 to 9):
Real: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
Predicted: [0.02, 0.9, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]
Sum of predicted is equal to 1.0 due to definition of softmax.
Let's say I have a task where I need to classify some objects that can fall in several categories:
Real: [0, 1, 0, 1, 0, 1, 0, 0, 0, 1]
So I need to normalize in some other way. I need function which gives value on range [0, 1] and which sum can be larger than 1.
I need something like that:
Predicted: [0.1, 0.9, 0.05, 0.9, 0.01, 0.8, 0.1, 0.01, 0.2, 0.9]
Each number is probability that object falls in given category. After that I can use some threshold like 0.5 to distinguish categories in which given object falls.
The following questions appear:
So which activation function can be used for this?
May be this function already exists in Keras?
May be you can propose some other way to predict in this case?
Your problem is one of multi-label classification, and in the context of Keras it is discussed, for example, here: https://github.com/fchollet/keras/issues/741
In short the suggested solution for it in keras is to replace the softmax layer with a sigmoid layer and use binary_crossentropy as your cost function.
an example from that thread:
# Build a classifier optimized for maximizing f1_score (uses class_weights)
clf = Sequential()
clf.add(Dropout(0.3))
clf.add(Dense(xt.shape[1], 1600, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1600, 1200, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1200, 800, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(800, yt.shape[1], activation='sigmoid'))
clf.compile(optimizer=Adam(), loss='binary_crossentropy')
clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)
preds = clf.predict(xs)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
print f1_score(ys, preds, average='macro')