Array mismatch in Keras classifier model output layer - python-3.x

I am designing a classifier which takes 10 values - signal (acquired by processing pixels of MNIST dataset, normalized 0-1) at the input, and outputs the class of digit. The 10 valued signal is unique for each digit and therefore classification can be performed.
num_classes=10
y_train=to_categorical(y_train,num_classes)
y_test=to_categorical(y_test,num_classes)
x_train=(60000,10,1,1)
y_train=(60000,10)
x_test=(10000,10,1,1)
y_test=(10000,10)
The code is given as
input_img = Input(shape=(10,1,1))
x = Flatten()(input_img)
x = Dense(100, activation='relu')(x)
x = Dense(100, activation='relu')(x)
decoded = Dense(10, activation='softmax')(x)
autoencoder=Model(input_img,decoded)
autoencoder.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
history=autoencoder.fit(x_train, y_train,
epochs=30,
batch_size=32,
verbose=1,
shuffle=True,
validation_data=(x_test, y_test))
Please suggest what changes can be made.

I think you should probably use the tf.keras.losses.CategoricalCrossentropy loss function because you encoded your target to one hot vector using to_categorical. According to doc:
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot representation. If you want to provide labels as integers, please use SparseCategoricalCrossentropy loss.
However, IMO, without reproducible code, it's really ambitious to give a specific answer.

Related

LSTM Autoencoder producing poor results in test data

I'm applying LSTM autoencoder for anomaly detection. Since anomaly data are very few as compared to normal data, only normal instances are used for the training. Testing data consists of both anomalies and normal instances. During the training, the model loss seems good. However, in the test the data the model produces poor accuracy. i.e. anomaly and normal points are not well separated.
The snippet of my code is below:
.............
.............
X_train = X_train.reshape(X_train.shape[0], lookback, n_features)
X_valid = X_valid.reshape(X_valid.shape[0], lookback, n_features)
X_test = X_test.reshape(X_test.shape[0], lookback, n_features)
.....................
......................
N = 1000
batch = 1000
lr = 0.0001
timesteps = 3
encoding_dim = int(n_features/2)
lstm_model = Sequential()
lstm_model.add(LSTM(N, activation='relu', input_shape=(timesteps, n_features), return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=False))
lstm_model.add(RepeatVector(timesteps))
# Decoder
lstm_model.add(LSTM(timesteps, activation='relu', return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=True))
lstm_model.add(TimeDistributed(Dense(n_features)))
lstm_model.summary()
adam = optimizers.Adam(lr)
lstm_model.compile(loss='mse', optimizer=adam)
cp = ModelCheckpoint(filepath="lstm_classifier.h5",
save_best_only=True,
verbose=0)
tb = TensorBoard(log_dir='./logs',
histogram_freq=0,
write_graph=True,
write_images=True)
lstm_model_history = lstm_model.fit(X_train, X_train,
epochs=epochs,
batch_size=batch,
shuffle=False,
verbose=1,
validation_data=(X_valid, X_valid),
callbacks=[cp, tb]).history
.........................
test_x_predictions = lstm_model.predict(X_test)
mse = np.mean(np.power(preprocess_data.flatten(X_test) - preprocess_data.flatten(test_x_predictions), 2), axis=1)
error_df = pd.DataFrame({'Reconstruction_error': mse,
'True_class': y_test})
# Confusion Matrix
pred_y = [1 if e > threshold else 0 for e in error_df.Reconstruction_error.values]
conf_matrix = confusion_matrix(error_df.True_class, pred_y)
plt.figure(figsize=(5, 5))
sns.heatmap(conf_matrix, xticklabels=LABELS, yticklabels=LABELS, annot=True, fmt="d")
plt.title("Confusion matrix")
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
Please suggest what can be done in the model to improve the accuracy.
If your model is not performing good on the test set I would make sure to check certain things;
Training set is not contaminated with anomalies or any information from the test set. If you use scaling, make sure you did not fit the scaler to training and test set combined.
Based on my experience; if an autoencoder cannot discriminate well enough on the test data but has low training loss, provided your training set is pure, it means that the autoencoder did learn about the underlying details of the training set but not about the generalized idea.
Your threshold value might be off and you may need to come up with a better thresholding procedure. One example can be found here: https://dl.acm.org/citation.cfm?doid=3219819.3219845
If the problem is 2nd one, the solution is to increase generalization. With autoencoders, one of the most efficient generalization tool is the dimension of the bottleneck. Again based on my experience with anomaly detection in flight radar data; lowering the bottleneck dimension significantly increased my multi-class classification accuracy. I was using 14 features with an encoding_dim of 7, but encoding_dim of 4 provided even better results. The value of the training loss was not important in my case because I was only comparing reconstruction errors, but since you are making a classification with a threshold value of RE, a more robust thresholding may be used to improve accuracy, just as in the paper I've shared.

Keras - Issues using pre-trained word embeddings

I'm following Keras tutorials on word embeddings and replicated the code (with a few modifications) from this particular one:
Using pre-trained word embeddings in a Keras model
It's a topic classification problem in which they are loading pre-trained word vectors and use them via a fixed embedding layer.
When using the pre-trained embedding vectors I can, in fact, achieve their 95% accuracy. This is the code:
embedding_layer = Embedding(len(embed_matrix), len(embed_matrix.columns), weights=[embed_matrix],
input_length=data.shape[1:], trainable=False)
sequence_input = Input(shape=(MAXLEN,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation='relu')(embedded_sequences)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(5)(x)
x = Dropout(0.2)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(35)(x) # global max pooling
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
output = Dense(target.shape[1], activation='softmax')(x)
model = Model(sequence_input, output)
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['acc'])
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=2,
batch_size=128)
The issue happens when I remove the embedding vectors and use completely random vectors, surprisingly achieving higher accuracy: 96.5%.
The code is the same, with one modification: weighs=[random_matrix]. That's a matrix with the same shape of embed_matrix, but using random values. So this is the embedding layer now:
embedding_layer = Embedding(len(embed_matrix),
len(embed_matrix.columns), weights=[random_matrix],
input_length=data.shape[1:], trainable=False)
I experimented many times with random weights and the result is always similar. Notice that even though those weights are random, the trainable parameter is still False, so the NN is not updating them.
After that, I fully removed the embedding layer and used words sequences as the input, expecting that those weights were not contributing to the model's accuracy. With that, I got nothing more than 16% accuracy.
So, what is going on? How could random embeddings achieve the same or better performance than pre-trained ones?
And why using word indexes (normalized, of course) as inputs result in such a poor accuracy?

Errors while fine tuning InceptionV3 in Keras

I am going to fine-tune InceptionV3 model using my self-defined dataset. Unfortunately, when using model.fit to train, here comes the error below:
ValueError: Error when checking target: expected dense_6 to have shape (4,) but got array with shape (1,)
Firstly, I load my own dataset as training_data which contains a pair of image and corresponding label. Then, I use the code below to convert them into specific array-type(img_new and label_new) so that it's compatible to Keras's inputs of both data and labels.
for img, label in training_data:
img_new[i,:,:,:] = img
label_new[i,:] = label
i=i+1
Second, I fine tune the Inception Model below.
InceptionV3_model=keras.applications.inception_v3.InceptionV3(include_top=False,
weights='imagenet',
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
#InceptionV3_model.summary()
# add a global spatial average pooling layer
x = InceptionV3_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 4 classes
predictions = Dense(4, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=InceptionV3_model.input, outputs=predictions)
# Transfer Learning
for layer in model.layers[:311]:
layer.trainable = False
for layer in model.layers[311:]:
layer.trainable = True
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.001, momentum=0.9), loss='categorical_crossentropy')
model.fit(x=X_train, y=y_train, batch_size=3, epochs=3, validation_split=0.2)
model.save_weights('first_try.h5')
Does anyone have ideas of what is wrong while training using model.fit?
Sincerely thanks for your kind help.
The error is caused because my labels r integers, I gotta compile it by sparse_categorical_crossentropy which is set for integer labels instead of categorical_crossentropy which is used for one-hot encoding.
Sincerely thank for the help by #Amir very much. :-)

Accuracy goes to 0.0000 when training RNN with Keras?

I'm trying to use custom word-embeddings from Spacy for training a sequence -> label RNN query classifier. Here's my code:
word_vector_length = 300
dictionary_size = v.num_tokens + 1
word_vectors = v.get_word_vector_dictionary()
embedding_weights = np.zeros((dictionary_size, word_vector_length))
max_length = 186
for word, index in dictionary._get_raw_id_to_token().items():
if word in word_vectors:
embedding_weights[index,:] = word_vectors[word]
model = Sequential()
model.add(Embedding(input_dim=dictionary_size, output_dim=word_vector_length,
input_length= max_length, mask_zero=True, weights=[embedding_weights]))
model.add(Bidirectional(LSTM(128, activation= 'relu', return_sequences=False)))
model.add(Dense(v.num_labels, activation= 'sigmoid'))
model.compile(loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = ['accuracy'])
model.fit(X_train, Y_train, batch_size=200, nb_epoch=20)
here the word_vectors are stripped from spacy.vectors and have length 300, the input is an np_array which looks like [0,0,12,15,0...] of dimension 186, where the integers are the token ids in the input, and I've constructed the embedded weight matrix accordingly. The output layer is [0,0,1,0,...0] of length 26 for each training sample, indicating the label that should go with this piece of vectorized text.
This looks like it should work, but during the first epoch the training accuracy is continually decreasing... and by the end of the first epoch/for the rest of training, it's exactly 0 and I'm not sure why this is happening. I've trained plenty of models with keras/TF before and never encountered this issue.
Any idea what might be happening here?
Are the labels always one-hot? Meaning only one of the elements of the label vector is one and the rest zero.
If so, then maybe try using a softmax activation with a categorical crossentropy loss like in the following official example:
https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py#L202
This will help constraint the network to output probability distributions on the last layer (i.e. the softmax layer outputs sum up to 1).

Training only one output of a network in Keras

I have a network in Keras with many outputs, however, my training data only provides information for a single output at a time.
At the moment my method for training has been to run a prediction on the input in question, change the value of the particular output that I am training and then doing a single batch update. If I'm right this is the same as setting the loss for all outputs to zero except the one that I'm trying to train.
Is there a better way? I've tried class weights where I set a zero weight for all but the output I'm training but it doesn't give me the results I expect?
I'm using the Theano backend.
Outputting multiple results and optimizing only one of them
Let's say you want to return output from multiple layers, maybe from some intermediate layers, but you need to optimize only one target output. Here's how you can do it:
Let's start with this model:
inputs = Input(shape=(784,))
x = Dense(64, activation='relu')(inputs)
# you want to extract these values
useful_info = Dense(32, activation='relu', name='useful_info')(x)
# final output. used for loss calculation and optimization
result = Dense(1, activation='softmax', name='result')(useful_info)
Compile with multiple outputs, set loss as None for extra outputs:
Give None for outputs that you don't want to use for loss calculation and optimization
model = Model(inputs=inputs, outputs=[result, useful_info])
model.compile(optimizer='rmsprop',
loss=['categorical_crossentropy', None],
metrics=['accuracy'])
Provide only target outputs when training. Skipping extra outputs:
model.fit(my_inputs, {'result': train_labels}, epochs=.., batch_size=...)
# this also works:
#model.fit(my_inputs, [train_labels], epochs=.., batch_size=...)
One predict to get them all
Having one model you can run predict only once to get all outputs you need:
predicted_labels, useful_info = model.predict(new_x)
In order to achieve this I ended up using the 'Functional API'. You basically create multiple models, using the same layers input and hidden layers but different output layers.
For example:
https://keras.io/getting-started/functional-api-guide/
from keras.layers import Input, Dense
from keras.models import Model
# This returns a tensor
inputs = Input(shape=(784,))
# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions_A = Dense(1, activation='softmax')(x)
predictions_B = Dense(1, activation='softmax')(x)
# This creates a model that includes
# the Input layer and three Dense layers
modelA = Model(inputs=inputs, outputs=predictions_A)
modelA.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
modelB = Model(inputs=inputs, outputs=predictions_B)
modelB.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])

Resources