Keras - Issues using pre-trained word embeddings - keras

I'm following Keras tutorials on word embeddings and replicated the code (with a few modifications) from this particular one:
Using pre-trained word embeddings in a Keras model
It's a topic classification problem in which they are loading pre-trained word vectors and use them via a fixed embedding layer.
When using the pre-trained embedding vectors I can, in fact, achieve their 95% accuracy. This is the code:
embedding_layer = Embedding(len(embed_matrix), len(embed_matrix.columns), weights=[embed_matrix],
input_length=data.shape[1:], trainable=False)
sequence_input = Input(shape=(MAXLEN,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation='relu')(embedded_sequences)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(5)(x)
x = Dropout(0.2)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(35)(x) # global max pooling
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
output = Dense(target.shape[1], activation='softmax')(x)
model = Model(sequence_input, output)
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['acc'])
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=2,
batch_size=128)
The issue happens when I remove the embedding vectors and use completely random vectors, surprisingly achieving higher accuracy: 96.5%.
The code is the same, with one modification: weighs=[random_matrix]. That's a matrix with the same shape of embed_matrix, but using random values. So this is the embedding layer now:
embedding_layer = Embedding(len(embed_matrix),
len(embed_matrix.columns), weights=[random_matrix],
input_length=data.shape[1:], trainable=False)
I experimented many times with random weights and the result is always similar. Notice that even though those weights are random, the trainable parameter is still False, so the NN is not updating them.
After that, I fully removed the embedding layer and used words sequences as the input, expecting that those weights were not contributing to the model's accuracy. With that, I got nothing more than 16% accuracy.
So, what is going on? How could random embeddings achieve the same or better performance than pre-trained ones?
And why using word indexes (normalized, of course) as inputs result in such a poor accuracy?

Related

Why my neural network always predicts the same class?

I have the following neural network for binary classification. The problem is it always predicts the same class (class 1, or positive class). I tried oversampling the negative class so that the ratio of the positive is about 43% but still the model produces 1. Basically, it is not doing any training.
tf.reset_default_graph()
sess = tf.InteractiveSession()
input1 = Input(shape=(10,100)) #number of time steps and number of features
lstm1 = LSTM(units=10)(input1)
dense_1 = Dense(8, activation='relu')(lstm1)
dense_2 = Dense(4, activation='relu')(dense_1)
dense_3 = Dense(1, activation='softmax')(dense_2)
model = Model(inputs=[input1],outputs=[dense_3])
# compile the model
opt = Adam(lr=1e-06)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
model.summary()
batch_size = 32
epochs = 100
callbacks = [ModelCheckpoint(filepath='best_Doc2Vec_LSTM.h5', monitor='val_loss', save_best_only=True)]
train_history = model.fit([Data_x_train],
[Data_y_train], batch_size=batch_size, epochs=epochs, validation_data=(Data_x_val, Data_y_val), callbacks = callbacks, verbose = 2)
The possible problem can be your datasets, if the dataset that you used for training were unbalanced, the test error will increase. So, you need to have a balanced dataset. It contributes in improve generalisation and avoid overfitting.
Another possible solution is to do a grid search to tune the hyperparameters of models including batch size, epochs, activation functions, learning rate, optimizer and loss function and finds the optimum ones.

Array mismatch in Keras classifier model output layer

I am designing a classifier which takes 10 values - signal (acquired by processing pixels of MNIST dataset, normalized 0-1) at the input, and outputs the class of digit. The 10 valued signal is unique for each digit and therefore classification can be performed.
num_classes=10
y_train=to_categorical(y_train,num_classes)
y_test=to_categorical(y_test,num_classes)
x_train=(60000,10,1,1)
y_train=(60000,10)
x_test=(10000,10,1,1)
y_test=(10000,10)
The code is given as
input_img = Input(shape=(10,1,1))
x = Flatten()(input_img)
x = Dense(100, activation='relu')(x)
x = Dense(100, activation='relu')(x)
decoded = Dense(10, activation='softmax')(x)
autoencoder=Model(input_img,decoded)
autoencoder.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
history=autoencoder.fit(x_train, y_train,
epochs=30,
batch_size=32,
verbose=1,
shuffle=True,
validation_data=(x_test, y_test))
Please suggest what changes can be made.
I think you should probably use the tf.keras.losses.CategoricalCrossentropy loss function because you encoded your target to one hot vector using to_categorical. According to doc:
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot representation. If you want to provide labels as integers, please use SparseCategoricalCrossentropy loss.
However, IMO, without reproducible code, it's really ambitious to give a specific answer.

LSTM Autoencoder producing poor results in test data

I'm applying LSTM autoencoder for anomaly detection. Since anomaly data are very few as compared to normal data, only normal instances are used for the training. Testing data consists of both anomalies and normal instances. During the training, the model loss seems good. However, in the test the data the model produces poor accuracy. i.e. anomaly and normal points are not well separated.
The snippet of my code is below:
.............
.............
X_train = X_train.reshape(X_train.shape[0], lookback, n_features)
X_valid = X_valid.reshape(X_valid.shape[0], lookback, n_features)
X_test = X_test.reshape(X_test.shape[0], lookback, n_features)
.....................
......................
N = 1000
batch = 1000
lr = 0.0001
timesteps = 3
encoding_dim = int(n_features/2)
lstm_model = Sequential()
lstm_model.add(LSTM(N, activation='relu', input_shape=(timesteps, n_features), return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=False))
lstm_model.add(RepeatVector(timesteps))
# Decoder
lstm_model.add(LSTM(timesteps, activation='relu', return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=True))
lstm_model.add(TimeDistributed(Dense(n_features)))
lstm_model.summary()
adam = optimizers.Adam(lr)
lstm_model.compile(loss='mse', optimizer=adam)
cp = ModelCheckpoint(filepath="lstm_classifier.h5",
save_best_only=True,
verbose=0)
tb = TensorBoard(log_dir='./logs',
histogram_freq=0,
write_graph=True,
write_images=True)
lstm_model_history = lstm_model.fit(X_train, X_train,
epochs=epochs,
batch_size=batch,
shuffle=False,
verbose=1,
validation_data=(X_valid, X_valid),
callbacks=[cp, tb]).history
.........................
test_x_predictions = lstm_model.predict(X_test)
mse = np.mean(np.power(preprocess_data.flatten(X_test) - preprocess_data.flatten(test_x_predictions), 2), axis=1)
error_df = pd.DataFrame({'Reconstruction_error': mse,
'True_class': y_test})
# Confusion Matrix
pred_y = [1 if e > threshold else 0 for e in error_df.Reconstruction_error.values]
conf_matrix = confusion_matrix(error_df.True_class, pred_y)
plt.figure(figsize=(5, 5))
sns.heatmap(conf_matrix, xticklabels=LABELS, yticklabels=LABELS, annot=True, fmt="d")
plt.title("Confusion matrix")
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
Please suggest what can be done in the model to improve the accuracy.
If your model is not performing good on the test set I would make sure to check certain things;
Training set is not contaminated with anomalies or any information from the test set. If you use scaling, make sure you did not fit the scaler to training and test set combined.
Based on my experience; if an autoencoder cannot discriminate well enough on the test data but has low training loss, provided your training set is pure, it means that the autoencoder did learn about the underlying details of the training set but not about the generalized idea.
Your threshold value might be off and you may need to come up with a better thresholding procedure. One example can be found here: https://dl.acm.org/citation.cfm?doid=3219819.3219845
If the problem is 2nd one, the solution is to increase generalization. With autoencoders, one of the most efficient generalization tool is the dimension of the bottleneck. Again based on my experience with anomaly detection in flight radar data; lowering the bottleneck dimension significantly increased my multi-class classification accuracy. I was using 14 features with an encoding_dim of 7, but encoding_dim of 4 provided even better results. The value of the training loss was not important in my case because I was only comparing reconstruction errors, but since you are making a classification with a threshold value of RE, a more robust thresholding may be used to improve accuracy, just as in the paper I've shared.

Emotion detection on text

I am a newbie in ML and was experimenting with emotion detection on the text.
So I have an ISEAR dataset which contains tweets with their emotion labeled.
So my current accuracy is 63% and I want to increase to at least 70% or even more maybe.
Heres the code :
inputs = Input(shape=(MAX_LENGTH, ))
embedding_layer = Embedding(vocab_size,
64,
input_length=MAX_LENGTH)(inputs)
# x = Flatten()(embedding_layer)
x = LSTM(32, input_shape=(32, 32))(embedding_layer)
x = Dense(10, activation='relu')(x)
predictions = Dense(num_class, activation='softmax')(x)
model = Model(inputs=[inputs], outputs=predictions)
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['acc'])
model.summary()
filepath="weights-simple.hdf5"
checkpointer = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
history = model.fit([X_train], batch_size=64, y=to_categorical(y_train), verbose=1, validation_split=0.1,
shuffle=True, epochs=10, callbacks=[checkpointer])
That's a pretty general question, optimizing the performance of a neural network may require tuning many factors.
For instance:
The optimizer chosen: in NLP tasks rmsprop is also a popular
optimizer
Tweaking the learning rate
Regularization - e.g dropout, recurrent_dropout, batch norm. This may help the model to generalize better
More units in the LSTM
More dimensions in the embedding
You can try grid search, e.g. using different optimizers and evaluate on a validation set.
The data may also need some tweaking, such as:
Text normalization - better representation of the tweets - remove unnecessary tokens (#, #)
Shuffle the data before the fit - keras validation_split creates a validation set using the last data records
There is no simple answer to your question.

Accuracy goes to 0.0000 when training RNN with Keras?

I'm trying to use custom word-embeddings from Spacy for training a sequence -> label RNN query classifier. Here's my code:
word_vector_length = 300
dictionary_size = v.num_tokens + 1
word_vectors = v.get_word_vector_dictionary()
embedding_weights = np.zeros((dictionary_size, word_vector_length))
max_length = 186
for word, index in dictionary._get_raw_id_to_token().items():
if word in word_vectors:
embedding_weights[index,:] = word_vectors[word]
model = Sequential()
model.add(Embedding(input_dim=dictionary_size, output_dim=word_vector_length,
input_length= max_length, mask_zero=True, weights=[embedding_weights]))
model.add(Bidirectional(LSTM(128, activation= 'relu', return_sequences=False)))
model.add(Dense(v.num_labels, activation= 'sigmoid'))
model.compile(loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = ['accuracy'])
model.fit(X_train, Y_train, batch_size=200, nb_epoch=20)
here the word_vectors are stripped from spacy.vectors and have length 300, the input is an np_array which looks like [0,0,12,15,0...] of dimension 186, where the integers are the token ids in the input, and I've constructed the embedded weight matrix accordingly. The output layer is [0,0,1,0,...0] of length 26 for each training sample, indicating the label that should go with this piece of vectorized text.
This looks like it should work, but during the first epoch the training accuracy is continually decreasing... and by the end of the first epoch/for the rest of training, it's exactly 0 and I'm not sure why this is happening. I've trained plenty of models with keras/TF before and never encountered this issue.
Any idea what might be happening here?
Are the labels always one-hot? Meaning only one of the elements of the label vector is one and the rest zero.
If so, then maybe try using a softmax activation with a categorical crossentropy loss like in the following official example:
https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py#L202
This will help constraint the network to output probability distributions on the last layer (i.e. the softmax layer outputs sum up to 1).

Resources