How to implement a multi-label text classifier in Keras? - keras

I've been trying to create a multi-label text classifier that uses GloVe embeddings using Keras. I'm currently experimenting with the TREC-6 data set available at http://cogcomp.org/Data/QA/QC/. I'm only considering the 5 broad labels for my classification problem and am ignoring the sub-labels.
Since it's a multi-label classification problem, given a sentence, my neural network should output all labels with a probability greater than 0.1. Issue is that the network almost always classifies a question with only 1 label, which is fine when I'm asking a question that belongs to only one category. When I combine questions from different categories however, it still gives only one label with a high confidence most of the time, although I want all relevant labels to be identified.
I'm absolutely sure the pre-processing steps are correct. I get the feeling that there's some issue in my model.
I started experimenting with only CNNs in the beginning by referring to the paper "Convolutional Neural Networks for Sentence Classification" at https://www.aclweb.org/anthology/D14-1181.pdf, but on a teacher's advice and after seeing that they fail for long questions with different relevant topics, I tried experimenting with LSTMs and BiLSTMS. I started with this approach https://www.kaggle.com/c/quora-insincere-questions-classification/discussion/80568 and kept modifying some parameters / adding and removing layers hoping to get a good result but I've failed so far.
I tried copy pasting some code for Attention mechanism and adding that after my LSTM layers as well but it doesn't help.
My current model looks somewhat like this. I'll paste most of the rest of my code for clarity. The model and training code is present in the sentence_classifier() function.
class SentenceClassifier:
def __init__(self):
self.MAX_SEQUENCE_LENGTH = 200
self.EMBEDDING_DIM = 100
self.LABEL_COUNT = 0
self.WORD_INDEX = dict()
self.LABEL_ENCODER = None
def clean_str(self, string):
"""
Cleans each string and convert to lower case.
"""
string = re.sub(r"\'s", "", string)
string = re.sub(r"\'ve", "", string)
string = re.sub(r"n\'t", " not", string)
string = re.sub(r"\'re", "", string)
string = re.sub(r"\'d", "", string)
string = re.sub(r"\'ll", "", string)
string = re.sub(r"[^A-Za-z0-9]", " ", string)
string = re.sub(r"\s{2,}", " ", string)
return string.strip().lower()
def loader_encoder(self, table, type="json"):
"""
Load and encode data from dataset.
type = "sql" means get data from MySQL database.
type = "json" means get data from .json file. """
if type == "json":
with open('data/' + table + '.json', 'r', encoding='utf8') as f:
datastore = json.load(f)
questions = []
tags = []
for row in datastore:
questions.append(row['question'])
tags.append(row['tags'].split(','))
tokenizer = Tokenizer(lower=True, char_level=False)
tokenizer.fit_on_texts(questions)
self.WORD_INDEX = tokenizer.word_index
questions_encoded = tokenizer.texts_to_sequences(questions)
questions_encoded_padded = pad_sequences(questions_encoded, maxlen=self.MAX_SEQUENCE_LENGTH, padding='post')
for i, ele in enumerate(tags):
for j, tag in enumerate(ele):
if len(tag) == 0 or tag == ',':
del tags[i][j]
encoder = MultiLabelBinarizer()
encoder.fit(tags)
self.LABEL_ENCODER = encoder
tags_encoded = encoder.fit_transform(tags)
self.LABEL_COUNT = len(tags_encoded[0]) #No. of labels
print("\tUnique Tokens in Training Data: ", len(self.WORD_INDEX))
return questions_encoded_padded, tags_encoded
def load_embeddings(self, EMBED_PATH='./embeddings/glove.6B.100d.txt'):
"""
Load pre-trained embeddings into memory.
"""
embeddings_index = {}
try:
f = open(EMBED_PATH, encoding='utf-8')
except FileNotFoundError:
print("Embeddings missing.")
sys.exit()
for line in f:
values = line.rstrip().rsplit(' ')
word = values[0]
vec = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = vec
f.close()
print("\tNumber of tokens in embeddings file: ", len(embeddings_index))
return embeddings_index
def create_embedding_matrix(self, embeddings_index):
"""
Creates an embedding matrix for all the words(vocab) in the training data with shape (vocab, EMBEDDING_DIM).
Out-of-vocab words will be randomly initialized to values between +0.25 and -0.25.
"""
words_not_found = []
vocab = len(self.WORD_INDEX) + 1
embedding_matrix = np.random.uniform(-0.25, 0.25, size=(vocab, self.EMBEDDING_DIM))
for word, i in self.WORD_INDEX.items():
if i >= vocab:
continue
embedding_vector = embeddings_index.get(word)
if (embedding_vector is not None) and len(embedding_vector) > 0:
embedding_matrix[i] = embedding_vector
else:
words_not_found.append(word)
# print('Number of null word embeddings: %d' % np.sum(np.sum(embedding_matrix, axis=1) == 0))
print("\tShape of embedding matrix: ", str(embedding_matrix.shape))
print("\tNo. of words not found in pre-trained embeddings: ", len(words_not_found))
return embedding_matrix
def sentence_classifier_cnn(self, embedding_matrix, x, y, table, load_saved=0):
"""
A static CNN model.
Makes uses of Keras functional API for constructing the model.
If load_saved=1, THEN load old model, ELSE train new model
model_name = table + ".model.h5"
if load_saved == 1 and os.path.exists('./saved/' + model_name):
print("\nLoading saved model...")
model = load_model('./saved/' + model_name)
print("Model Summary")
print(model.summary())
"""
print("\nTraining model...")
inputs = Input(shape=(self.MAX_SEQUENCE_LENGTH,), dtype='int32')
embedding = Embedding(input_dim=(len(self.WORD_INDEX) + 1), output_dim=self.EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=self.MAX_SEQUENCE_LENGTH)(inputs)
X = keras.layers.SpatialDropout1D(0.3)(embedding)
X = keras.layers.Bidirectional(keras.layers.CuDNNLSTM(64, return_sequences = True))(X)
#X2 = keras.layers.Bidirectional(keras.layers.CuDNNGRU(128, return_sequences = False))(X)
X = keras.layers.Conv1D(32, kernel_size=2, padding='valid', kernel_initializer='normal')(X)
X = keras.layers.GlobalMaxPooling1D()(X)
#X = Attention(self.MAX_SEQUENCE_LENGTH)(X)
X = Dropout(0.5)(X)
X = keras.layers.Dense(16, activation="relu")(X)
X = Dropout(0.5)(X)
X = keras.layers.BatchNormalization()(X)
output = Dense(units=self.LABEL_COUNT, activation='sigmoid')(X)
model = Model(inputs=inputs, outputs=output, name='intent_classifier')
print("Model Summary")
print(model.summary())
cbk = OutputObserver(model, classifier)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x, y,
batch_size=30,
epochs=23,
verbose=2,
callbacks = [cbk])
#keras.utils.vis_utils.plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
return model
def tag_question(self, model, question):
question = self.clean_str(question)
question_encoded = [[self.WORD_INDEX[w] for w in question.split(' ') if w in self.WORD_INDEX]]
question_encoded_padded = pad_sequences(question_encoded, maxlen=self.MAX_SEQUENCE_LENGTH, padding='post')
predictions = model.predict(question_encoded_padded)
possible_tags = []
for i, probability in enumerate(predictions[0]):
if probability >= 0.01:
possible_tags.append([self.LABEL_ENCODER.classes_[i], probability])
possible_tags.sort(reverse=True, key=lambda x:x[1]) #sort in place on the basis of the probability in each sub-list in descending order
print(possible_tags)
def setup_classifier(self, table):
'''
'''
print("Loading Data Set...")
x, y = self.loader_encoder(table)
embeddings_index = self.load_embeddings()
print("\nGenerating embedding matrix...")
embedding_matrix = self.create_embedding_matrix(embeddings_index)
#Loading / Training model
model = self.sentence_classifier_cnn(embedding_matrix, x, y, table, load_saved=1)
return model, embeddings_index
def connect_to_db(self):
mydb = mysql.connector.connect(host="localhost", user="root", passwd="root", database="questiondb")
cursor = mydb.cursor()
return mydb, cursor
As you can see, I've used a callback to print my predictions after each step.
I've tried to predict labels for all kinds of questions but I get good results only for questions that fall under one category.
For instance,
classifier.tag_question(model, "how many days before new year?")
gives
[['numeric', 0.99226487]]
as the output. But a more complex question like
classifier.tag_question(model, "who is the prophet of the muslim people and where is india located and how much do fruits costs there?")
gives something like
[['human', 0.9990531]]
as the output although labels like 'location' and 'numeric' are also relevant.
I used a callback to predict the prediction for that question after every epoch and I see something like this.
Epoch 1/23
- 19s - loss: 0.6581 - acc: 0.6365
[['human', 0.69752634], ['location', 0.40014982], ['entity', 0.32047516], ['abbreviation', 0.23877779], ['numeric', 0.23324837], ['description', 0.15995058]]
Epoch 2/23
- 12s - loss: 0.4525 - acc: 0.8264
[['human', 0.7437608], ['location', 0.18141672], ['entity', 0.14474556], ['numeric', 0.09171515], ['description', 0.053900182], ['abbreviation', 0.05283475]]
Epoch 3/23
- 12s - loss: 0.3854 - acc: 0.8478
[['human', 0.86335427], ['location', 0.12673976], ['entity', 0.09847507], ['numeric', 0.064431995], ['description', 0.035599917], ['abbreviation', 0.02441895]]
Epoch 4/23
- 12s - loss: 0.3634 - acc: 0.8509
[['human', 0.90795004], ['location', 0.10085008], ['entity', 0.09804481], ['numeric', 0.050411616], ['description', 0.032810867], ['abbreviation', 0.014970899]]
Epoch 5/23
- 13s - loss: 0.3356 - acc: 0.8582
[['human', 0.8365586], ['entity', 0.1130701], ['location', 0.10253032], ['numeric', 0.039931685], ['description', 0.02874279]]
Epoch 6/23
- 13s - loss: 0.3142 - acc: 0.8657
[['human', 0.95577633], ['entity', 0.088555306], ['location', 0.055004593], ['numeric', 0.015950901], ['description', 0.01428318]]
Epoch 7/23
- 13s - loss: 0.2942 - acc: 0.8750
[['human', 0.89538944], ['entity', 0.130977], ['location', 0.06350105], ['description', 0.023014158], ['numeric', 0.019377537]]
Epoch 8/23
- 13s - loss: 0.2739 - acc: 0.8802
[['human', 0.9725125], ['entity', 0.061141968], ['location', 0.026945814], ['description', 0.010931551]]
Epoch 9/23
- 13s - loss: 0.2579 - acc: 0.8914
[['human', 0.9797143], ['entity', 0.042518377], ['location', 0.027904237]]
Epoch 10/23
- 13s - loss: 0.2380 - acc: 0.9020
[['human', 0.7897601], ['entity', 0.14315197], ['location', 0.07439863], ['description', 0.019453615], ['numeric', 0.010681627]]
Epoch 11/23
- 13s - loss: 0.2250 - acc: 0.9104
[['human', 0.9886158], ['entity', 0.024878502], ['location', 0.015951043]]
Epoch 12/23
- 13s - loss: 0.2131 - acc: 0.9178
[['human', 0.9677731], ['entity', 0.03698206], ['location', 0.026153017]]
Epoch 13/23
- 13s - loss: 0.2029 - acc: 0.9204
[['human', 0.9514474], ['entity', 0.053581357], ['location', 0.029657435]]
Epoch 14/23
- 13s - loss: 0.1915 - acc: 0.9285
[['human', 0.9706739], ['entity', 0.0328649], ['location', 0.013876333]]
Epoch 15/23
- 13s - loss: 0.1856 - acc: 0.9300
[['human', 0.9328136], ['location', 0.05573874], ['entity', 0.025918543]]
Epoch 16/23
- 13s - loss: 0.1802 - acc: 0.9318
[['human', 0.9895527], ['entity', 0.014941782], ['location', 0.011972391]]
Epoch 17/23
- 13s - loss: 0.1717 - acc: 0.9373
[['human', 0.9426272], ['entity', 0.03754583], ['location', 0.023379702]]
Epoch 18/23
- 13s - loss: 0.1614 - acc: 0.9406
[['human', 0.99186605]]
Epoch 19/23
- 13s - loss: 0.1573 - acc: 0.9432
[['human', 0.9926062]]
Epoch 20/23
- 13s - loss: 0.1511 - acc: 0.9448
[['human', 0.9993554]]
Epoch 21/23
- 13s - loss: 0.1591 - acc: 0.9426
[['human', 0.9964465]]
Epoch 22/23
- 13s - loss: 0.1507 - acc: 0.9451
[['human', 0.999688]]
Epoch 23/23
- 13s - loss: 0.1524 - acc: 0.9436
[['human', 0.9990531]]
I've tried varying my parameters hundreds of times, especially my network size, batch size and epochs to try and avoid over-fitting.
I know my question is ridiculously long but I'm running out of patience and any help would be appreciated.
Here's the link to my colab notebook - https://colab.research.google.com/drive/1EOklUw7efOv69HvWKpuKVy1LSzcvTTCk.

Related

Using custom pre-trained word embeddings

I have a fairly simple script to classify intents from natural language queries working pretty well, to which I want to add a word embedding layer from a pre-trained custom model of 200 dims. I'm trying to help myself with this tutorial Keras pretrained_word_embeddings But with what I have achieved so far, the training is very very slow! and even worse the model doesn't learn, accuracy doesn't improve with each epoch, something impossible to handle. I think I have not configured the layers correctly or the parameters are not correct. Could you help with this??
with open("tf-kr_esp.json") as f:
rows = json.load(f)
for row in rows["utterances"]:
w = nltk.word_tokenize(row["text"])
words.extend(w)
documents.append((w, row["intent"]))
if row["intent"] not in classes:
classes.append(row["intent"])
words = sorted(list(set(words)))
classes = sorted(list(set(classes)))
word_index = dict(zip(words, range(len(words))))
embeddings_index = {}
with open('embeddings.txt') as f:
for line in f:
word, coefs = line.split(maxsplit=1)
coefs = np.fromstring(coefs, "f", sep=" ")
embeddings_index[word] = coefs
num_tokens = len(words) + 2
embedding_dim = 200
hits = 0
misses = 0
# Prepare embedding matrix
embedding_matrix = np.zeros((num_tokens, embedding_dim))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# Words not found in embedding index will be all-zeros.
# This includes the representation for "padding" and "OOV"
embedding_matrix[i] = embedding_vector
hits += 1
else:
misses += 1
print("Converted %d words (%d misses)" % (hits, misses))
embedding_layer = Embedding(
num_tokens,
embedding_dim,
embeddings_initializer=tf.keras.initializers.Constant(embedding_matrix),
trainable=False,
)
# create our training data
training = []
output_empty = [0] * len(classes)
for doc in documents:
bag = []
pattern_words = doc[0]
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
random.shuffle(training)
training = np.array(training, dtype="object")
train_x = list(training[:,0])
train_y = list(training[:,1])
int_sequences_input = tf.keras.Input(shape=(None,), dtype="int64")
embedded_sequences = embedding_layer(int_sequences_input)
x = layers.Conv1D(128, 5, activation="relu")(embedded_sequences)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(128, 5, activation="relu")(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(128, 5, activation="relu")(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation="relu")(x)
x = layers.Dropout(0.5)(x)
preds = layers.Dense(69, activation="softmax")(x)
model = tf.keras.Model(int_sequences_input, preds)
model.summary()
#sgd = SGD(learning_rate=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
model.fit(np.array(train_x), np.array(train_y), epochs=20, batch_size=128, verbose=1)
Epoch 1/20
116/116 [==============================] - 279s 2s/step - loss: 4.2157 - accuracy: 0.0485
Epoch 2/20
116/116 [==============================] - 279s 2s/step - loss: 4.1861 - accuracy: 0.0550
Epoch 3/20
116/116 [==============================] - 281s 2s/step - loss: 4.1607 - accuracy: 0.0550
Epoch 4/20
116/116 [==============================] - 283s 2s/step - loss: 4.1387 - accuracy: 0.0550
Epoch 5/20
116/116 [==============================] - 286s 2s/step - loss: 4.1202 - accuracy: 0.0550
Epoch 6/20
116/116 [==============================] - 284s 2s/step - loss: 4.1047 - accuracy: 0.0550
Epoch 7/20
116/116 [==============================] - 286s 2s/step - loss: 4.0915 - accuracy: 0.0550
Epoch 8/20
116/116 [==============================] - 283s 2s/step - loss: 4.0806 - accuracy: 0.0550
Epoch 9/20
116/116 [==============================] - 280s 2s/step - loss: 4.0716 - accuracy: 0.0550
Epoch 10/20
116/116 [==============================] - 283s 2s/step - loss: 4.0643 - accuracy: 0.0550
Can you mention how many number of class you have got?
and also the embedding dimension is 200 that is okay but it is reality that pretrained vectors takes long time to train on the new embeddings. To make it more fast you can lower your input features in Convolutional layers. also you can use Adam as an optimizer instead of SGD. As SGD is much slower than Adam.

"Your input ran out of data" always at step # of epochs

I have been trying to make my first data generator for a model.fit() with Keras. The dataset I'm trying to make has two inputs, an image and a float value. All of my image names and values are stored in a csv file. I believe I made my generator incorrectly because no matter what my batch size is, I always get the error "Your input ran out of data" at the step equal to my epochs. So if my epochs are set to 100 my model will run until it reaches step 100. My dataset is about 100000 images/values big. If anyone could help me find a solution that would be great.
I am currently using:
python 3.8
tf-gpu 2.4.0rc1
keras 2.4.3
pandas 1.1.4
Code:
IMG_SIZE = 400
Version = 1
batch_size = 64
val = .05
val_aug = ImageDataGenerator(rescale=1/255)
aug = ImageDataGenerator(
rescale=1/255,
rotation_range=30,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.2,
zoom_range=0.2,
channel_shift_range=25,
horizontal_flip=True,
fill_mode='constant')
df = pd.read_csv('F:/DATA/Vote/Vote_Age.csv')
df = df.sample(frac = 1)
cut = int(len(df) * val)
train_df = df[cut:]
val_df = df[0:cut]
print(f'Training dataset: {len(train_df)}')
print(f'Val dataset: {len(val_df)}')
train_steps = int(len(train_df) / batch_size)
val_steps = int(len(val_df) / batch_size)
def data(df, generator, batch_size, IMG_SIZE):
z = 0
while True:
df = df.sample(frac = 1)
for i in range(int(len(df) / batch_size)):
images, ages, votes = [], [], []
for x in range(batch_size):
csv_row = df.iloc[(z), :]
z += 1
image_path = f'F:/DATA/Vote/Images/{int(csv_row[0])}.jpg'
image = cv2.resize(cv2.imread(image_path), (int(IMG_SIZE), int(IMG_SIZE)))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = image.reshape(-1, IMG_SIZE, IMG_SIZE, 3)
generator.fit(image)
image = generator.flow(image, batch_size=1)
image = image.next()
image = image.reshape(IMG_SIZE, IMG_SIZE, 3)
images.append(image)
ages.append(csv_row[1])
votes.append(int(csv_row[2]))
images = np.array(images)
ages = np.array(ages)
votes = np.array(votes)
return [[images, ages], [votes]]
#########
#Model was very big and unnecessary to include
#########
train_dataset = train_data(train_df, batch_size, IMG_SIZE)
val_dataset = val_data(val_df, batch_size, IMG_SIZE)
model.fit(
x = train_dataset[0],
y = train_dataset[1],
validation_data=(val_dataset[0], val_dataset[1]),
steps_per_epoch=train_steps,
validation_steps=val_steps,
callbacks=earlyStop,
epochs=100, batch_size=batch_size,
workers=multiprocessing.cpu_count(),
verbose=1)
model.save(f'F:/DATA/Vote/Models/YiffModel{Version}')
Ouput:
...
83/1484 [>.............................] - ETA: 4:46 - loss: 7578.7731
84/1484 [>.............................] - ETA: 4:46 - loss: 7575.5172
85/1484 [>.............................] - ETA: 4:46 - loss: 7572.2818
86/1484 [>.............................] - ETA: 4:46 - loss: 7569.0662
87/1484 [>.............................] - ETA: 4:46 - loss: 7565.8702
88/1484 [>.............................] - ETA: 4:45 - loss: 7562.6932
89/1484 [>.............................] - ETA: 4:45 - loss: 7559.5349
90/1484 [>.............................] - ETA: 4:45 - loss: 7556.3948
91/1484 [>.............................] - ETA: 4:45 - loss: 7553.2726
92/1484 [>.............................] - ETA: 4:44 - loss: 7550.1679
93/1484 [>.............................] - ETA: 4:44 - loss: 7547.0802
94/1484 [>.............................] - ETA: 4:44 - loss: 7544.0094
95/1484 [>.............................] - ETA: 4:44 - loss: 7540.9549
96/1484 [>.............................] - ETA: 4:43 - loss: 7537.9164
97/1484 [>.............................] - ETA: 4:43 - loss: 7534.8937
98/1484 [>.............................] - ETA: 4:43 - loss: 7531.8863
99/1484 [=>............................] - ETA: 4:43 - loss: 7528.8939
100/1484 [=>............................] - ETA: 4:43 - loss: 7525.9163
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 148400 batches). You may need to use the repeat() function when building your dataset.
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 78 batches). You may need to use the repeat() function when building your dataset.
1484/1484 [==============================] - 30s 15ms/step - loss: 7250.9988 - val_loss: 13595.9355
C:\Users\Tristan\anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\engine\training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
2021-01-05 15:37:04.425018: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
C:\Users\Tristan\anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\engine\base_layer.py:1402: UserWarning: `layer.updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`layer.updates` will be removed in a future version. '
WARNING:tensorflow:FOR KERAS USERS: The object that you are saving contains one or more Keras models or layers. If you are loading the SavedModel with `tf.keras.models.load_model`, continue reading (otherwise, you may ignore the following instructions). Please change your code to save with `tf.keras.models.save_model` or `model.save`, and confirm that the file "keras.metadata" exists in the export directory. In the future, Keras will only load the SavedModels that have this file. In other words, `tf.saved_model.save` will no longer write SavedModels that can be recovered as Keras models (this will apply in TF 2.5).
FOR DEVS: If you are overwriting _tracking_metadata in your class, this property has been used to save metadata in the SavedModel. The metadta field will be deprecated soon, so please move the metadata to a different file.
libpng warning: iCCP: known incorrect sRGB profile

Keras: high training and validation accuracy but bad predictions

I'm implementing a Bidirectional LSTM in Keras. During the training, either training accuracy and validation accuracy are 0.83 and also losses are 0.45.
Epoch 1/50
32000/32000 [==============================] - 597s 19ms/step - loss: 0.4611 - accuracy: 0.8285 - val_loss: 0.4515 - val_accuracy: 0.8316
Epoch 2/50
32000/32000 [==============================] - 589s 18ms/step - loss: 0.4563 - accuracy: 0.8299 - val_loss: 0.4514 - val_accuracy: 0.8320
Epoch 3/50
32000/32000 [==============================] - 584s 18ms/step - loss: 0.4561 - accuracy: 0.8299 - val_loss: 0.4513 - val_accuracy: 0.8318
Epoch 4/50
32000/32000 [==============================] - 612s 19ms/step - loss: 0.4560 - accuracy: 0.8300 - val_loss: 0.4513 - val_accuracy: 0.8319
Epoch 5/50
32000/32000 [==============================] - 572s 18ms/step - loss: 0.4559 - accuracy: 0.8299 - val_loss: 0.4512 - val_accuracy: 0.8318
This is my model:
model = tf.keras.Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(Bidirectional(LSTM(units=100, return_sequences=True), input_shape=(timesteps, features)))
model.add(Dropout(0.7))
model.add(Dense(1, activation='sigmoid'))
I normalized my dataset through scikit-learn StandardScaler.
I have a custom loss:
def get_top_one_probability(vector):
return (K.exp(vector) / K.sum(K.exp(vector)))
def listnet_loss(real_labels, predicted_labels):
return -K.sum(get_top_one_probability(real_labels) * tf.math.log(get_top_one_probability(predicted_labels)))
These are the model.compile and model.fit settings:
model.compile(loss=listnet_loss, optimizer=keras.optimizers.Adadelta(learning_rate=1.0, rho=0.95), metrics=["accuracy"])
model.fit(training_dataset, training_dataset_labels, validation_split=0.2, batch_size=1,
epochs=number_of_epochs, workers=10, verbose=1,
callbacks=[SaveModelCallback(), keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)])
This is my test phase:
scaler = StandardScaler()
scaler.fit(test_dataset)
test_dataset = scaler.transform(test_dataset)
test_dataset = test_dataset.reshape((int(test_dataset.shape[0]/20), 20, test_dataset.shape[1]))
# Read model
json_model_file = open('/content/drive/My Drive/Tesi_magistrale/LSTM/models_padded_2/model_11.json', 'r')
loaded_model_json = json_model_file.read()
json_model_file.close()
model = model_from_json(loaded_model_json)
model.load_weights("/content/drive/My Drive/Tesi_magistrale/LSTM/models_weights_padded_2/model_11_weights.h5")
with open("/content/drive/My Drive/Tesi_magistrale/LSTM/predictions/padded/en_ewt-padded.H.pred", "w+") as predictions_file:
predictions = model.predict(test_dataset)
I rescaled also the test set. After line predictions = model.predict(test_dataset) I put some business logic to process my predictions (this logic is also used in the training phase).
I get very bad results on test set, also if the results in training are good.
What I do in a wrong way?
Somehow, the image generator of Keras works well when combined with fit() or fit_generator() function, but fails miserably when combined
with predict_generator() or the predict() function.
When using Plaid-ML Keras back-end for AMD processor, I would rather loop through all test images one-by-one and get the prediction for each image in each iteration.
import os
from PIL import Image
import keras
import numpy
# code for creating dan training model is not included
print("Prediction result:")
dir = "/path/to/test/images"
files = os.listdir(dir)
correct = 0
total = 0
#dictionary to label all animal category class.
classes = {
0:'This is Cat',
1:'This is Dog',
}
for file_name in files:
total += 1
image = Image.open(dir + "/" + file_name).convert('RGB')
image = image.resize((100,100))
image = numpy.expand_dims(image, axis=0)
image = numpy.array(image)
image = image/255
pred = model.predict_classes([image])[0]
animals_category = classes[pred]
if ("cat" in file_name) and ("cat" in sign):
print(correct,". ", file_name, animals_category)
correct+=1
elif ("dog" in file_name) and ("dog" in animals_category):
print(correct,". ", file_name, animals_category)
correct+=1
print("accuracy: ", (correct/total))

NaNs with customised weighted F1-Score in Keras

I need to compute a weighted F1-score in such a way to penalize more errors over my least popular label (typical binary classification problem with an unbalanced dataset).
Unfortunately, I don't get a valid F1-score.
The followings are my metrics functions:
def sensitivity(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
return true_positives / (possible_positives + K.epsilon())
def specificity(y_true, y_pred):
true_negatives = K.sum(K.round(K.clip((1-y_true) * (1-y_pred), 0, 1)))
possible_negatives = K.sum(K.round(K.clip(1-y_true, 0, 1)))
return true_negatives / (possible_negatives + K.epsilon())
def f1(y_true, y_pred):
def recall(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def precision(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
precision = precision(y_true, y_pred)
recall = recall(y_true, y_pred)
return 2*((precision*recall)/(precision+recall))
model.compile(loss='binary_crossentropy',
optimizer=RMSprop(0.001),
metrics=[sensitivity, specificity, 'accuracy', f1])
and here I train the model and do evaluation:
model.fit(x_train, y_train, epochs=12, batch_size=32, verbose=1, class_weight=class_weights_dict, validation_split=0.3)
classes = model.predict(x_test)
loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128, verbose=1)
I always get nan as f1score - is something wrong conceptually or programmatically? Because data are the same I used with another classifier of the scikit-learn library (SVM) and it succeeded.
These are results:
Epoch 1/12
5133/5133 [==============================] - 5s 976us/step - loss: 0.6955 - sensitivity: 0.0561 - specificity: 0.9377 - acc: 0.8712 - f1: nan - val_loss: 0.6884 - val_sensitivity: 0.8836 - val_specificity: 0.0000e+00 - val_acc: 0.0723 - val_f1: nan
Epoch 2/12
5133/5133 [==============================] - 5s 894us/step - loss: 0.6954 - sensitivity: 0.3865 - specificity: 0.5548 - acc: 0.5398 - f1: nan - val_loss: 0.6884 - val_sensitivity: 0.0000e+00 - val_specificity: 1.0000 - val_acc: 0.9277 - val_f1: nan
Epoch 3/12
5133/5133 [==============================] - 5s 925us/step - loss: 0.6953 - sensitivity: 0.3928 - specificity: 0.5823 - acc: 0.5696 - f1: nan - val_loss: 0.6884 - val_sensitivity: 0.0000e+00 - val_specificity: 1.0000 - val_acc: 0.9277 - val_f1: nan
Epoch 4/12
5133/5133 [==============================] - 5s 935us/step - loss: 0.6954 - sensitivity: 0.1309 - specificity: 0.8504 - acc: 0.7976 - f1: nan - val_loss: 0.6884 - val_sensitivity: 0.0000e+00 - val_specificity: 1.0000 - val_acc: 0.9277 - val_f1: nan
etc.
Final result:
[0.6859536773606656, 0.0, 1.0, 0.9321705426356589, nan]
Regarding the nan in your f1 metric:
If you look at the log, your validation sensitivity is 0. Which means your precision and recall are both zero as well. So in the f1 calculation you are dividing by zero and getting a nan.
Add K.epsilon(), as you have done in the other functions.
On a side note, judging by your loss, which had a negligible improvement on the train set, your network had learnt nothing. I'd advice you to start by increasing the number of epochs, make the network deeper and don't pass anything to the class_weight argument (you mention not using weighted computation yet, but your code does set some class weight).
Check also if one of the batches has f1_score equals to nan.

Keras - Classifier not learning from Transfer-Values of a Pre-Trained Model

I'm currently trying to use a pre-trained network and test in on this dataset.
Originally, I used VGG19 and just fine-tuned only the classifier at the end to fit with my 120 classes. I let all layers trainable to maybe improve performance by having a deeper training. The problem is that the model is very slow (even if I let it run for a night, I only got couple of epochs and reach an accuracy of around 45% - I have a GPU GTX 1070).
Then, my thinking was to freeze all layers from this model as I have only 10k images and only train the few last Denses layers but it's still not realy fast.
After watching this video (at around 2 min 30s), I decided to replicate the principle of Transfer-Values with InceptionResnetv2.
I processed every pictures and saved the output in a numpy matrix with the following code.
# Loading pre-trained Model + freeze layers
model = applications.inception_resnet_v2.InceptionResNetV2(
include_top=False,
weights='imagenet',
pooling='avg')
for layer in model.layers:
layer.trainable = False
# Extraction of features and saving
a = True
for filename in glob.glob('train/resized/*.jpg'):
name_img = os.path.basename(filename)[:-4]
class_ = label[label["id"] == name_img]["breed"].values[0]
input_img = np.expand_dims(np.array(Image.open(filename)), 0)
pred = model.predict(input_img)
if a:
X = np.array(pred)
y = np.array(class_)
a = False
else:
X = np.vstack((X, np.array(pred)))
y = np.vstack((y, class_))
np.savez_compressed('preprocessed.npz', X=X, y=y)
X is a matrix of shape (10222, 1536) and y is (10222, 1).
After, I designed my classifier (several topologies) and I have no idea why it is not able to perform any learning.
# Just to One-Hot-Encode labels properly to (10222, 120)
label_binarizer = sklearn.preprocessing.LabelBinarizer()
y = label_binarizer.fit_transform(y)
model = Sequential()
model.add(Dense(512, input_dim=X.shape[1]))
# model.add(Dense(2048, activation="relu"))
# model.add(Dropout(0.5))
# model.add(Dense(256))
model.add(Dense(120, activation='softmax'))
model.compile(
loss = "categorical_crossentropy",
optimizer = "Nadam", # I tried several ones
metrics=["accuracy"]
)
model.fit(X, y, epochs=100, batch_size=64,
callbacks=[early_stop], verbose=1,
shuffle=True, validation_split=0.10)
Below you can find the output from the model :
Train on 9199 samples, validate on 1023 samples
Epoch 1/100
9199/9199 [==============================] - 2s 185us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 2/100
9199/9199 [==============================] - 1s 100us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 3/100
9199/9199 [==============================] - 1s 98us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 4/100
9199/9199 [==============================] - 1s 96us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 5/100
9199/9199 [==============================] - 1s 99us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 6/100
9199/9199 [==============================] - 1s 96us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
I tried to change topologies, activation functions, add dropouts but nothing creates any improvements.
I have no idea what is wrong in my way of doing this. Is the X matrix incorrect ? Isn't it allowed to use the pre-trained model only as feature extractor then perform the classification with a second model ?
Many thanks for your feedbacks,
Regards,
Nicolas
You'll need to call preprocess_input before feeding the image array to the model. It normalizes the values of input_img from [0, 255] into [-1, 1], which is the desired input range for InceptionResNetV2.
input_img = np.expand_dims(np.array(Image.open(filename)), 0)
input_img = applications.inception_resnet_v2.preprocess_input(input_img.astype('float32'))
pred = model.predict(input_img)

Resources