Using custom pre-trained word embeddings - python-3.x

I have a fairly simple script to classify intents from natural language queries working pretty well, to which I want to add a word embedding layer from a pre-trained custom model of 200 dims. I'm trying to help myself with this tutorial Keras pretrained_word_embeddings But with what I have achieved so far, the training is very very slow! and even worse the model doesn't learn, accuracy doesn't improve with each epoch, something impossible to handle. I think I have not configured the layers correctly or the parameters are not correct. Could you help with this??
with open("tf-kr_esp.json") as f:
rows = json.load(f)
for row in rows["utterances"]:
w = nltk.word_tokenize(row["text"])
words.extend(w)
documents.append((w, row["intent"]))
if row["intent"] not in classes:
classes.append(row["intent"])
words = sorted(list(set(words)))
classes = sorted(list(set(classes)))
word_index = dict(zip(words, range(len(words))))
embeddings_index = {}
with open('embeddings.txt') as f:
for line in f:
word, coefs = line.split(maxsplit=1)
coefs = np.fromstring(coefs, "f", sep=" ")
embeddings_index[word] = coefs
num_tokens = len(words) + 2
embedding_dim = 200
hits = 0
misses = 0
# Prepare embedding matrix
embedding_matrix = np.zeros((num_tokens, embedding_dim))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# Words not found in embedding index will be all-zeros.
# This includes the representation for "padding" and "OOV"
embedding_matrix[i] = embedding_vector
hits += 1
else:
misses += 1
print("Converted %d words (%d misses)" % (hits, misses))
embedding_layer = Embedding(
num_tokens,
embedding_dim,
embeddings_initializer=tf.keras.initializers.Constant(embedding_matrix),
trainable=False,
)
# create our training data
training = []
output_empty = [0] * len(classes)
for doc in documents:
bag = []
pattern_words = doc[0]
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
random.shuffle(training)
training = np.array(training, dtype="object")
train_x = list(training[:,0])
train_y = list(training[:,1])
int_sequences_input = tf.keras.Input(shape=(None,), dtype="int64")
embedded_sequences = embedding_layer(int_sequences_input)
x = layers.Conv1D(128, 5, activation="relu")(embedded_sequences)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(128, 5, activation="relu")(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(128, 5, activation="relu")(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation="relu")(x)
x = layers.Dropout(0.5)(x)
preds = layers.Dense(69, activation="softmax")(x)
model = tf.keras.Model(int_sequences_input, preds)
model.summary()
#sgd = SGD(learning_rate=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
model.fit(np.array(train_x), np.array(train_y), epochs=20, batch_size=128, verbose=1)
Epoch 1/20
116/116 [==============================] - 279s 2s/step - loss: 4.2157 - accuracy: 0.0485
Epoch 2/20
116/116 [==============================] - 279s 2s/step - loss: 4.1861 - accuracy: 0.0550
Epoch 3/20
116/116 [==============================] - 281s 2s/step - loss: 4.1607 - accuracy: 0.0550
Epoch 4/20
116/116 [==============================] - 283s 2s/step - loss: 4.1387 - accuracy: 0.0550
Epoch 5/20
116/116 [==============================] - 286s 2s/step - loss: 4.1202 - accuracy: 0.0550
Epoch 6/20
116/116 [==============================] - 284s 2s/step - loss: 4.1047 - accuracy: 0.0550
Epoch 7/20
116/116 [==============================] - 286s 2s/step - loss: 4.0915 - accuracy: 0.0550
Epoch 8/20
116/116 [==============================] - 283s 2s/step - loss: 4.0806 - accuracy: 0.0550
Epoch 9/20
116/116 [==============================] - 280s 2s/step - loss: 4.0716 - accuracy: 0.0550
Epoch 10/20
116/116 [==============================] - 283s 2s/step - loss: 4.0643 - accuracy: 0.0550

Can you mention how many number of class you have got?
and also the embedding dimension is 200 that is okay but it is reality that pretrained vectors takes long time to train on the new embeddings. To make it more fast you can lower your input features in Convolutional layers. also you can use Adam as an optimizer instead of SGD. As SGD is much slower than Adam.

Related

Record time steps in trainig sequential tensorflow and keras model

I'm training a sequential model with tensorflow and keras.
# create the Sequential model
model = Sequential()
# add layers
model.add(Dense(1, activation='relu', input_shape=(2,)))
model.add(Dense(1, activation='sigmoid'))
# configure the network
model.compile(optimizer=SGD(lr=0.1), loss='mean_squared_error', metrics=['accuracy'])
# train the network
history = model.fit(X_train, y_train, validation_split=0.2, epochs=500, verbose=1)
Where the verbose is:
Epoch 1/500
2/2 [==============================] - 0s 159ms/step - loss: 0.2474 - accuracy: 0.4499 - val_loss: 0.2482 - val_accuracy: 0.5714
Epoch 2/500
2/2 [==============================] - 0s 40ms/step - loss: 0.2469 - accuracy: 0.4980 - val_loss: 0.2478 - val_accuracy: 0.7857
Epoch 3/500
2/2 [==============================] - 0s 36ms/step - loss: 0.2459 - accuracy: 0.6568 - val_loss: 0.2474 - val_accuracy: 0.7143
Epoch 4/500
2/2 [==============================] - 0s 44ms/step - loss: 0.2459 - accuracy: 0.6820 - val_loss: 0.2470 - val_accuracy: 0.7143
Epoch 5/500
2/2 [==============================] - 0s 49ms/step - loss: 0.2453 - accuracy: 0.7636 - val_loss: 0.2468 - val_accuracy: 0.7143
...
I can retrieve "loss/accuracy" history:
# Returns [0.24733777344226837, 0.24695482850074768, 0.24644054472446442, ... ]
history.history['loss']
# Returns [0.4716981053352356, 0.5283018946647644, 0.6415094137191772, ... ]
history.history['accuracy']
Is there a similar way to retrieve times used in each step?
Something like:
# Should return
[159, 40, 36, 44 ... ]
Thanks!
They are not part of the callback but rather via progress bar which is not exposed outside. However, you can write a custom callback and calculate these values. Note that the steps shown is the number of batches while training.
Sample Code:
class MyCallback(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.times = []
def on_epoch_begin(self, epoch, logs=None):
self.current_batch_times = []
def on_train_batch_begin(self, batch, logs=None):
self.start = time.time()
def on_train_batch_end(self, batch, logs=None):
self.current_batch_times.append(time.time() - self.start)
def on_epoch_end(self, epoch, logs=None):
self.times.append(np.mean(self.current_batch_times))
my_callback = MyCallback()
history = model.fit(np.random.randn(32*100,2), np.random.randn(32*100,1),
validation_split=0.2, epochs=5, verbose=1, batch_size=32,
callbacks=[my_callback])
print ([np.round(t*1000) for t in my_callback.times])
Output:
[962.0, 938.0, 918.0, 932.0, 912.0]

How to extract cell state of LSTM model through model.fit()?

My LSTM model is like this, and I would like to get state_c
def _get_model(input_shape, latent_dim, num_classes):
inputs = Input(shape=input_shape)
lstm_lyr,state_h,state_c = LSTM(latent_dim,dropout=0.1,return_state = True)(inputs)
fc_lyr = Dense(num_classes)(lstm_lyr)
soft_lyr = Activation('relu')(fc_lyr)
model = Model(inputs, [soft_lyr,state_c])
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
return model
model =_get_model((n_steps_in, n_features),latent_dim ,n_steps_out)
history = model.fit(X_train,Y_train)
But I canot extract the state_c from the history. How to return that?
I am unsure of what you mean by "How to get state_c", because your LSTM layer is already returning the state_c with the flag return_state=True. I assume you are trying to train the multi-output model in this case. Currently, you only have a single output but your model is compiled with multiple outputs.
Here is how you work with multi-output models.
from tensorflow.keras import layers, Model, utils
def _get_model(input_shape, latent_dim, num_classes):
inputs = layers.Input(shape=input_shape)
lstm_lyr,state_h,state_c = layers.LSTM(latent_dim,dropout=0.1,return_state = True)(inputs)
fc_lyr = layers.Dense(num_classes)(lstm_lyr)
soft_lyr = layers.Activation('relu')(fc_lyr)
model = Model(inputs, [soft_lyr,state_c]) #<------- One input, 2 outputs
model.compile(optimizer='adam', loss='mse')
return model
#Dummy data
X = np.random.random((100,15,5))
y1 = np.random.random((100,4))
y2 = np.random.random((100,7))
model =_get_model((15, 5), 7 , 4)
model.fit(X, [y1,y2], epochs=4) #<--------- #One input, 2 outputs
Epoch 1/4
4/4 [==============================] - 2s 6ms/step - loss: 0.6978 - activation_9_loss: 0.2388 - lstm_9_loss: 0.4591
Epoch 2/4
4/4 [==============================] - 0s 6ms/step - loss: 0.6615 - activation_9_loss: 0.2367 - lstm_9_loss: 0.4248
Epoch 3/4
4/4 [==============================] - 0s 7ms/step - loss: 0.6349 - activation_9_loss: 0.2392 - lstm_9_loss: 0.3957
Epoch 4/4
4/4 [==============================] - 0s 8ms/step - loss: 0.6053 - activation_9_loss: 0.2392 - lstm_9_loss: 0.3661

Keras: high training and validation accuracy but bad predictions

I'm implementing a Bidirectional LSTM in Keras. During the training, either training accuracy and validation accuracy are 0.83 and also losses are 0.45.
Epoch 1/50
32000/32000 [==============================] - 597s 19ms/step - loss: 0.4611 - accuracy: 0.8285 - val_loss: 0.4515 - val_accuracy: 0.8316
Epoch 2/50
32000/32000 [==============================] - 589s 18ms/step - loss: 0.4563 - accuracy: 0.8299 - val_loss: 0.4514 - val_accuracy: 0.8320
Epoch 3/50
32000/32000 [==============================] - 584s 18ms/step - loss: 0.4561 - accuracy: 0.8299 - val_loss: 0.4513 - val_accuracy: 0.8318
Epoch 4/50
32000/32000 [==============================] - 612s 19ms/step - loss: 0.4560 - accuracy: 0.8300 - val_loss: 0.4513 - val_accuracy: 0.8319
Epoch 5/50
32000/32000 [==============================] - 572s 18ms/step - loss: 0.4559 - accuracy: 0.8299 - val_loss: 0.4512 - val_accuracy: 0.8318
This is my model:
model = tf.keras.Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(Bidirectional(LSTM(units=100, return_sequences=True), input_shape=(timesteps, features)))
model.add(Dropout(0.7))
model.add(Dense(1, activation='sigmoid'))
I normalized my dataset through scikit-learn StandardScaler.
I have a custom loss:
def get_top_one_probability(vector):
return (K.exp(vector) / K.sum(K.exp(vector)))
def listnet_loss(real_labels, predicted_labels):
return -K.sum(get_top_one_probability(real_labels) * tf.math.log(get_top_one_probability(predicted_labels)))
These are the model.compile and model.fit settings:
model.compile(loss=listnet_loss, optimizer=keras.optimizers.Adadelta(learning_rate=1.0, rho=0.95), metrics=["accuracy"])
model.fit(training_dataset, training_dataset_labels, validation_split=0.2, batch_size=1,
epochs=number_of_epochs, workers=10, verbose=1,
callbacks=[SaveModelCallback(), keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)])
This is my test phase:
scaler = StandardScaler()
scaler.fit(test_dataset)
test_dataset = scaler.transform(test_dataset)
test_dataset = test_dataset.reshape((int(test_dataset.shape[0]/20), 20, test_dataset.shape[1]))
# Read model
json_model_file = open('/content/drive/My Drive/Tesi_magistrale/LSTM/models_padded_2/model_11.json', 'r')
loaded_model_json = json_model_file.read()
json_model_file.close()
model = model_from_json(loaded_model_json)
model.load_weights("/content/drive/My Drive/Tesi_magistrale/LSTM/models_weights_padded_2/model_11_weights.h5")
with open("/content/drive/My Drive/Tesi_magistrale/LSTM/predictions/padded/en_ewt-padded.H.pred", "w+") as predictions_file:
predictions = model.predict(test_dataset)
I rescaled also the test set. After line predictions = model.predict(test_dataset) I put some business logic to process my predictions (this logic is also used in the training phase).
I get very bad results on test set, also if the results in training are good.
What I do in a wrong way?
Somehow, the image generator of Keras works well when combined with fit() or fit_generator() function, but fails miserably when combined
with predict_generator() or the predict() function.
When using Plaid-ML Keras back-end for AMD processor, I would rather loop through all test images one-by-one and get the prediction for each image in each iteration.
import os
from PIL import Image
import keras
import numpy
# code for creating dan training model is not included
print("Prediction result:")
dir = "/path/to/test/images"
files = os.listdir(dir)
correct = 0
total = 0
#dictionary to label all animal category class.
classes = {
0:'This is Cat',
1:'This is Dog',
}
for file_name in files:
total += 1
image = Image.open(dir + "/" + file_name).convert('RGB')
image = image.resize((100,100))
image = numpy.expand_dims(image, axis=0)
image = numpy.array(image)
image = image/255
pred = model.predict_classes([image])[0]
animals_category = classes[pred]
if ("cat" in file_name) and ("cat" in sign):
print(correct,". ", file_name, animals_category)
correct+=1
elif ("dog" in file_name) and ("dog" in animals_category):
print(correct,". ", file_name, animals_category)
correct+=1
print("accuracy: ", (correct/total))

How to implement a multi-label text classifier in Keras?

I've been trying to create a multi-label text classifier that uses GloVe embeddings using Keras. I'm currently experimenting with the TREC-6 data set available at http://cogcomp.org/Data/QA/QC/. I'm only considering the 5 broad labels for my classification problem and am ignoring the sub-labels.
Since it's a multi-label classification problem, given a sentence, my neural network should output all labels with a probability greater than 0.1. Issue is that the network almost always classifies a question with only 1 label, which is fine when I'm asking a question that belongs to only one category. When I combine questions from different categories however, it still gives only one label with a high confidence most of the time, although I want all relevant labels to be identified.
I'm absolutely sure the pre-processing steps are correct. I get the feeling that there's some issue in my model.
I started experimenting with only CNNs in the beginning by referring to the paper "Convolutional Neural Networks for Sentence Classification" at https://www.aclweb.org/anthology/D14-1181.pdf, but on a teacher's advice and after seeing that they fail for long questions with different relevant topics, I tried experimenting with LSTMs and BiLSTMS. I started with this approach https://www.kaggle.com/c/quora-insincere-questions-classification/discussion/80568 and kept modifying some parameters / adding and removing layers hoping to get a good result but I've failed so far.
I tried copy pasting some code for Attention mechanism and adding that after my LSTM layers as well but it doesn't help.
My current model looks somewhat like this. I'll paste most of the rest of my code for clarity. The model and training code is present in the sentence_classifier() function.
class SentenceClassifier:
def __init__(self):
self.MAX_SEQUENCE_LENGTH = 200
self.EMBEDDING_DIM = 100
self.LABEL_COUNT = 0
self.WORD_INDEX = dict()
self.LABEL_ENCODER = None
def clean_str(self, string):
"""
Cleans each string and convert to lower case.
"""
string = re.sub(r"\'s", "", string)
string = re.sub(r"\'ve", "", string)
string = re.sub(r"n\'t", " not", string)
string = re.sub(r"\'re", "", string)
string = re.sub(r"\'d", "", string)
string = re.sub(r"\'ll", "", string)
string = re.sub(r"[^A-Za-z0-9]", " ", string)
string = re.sub(r"\s{2,}", " ", string)
return string.strip().lower()
def loader_encoder(self, table, type="json"):
"""
Load and encode data from dataset.
type = "sql" means get data from MySQL database.
type = "json" means get data from .json file. """
if type == "json":
with open('data/' + table + '.json', 'r', encoding='utf8') as f:
datastore = json.load(f)
questions = []
tags = []
for row in datastore:
questions.append(row['question'])
tags.append(row['tags'].split(','))
tokenizer = Tokenizer(lower=True, char_level=False)
tokenizer.fit_on_texts(questions)
self.WORD_INDEX = tokenizer.word_index
questions_encoded = tokenizer.texts_to_sequences(questions)
questions_encoded_padded = pad_sequences(questions_encoded, maxlen=self.MAX_SEQUENCE_LENGTH, padding='post')
for i, ele in enumerate(tags):
for j, tag in enumerate(ele):
if len(tag) == 0 or tag == ',':
del tags[i][j]
encoder = MultiLabelBinarizer()
encoder.fit(tags)
self.LABEL_ENCODER = encoder
tags_encoded = encoder.fit_transform(tags)
self.LABEL_COUNT = len(tags_encoded[0]) #No. of labels
print("\tUnique Tokens in Training Data: ", len(self.WORD_INDEX))
return questions_encoded_padded, tags_encoded
def load_embeddings(self, EMBED_PATH='./embeddings/glove.6B.100d.txt'):
"""
Load pre-trained embeddings into memory.
"""
embeddings_index = {}
try:
f = open(EMBED_PATH, encoding='utf-8')
except FileNotFoundError:
print("Embeddings missing.")
sys.exit()
for line in f:
values = line.rstrip().rsplit(' ')
word = values[0]
vec = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = vec
f.close()
print("\tNumber of tokens in embeddings file: ", len(embeddings_index))
return embeddings_index
def create_embedding_matrix(self, embeddings_index):
"""
Creates an embedding matrix for all the words(vocab) in the training data with shape (vocab, EMBEDDING_DIM).
Out-of-vocab words will be randomly initialized to values between +0.25 and -0.25.
"""
words_not_found = []
vocab = len(self.WORD_INDEX) + 1
embedding_matrix = np.random.uniform(-0.25, 0.25, size=(vocab, self.EMBEDDING_DIM))
for word, i in self.WORD_INDEX.items():
if i >= vocab:
continue
embedding_vector = embeddings_index.get(word)
if (embedding_vector is not None) and len(embedding_vector) > 0:
embedding_matrix[i] = embedding_vector
else:
words_not_found.append(word)
# print('Number of null word embeddings: %d' % np.sum(np.sum(embedding_matrix, axis=1) == 0))
print("\tShape of embedding matrix: ", str(embedding_matrix.shape))
print("\tNo. of words not found in pre-trained embeddings: ", len(words_not_found))
return embedding_matrix
def sentence_classifier_cnn(self, embedding_matrix, x, y, table, load_saved=0):
"""
A static CNN model.
Makes uses of Keras functional API for constructing the model.
If load_saved=1, THEN load old model, ELSE train new model
model_name = table + ".model.h5"
if load_saved == 1 and os.path.exists('./saved/' + model_name):
print("\nLoading saved model...")
model = load_model('./saved/' + model_name)
print("Model Summary")
print(model.summary())
"""
print("\nTraining model...")
inputs = Input(shape=(self.MAX_SEQUENCE_LENGTH,), dtype='int32')
embedding = Embedding(input_dim=(len(self.WORD_INDEX) + 1), output_dim=self.EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=self.MAX_SEQUENCE_LENGTH)(inputs)
X = keras.layers.SpatialDropout1D(0.3)(embedding)
X = keras.layers.Bidirectional(keras.layers.CuDNNLSTM(64, return_sequences = True))(X)
#X2 = keras.layers.Bidirectional(keras.layers.CuDNNGRU(128, return_sequences = False))(X)
X = keras.layers.Conv1D(32, kernel_size=2, padding='valid', kernel_initializer='normal')(X)
X = keras.layers.GlobalMaxPooling1D()(X)
#X = Attention(self.MAX_SEQUENCE_LENGTH)(X)
X = Dropout(0.5)(X)
X = keras.layers.Dense(16, activation="relu")(X)
X = Dropout(0.5)(X)
X = keras.layers.BatchNormalization()(X)
output = Dense(units=self.LABEL_COUNT, activation='sigmoid')(X)
model = Model(inputs=inputs, outputs=output, name='intent_classifier')
print("Model Summary")
print(model.summary())
cbk = OutputObserver(model, classifier)
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x, y,
batch_size=30,
epochs=23,
verbose=2,
callbacks = [cbk])
#keras.utils.vis_utils.plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
return model
def tag_question(self, model, question):
question = self.clean_str(question)
question_encoded = [[self.WORD_INDEX[w] for w in question.split(' ') if w in self.WORD_INDEX]]
question_encoded_padded = pad_sequences(question_encoded, maxlen=self.MAX_SEQUENCE_LENGTH, padding='post')
predictions = model.predict(question_encoded_padded)
possible_tags = []
for i, probability in enumerate(predictions[0]):
if probability >= 0.01:
possible_tags.append([self.LABEL_ENCODER.classes_[i], probability])
possible_tags.sort(reverse=True, key=lambda x:x[1]) #sort in place on the basis of the probability in each sub-list in descending order
print(possible_tags)
def setup_classifier(self, table):
'''
'''
print("Loading Data Set...")
x, y = self.loader_encoder(table)
embeddings_index = self.load_embeddings()
print("\nGenerating embedding matrix...")
embedding_matrix = self.create_embedding_matrix(embeddings_index)
#Loading / Training model
model = self.sentence_classifier_cnn(embedding_matrix, x, y, table, load_saved=1)
return model, embeddings_index
def connect_to_db(self):
mydb = mysql.connector.connect(host="localhost", user="root", passwd="root", database="questiondb")
cursor = mydb.cursor()
return mydb, cursor
As you can see, I've used a callback to print my predictions after each step.
I've tried to predict labels for all kinds of questions but I get good results only for questions that fall under one category.
For instance,
classifier.tag_question(model, "how many days before new year?")
gives
[['numeric', 0.99226487]]
as the output. But a more complex question like
classifier.tag_question(model, "who is the prophet of the muslim people and where is india located and how much do fruits costs there?")
gives something like
[['human', 0.9990531]]
as the output although labels like 'location' and 'numeric' are also relevant.
I used a callback to predict the prediction for that question after every epoch and I see something like this.
Epoch 1/23
- 19s - loss: 0.6581 - acc: 0.6365
[['human', 0.69752634], ['location', 0.40014982], ['entity', 0.32047516], ['abbreviation', 0.23877779], ['numeric', 0.23324837], ['description', 0.15995058]]
Epoch 2/23
- 12s - loss: 0.4525 - acc: 0.8264
[['human', 0.7437608], ['location', 0.18141672], ['entity', 0.14474556], ['numeric', 0.09171515], ['description', 0.053900182], ['abbreviation', 0.05283475]]
Epoch 3/23
- 12s - loss: 0.3854 - acc: 0.8478
[['human', 0.86335427], ['location', 0.12673976], ['entity', 0.09847507], ['numeric', 0.064431995], ['description', 0.035599917], ['abbreviation', 0.02441895]]
Epoch 4/23
- 12s - loss: 0.3634 - acc: 0.8509
[['human', 0.90795004], ['location', 0.10085008], ['entity', 0.09804481], ['numeric', 0.050411616], ['description', 0.032810867], ['abbreviation', 0.014970899]]
Epoch 5/23
- 13s - loss: 0.3356 - acc: 0.8582
[['human', 0.8365586], ['entity', 0.1130701], ['location', 0.10253032], ['numeric', 0.039931685], ['description', 0.02874279]]
Epoch 6/23
- 13s - loss: 0.3142 - acc: 0.8657
[['human', 0.95577633], ['entity', 0.088555306], ['location', 0.055004593], ['numeric', 0.015950901], ['description', 0.01428318]]
Epoch 7/23
- 13s - loss: 0.2942 - acc: 0.8750
[['human', 0.89538944], ['entity', 0.130977], ['location', 0.06350105], ['description', 0.023014158], ['numeric', 0.019377537]]
Epoch 8/23
- 13s - loss: 0.2739 - acc: 0.8802
[['human', 0.9725125], ['entity', 0.061141968], ['location', 0.026945814], ['description', 0.010931551]]
Epoch 9/23
- 13s - loss: 0.2579 - acc: 0.8914
[['human', 0.9797143], ['entity', 0.042518377], ['location', 0.027904237]]
Epoch 10/23
- 13s - loss: 0.2380 - acc: 0.9020
[['human', 0.7897601], ['entity', 0.14315197], ['location', 0.07439863], ['description', 0.019453615], ['numeric', 0.010681627]]
Epoch 11/23
- 13s - loss: 0.2250 - acc: 0.9104
[['human', 0.9886158], ['entity', 0.024878502], ['location', 0.015951043]]
Epoch 12/23
- 13s - loss: 0.2131 - acc: 0.9178
[['human', 0.9677731], ['entity', 0.03698206], ['location', 0.026153017]]
Epoch 13/23
- 13s - loss: 0.2029 - acc: 0.9204
[['human', 0.9514474], ['entity', 0.053581357], ['location', 0.029657435]]
Epoch 14/23
- 13s - loss: 0.1915 - acc: 0.9285
[['human', 0.9706739], ['entity', 0.0328649], ['location', 0.013876333]]
Epoch 15/23
- 13s - loss: 0.1856 - acc: 0.9300
[['human', 0.9328136], ['location', 0.05573874], ['entity', 0.025918543]]
Epoch 16/23
- 13s - loss: 0.1802 - acc: 0.9318
[['human', 0.9895527], ['entity', 0.014941782], ['location', 0.011972391]]
Epoch 17/23
- 13s - loss: 0.1717 - acc: 0.9373
[['human', 0.9426272], ['entity', 0.03754583], ['location', 0.023379702]]
Epoch 18/23
- 13s - loss: 0.1614 - acc: 0.9406
[['human', 0.99186605]]
Epoch 19/23
- 13s - loss: 0.1573 - acc: 0.9432
[['human', 0.9926062]]
Epoch 20/23
- 13s - loss: 0.1511 - acc: 0.9448
[['human', 0.9993554]]
Epoch 21/23
- 13s - loss: 0.1591 - acc: 0.9426
[['human', 0.9964465]]
Epoch 22/23
- 13s - loss: 0.1507 - acc: 0.9451
[['human', 0.999688]]
Epoch 23/23
- 13s - loss: 0.1524 - acc: 0.9436
[['human', 0.9990531]]
I've tried varying my parameters hundreds of times, especially my network size, batch size and epochs to try and avoid over-fitting.
I know my question is ridiculously long but I'm running out of patience and any help would be appreciated.
Here's the link to my colab notebook - https://colab.research.google.com/drive/1EOklUw7efOv69HvWKpuKVy1LSzcvTTCk.

Keras - Classifier not learning from Transfer-Values of a Pre-Trained Model

I'm currently trying to use a pre-trained network and test in on this dataset.
Originally, I used VGG19 and just fine-tuned only the classifier at the end to fit with my 120 classes. I let all layers trainable to maybe improve performance by having a deeper training. The problem is that the model is very slow (even if I let it run for a night, I only got couple of epochs and reach an accuracy of around 45% - I have a GPU GTX 1070).
Then, my thinking was to freeze all layers from this model as I have only 10k images and only train the few last Denses layers but it's still not realy fast.
After watching this video (at around 2 min 30s), I decided to replicate the principle of Transfer-Values with InceptionResnetv2.
I processed every pictures and saved the output in a numpy matrix with the following code.
# Loading pre-trained Model + freeze layers
model = applications.inception_resnet_v2.InceptionResNetV2(
include_top=False,
weights='imagenet',
pooling='avg')
for layer in model.layers:
layer.trainable = False
# Extraction of features and saving
a = True
for filename in glob.glob('train/resized/*.jpg'):
name_img = os.path.basename(filename)[:-4]
class_ = label[label["id"] == name_img]["breed"].values[0]
input_img = np.expand_dims(np.array(Image.open(filename)), 0)
pred = model.predict(input_img)
if a:
X = np.array(pred)
y = np.array(class_)
a = False
else:
X = np.vstack((X, np.array(pred)))
y = np.vstack((y, class_))
np.savez_compressed('preprocessed.npz', X=X, y=y)
X is a matrix of shape (10222, 1536) and y is (10222, 1).
After, I designed my classifier (several topologies) and I have no idea why it is not able to perform any learning.
# Just to One-Hot-Encode labels properly to (10222, 120)
label_binarizer = sklearn.preprocessing.LabelBinarizer()
y = label_binarizer.fit_transform(y)
model = Sequential()
model.add(Dense(512, input_dim=X.shape[1]))
# model.add(Dense(2048, activation="relu"))
# model.add(Dropout(0.5))
# model.add(Dense(256))
model.add(Dense(120, activation='softmax'))
model.compile(
loss = "categorical_crossentropy",
optimizer = "Nadam", # I tried several ones
metrics=["accuracy"]
)
model.fit(X, y, epochs=100, batch_size=64,
callbacks=[early_stop], verbose=1,
shuffle=True, validation_split=0.10)
Below you can find the output from the model :
Train on 9199 samples, validate on 1023 samples
Epoch 1/100
9199/9199 [==============================] - 2s 185us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 2/100
9199/9199 [==============================] - 1s 100us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 3/100
9199/9199 [==============================] - 1s 98us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 4/100
9199/9199 [==============================] - 1s 96us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 5/100
9199/9199 [==============================] - 1s 99us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 6/100
9199/9199 [==============================] - 1s 96us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
I tried to change topologies, activation functions, add dropouts but nothing creates any improvements.
I have no idea what is wrong in my way of doing this. Is the X matrix incorrect ? Isn't it allowed to use the pre-trained model only as feature extractor then perform the classification with a second model ?
Many thanks for your feedbacks,
Regards,
Nicolas
You'll need to call preprocess_input before feeding the image array to the model. It normalizes the values of input_img from [0, 255] into [-1, 1], which is the desired input range for InceptionResNetV2.
input_img = np.expand_dims(np.array(Image.open(filename)), 0)
input_img = applications.inception_resnet_v2.preprocess_input(input_img.astype('float32'))
pred = model.predict(input_img)

Resources