model.compile(loss="binary_crossentropy", optimizer='adam', metrics = ['accuracy'])
the_fit = model.fit(X_train, y_train.values, epochs=20,
sample_weight=weights_train,validation_data=(X_test,y_test))
I used the codes above to train a NN model and the output gave train and test loss as below.
Epoch 1/20
69254/69254 [==============================] - 209s 3ms/step - loss: 0.0061 - accuracy: 0.7263 - val_loss: 0.5178 - val_accuracy: 0.7220
you can see the huge difference between train_loss and val_loss, but their accuracies are closed.
[train_loss,train_acc] = model.evaluate(X_train,y_train)
print('train accuracy: ',train_acc)
print('train loss: ',train_loss)
The other way of evaluation revealed that train_loss actually is closed to 0.5, as train_accuracy implied. Does anyone know why the initial metrics output during training is wrong by giving train_loss 0.0061? Thank you!
Related
I am training a LSTM model on my current dataset to predict the multiclass categories - there are 18 mutually exclusive categories and the dataset has ~ 500 rows only (a really small dataset). I am handling the class imbalance using the following:
from sklearn.utils import class_weight
class_weights = list(class_weight.compute_class_weight('balanced',
classes = np.unique(df['categories']),
y = df['categories']))
weights = {}
for index, weight in enumerate(class_weights):
weights[index] = weight
Post this I am building my LSTM model and have been evaluating this model using PRC in tf.metrics as this is an imbalanced target classification problem
METRICS = [ tf.metrics.AUC(name='prc', curve='PR'), # precision-recall curve]
model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(18, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=METRICS)
print(model.summary())
and finally:
history = model.fit(X_train,
y_train,
batch_size=10,
epochs=10,
verbose=1,
class_weight=weights,
validation_data=(X_test,y_test))
Now when I look at the results, the training prc is coming out to be really high whereas my val_prc is really low. An example with 10 epochs:
Epoch 1/10
30/30 [==============================] - 5s 174ms/step - loss: 2.9951 - prc: 0.0682 -
val_loss: 2.8865 - val_prc: 0.0639
Epoch 2/10
30/30 [==============================] - 5s 169ms/step - loss: 2.9556 - prc: 0.0993 -
val_loss: 2.8901 - val_prc: 0.0523
.....
Epoch 8/10
30/30 [==============================] - 6s 189ms/step - loss: 1.2494 - prc: 0.6415 -
val_loss: 3.0662 - val_prc: 0.0728
Epoch 9/10
30/30 [==============================] - 6s 210ms/step - loss: 0.9237 - prc: 0.8302 -
val_loss: 3.0624 - val_prc: 0.1006
Epoch 10/10
30/30 [==============================] - 6s 184ms/step - loss: 0.7452 - prc: 0.9017 -
val_loss: 3.5035 - val_prc: 0.0821
My questions are:
Is the evaluation metric correct that I am using considering it is an imbalanced class problem?
Am I treating the imbalance correctly with the code that I have written in the first place and most importantly, am I using this correct in the model.fit() ?
How can I resolve this? Is there any alternative approach that you can suggest?
I'm running a multiclass classification problem using the below resnet model:
resnet = tf.keras.applications.ResNet50(
include_top=False ,
weights='imagenet' ,
input_shape=(96, 96, 3) ,
pooling="avg"
)
for layer in resnet.layers:
layer.trainable = True
model_resnet = tf.keras.Sequential()
model_resnet.add(resnet)
model_resnet.add(tf.keras.layers.Flatten())
model_resnet.add(tf.keras.layers.Dense(8, activation='softmax',name='output') )
model_resnet.compile( loss="sparse_categorical_crossentropy" , optimizer=tf.keras.optimizers.Adam(learning_rate=0.001) ,metrics=['accuracy'])
I also used a train and a test generator as below:
train_generator=img_gen.flow_from_dataframe(dataframe=train_dataset,x_col="file_loc",y_col='expr',target_size=(96, 96),batch_size=91,class_mode="raw")
test_generator=img_gen.flow_from_dataframe(dataframe=test_dataset,x_col="file_loc",target_size=(96, 96),batch_size=93,y_col=None,shuffle=False,class_mode=None)
when I am running the code below I get the wanted results and everything works fine
model_resnet.fit_generator(train_generator,
steps_per_epoch=STEP_SIZE_TRAIN_resnet,
epochs=20
)
I wanted to compute the validation accuracy of every epoch so I wrote something like this
model_path = f"/content/weights" + "{val_accuracy:.4f}.hdf5"
checkpoint = tf.keras.callbacks.ModelCheckpoint(
model_path,
monitor='val_accuracy',
save_best_only=True,
mode='max',
verbose=1
)
history = model_resnet.fit_generator(
train_generator,
epochs=5,
steps_per_epoch=STEP_SIZE_TRAIN_resnet,
validation_data=test_generator,
validation_steps=STEP_SIZE_TEST_resnet,
max_queue_size=1,
shuffle=True,
callbacks=[checkpoint],
verbose=1
)
The problem is that for every epoch the validation loss and validation accuracy remain zero even though the training loss and accuracy change. I ran this code for over 20 epochs and it doesn't change at all. I can't find what am I doing wrong since without this it works perfectly,does anyone have any idea?
Epoch 1: val_accuracy improved from -inf to 0.00000, saving model to /content/weights0.0000.hdf5
500/500 [==============================] - 30s 60ms/step - loss: 1.0213 - accuracy: 0.6546 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/5
500/500 [==============================] - ETA: 0s - loss: 0.9644 - accuracy: 0.6672
Epoch 2: val_accuracy did not improve from 0.00000
500/500 [==============================] - 29s 58ms/step - loss: 0.9644 - accuracy: 0.6672 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Edit: I didn't specify the test labels of the test dataset because I used to compute the accuracy score as below:
y_pred = model_resnet.predict(test_generator)
y_pred_max = np.argmax(y_pred, axis=1)
y_true = test_dataset["expr"].to_numpy()
print("accuracy",accuracy_score(y_true, y_pred_max))
I changed the test_generator as below:
test_generator=img_gen.flow_from_dataframe(dataframe=test_dataset,x_col="file_loc",target_size=(96, 96),batch_size=93,y_col='expr',shuffle=False,class_mode=None)
but nothing has changed, it still results in zero
As #Dr.Snoopy said, the problems were that I didn't specify the test labels in these generator (which are required to compute accuracy) and I had different class modes in the generator,the correct was "raw" in both.
I am trying to build a classifier in TensorFlow2.1 for CIFAR10 using ResNet50 pre-trained over imagenet from keras.application and then stacking a small FNN on top of it:
# Load ResNet50 pre-trained on imagenet
resn = applications.resnet50.ResNet50(weights='imagenet', input_shape=(IMG_SIZE, IMG_SIZE, 3), pooling='avg', include_top=False)
# Load CIFAR10
(c10_train, c10_test), info = tfds.load(name='cifar10', split=['train', 'test'], with_info=True, as_supervised=True)
# Make sure all the layers are not trainable
for layer in resn.layers:
layer.trainable = False
# Transfert Learning for CIFAR10: fine-tune the network by stacking a trainable FNN on top of Resnet
from tensorflow.keras import models, layers
def build_model():
model = models.Sequential()
# Feature extractor
model.add(resn)
# Small FNN
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.4))
model.add(layers.Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=tf.keras.optimizers.SGD(learning_rate=0.1),
metrics=['accuracy'])
return model
# Build the resulting net
resn50_c10 = build_model()
I am facing the following issue when it comes to validate or test the accuracy:
history = resn50_c10.fit_generator(c10_train.shuffle(1000).batch(BATCH_SIZE), validation_data=c10_test.batch(BATCH_SIZE), epochs=20)
Epoch 1/20
25/25 [==============================] - 113s 5s/step - loss: 0.9659 - accuracy: 0.6634 - val_loss: 2.8157 - val_accuracy: 0.1000
Epoch 2/20
25/25 [==============================] - 109s 4s/step - loss: 0.8908 - accuracy: 0.6920 - val_loss: 2.8165 - val_accuracy: 0.1094
Epoch 3/20
25/25 [==============================] - 116s 5s/step - loss: 0.8743 - accuracy: 0.7038 - val_loss: 2.7555 - val_accuracy: 0.1016
Epoch 4/20
25/25 [==============================] - 132s 5s/step - loss: 0.8319 - accuracy: 0.7166 - val_loss: 2.8398 - val_accuracy: 0.1013
Epoch 5/20
25/25 [==============================] - 132s 5s/step - loss: 0.7903 - accuracy: 0.7253 - val_loss: 2.8624 - val_accuracy: 0.1000
Epoch 6/20
25/25 [==============================] - 132s 5s/step - loss: 0.7697 - accuracy: 0.7325 - val_loss: 2.8409 - val_accuracy: 0.1000
Epoch 7/20
25/25 [==============================] - 132s 5s/step - loss: 0.7515 - accuracy: 0.7406 - val_loss: 2.7697 - val_accuracy: 0.1000
#... (same for the remaining epochs)
Although the model seems to learn adequately from the training split, both the accuracy and loss for the validation set does not improve at all. What is causing this behavior?
I am excluding this is overfitting since I am applying Dropout and since the model seems to never really improve on the test set.
What I have done so far:
Check the one-hot labelling is consistent throughout train and test
Tried different FNN configurations
Tried the method fit_generator instead of fit
Preprocess the image, resized the images w/ different input_shapes
and experienced always the same problem.
Any hint would be extremely appreciated.
The problem is likely due to loading data using tfds and then passing to Keras .fit
Try to load your data with
from keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
And then
fit(x=x_train, y=y_train, batch_size=BATCH_SIZE, epochs=20, verbose=1, callbacks=None, validation_split=0.2, validation_data=None, shuffle=True)
Apparently the problem was caused uniquely by the use of ResNet50.
As a workaround, I downloaded and used other pre-trained deep networks such as keras.applications.vgg16.VGG16, keras.applications.densenet.DenseNet121 and the accuracy on the test set increased as expected.
UPDATE
The above part of this answer is just a palliative. In order to understand what is really happening and eventually use transfer learning properly with ResNet50, keep on reading.
The root cause appears to be found in how Keras handles the Batch Normalization layer:
During fine-tuning, if a Batch Normalization layer is frozen it uses the mini-batch statistics. I believe this is incorrect and it can lead to reduced accuracy especially when we use Transfer learning. A better approach in this case would be to use the values of the moving mean and variance.
As explained more in-depth here: https://github.com/keras-team/keras/pull/9965
Even though the correct approach has been implemented in TensorFlow 2 when we use tf.keras.applications we reference the TensorFlow 1.0 behavior for Batch Normalization. That's why we need to explicitly inject the reference to TensorFlow 2 by adding the argument layers=tf.keras.layers when loading modules. So in my case, the loading of ResNet50 will become
history = resn50_c10.fit_generator(c10_train.shuffle(1000).batch(BATCH_SIZE), validation_data=c10_test.batch(BATCH_SIZE), epochs=20, layers=tf.keras.layers)
and that will do the trick.
Credits for the solution to #rpeloff: https://github.com/keras-team/keras/pull/9965#issuecomment-549126009
I am using an LSTM architecture to create a chatbot. I am using GloVe embedding.
During my training process, my Training accuracy gets stuck at very low values (0.1969) and no progress happens. I am attaching my code below. Can you tell me what can be done to improve the training?
from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense, LSTM
from keras.optimizers import Adam
model=Sequential()
model.add(Embedding(max_words,embedding_dim,input_length=maxlen))
model.add(LSTM(units=100,return_sequences=True, kernel_initializer="glorot_normal", recurrent_initializer="glorot_normal", activation='sigmoid'))
model.add(LSTM(units=100,return_sequences=True, kernel_initializer="glorot_normal", recurrent_initializer="glorot_normal", activation='sigmoid'))
model.add(LSTM(units=100,return_sequences=True, kernel_initializer="glorot_normal", recurrent_initializer="glorot_normal", activation='sigmoid'))
model.add(LSTM(units=100,return_sequences=True, kernel_initializer="glorot_normal", recurrent_initializer="glorot_normal", activation='sigmoid'))
model.summary()
model.layers[0].set_weights([embedding_matrix])
model.layers[0].trainable = False
model.compile(loss='cosine_proximity', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train,
epochs = 500,
batch_size = 32,
validation_data=(x_val,y_val))
Epoch 498/500
60/60 [==============================] - 0s 3ms/step - loss: -0.1303 - acc: 0.1969 - val_loss: -0.1785 - val_acc: 0.2909
Epoch 499/500
60/60 [==============================] - 0s 3ms/step - loss: -0.1303 - acc: 0.1969 - val_loss: -0.1785 - val_acc: 0.2909
Epoch 500/500
60/60 [==============================] - 0s 3ms/step - loss: -0.1303 - acc: 0.1969 - val_loss: -0.1785 - val_acc: 0.2909
Further training (on the same conversation data set ) does not improve accuracy.
Add BatchNormalization layer after Embedding and LSTM too. This helps in regularizing the learning. I added it and it helped me. Also, look at data. There might be some issue too.
#clear session
keras.backend.clear_session()
model = Sequential()
#embedding layer
model.add(Embedding(vocab_size, 50, input_length=pad_length))
#Batch Norm layer
model.add(BatchNormalization())
#First LSTM layer
model.add(LSTM(units=100, return_sequences=True))
#Batch Norm layer
model.add(BatchNormalization())
#add dropout too
model.add(Dropout(0.25))
#Second LSTM layer
model.add(LSTM(units=100))
#Batch Norm
model.add(BatchNormalization())
model.add(Dense(1, activation="sigmoid"))
I'm currently trying to use a pre-trained network and test in on this dataset.
Originally, I used VGG19 and just fine-tuned only the classifier at the end to fit with my 120 classes. I let all layers trainable to maybe improve performance by having a deeper training. The problem is that the model is very slow (even if I let it run for a night, I only got couple of epochs and reach an accuracy of around 45% - I have a GPU GTX 1070).
Then, my thinking was to freeze all layers from this model as I have only 10k images and only train the few last Denses layers but it's still not realy fast.
After watching this video (at around 2 min 30s), I decided to replicate the principle of Transfer-Values with InceptionResnetv2.
I processed every pictures and saved the output in a numpy matrix with the following code.
# Loading pre-trained Model + freeze layers
model = applications.inception_resnet_v2.InceptionResNetV2(
include_top=False,
weights='imagenet',
pooling='avg')
for layer in model.layers:
layer.trainable = False
# Extraction of features and saving
a = True
for filename in glob.glob('train/resized/*.jpg'):
name_img = os.path.basename(filename)[:-4]
class_ = label[label["id"] == name_img]["breed"].values[0]
input_img = np.expand_dims(np.array(Image.open(filename)), 0)
pred = model.predict(input_img)
if a:
X = np.array(pred)
y = np.array(class_)
a = False
else:
X = np.vstack((X, np.array(pred)))
y = np.vstack((y, class_))
np.savez_compressed('preprocessed.npz', X=X, y=y)
X is a matrix of shape (10222, 1536) and y is (10222, 1).
After, I designed my classifier (several topologies) and I have no idea why it is not able to perform any learning.
# Just to One-Hot-Encode labels properly to (10222, 120)
label_binarizer = sklearn.preprocessing.LabelBinarizer()
y = label_binarizer.fit_transform(y)
model = Sequential()
model.add(Dense(512, input_dim=X.shape[1]))
# model.add(Dense(2048, activation="relu"))
# model.add(Dropout(0.5))
# model.add(Dense(256))
model.add(Dense(120, activation='softmax'))
model.compile(
loss = "categorical_crossentropy",
optimizer = "Nadam", # I tried several ones
metrics=["accuracy"]
)
model.fit(X, y, epochs=100, batch_size=64,
callbacks=[early_stop], verbose=1,
shuffle=True, validation_split=0.10)
Below you can find the output from the model :
Train on 9199 samples, validate on 1023 samples
Epoch 1/100
9199/9199 [==============================] - 2s 185us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 2/100
9199/9199 [==============================] - 1s 100us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 3/100
9199/9199 [==============================] - 1s 98us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 4/100
9199/9199 [==============================] - 1s 96us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 5/100
9199/9199 [==============================] - 1s 99us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
Epoch 6/100
9199/9199 [==============================] - 1s 96us/step - loss: 15.9639 - acc: 0.0096 - val_loss: 15.8975 - val_acc: 0.0137
I tried to change topologies, activation functions, add dropouts but nothing creates any improvements.
I have no idea what is wrong in my way of doing this. Is the X matrix incorrect ? Isn't it allowed to use the pre-trained model only as feature extractor then perform the classification with a second model ?
Many thanks for your feedbacks,
Regards,
Nicolas
You'll need to call preprocess_input before feeding the image array to the model. It normalizes the values of input_img from [0, 255] into [-1, 1], which is the desired input range for InceptionResNetV2.
input_img = np.expand_dims(np.array(Image.open(filename)), 0)
input_img = applications.inception_resnet_v2.preprocess_input(input_img.astype('float32'))
pred = model.predict(input_img)