By "progress bar" I mean the standard progress bar that shows up with tf.keras.Model.fit
As I understand, it shows a running average of your selected metrics (over the current epoch), but I want it to show the value at the last completed iteration.
Is there a built-in way to make this change? And if not, what would be the easiest way?
I made a callback a while ago to solve this problem.
class print_on_end(Callback):
def on_batch_end(self, batch, logs={}):
print()
You want to call it like this.
model.fit(training_dataset, steps_per_epoch=num_training_samples, epochs=EPOCHS,validation_data=validation_dataset, callbacks=[print_on_end()])
But this callback prints the everages, just on different lines, so i don't thinks its what you want.
This instead:
class LossAndErrorPrintingCallback(keras.callbacks.Callback):
def on_train_batch_end(self, batch, logs=None):
print("For batch {}, loss is {:7.2f}.".format(batch, logs["loss"]))
)
)
This callback prints the loss of every batch, so it should be what you are looking for.
( if you need the metric just change logs["loss"] with logs["name of the metric"] ex. logs["mean_absolute_error"]
EDIT:
to check the name of the metric inside logs you could print the keys of the log and find the ona you are searching.
class PrintKeys(keras.callbacks.Callback):
def on_train_batch_end(self, batch, logs=None):
keys = list(logs.keys())
print(keys)
)
)
In that method you should only find the keys for the loss and the metric.
source:
https://keras.io/guides/writing_your_own_callbacks/
An example of the class hierarchy of the MeanSquaredError metric is
MeanSquaredError->MeanMetricWrapper->Mean->Reduce->Metric
Is there a built-in way ?
The main problem is that all metrics are subclasses of the Reduce metric, which performs the aggregation, and there is nothing foreseen to change the behaviour of the Reduce base class.
How to achieve this the most easily
Given the schema above you can achieve what you want by means of creating a new metric subclass of the MeanMetricWrapper
which overrides the update_state method of the MeanMetricWrapper
by means of first calling self.reset_state and then MeanMetricWrapper.update_state.
Like this, the aggregates in the underlying Reduce base class will aggregate only a single value. Working Example below:
#! /usr/bin/env python
import numpy as np
import keras
from keras.metrics import MeanMetricWrapper
x=np.linspace(0, 1, 20000)[:,np.newaxis,np.newaxis]
y=np.sin(x*2*np.pi)
model = keras.Sequential()
model.add(keras.layers.Dense(4, activation="tanh", input_shape=(1,1)))
model.add(keras.layers.Dense(4, activation="tanh"))
model.add(keras.layers.Dense(4))
#####
# Here the Instantaneous metric variant
class InstMetric(MeanMetricWrapper):
def __init__(self, fn, **kwargs):
""" fn is the callable loss function you want to use in your metric """
super().__init__(fn=fn, **kwargs)
def update_state(self, y_true, y_pred, sample_weight=None):
self.reset_states()
return super().update_state(y_true, y_pred, sample_weight=sample_weight)
#####
model.compile(optimizer='adam', loss='mean_squared_error',
metrics=[
keras.metrics.MeanSquaredError(name="MSE"),
InstMetric(keras.metrics.mean_squared_error, name="IMSE")
]
)
model.fit(x=x, y=y, epochs=1, batch_size=5, steps_per_epoch=1000)
Storing this script as inst_demo.py and running it through tr to unfold the progress bar in the terminal
you get
$> ./inst_demo.py | tr \\r \\n
1/1000 [..............................] - ETA: 8:07 - loss: 0.4656 - MSE: 0.4656 - IMSE: 0.4656
42/1000 [>.............................] - ETA: 1s - loss: 0.4874 - MSE: 0.4874 - IMSE: 0.4133
87/1000 [=>............................] - ETA: 1s - loss: 0.4685 - MSE: 0.4685 - IMSE: 0.4764
132/1000 [==>...........................] - ETA: 1s - loss: 0.4627 - MSE: 0.4627 - IMSE: 0.5445
175/1000 [====>.........................] - ETA: 0s - loss: 0.4558 - MSE: 0.4558 - IMSE: 0.7689
217/1000 [=====>........................] - ETA: 0s - loss: 0.4443 - MSE: 0.4443 - IMSE: 0.1058
264/1000 [======>.......................] - ETA: 0s - loss: 0.4258 - MSE: 0.4258 - IMSE: 0.4162
311/1000 [========>.....................] - ETA: 0s - loss: 0.4090 - MSE: 0.4090 - IMSE: 0.1716
356/1000 [=========>....................] - ETA: 0s - loss: 0.3889 - MSE: 0.3889 - IMSE: 0.3417
400/1000 [===========>..................] - ETA: 0s - loss: 0.3707 - MSE: 0.3707 - IMSE: 0.1271
445/1000 [============>.................] - ETA: 0s - loss: 0.3532 - MSE: 0.3532 - IMSE: 0.0729
489/1000 [=============>................] - ETA: 0s - loss: 0.3383 - MSE: 0.3383 - IMSE: 0.2310
535/1000 [===============>..............] - ETA: 0s - loss: 0.3248 - MSE: 0.3248 - IMSE: 0.1228
580/1000 [================>.............] - ETA: 0s - loss: 0.3143 - MSE: 0.3143 - IMSE: 0.2670
625/1000 [=================>............] - ETA: 0s - loss: 0.3048 - MSE: 0.3048 - IMSE: 0.1762
671/1000 [===================>..........] - ETA: 0s - loss: 0.2962 - MSE: 0.2962 - IMSE: 0.0751
715/1000 [====================>.........] - ETA: 0s - loss: 0.2896 - MSE: 0.2896 - IMSE: 0.0650
756/1000 [=====================>........] - ETA: 0s - loss: 0.2831 - MSE: 0.2831 - IMSE: 0.2332
799/1000 [======================>.......] - ETA: 0s - loss: 0.2773 - MSE: 0.2773 - IMSE: 0.1026
841/1000 [========================>.....] - ETA: 0s - loss: 0.2721 - MSE: 0.2721 - IMSE: 0.1238
888/1000 [=========================>....] - ETA: 0s - loss: 0.2673 - MSE: 0.2673 - IMSE: 0.1471
936/1000 [===========================>..] - ETA: 0s - loss: 0.2631 - MSE: 0.2631 - IMSE: 0.2242
986/1000 [============================>.] - ETA: 0s - loss: 0.2580 - MSE: 0.2580 - IMSE: 0.2704
1000/1000 [==============================] - 2s 1ms/step - loss: 0.2574 - MSE: 0.2574 - IMSE: 0.2773
So you get an instant value each time the progressbar is updated.
You can also derive the InstMetric from any of the available keras metrics, if you don't want to be able to select the metric you will use.
Related
I am training a DeepFM model by DeepCTR. During training, the metrics are wrongly calculated. The confusion matrix metrics(TP, FP, TN, FN) doesn't seem to reset after each epoch and keeps on adding.
The metrics are defined as
METRICS = [
tf.keras.metrics.TruePositives(name='tp'),
tf.keras.metrics.FalsePositives(name='fp'),
tf.keras.metrics.TrueNegatives(name='tn'),
tf.keras.metrics.FalseNegatives(name='fn'),
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall'),
tf.keras.metrics.AUC(name='auc'),
tf.keras.metrics.AUC(name='prc', curve='PR'), # precision-recall curve
tf.keras.metrics.BinaryCrossentropy(name='bce'),
]
The sum of confusion matrix increases exponentially topping the number of training samples I have.
Epoch 1/5
69/69 [==============================] - 4s 29ms/step - loss: 0.6810 - tp: 154.6667 - fp: 333572.0625 - tn: 239617.6094 - fn: 52.3478 - precision: 4.5102e-04 - recall: 0.7916 - auc: 0.5877 - prc: 5.0342e-04 - bce: 0.6911
Epoch 2/5
69/69 [==============================] - 2s 27ms/step - loss: 0.5650 - tp: 442.0580 - fp: 782765.5625 - tn: 917517.3125 - fn: 179.7101 - precision: 5.5941e-04 - recall: 0.7099 - auc: 0.6618 - prc: 9.4619e-04 - bce: 0.6555
Epoch 3/5
69/69 [==============================] - 2s 28ms/step - loss: 0.1199 - tp: 792.6667 - fp: 973029.1250 - tn: 1854353.7500 - fn: 237.1304 - precision: 8.1280e-04 - recall: 0.7671 - auc: 0.7935 - prc: 0.0402 - bce: 0.5320
Epoch 4/5
69/69 [==============================] - 2s 27ms/step - loss: 0.0099 - tp: 1195.8406 - fp: 984204.6250 - tn: 2970279.2500 - fn: 241.0000 - precision: 0.0012 - recall: 0.8311 - auc: 0.8924 - prc: 0.1829 - bce: 0.3886
Epoch 5/5
69/69 [==============================] - 2s 29ms/step - loss: 0.0074 - tp: 1603.8406 - fp: 984474.1875 - tn: 4097109.2500 - fn: 241.0000 - precision: 0.0016 - recall: 0.8688 - auc: 0.9353 - prc: 0.2926 - bce: 0.3021
Tried overriding the TruePositive class as
class TruePositives(tf.keras.metrics.TruePositives):
def reset_states(self):
print('Resetting TruePositives')
super().reset_states()
def update_state(self, y_true, y_pred, sample_weight=None):
super().update_state(y_true, y_pred, sample_weight)
def result(self):
return super().result()
to check if the reset function is being called.
It is not being called.
Similar problem encountered here
I've been following the tutorial of forecasting timeseries data with keras.
https://keras.io/examples/timeseries/timeseries_weather_forecasting/
I wanted to compare the LSTM approach with the basic machine-learning approach.
So I created a Dense layer model as follow:
model2 = models.Sequential()
model2.add(layers.Input(shape=(inputs.shape[1], inputs.shape[2])))
model2.add(layers.Flatten())
model2.add(layers.Dense(64,activation='relu'))
model2.add(layers.Dense(8,activation='relu'))
model2.add(layers.Dense(1))
model2.summary()
model2.compile(optimizer=keras.optimizers.RMSprop(), loss="mae",metrics=['mse','mae'])
history = model2.fit(
dataset_train,
epochs=10,
validation_data=dataset_val,
)
I run all the keras tutorial sample code in google Colab and added model2 at the last.
However, in the result mae metric looks fine, but the mse metric looks strange. As shown, the training mse was about 0.1 while val_mse was over 100.
But I don't know whether this is normal? or where I did wrong?
Epoch 1/10
1172/1172 [==============================] - 68s 57ms/step - loss: 0.5176 - mse: 0.5927 - mae: 0.5176 - val_loss: 1.1439 - val_mse: 120.2718 - val_mae: 1.1439
Epoch 2/10
1172/1172 [==============================] - 64s 55ms/step - loss: 0.2998 - mse: 0.1554 - mae: 0.2998 - val_loss: 1.0518 - val_mse: 140.6306 - val_mae: 1.0518
Epoch 3/10
1172/1172 [==============================] - 65s 55ms/step - loss: 0.2767 - mse: 0.1299 - mae: 0.2767 - val_loss: 0.9180 - val_mse: 103.4829 - val_mae: 0.9180
Epoch 4/10
1172/1172 [==============================] - 65s 55ms/step - loss: 0.2667 - mse: 0.1215 - mae: 0.2667 - val_loss: 0.8420 - val_mse: 83.6165 - val_mae: 0.8420
Epoch 5/10
1172/1172 [==============================] - 65s 55ms/step - loss: 0.2628 - mse: 0.1185 - mae: 0.2628 - val_loss: 0.8389 - val_mse: 89.2020 - val_mae: 0.8389
Epoch 6/10
1172/1172 [==============================] - 64s 55ms/step - loss: 0.2573 - mse: 0.1140 - mae: 0.2573 - val_loss: 0.8562 - val_mse: 105.4153 - val_mae: 0.8562
Epoch 7/10
1172/1172 [==============================] - 65s 55ms/step - loss: 0.2539 - mse: 0.1108 - mae: 0.2539 - val_loss: 0.8436 - val_mse: 96.0179 - val_mae: 0.8436
Epoch 8/10
1172/1172 [==============================] - 69s 59ms/step - loss: 0.2514 - mse: 0.1096 - mae: 0.2514 - val_loss: 0.8834 - val_mse: 121.4520 - val_mae: 0.8834
Epoch 9/10
1172/1172 [==============================] - 65s 55ms/step - loss: 0.2491 - mse: 0.1081 - mae: 0.2491 - val_loss: 0.9360 - val_mse: 145.4284 - val_mae: 0.9360
Epoch 10/10
1172/1172 [==============================] - 65s 55ms/step - loss: 0.2487 - mse: 0.1112 - mae: 0.2487 - val_loss: 0.8668 - val_mse: 110.2743 - val_mae: 0.8668
I am training a simple model in keras for label classification task with following code.
This dataset has 5 classes so final layer of the network has 5 outputs.
Labels are also one-hot encoded. Here are my results:
32/4000 [..............................] - ETA: 0s - loss: 0.2264 - acc:
0.8750
2176/4000 [===============>..............] - ETA: 0s - loss: 0.3092 - acc:
0.8755
4000/4000 [==============================] - 0s 26us/step - loss: 0.2870 -
acc: 0.8805 - val_loss: 15.9636 - val_acc: 0.0070
Epoch 99/100
32/4000 [..............................] - ETA: 0s - loss: 0.1408 - acc:
0.9688
2176/4000 [===============>..............] - ETA: 0s - loss: 0.2696 - acc:
0.8824
4000/4000 [==============================] - 0s 25us/step - loss: 0.2729 -
acc: 0.8868 - val_loss: 15.9731 - val_acc: 0.0070
Epoch 100/100
32/4000 [..............................] - ETA: 0s - loss: 0.2299 - acc:
0.9375
2176/4000 [===============>..............] - ETA: 0s - loss: 0.2861 - acc:
0.8787
4000/4000 [==============================] - 0s 25us/step - loss: 0.2763 -
acc: 0.8865 - val_loss: 15.9791 - val_acc: 0.0070
10/1000 [..............................] - ETA: 0s
1000/1000 [==============================] - 0s 26us/step
32/5000 [..............................] - ETA: 0s
5000/5000 [==============================] - 0s 9us/step
When do tests at the end of training I get almost 100% error on the test data
I have looked at many related posts, but could not figure out what is wrong, but no luck.
Any advice ?
I am using the a CNN similar to alexnet for a image related regression task. I defined a rmse for the loss function. However, during the training in the first epoch, the loss returned a huge value. But following the second epoch, it dropped to a meaningful value. Here it is:
1/51 [..............................] - ETA: 847s - loss: 104.1821 -
acc: 0.2500 - root_mean_squared_error: 104.1821 2/51
[>.............................] - ETA: 470s - loss: 5277326.0910 -
acc: 0.5938 - root_mean_squared_error: 5277326.0910 3/51
[>.............................] - ETA: 345s - loss: 3518246.7337 -
acc: 0.5000 - root_mean_squared_error: 3518246.7337 4/51
[=>............................] - ETA: 281s - loss: 2640801.3379 -
acc: 0.6094 - root_mean_squared_error: 2640801.3379 5/51
[=>............................] - ETA: 241s - loss: 2112661.3062 -
acc: 0.5000 - root_mean_squared_error: 2112661.3062 6/51
[==>...........................] - ETA: 214s - loss: 1760566.4758 -
acc: 0.4375 - root_mean_squared_error: 1760566.4758 7/51
[===>..........................] - ETA: 194s - loss: 1509067.6495 -
acc: 0.4464 - root_mean_squared_error: 1509067.6495 8/51
[===>..........................] - ETA: 178s - loss: 1320442.6319 -
acc: 0.4570 - root_mean_squared_error: 1320442.6319 9/51
[====>.........................] - ETA: 165s - loss: 1173734.9212 -
acc: 0.4792 - root_mean_squared_error: 1173734.9212 10/51
[====>.........................] - ETA: 155s - loss: 1056369.3193 -
acc: 0.4875 - root_mean_squared_error: 1056369.3193 11/51
[=====>........................] - ETA: 146s - loss: 960343.5998 -
acc: 0.4943 - root_mean_squared_error: 960343.5998 12/51
[======>.......................] - ETA: 139s - loss: 880320.3762 -
acc: 0.5052 - root_mean_squared_error: 880320.3762 13/51
[======>.......................] - ETA: 131s - loss: 812608.7112 -
acc: 0.5216 - root_mean_squared_error: 812608.7112 14/51
[=======>......................] - ETA: 125s - loss: 754570.1939 -
acc: 0.5402 - root_mean_squared_error: 754570.1939 15/51
[=======>......................] - ETA: 120s - loss: 704269.2443 -
acc: 0.5479 - root_mean_squared_error: 704269.2443 16/51
[========>.....................] - ETA: 114s - loss: 660256.3035 -
acc: 0.5508 - root_mean_squared_error: 660256.3035 17/51
[========>.....................] - ETA: 109s - loss: 621420.7248 -
acc: 0.5607 - root_mean_squared_error: 621420.7248 18/51
[=========>....................] - ETA: 104s - loss: 586900.8398 -
acc: 0.5712 - root_mean_squared_error: 586900.8398 19/51
[==========>...................] - ETA: 100s - loss: 556014.6719 -
acc: 0.5806 - root_mean_squared_error: 556014.6719 20/51
[==========>...................] - ETA: 95s - loss: 528216.9077 - acc:
0.5875 - root_mean_squared_error: 528216.9077 21/51 [===========>..................] - ETA: 91s - loss: 503065.7743 - acc:
0.5967 - root_mean_squared_error: 503065.7743 22/51 [===========>..................] - ETA: 87s - loss: 480206.3521 - acc:
0.6094 - root_mean_squared_error: 480206.3521 23/51 [============>.................] - ETA: 83s - loss: 459331.8636 - acc:
0.6114 - root_mean_squared_error: 459331.8636 24/51 [=============>................] - ETA: 80s - loss: 440196.2991 - acc:
0.6159 - root_mean_squared_error: 440196.2991 25/51 [=============>................] - ETA: 76s - loss: 422590.8381 - acc:
0.6162 - root_mean_squared_error: 422590.8381 26/51 [==============>...............] - ETA: 73s - loss: 406339.5179 - acc:
0.6178 - root_mean_squared_error: 406339.5179 27/51 [==============>...............] - ETA: 69s - loss: 391292.6992 - acc:
0.6238 - root_mean_squared_error: 391292.6992 28/51 [===============>..............] - ETA: 66s - loss: 377319.9851 - acc:
0.6306 - root_mean_squared_error: 377319.9851 29/51 [===============>..............] - ETA: 63s - loss: 364310.7557 - acc:
0.6336 - root_mean_squared_error: 364310.7557 30/51 [================>.............] - ETA: 60s - loss: 352169.1059 - acc:
0.6385 - root_mean_squared_error: 352169.1059 31/51 [=================>............] - ETA: 57s - loss: 340810.8854 - acc:
0.6401 - root_mean_squared_error: 340810.8854 32/51 [=================>............] - ETA: 53s - loss: 330162.1334 - acc:
0.6455 - root_mean_squared_error: 330162.1334 33/51 [==================>...........] - ETA: 50s - loss: 320158.7622 - acc:
0.6553 - root_mean_squared_error: 320158.7622 34/51 [==================>...........] - ETA: 47s - loss: 310744.0080 - acc:
0.6645 - root_mean_squared_error: 310744.0080 35/51 [===================>..........] - ETA: 44s - loss: 301866.8259 - acc:
0.6714 - root_mean_squared_error: 301866.8259 36/51 [====================>.........] - ETA: 41s - loss: 293483.0129 - acc:
0.6762 - root_mean_squared_error: 293483.0129 37/51 [====================>.........] - ETA: 39s - loss: 285552.8197 - acc:
0.6757 - root_mean_squared_error: 285552.8197 38/51 [=====================>........] - ETA: 36s - loss: 278039.4488 - acc:
0.6752 - root_mean_squared_error: 278039.4488 39/51 [=====================>........] - ETA: 33s - loss: 270911.4670 - acc:
0.6795 - root_mean_squared_error: 270911.4670 40/51 [======================>.......] - ETA: 30s - loss: 264140.2391 - acc:
0.6820 - root_mean_squared_error: 264140.2391 41/51 [=======================>......] - ETA: 27s - loss: 257699.1895 - acc:
0.6852 - root_mean_squared_error: 257699.1895 42/51 [=======================>......] - ETA: 25s - loss: 251564.6846 - acc:
0.6890 - root_mean_squared_error: 251564.6846 43/51 [========================>.....] - ETA: 22s - loss: 245715.4124 - acc:
0.6933 - root_mean_squared_error: 245715.4124 44/51 [========================>.....] - ETA: 19s - loss: 240131.9916 - acc:
0.6960 - root_mean_squared_error: 240131.9916 45/51 [=========================>....] - ETA: 16s - loss: 234796.6948 - acc:
0.7007 - root_mean_squared_error: 234796.6948 46/51 [=========================>....] - ETA: 14s - loss: 229693.3717 - acc:
0.7045 - root_mean_squared_error: 229693.3717 47/51 [==========================>...] - ETA: 11s - loss: 224807.2748 - acc:
0.7055 - root_mean_squared_error: 224807.2748 48/51 [===========================>..] - ETA: 8s - loss: 220125.0731 - acc:
0.7077 - root_mean_squared_error: 220125.0731 49/51 [===========================>..] - ETA: 5s - loss: 215634.5638 - acc:
0.7117 - root_mean_squared_error: 215634.5638 50/51 [============================>.] - ETA: 3s - loss: 211323.1692 - acc:
0.7144 - root_mean_squared_error: 211323.1692 51/51 [============================>.] - ETA: 0s - loss: 207180.6328 - acc:
0.7151 - root_mean_squared_error: 207180.6328 52/51 [==============================] - 143s - loss: 203253.6237 - acc:
0.7157 - root_mean_squared_error: 203253.6237 - val_loss: 44.4203 - val_acc: 0.9878 - val_root_mean_squared_error: 44.4203 Epoch 2/128
1/51 [..............................] - ETA: 117s - loss: 52.6087 -
acc: 0.7188 - root_mean_squared_error: 52.6087
How to understand this behavior? Here is my implementation. First define the rmse function:
from keras import backend as K
def root_mean_squared_error(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
Then for the model:
model.compile(optimizer="rmsprop", loss=root_mean_squared_error, metrics=['accuracy', root_mean_squared_error])
Then fit the model:
estimator = alexmodel()
datagen = ImageDataGenerator()
datagen.fit(x_train)
start = time.time()
history = estimator.fit_generator(datagen.flow(x_train, x_train,batch_size=batch_size, shuffle=True),
epochs=epochs,
steps_per_epoch=x_train.shape[0]/batch_size,
validation_data=(x_test, y_test))
end = time.time()
Can anyone tell me why is that? Anything potential wrong?
So - it's important to normalize your data. It seems that you haven't normalized your target and as a network is usually initialized in such way that it will produce small values at the beginning - this made your loss so huge during the first epoch. So I still advise you to normalize your target (by either using StandardScaler or MinMaxScaller) because a need to produce high scale values will make the weights of your network to have much higher absolute values which are something which you should prevent your network from.
I'm a newbie in deep learning and Keras. I really hope folks with experience in this field could help me answer the following question.
I downloaded the cifar10_cnn.py code from Keras github. I run it with python 3.5.2, Keras 2.0.2 and tried both backends tensorflow 0.12.0-rco and theano 0.9.0. But unfortunately both of them print output something like below:
Epoch 1/200
1/1562 [..............................] - ETA: 92s - loss: 2.2861 - acc: 0.1562
3/1562 [..............................] - ETA: 65s - loss: 2.3133 - acc: 0.1354
5/1562 [..............................] - ETA: 59s - loss: 2.3202 - acc: 0.1125
7/1562 [..............................] - ETA: 57s - loss: 2.3168 - acc: 0.1071
what I expect is some like below:
Epoch 1/200
32/50000 [..............................] - ETA: 3138s - loss: 2.3238 - acc: 0.0625
64/50000 [..............................] - ETA: 1579s - loss: 2.3165 - acc: 0.0625
96/50000 [..............................] - ETA: 1059s - loss: 2.3091 - acc: 0.0625
128/50000 [..............................] - ETA: 798s - loss: 2.3070 - acc: 0.0781
160/50000 [..............................] - ETA: 643s - loss: 2.3056 - acc: 0.0750
you could observe that 50000/32 = 1562.5, but I don't know why the output was changed like that. It's very confusing for new comer to see the numerator is 1 and denominator is 1562. Is this change related to python3?
Another confusion for me is where the output comes from? which API leads to the above output?