How can I save loss and accuracy metrics in mlflow after each epoch? - keras

I would like to see metrics like loss and accuracy as a graph by storing each value for the corresponding metrics after each epoch during training/testing phase of a keras model.
PS: I know that we can do it by using autolog feature of mlflow for keras like below, but I dont want to use that.
mlflow.keras.autolog()

After searching through the internet and combining a few concepts, I was able to solve the problem that I had asked. In Keras, we can create custom callbacks that can be called at various points (start/end of epoch, batch, etc) during training, testing, and prediction phase of a model.
So, I created a Keras custom callback to store loss/accuracy values after each epoch as mlflow metrics like below.
class CustomCallback(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
mlflow.log_metrics({
"loss": logs["loss"],
"sparse_categorical_accuracy":
logs["sparse_categorical_accuracy"],
"val_loss": logs["val_loss"],
"val_sparse_categorical_accuracy":
logs["val_sparse_categorical_accuracy"],
})
I called this above callback during training of my model like below.
history = model.fit(
features_train,
labels_train,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
callbacks=[CustomCallback()],
validation_split=0.2
)
The keras custom callback stored all the values during training after each epoch which I was able to see as a graph in mlflow UI like below.

Related

Keras Creating custom loss function that takes into account only certain features

I want to create a custom loss function that takes into account some output features (not all). I am training sequence regression LSTM neural network on data that looks like this: my input shape is (number_of_samples, 200 timesteps, 4 features) and my output shape is (number_of_samples, 200 timesteps, 6 features). My first, basic model looks like this:
inputs1=Input(shape=(None,num_input_features))
lstm1=LSTM(10,return_sequences=True)(inputs1)
lstm2=LSTM(10,return_sequences=True)(lstm1)
outputs1=TimeDistributed(Dense(num_output_features))(lstm2)
model_proba=Model(inputs=inputs1,outputs=outputs1)
I want to train a model only on the first 4 features of my output (the last 2 features are not relevant for training, I just want to predict them without training on that data). I have tried creating a custom loss function that looks like this:
def custom_loss(y_true, y_pred): # Loss that doesn't take into account last 2 output features
y_true_r=y_true[:,:,:4]
y_pred_r=y_pred[:,:,:4]
mse = MeanSquaredError()
return mse(y_true_r, y_pred_r)
But the problem is that during the training my weights and biases connected to 2 output features (that are not in the loss function) are not trained, they have same initial values after the training.
As far as I understand, the loss doesn't depend on these weights and biases so the gradient is 0 and there is no weights and bias updates. So i want to know is it possible to train these weights and biases in relation to the global loss value?
p.s. I am using Adam optimizer.

LSTM Autoencoder for Anomaly detection in time series, correct way to fit model

I'm trying to find correct examples of using LSTM Autoencoder for defining anomalies in time series data in internet and see a lot of examples, where LSTM Autoencoder model are fitted with labels, which are future time steps for feature sequences (as for usual time series forecasting with LSTM), but I suppose, that this kind of model should be trained with labels which are the same sequence as sequence of features (previous time steps).
The first link in the google by this searching for example - https://towardsdatascience.com/time-series-of-price-anomaly-detection-with-lstm-11a12ba4f6d9
1.This function defines the way to get labels (y feature)
def create_sequences(X, **y**, time_steps=TIME_STEPS):
Xs, ys = [], []
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc**[i+time_steps]**)
return np.array(Xs), np.array(ys)
X_train, **y_train** = create_sequences(train[['Close']], train['Close'])
X_test, y_test = create_sequences(test[['Close']], test['Close'])
2.Model is fitted as follow
history = model.fit(X_train, **y_train**, epochs=100, batch_size=32, validation_split=0.1,
callbacks=[keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, mode='min')], shuffle=False)
Could you kindly comment the way how Autoencoder is implemented in the link on towardsdatascience.com/?
Is it correct method or model should be fitted following way ?
model.fit(X_train,X_train)
Thanks in advance!
This is time series auto-encoder. If you want to predict for future, it goes this way. The auto-encoder / machine learning model fitting is different for different problems and their solutions. You cannot train and fit one model / workflow for all problems. Time-series / time lapse can be what we already collected data for time period and predict, it can be for data collected and future prediction. Both are differently constructed. Like time series data for sub surface earth is differently modeled, and for weather forecast is differently. One model cannot work for both.
By definition an autoencoder is any model attempting at reproducing it's input, independent of the type of architecture (LSTM, CNN,...).
Framed this way it is a unspervised task so the training would be : model.fit(X_train,X_train)
Now, what she does in the article you linked, is to use a common architecture for LSTM autoencoder but applied to timeseries forecasting:
model.add(LSTM(128, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(RepeatVector(X_train.shape[1]))
model.add(LSTM(128, return_sequences=True))
model.add(TimeDistributed(Dense(X_train.shape[2])))
She's pre-processing the data in a way to get X_train = [x(t-seq)....x(t)] and y_train = x(t+1)
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc[i+time_steps])
So the model does not per-se reproduce the input it's fed, but it doesn't mean it's not a valid implementation since it produce valuable prediction.

Azure ML Service dump logs

With the AzureML service, how can I dump the correct Loss curve or Accuracy curve for different epochs for keras deep learning on multiple nodes with Horovod?
The Loss vs epochs plt from Keras deep learning using Horovod and AzureML appears to have issues.
Training CNN with Keras/Horovod (2 GPUs) and AMLS SDK generates weird graphs
It seems like you might be training 2 models and the averaging of the gradients from the different nodes is note happening. Can you share more of your training script -- are you wrapping your optimizer in a DistributedOptimizer like so:
# Horovod: adjust learning rate based on number of GPUs.
opt = keras.optimizers.Adadelta(1.0 * hvd.size())
# Horovod: add Horovod Distributed Optimizer.
opt = hvd.DistributedOptimizer(opt)
In addition, you really only want one machine to log, so usually only attach an AzureML logger for rank 0, like so:
class LogToAzureMLCallback(tf.keras.callbacks.Callback):
def on_batch_end(self, batch, logs=None):
Run.get_context().log('acc',logs['acc'])
def on_epoch_end(self, epoch, logs=None):
Run.get_context().log('epoch_acc',logs['acc'])
callbacks = [
# Horovod: broadcast initial variable states from rank 0 to all other processes.
# This is necessary to ensure consistent initialization of all workers when
# training is started with random weights or restored from a checkpoint.
hvd.callbacks.BroadcastGlobalVariablesCallback(0)
]
# Horovod: save checkpoints only on worker 0 and only log to AzureML from worker 0.
if hvd.rank() == 0:
callbacks.append(keras.callbacks.ModelCheckpoint('./checkpoint-{epoch}.h5'))
callbacks.append(LogToAzureMLCallback())
model.fit(x_train, y_train,
batch_size=batch_size,
callbacks=callbacks,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
How are you logging these metrics? From the graph it looks like there's two sets of datapoints interleaved.

HP Tuning with keras model and setting hyperparameterMetric to a Evaluation Metric and not training metric

It is about Hyperparameter Tuning with GCP.
With estimators I can easily set the desired hyperparameterMetric to the proper metric on evaluation data. But I don't see how I can do that for a keras (tf.keras and keras) model?
I mean where can I "assign" the right metric? I need the hyperparameterMetric to be the metric for evaluation data.
Edit:
model.fit returns a dict like:
{'acc': [0.9843952109499714],
'loss': [0.050826362343496051],
'val_acc': [0.98403786838658314],
'val_loss': [0.0502210383056177]
}
Does GCP works now if I just set my desired validation metric to 'val_acc' in the config file without doing anything else?
you must use keras callback of tensorboard
use prefix "epoch_" for ParameterMetricTag=epoch_val_acc for validation accuracy and ParameterMetricTag=epoch_acc for training accuracy

How Keras really fits the models via epochs

I am a bit confused on how Keras fits the models. In general, Keras models are fitted by simply using model.fit(...) something like the following:
model.fit(X_train, y_train, epochs=300, batch_size=64, validation_data=(X_test, y_test))
My question is: Because I stated the testing data by the argument validation_data=(X_test, y_test), does it mean that each epoch is independent? In other words, I understand that at each epoch, Keras train the model using the training data (after getting shuffled) followed by testing the trained model using the provided validation_data. If that's the case, then no matter how many epochs I choose, I only take the results of the last epoch!!
If this scenario is correct, so we do we need multiple epoches? Unless these epoches are dependent somwhow where each epoch uses the same NN weights from the previous epoch, correct?
Thank you
When Keras fit your model it pass throught all the dataset at each epoch by a step corresponding to your batch_size.
For exemple if you have a dataset of 1000 items and a batch_size of 8, the weight of your model will be updated by using 8 items and this until it have seen all your data set.
At the end of that epoch, the model will try to do a prediction on your validation set.
If we have made only one epoch, it would mean that the weight of the model is updated only once per element (because it only "saw" one time the complete dataset).
But in order to minimize the loss function and by backpropagation, we need to update those weights multiple times in order to reach the optimum loss, so pass throught all the dataset multiple times, in other word, multiple epochs.
I hope i'm clear, ask if you need more informations.

Resources