Is it ok to run same model multiple times? - python-3.x

My Question is I ran a keras model for 100 epochs(gave epochs=100) and stopped for some time for cooling purpose of CPU and GPU.
Than I ran 100 epochs again and loss is decreasing from where it has stopped in the previous 100 epochs .
Is it works in all conditions.
Like if there are on 1000 epochs I want to train my model, can I stop after every 100 epochs and wait until my cpu and GPU cools and run the next 100 epochs.
Can I do this?

It will not work in all conditions. For examples if you shuffle the data and perform a validation split like this :
fit(x,y,epochs=1, verbose=1, validation_split=0.2, shuffle=True)
You will use the entire dataset for training which is not what you expect.
Furthermore, by doing multiple fit you will erase history information (accuracy, loss, etc at each epoch), given by :
model.history
So some callback functions that use this history will not work properly, like EarlyStopping (source code here).
Otherwise, it works as it does not mess around with the keras optimizer as you can see in the source code of keras optimizers (Adadelta optimizer).
However, I do not recommend to do this. Because it could cause bugs in future development. A cleaner way to do that would be to create a custom callback function with a delay like this :
import time
class DelayCallback(keras.callbacks.Callback):
def __init__(self,delay_value=10, epoch_to_complete=10):
self.delay_value = delay_value # in second
self.epoch_to_complete = epoch_to_complete
def on_epoch_begin(self, epoch, logs={}):
if (epoch+1) % self.epoch_to_complete == 0:
print("cooling down")
time.sleep(self.delay_value)
return
model.fit(x_train, y_train,
batch_size=32,
epochs=20,
verbose=1, callbacks=[DelayCallback()])

Related

Azure ML Service dump logs

With the AzureML service, how can I dump the correct Loss curve or Accuracy curve for different epochs for keras deep learning on multiple nodes with Horovod?
The Loss vs epochs plt from Keras deep learning using Horovod and AzureML appears to have issues.
Training CNN with Keras/Horovod (2 GPUs) and AMLS SDK generates weird graphs
It seems like you might be training 2 models and the averaging of the gradients from the different nodes is note happening. Can you share more of your training script -- are you wrapping your optimizer in a DistributedOptimizer like so:
# Horovod: adjust learning rate based on number of GPUs.
opt = keras.optimizers.Adadelta(1.0 * hvd.size())
# Horovod: add Horovod Distributed Optimizer.
opt = hvd.DistributedOptimizer(opt)
In addition, you really only want one machine to log, so usually only attach an AzureML logger for rank 0, like so:
class LogToAzureMLCallback(tf.keras.callbacks.Callback):
def on_batch_end(self, batch, logs=None):
Run.get_context().log('acc',logs['acc'])
def on_epoch_end(self, epoch, logs=None):
Run.get_context().log('epoch_acc',logs['acc'])
callbacks = [
# Horovod: broadcast initial variable states from rank 0 to all other processes.
# This is necessary to ensure consistent initialization of all workers when
# training is started with random weights or restored from a checkpoint.
hvd.callbacks.BroadcastGlobalVariablesCallback(0)
]
# Horovod: save checkpoints only on worker 0 and only log to AzureML from worker 0.
if hvd.rank() == 0:
callbacks.append(keras.callbacks.ModelCheckpoint('./checkpoint-{epoch}.h5'))
callbacks.append(LogToAzureMLCallback())
model.fit(x_train, y_train,
batch_size=batch_size,
callbacks=callbacks,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
How are you logging these metrics? From the graph it looks like there's two sets of datapoints interleaved.

Early Stopping, Model has gone through how many epochs?

I am using Keras. I am training my Neural Network and using Early Stopping. My patience is 10 and the epoch with the lowest validation loss is 15. My network runs til 25 epochs and stops however my model is the one with 25 epochs not 15 if I understand correctly
Is there an easy way to revert to the 15 epoch model or do I need to re-instantiate the model and run 15 epochs?
Yes, there is one, the restore_best_weights parameter in the EarlyStopping callback, set this to True and Keras will keep track of the weights producing the best loss:
callback = EarlyStopping(..., restore_best_weights=True)
See all the parameters for this callback here.
Yes, you get the model (weights) corresponding to the epoch when it stops. A commonly used strategy is to save the model whenever the validation loss/acc improves.
Early Stopping doesn't work the way you are thinking, that it should return the lowest loss or highest accuracy model, it works if there is no improvement in model accuracy or loss, for about x epochs (10 in your case, the patience parameter) then it will stop.
you should use callback modelcheckpoint functions instead e.g.
keras.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=0, save_best_only=True, save_weights_only=False, mode='auto', period=1)
https://keras.io/callbacks/
This will save or checkpoint best model encountered during the training history.

How to use Adam optim considering of its adaptive learning rate?

In the Adam optimization algorithm, the learning speed is adjusted according to the number of iterations. I don't quite understand Adam's design, especially when using batch training. When using batch training, if there are 19,200 pictures, each time 64 pictures are trained, it is equivalent to 300 iterations. If our epoch has 200 times, there are a total of 60,000 iterations. I don't know if such multiple iterations will reduce the learning speed to a very small size. So when we are training, shall we initialize the optim after each epoch, or do nothing throughout the process?
Using pytorch. I have tried to initialize the optim after each epoch if I use batch train, and I do nothing when the number of data is small.
For expample, I don't know whether the two pieces of code is correct:
optimizer = optim.Adam(model.parameters(), lr=0.1)
for epoch in range(100):
###Some code
optim.step()
Another piece of code:
for epoch in range(100):
optimizer = optim.Adam(model.parameters(), lr=0.1)
###Some code
optim.step()
You can read the official paper here https://arxiv.org/pdf/1412.6980.pdf
Your update looks somewhat like this (for brevity, sake I have omitted the warmup-phase):
new_theta = old_theta-learning_rate*momentum/(velocity+eps)
The intuition here is that if momentum>velocity, then the optimizer is in a plateau, so the the learning_rate is increased because momentum/velocity > 1. on the other hand if momentum<velocity, then the optimizer is in a steep slope or noisy region, so the learning_rate is decreased.
The learning_rate isn't necessarily decreased throughout the training, as you have mentioned in you question.

How Keras really fits the models via epochs

I am a bit confused on how Keras fits the models. In general, Keras models are fitted by simply using model.fit(...) something like the following:
model.fit(X_train, y_train, epochs=300, batch_size=64, validation_data=(X_test, y_test))
My question is: Because I stated the testing data by the argument validation_data=(X_test, y_test), does it mean that each epoch is independent? In other words, I understand that at each epoch, Keras train the model using the training data (after getting shuffled) followed by testing the trained model using the provided validation_data. If that's the case, then no matter how many epochs I choose, I only take the results of the last epoch!!
If this scenario is correct, so we do we need multiple epoches? Unless these epoches are dependent somwhow where each epoch uses the same NN weights from the previous epoch, correct?
Thank you
When Keras fit your model it pass throught all the dataset at each epoch by a step corresponding to your batch_size.
For exemple if you have a dataset of 1000 items and a batch_size of 8, the weight of your model will be updated by using 8 items and this until it have seen all your data set.
At the end of that epoch, the model will try to do a prediction on your validation set.
If we have made only one epoch, it would mean that the weight of the model is updated only once per element (because it only "saw" one time the complete dataset).
But in order to minimize the loss function and by backpropagation, we need to update those weights multiple times in order to reach the optimum loss, so pass throught all the dataset multiple times, in other word, multiple epochs.
I hope i'm clear, ask if you need more informations.

Overfitting after one epoch

I am training a model using Keras.
model = Sequential()
model.add(LSTM(units=300, input_shape=(timestep,103), use_bias=True, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(units=536))
model.add(Activation("sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
while True:
history = model.fit_generator(
generator = data_generator(x_[train_indices],
y_[train_indices], batch = batch, timestep=timestep),
steps_per_epoch=(int)(train_indices.shape[0] / batch),
epochs=1,
verbose=1,
validation_steps=(int)(validation_indices.shape[0] / batch),
validation_data=data_generator(
x_[validation_indices],y_[validation_indices], batch=batch,timestep=timestep))
It is a multiouput classification accoriding to scikit-learn.org definition:
Multioutput regression assigns each sample a set of target values.This can be thought of as predicting several properties for each data-point, such as wind direction and magnitude at a certain location.
Thus, it is a recurrent neural network I tried out different timestep sizes. But the result/problem is mostly the same.
After one epoch, my train loss is around 0.0X and my validation loss is around 0.6X. And this values keep stable for the next 10 epochs.
Dataset is around 680000 rows. Training data is 9/10 and validation data is 1/10.
I ask for intuition behind that..
Is my model already over fittet after just one epoch?
Is 0.6xx even a good value for a validation loss?
High level question:
Therefore it is a multioutput classification task (not multi class), I see the only way by using sigmoid an binary_crossentropy. Do you suggest an other approach?
I've experienced this issue and found that the learning rate and batch size have a huge impact on the learning process. In my case, I've done two things.
Reduce the learning rate (try 0.00005)
Reduce the batch size (8, 16, 32)
Moreover, you can try the basic steps for preventing overfitting.
Reduce the complexity of your model
Increase the training data and also balance each sample per class.
Add more regularization (Dropout, BatchNorm)

Resources