Early Stopping and Callbacks with Keras when using SageMaker - keras

I am using sagemaker to train a keras model. I need to implement early stoping approach when training the model.
Is there a way to pass callbacks such as EarlyStopping, Histories..etc.
In traditional way, we used to pass this as a parameter to keras's fit function:
results = model.fit(train_x_trim, train_y_trim,
validation_data=(test_x, test_y),
epochs=FLAGS.epoch,
verbose=0,
callbacks=[tboard, checkpointer, early_stopping, history])
However, if using SageMaker, we need to call SageMaker's fit function instead which doesn't support callbacks.
from sagemaker.tensorflow import TensorFlow
iris_estimator = TensorFlow(entry_point='training_code.py',
role=role, output_path=model_location,
code_location=custom_code_upload_location,
train_instance_count=1,
train_instance_type='ml.c4.xlarge',
training_steps=1000,
evaluation_steps=100)
Any idea how to implement callbacks in SageMaker ?

I apologize for the late response.
It looks like the Keras code you specified above is essentially your algorithm code. This would be defined in your user script, which would be "training_code.py" in the SageMaker Python SDK example you provided.
Starting with TensorFlow 1.11, the SageMaker predefined TensorFlow containers have support for "script mode". You should be able to specify your Keras callbacks within your user script.
For more information: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/README.rst#tensorflow-sagemaker-estimators-and-models

Related

How to use Keras effectively agnostic to backend

I am trying out some examples using keras models, that are already available. Most of the examples are using keras with tensorflow (or pytorch or theano).
Due to limited available resource and cost cutting, I am using plaidml to work with amd gpu. As keras support pluggable backend, I think this may not be an issue.
Please share your thoughts about using keras api and later plugging in with desired backend.
I have this concern because the samples and this are using keras from tensorflow (import tensorflow.keras) and I am using plain from keras(import keras) with pluggable backend.
what is equivalent statement for
img = tf.io.decode_png(img, channels=1)
# 3. Convert to float32 in [0, 1] range
img = tf.image.convert_image_dtype(img, tf.float32)
Is there any limitation going with plain keras api?
I just used PIL Image to read and convert an image. It works the same as without using tensorflow api. Most of the keras api can be used irrespective of the backend. There are some caveat with PlaidML as well, there are some function like CTC Loss ctc_batch_cost cannot be found. I got an error like
The Keras backend function 'ctc_batch_cost' is not yet implemented in
Plaid. You can help us prioritize by letting us know if this function
is important to you, and as always, contributions are welcome!
There are some posts, which provide some sample implementation but it is not straight forward. From PLaidML, the response was that it may not be available soon.

Pytorch Lightning Inference

I trained a model using pytorch lightning and especially appreciated the ease of using multiple GPU's. Now after training, how can I still make use of lightnings GPU features to run inference on a test set and store/export the predictions?
The documentation on inference does not target that.
Thanks in advance.
You can implement the validation_epoch_end on your LightningModule which is called "at the end of the validation epoch with the outputs of all validation steps". For this to work you also need to define validation_step on that same module.
Once this is done, you can run validation using your trainer and a given dataloader by calling:
trainer.validate(pl_module, dataloaders=validation_dataloader)

Porting pre-trained keras models and run them on IPU

I am trying to port two pre-trained keras models into the IPU machine. I managed to load and run them using IPUstrategy.scope but I dont know if i am doing it the right way. I have my pre-trained models in .h5 file format.
I load them this way:
def first_model():
model = tf.keras.models.load_model("./model1.h5")
return model
After searching your ipu.keras.models.py file I couldn't find any load methods to load my pre-trained models, and this is why i used tf.keras.models.load_model().
Then i use this code to run:
cfg=ipu.utils.create_ipu_config()
cfg=ipu.utils.auto_select_ipus(cfg, 1)
ipu.utils.configure_ipu_system(cfg)
ipu.utils.move_variable_initialization_to_cpu()
strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
model = first_model()
print('compile attempt\n')
model.compile("sgd", "categorical_crossentropy", metrics=["accuracy"])
print('compilation completed\n')
print('running attempt\n')
res = model.predict(input_img)[0]
print('run completed\n')
you can see the output here:link
So i have some difficulties to understand how and if the system is working properly.
Basically the model.compile wont compile my model but when i use model.predict then the system first compiles and then is running. Why is that happening? Is there another way to run pre-trained keras models on an IPU chip?
Another question I have is if its possible to load a pre-trained keras model inside an ipu.keras.model and then use model.fit/evaluate to further train and evaluate it and then save it for future use?
One last question I have is about the compilation part of the graph. Is there a way to avoid recompilation of the graph every time i use the model.predict() in a different strategy.scope()?
I use tensorflow2.1.2 wheel
Thank you for your time
To add some context, the Graphcore TensorFlow wheel includes a port of Keras for the IPU, available as tensorflow.python.ipu.keras. You can access the API documentation for IPU Keras at this link. This module contains IPU-specific optimised replacement for TensorFlow Keras classes Model and Sequential, plus more high-performance, multi-IPU classes e.g. PipelineModel and PipelineSequential.
As per your specific issue, you are right when you mention that there are no IPU-specific ways to load pre-trained Keras models at present. I would encourage you, as you appear to have access to IPUs, to reach out to Graphcore Support. When doing so, please attach your pre-trained Keras model model1.h5 and a self-contained reproducer of your code.
Switching topic to the recompilation question: using an executable cache prevents recompilation, you can set that up with environmental variable TF_POPLAR_FLAGS='--executable_cache_path=./cache'. I'd also recommend to take a look into the following resources:
this tutorial gathers several considerations around recompilation and how to avoid it when using TensorFlow2 on the IPU.
Graphcore TensorFlow documentation here explains how to use the pre-compile mode on the IPU.

XGBoost get classifier object form booster object?

I usually get to feature importance using
regr = XGBClassifier()
regr.fit(X, y)
regr.feature_importances_
where type(regr) is .
However, I have a pickled mXGBoost model, which when unpacked returns an object of type . This is the same object as if I would have ran regr.get_booster().
I have found a few solutions for getting variable importance from a booster object, but is there a way to get to the classifier object from the booster object so I can just apply the same feature_importances_ command? This seems like the most straightforward solution, or it seems like I have to write a function that mimics the output of feature_importances_ in order for it to fit my logged feature importances...
So ideally I'd have something like
xbg_booster = pickle.load(open("xgboost-model", "rb"))
assert str(type(xgb_booster)) == "<class 'xgboost.core.Booster'>", 'wrong class'
xgb_classifier = xgb_booster.get_classifier()
xgb_classifier.feature_importances_
Are there any limitations to what can be done with a booster object in terms finding the classifier? I figure there's some combination of save/load/dump that will get me what I need but I'm stuck for now...
Also for context, the pickled model is the output from AWS sagemaker, so I'm just unpacking it to do some further evaluation
Based on my own experience trying to recreate a classifier from a booster object generated by SageMaker I learned the following:
It doesn't appear to be possible to recreate the classifier from the booster. :(
https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster has the details on the booster class so you can review what it can do.
Crazy things you can do however:
You can create a classifier object and then over-ride the booster within it:
xgb_classifier = xgb.XGBClassifier(**xgboost_params)
[..]
xgb_classifier._Boster = booster
This is nearly useless unless you fit it otherwise it doesn't have any feature data. (I didn't go all the way through this scenario to validate if fitting would provide the feature data required to be functional.)
You can remove the booster object from the classifier and then pickle the classifier using xgboost directly. Then later restore the SageMaker booster back into it. This abomination is closer and appears to work, but is not truly a rehydrated classifier object from the SageMaker output alone.
Recommendation
If you’re not stuck using the SageMaker training solution you can certainly use XGBoost directly to train with. At that point you have access to everything you need to dump/save the data for use in a different context.
I know you're after feature importance so I hope this gets you closer, I had a different use case and was ultimately able to leverage the booster for what I needed.
I was able to get xgboost.XGBClassifier model virtually identical to a xgboost.Booster version model by
(1) extracting all tuning parameters from the booster model using this:
import json
json.loads(your_booster_model.save_config())
(2) implementing these same tuning parameters and then training a XGBClassifier model using the same training dataset used to train the Booster model before that.
Note: one mistake I made was that I forgot to explicitly assign the same seed /random_state in both Booster and Classifier versions.

Training one model with several GPU's

How you can program keras or tensorflow to partitionate training on multiple GPU, let's say you are in an amaozn ec2 instance that has 8 GPU's and you want to use all of them to train faster, but your code is just for a single cpu or GPU ?
Yes, can run Keras models on multiple GPUs. This is only possible with the TensorFlow backend for the time being, because the Theano feature is still rather new. We are looking at adding support for multi-gpu in Theano in the near future (it should be fairly straightforward).
With the TensorFlow backend, you can achieve this the same way as you would in pure TensorFlow: by using the with tf.device(d) scope when defining Keras layers.
Originally from here

Resources